Claim Missing Document
Check
Articles

Analysis of Name Entities in Text Using Robust Disambiguation Method Muthia Virliani; Moch. Arif Bijaksana; Arie Ardiyanti Suryani
SISFOTENIKA Vol 10, No 2 (2020): SISFOTENIKA
Publisher : STMIK PONTIANAK

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (565.011 KB) | DOI: 10.30700/jst.v10i2.963

Abstract

Named entities are proper nouns or objects contained in a text, such as a person's name, country name, and others. Names of persons in some text are often ambiguous, which makes it difficult for ordinary people to find out these same names are the same person or not.  An ambiguity of names also found in hadith, like the name Abdullah in hadith number 86 and 2411, that might be the same person or might be different. Based on this problem, then this study focuses on named entity disambiguation, which considered further semantic and lexical relation between a named entity. Expected in the future, it would help people to understand the ambiguity of the name or distinguish ambiguous names. The method used in this research was Robust Disambiguation because, in this method, the context of the named entity considered. The resulted output obtained was in the form of named entity that grouped based on the same person or different person processed with Density-based Spatial Clustering of Applications with Noise.  This research resulted in an accuracy value of 90%, a precision value of 97%, and a recall value of 89% obtained from actual value and predicted value
Analysis Name Entity Disambiguation Using Mining Evidence Method Adelya Astari; Moch. Arif Bijaksana; Arie Ardiyanti Suryani
Paradigma Vol 22, No 2 (2020): Periode September 2020
Publisher : LPPM Universitas Bina Sarana Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (814.724 KB) | DOI: 10.31294/p.v22i2.8196

Abstract

Hadith is the second guideline and source of Islamic teachings after the Qur'an. One of the most Saheeh hadith is the book of Saheeh al-Bukhaari. Hadith Sahih Bukhari has a chain of narrators, hadith numbers, and contents of different contents. This tradition also has science that discusses the history of the narrators of the hadith called the Science of Rijalul Hadith. In the Sahih Bukhari hadith there are the names of the narrators of the hadith who have the same name, causing obligation between names. That makes it difficult for many ordinary people to understand these ambiguous names because it is not yet known whether the two names are the same person or not. So, it raises the problem of a name ambiguation for ordinary people who cannot distinguish whether the name of the narrator is the same person or not. To solve these problems, a solution is built, namely the disambiguation of names to eliminate the ambiguity of the name by checking the name, hadith number, narrators chain, content topics, circles, countries, and companions of the Prophet that are seen from the 3 last names before the Prophet based on the chain of narrators. Also, the solution is assisted by using a method Mining Evidence with several other approaches, i.e. Association label documents, word association labels, context similarity, cosine similarity, and word2vec to obtain all similarity values between name entities. After the similarity values are obtained, the data are grouped using the Clustering algorithm. This system is expected to be able to produce a good system performance with a confusion matrix based on value precision, recall, and accuracy.
Analysis of the Commutative Method Approach on English Thesaurus for Developing Synonym Sets Arini Rohmawati; Moch. Arif Bijaksana; Kemas Muslim Lhaksmana
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 2 (2019): September, 2019
Publisher : School of Computing, Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/INDOJC.2019.4.2.332

Abstract

WordNet is a lexical database for languages, the difference between WordNet and dictionaries in general is that WordNet focuses on the synonyms. The main unit of WordNet is synonym set (synset), synset is a set of one or more words that have the same meaning and certainly can be replaced in certain contexts. Synset is a very important element in implementing WordNet. In this paper, an analysis of the synonym extraction process is carried out by using commutative approach, the data test obtained from the Oxford Paperback Thesaurus by taking 51 word entries. Commutative method has similar characters with synonym set, synonym set can replace each other in certain contexts. The data test extraction process is carried out until the performance measurement evaluation process using F1Score. The system generates synonym sets that matched with the manual extraction, the result of F1Score between the program and Princeton synonym sets are worth 10%.
Entity Recognition for Quran English Version with Supervised Learning Approach Muhammad Aris Maulana; Moch. Arif Bijaksana; Arief Fatchul Huda
Indonesia Journal on Computing (Indo-JC) Vol. 4 No. 3 (2019): December, 2019
Publisher : School of Computing, Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/INDOJC.2019.4.3.362

Abstract

The Quran is a Muslim holy book that consists of 6236 ayat or verses which divides into 144 surahs or chapters. In each chapter, there are many entities scattered in each verse. For a person, finding a particular entity will be difficult without a classification process, Resulting in difficulties in understanding the Quran. A system can be modeled to extract the information on entities in the Quran to solve this problem. Therefore, we want to offer a method to identify and classify entities using Entity recognition. The system will use the SVM techniques where the system will be given various entities from the Quran as an input to be able to identify correct entities. We are using the dataset obtained from website tanzil.net consists of 19.473 tokens and 720 entities. The classification scenario using a linear kernel with unigram produces the highest f-measure value of 0.75.Al-Quran merupakan kitab suci Muslim yang terdiri dari 6236 ayat atau bait yang dibagi menjadi 144 surah atau bab. Di setiap bab, ada banyak entitas yang tersebar di setiap ayat. Bagi seorang individu, menemukan entitas tertentu akan sulit tanpa proses klasifikasi yang membuat kesulitan dalam memahami Quran. Sebuah sistem dapat dimodelkan untuk mengekstrak informasi tentang entitas dalam Al-Quran untuk menyelesaikan masalah ini. Oleh karena itu, kami menawarkan sistem untuk mengidentifikasi dan mengklasifikasikan entitas menggunakan Entity Recognition. Sistem akan menggunakan teknik SVM di mana sistem akan diberikan berbagai entitas dari Quran sebagai input untuk dapat mengidentifikasi entitas yang benar. Kami menggunakan dataset yang diperoleh dari situs web tanzil.net terdiri dari 19.473 tokens dan 720 entitas. Skenario klasifikasi yang menggunakan linear kernel dengan unigram memperoleh nilai f-measure tertinggi sebesar 0,75.
Development Synonym Set for the English Wordnet Using the Method of Comutative and Agglomerative Clustering Munirsyah Munirsyah; Moch. Arif Bijaksana; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 2 (2020): JULI
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i2.855

Abstract

Wordnet is a collection of words that interpret or present a meaning, in its development Wordnet has an important part, the Synonym Set or Synset. In making Synonym sets, synonyms are needed and the commutative nature of words is needed. To get word synonyms, the English language thesaurus becomes the reference data for taking synonym data. Broadly speaking, the difference between Wordnet and the dictionary is that the meaning of the word is related to other words, to determine the equation requires a commutative process. The process is made easy by using commutative methods that will produce a candidate synonym set. Candidates for the synonym set cannot be used for word syntax, the grouping process of words which produces the Synonym set as the final result must be carried out. The process of grouping words can one of them use clustering techniques, in this study will use Agglomerative Clustering techniques. In the process of agglomerative clustering techniques there is a threshold value to determine the number of repetitions or as a condition to stop the iteration process. The clustering process in this study will use a threshold value of 0.1 to 1 to test the best threshold value to produce the best Synonym set and calculate its accuracy value. Accuracy calculation and evaluation will use the F-measure method to find the best results.
Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm Mubaroq Iqbal; Moch. Arif Bijaksana; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 2 (2020): JULI
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i2.853

Abstract

On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.
Implementasi Word Sense Disambiguation Dengan Metode Maximal Marginal Relevance Pada Peringkasan Teks Bening Suryani Pratiwi; Shaufiah Shaufiah; Moch. Arif Bijaksana
eProceedings of Engineering Vol 4, No 1 (2017): April, 2017
Publisher : eProceedings of Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Dalam meringkas sebuah teks terdapat permasalahan yang muncul dan mempengaruhi hasil dari peringkasan teks tersebut. Permasalahan yang muncul seperti ambiguitas kata dan redundansi. Untuk meningkatkan kualitas dari peringkasan teks tersebut maka, permasalahan ambiguitas dan redundansi harus diatasi. Sehingga pada tugas akhir ini dilakukan peringkasan teks pada single dokumen yang mengimplementasi Word Sense Disambiguation dengan metode Maximal Marginal Relevance. Tahapan yang dilakukan terdiri dari Preprocessing, Word Sense Disambiguation, perhitungan Cosine Similarity, perhitungan Maximal Marginal Relevance, dan evaluasi. Pada tahapan preprocessing dilakukan cleaning pada data seperti stopwords removal, tokenization, remove tag, lemmatization dan stemming. Proses Word Sense Disambiguation dipilih untuk mengatasi masalah ambigu pada term dan diganti dengan synset term pada peringkasan teks tersebut. Pada peringkasan ini akan menggunakan cosine similarity untuk mengukur kemiripan setiap kalimat dengan kalimat pada keseluruhan isi dokumen. Sedangkan metode Maximal Marginal Relevance digunakan untuk merangking ulang hasil dari perhitungan cosine similarity dan memilih kalimat dengan nilai MMR paling tinggi yang akan dijadikan summary dengan nilai compresion rate yang ditentukan. Metode MMR termasuk metode yang sederhana namun efisien untuk mengurangi redundansi. Hasil peringkasan teks otomatis ini selanjutnya dievalusi dan dianalisis dengan pengukuran precision, recall, dan F-Measure dan dilihat dari hasil survey pembaca terhadap summary yang dihasilkan. Dengan nilai Recall 35%, Precision 21%, dan F-Measure 25%. Kata kunci : Word Sense Disambiguation, Maximal Marginal Relevance, Cosine Similarity.
Analisis Dan Implementasi Pencocokan String Berdasarkan Kemiripan Pengucapan (phonetic String Matching) Menggunakan Algoritma Metaphone Dalam Pencarian Ayat Al-qur’an Tegar Graha Adiwiguna; Moch. Arif Bijaksana; Shaufiah Shaufiah
eProceedings of Engineering Vol 2, No 2 (2015): Agustus, 2015
Publisher : eProceedings of Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Abstrak Telah ditemukan penelitian baru yang membuat Al-Qur’an dalam versi digital. Akan tetapi, pada umumnya perangkat lunak yang telah ada hanya menggunakan teknik Exact String Matching untuk melakukan pencarian informasi (ayat). Dan jika pengguna perangkat lunak ini salah dalam penulisan inputan maka perangkat lunak tidak akan memberikan solusi dari apa yang diinginkan oleh pengguna. Oleh karena itu, tujuan penelitian ini adalah membangun sistem pencarian dengan teknik fonetik atau pencocokan kata berdasarkan pengucapan (Phonetic String Matching) yang dapat digunakan untuk mengatasi permasalah tersebut. Dengan menggunakan Algoritma Metaphone dan Dice Similarity, sistem pencarian ayat Al-Qur’an ini akan melakukan pencocokan string berdasarkan pengucapan dengan nilai precision sebesar 54% dan nilai recall sebesar 100%. Juga korelasi yang didapatkan sebesar 82%.
Analisis Dan Implementasi Pencarian Ayat Al-quran Berbasis Fonetis Menggunakan Metode N-gram Muhammad Fakhri Ar-Razi; Moch. Arif Bijaksana; Shaufiah Shaufiah
eProceedings of Engineering Vol 2, No 2 (2015): Agustus, 2015
Publisher : eProceedings of Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Abstrak Mencari ayat di Al-Qur’an tidak mudah bagi pengguna yang tidak memiliki cukup pengetahuan dan kemampuan dalam bahasa Arab. Oleh karena itu, pencarian fonetis dapat digunakan untuk mempermudah pengguna untuk mencari ayat dalam Al-Qur’an sesuai dengan pengucapan dan penulisan pengguna. Tugas akhir ini bertujuan untuk membangun system pencarian tersebut, khusus untuk penutur Bahasa Indonesia. Sebuah metode n-gram yang digabungkan dengan pengodean fonetis mengenai aturan bacaan Quran diusulkan untuk mencocokkan antara teks Al-Qur’an transliterasi yang sudah diubah ke dalam aksara latin (sesuai penuturan Bahasa Indonesia) dan query pengguna dalam aksara latin. Dilakukan pengindeksan dari trigram yang digunakan untuk perkiraan pencocokan string. Sistem ini menggunakan 2 skema pencarian yaitu pencarian dengan huruf vokal dan tanpa vokal yang sudah dibandingkan keduanya dan pencarian dengan vokal yang lebih baik; 2 metode pemeringkatan yaitu jumlah trigram dan letak posisi trigram. Dari hasil yang sudah diuji didapatkan presisi yang cukup baik dengan skema pencarian menggunakan vokal sebesar 0.746, sedangkan dengan skema pencarian tanpa vokal sebesar 0.515. Setelah menggabungkan 2 metode pemeringkatan dan menggunakan skema pencarian dengan vokal didapatkan nilai recall sebesar 0.79, serta didapatkan nilai korelasi yang cukup besar yaitu 0.907 dan sistem juga dapat menerima berbagai macam variasi query dengan baik.
Perancangan Semantic Similarity Based On Word Thesaurus Menggunakan Pengukuran Omiotis Untuk Pencarian Aplikasi Pada I-gracias Akip Maulana; Moch. Arif Bijaksana; Mohamad Syahrul Mubarok
eProceedings of Engineering Vol 3, No 2 (2016): Agustus, 2016
Publisher : eProceedings of Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Proses pencarian dengan cara konvensional akan membuat pengguna I-GRACIAS bingung apabila keyword yang dimasukkan memiliki ejaan kata yang berbeda dengan nama aplikasi yang ada. Semantic similarity adalah suatu pen- dekatan untuk menangani pencarian dengan mengandalkan nilai keterhubungan antar-term yang dibentuk dari Word- net. Pendekatan semantic similarity yang digunakan adalah Path-based dengan Wu and Palmer (WUP) sebagai metode perhitungan semantic similarity. Omiotis merupakan metode yang ditujukan untuk mengukur derajat relevansi antar- dokumen. Terdapat dua komponen utama dari perhitungan Omiotis. Komponen tersebut adalah lexical relevance dan semantic similarity. Dengan demikian, proses pencarian yang awalnya menggunakan cara konvensional diubah den- gan pendekatan Semantic Textual Similarity (STS). Oleh karena itu, pada tugas akhir ini akan digunakan pengukuran Omiotis untuk menghitung kemiripan antar-dokumen dengan menggunakan pendekatan Path-based sebagai metode semantic similairty, yang mana masih memiliki ketergantungan dengan Wordnet. Sehingga mampu membantu menan- gani masalah pencarian aplikasi di I-GRACIAS. Kata Kunci: Semantic Similarity, Lexycal Relevance, Omiotis, PairingWord, Wordnet.