Analisis Perbandingan Clustering-Based, Distance-Based dan Density-Based dalam Mendeteksi Outlier Dedy Handriyadi; M. Arif Bijaksana; Erwin Budi Setiawan
Seminar Nasional Aplikasi Teknologi Informasi (SNATI) 2009
Publisher : Jurusan Teknik Informatika, Fakultas Teknologi Industri, Universitas Islam Indonesia

Data Mining adalah proses pencarian pola-pola dan kecenderungan yang menarik dari dalam basis data berukuranbesar. Sebuah outlier didefinisikan sebagai sebuah titik data pada suatu data set dimana sangat berbedadibandingkan dengan titik data pada data set pada umumnya dengan suatu ukuran tertentu. Outlier ini walaupunmempunyai kelakuan yang abnormal, seringkali mengandung informasi yang sangat berguna. Permasalahan deteksioutlier ini mempunyai peran yang sangat penting pada aplikasi deteksi kecurangan, analisis kekuatan jaringan dandeteksi intrusi. Pencarian outlier biasanya dengan konsep keterdekatan berdasarkan hubungannya dengan sisa datayang ada. Pada data berdimensi tinggi, kepadatan data akan semakin berkurang, akibatnya dugaan akanketerdekatan antar data menjadi gagal.Pada makalah ini akan dilakukan perbandingan metode dalam pencariansuatu outlier dalam data berdimensi tinggi. Metode yang akan dibandingkan yaitu: Clustering-based, Distancebased,dan Density-based. Dimana masing-masing metode telah mendukung data berdimensi tinggi.Kata Kunci : data mining, outlier, deteksi outlier, metode deteksi outlier.
Klusterisasi Dokumen Berita Berbahasa Indonesia Menggunakan Document Index Graph Sari Ernawati; Arie Ardiyanti; Erwin Budi Setiawan
Seminar Nasional Aplikasi Teknologi Informasi (SNATI) 2009
Publisher : Jurusan Teknik Informatika, Fakultas Teknologi Industri, Universitas Islam Indonesia

Berita elektronik merupakan media informasi yang paling populer dan interaktif saat ini. Begitu interaktifnya,hingga perkembangannya cukup pesat. Terbukti bertambah banyaknya situs perusahaan maupun situs personal,yang berarti semakin meningkatkan jumlah informasi dan data. Peningkatan yang pesat ini juga dipacu olehpenggunaan internet yang semakin berkembang dibandingkan era sebelumnya. Sebagai akibatnya, jumlahinformasi meningkat secara eksponensial. Banyaknya data yang ada, semestinya dapat memberikan manfaatyang banyak pula. Clustering merupakan salah satu metode untuk pengelompokan dokumen dengan menemukanketerkaitan antardokumen. Saat ini, kebanyakan metode klusterisasi hanya mengandalkan perhitungankesamaan berdasarkan kata dan tidak memperhatikan aspek lain, misalnya kesamaan frasa, misalnya VectorSpace Model. Pada makalah ini berusaha mengklusterkan dokumen dengan metode Document Index Graphyang menggunakan kombinasi dua kesamaan dokumen yaitu; kesamaan berbasis kata dan kesamaan berbasisfrasa. Metode ini diuji coba dengan menggunakan sampel berita berbahasa Indonesia dari media massaberbasis web. Pemilihan fragmentation factor dan similarity threshold yang tepat akan meningkatkan kualitaskluster. Hasil klusterisasi dievaluasi berdasarkan nilai precision dan recall.Kata Kunci: clustering, Document Index Graph, fragmentation factor, similarity threshold.
Aplikasi Pelayanan Administrasi Penduduk Desa Berbasis Web Programing Yuliant Sibaroni; Erwin Budi Setiawan; Mahmud Imrona; Feby Ali Dzuhri
Seminar Nasional Aplikasi Teknologi Informasi (SNATI) 2015
Publisher : Jurusan Teknik Informatika, Fakultas Teknologi Industri, Universitas Islam Indonesia

Permasalahan yang dihadapi olehinstansi pemerintahan dalam lingkup desa salahsatunya adalah proses pembuatan surat menyuratyang masih manual, dimana hal ini berdampakterhadap pelayanan yang kurang maksimal terhadappenduduknya. Penggunaan aplikasi microsoft officedalam pembuatan surat memiliki beberapakelemahan utama seperti ketergantungan terhadapkemampuan aparatur desa yang tinggi yangberakibat format surat menjadi berubah-ubah sertarawan terhadap keakuratan pencatatan data suratyang telah dibuat. Disisi lain, umumnya kemampuanIT aparatur desa adalah dibawah rata-rata dibandingtenaga administrasi lainnya sehingga penggunaanaplikasi khusus pelayanan administrasi persuratanmenjadi mutlak dibutuhkan. Adanya aplikasi suratmenyurat desa berbasis web programing inidiharapkan dapat menyelesaikan permasalahan yangsedang dihadapi oleh setiap desa dalam melakukanpelayanan administrasi persuratan yang lebih baik.Manfaat lainnya adalah dapat membantu dalampencatatan data persuratan yang ada sehingga akanmembantu desa untuk melihat potensi desa secaralebih jelas, pelayanan administrasi yang dirasakanmasyarakat menjadi lebih baik dan transparan sertapraktek-praktek KKN dalam pembuatan surat didesa menjadi berkurang
The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector Machine Isep Mumu Mubaroq; Erwin Budi Setiawan
Indonesia Journal on Computing (Indo-JC) Vol. 5 No. 2 (2020): September, 2020
Publisher : School of Computing, Telkom University

Nowadays social media twitter is popular media for news dissemination. News has elements that can be distinguished types of news, such as hoax that has elements of panic, worry, and anxiety that can have a significant impact in various fields of social, economic, educational, and political. Hoax prevention efforts need as possible before news viral, by to be developed method with functions to identify and hoax analyze. in this research we have proposed an approach Machine Learning with method Support Vector Machine (SVM) supported by feature selection Information Gain (IG) added Term Frequency–Inverse Document Frequency (TF-IDF) for word weighting system performance is very optimal in increasing accuracy by 37,51%, with accuracy reaching 96.55%.
Implementation Information Gain Feature Selection for Hoax News Detection on Twitter using Convolutional Neural Network (CNN) Husnul Khotimah Farid; Erwin Budi Setiawan; Isman Kurniawan
Indonesia Journal on Computing (Indo-JC) Vol. 5 No. 3 (2020): December, 2020
Publisher : School of Computing, Telkom University

The development of information and communication technology is currently increased, especially related to social media. Nowadays, many people get information through social media, especially Twitter, because of its easy access and it doesn't cost much. However, it has a negative impact in the form of spreading fake news or hoaxes that are difficult to detect. In this research, the authors developed a hoax news detection model using the Convolutional Neural Network and the TF-IDF weighting method. Feature selection is performed using Information Gain with various features, such as unigram, bigram, trigram and a combination of the three. Testing is done with 3 scenarios, classification, classification by weighting, classification by weighting and feature selection. The parameter used in the information gain feature selection is the threshold 0.8. The results showed that the classification by weighting and feature selection produced the highest accuracy that is equal to 95.56% on the unigram + bigram features with a comparison of training data and test data 50:50.
Sistem Deteksi Hoax pada Twitter dengan Metode Klasifikasi Feed-Forward dan Back-Propagation Neural Networks Crisanadenta Wintang Kencana; Erwin Budi Setiawan; Isman Kurniawan
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 4 No 4 (2020): Agustus 2020
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Social media is one of the ways to connect every individual in the world. It also used by irresponsible people to spread a hoax. Hoax is false news that is made as if it is true. It may cause anxiety and panic in society. It can affect the social and political conditions. This era, the most popular social media is Twitter. It is a place for sharing information and users around the world can share and receive news in short messages or called tweet. Hoax detection gained significant interest in the last decade. Existing hoax detection methods are based on either news-content or social-context using user-based features. In this study, we present a hoax detection based on FF & BP neural networks. In the developing of it, we used two vectorization methods, TF-IDF and Word2Vec. Our model is designed to automatically learn features for hoax news classification through several hidden layers built into the neural network. The neural network is actually using the ability of the human brain that is able to provide stimulation, process, and output. It works by the neuron to process every information that enters, then is processed through a network connection, and will continue learning to produce abilities to do classification. Our proposed model would be helpful to provide a better solution for hoax detection. Data collection obtained through crawling used Twitter API and retrieve data according to the keywords and hashtags. The neural networks highest accuracy obtained using TF-IDF by 78.76%. We also found that data quality affects the performance.
Implementasi Deteksi Rumor di Twitter Menggunakan Algoritma J48 Yoan Maria Vianny; Erwin Budi Setiawan
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 4 No 5 (2020): Oktober 2020
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

The existence of rumors on Twitter has caused a lot of unrest among Indonesians. Unrecognized validity confuses users for that information. In this study, an Indonesian rumor detection system is built by using J48 Algorithm in collaboration with Term Frequency Inverse Document Frequency (TF-IDF) weighting method. Dataset contains 47.449 tweets that have been manually labeled. This study offers new features, namely the number of emoticons in display name, the number of digits in display name, and the number of digits in username. These three new features are used to maximize information about information sources. The highest accuracy is obtained by 75.76% using 90% training data and 1.000 TF-IDF features in 1-gram to 3-gram combinations.
Identifikasi Berita Palsu (Hoax) pada Media Sosial Twitter dengan Metode Decision Tree C4.5 Brenda Irena; Erwin Budi Setiawan
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 4 No 4 (2020): Agustus 2020
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Social media is a means to communicate and exchange information between people, and Twitter is one of them. But the information disseminated is not entirely true, but there is some news that is not in accordance with the truth or often called hoaxes. There have been many cases of spreading hoaxes that cause concern and often harm a particular individual or group. So in this research, the authors build a system to identify hoax news on social media Twitter using the Decision Tree C4.5 classification method to the 50,610 tweet data. What distinguishes this research from some researches before is the existence of several test scenarios, classification only, classification using weighting feature, and also classification using weighting feature and feature selection. The weighting method used is TF-IDF, and the feature selection uses Information Gain. The features used are also generated using n-grams consisting of unigram, bigram, and also trigrams. The final results show that the classification test that uses weighting feature and feature selection produces the best accuracy of 72.91% with a ratio of 90% training data and 10% test data (90:10) and the number of features used is 5000 in unigram features.
Semantic Approach for Big Five Personality Prediction on Twitter Ghina Dwi Salsabila; Erwin Budi Setiawan
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 5 No 4 (2021): Agustus 2021
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Personality provides a deep insight of someone and has an important part in someone’s job performance. Predicting personality through social media has been studied on several research. The problem is how to improve the performance of personality prediction system. The purpose of this research is to predict personality on Twitter users and increase the performance of the personality prediction system. An online survey using Big Five Inventory (BFI) questionnaire has been distributed and gathered 295 Twitter users with 511,617 tweets data. In this research, we experiment on two different methods using Support Vector Machine (SVM), and the combination of SVM and BERT as the semantic approach. This research also implements Linguistic Inquiry Word Count (LIWC) as the linguistic feature for personality prediction system. The results showed that combination of these two methods achieve 79.35% accuracy score and with the implementation of LIWC can improve the accuracy score up to 80.07%. Overall, these results showed that the combination of SVM and BERT as the semantic approach with the implementation of LIWC is recommended to gain a better performance for the personality prediction system.
Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis Naufal Adi Nugroho; Erwin Budi Setiawan
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 5 No 5 (2021): Oktober2021
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Abstract Twitter is a microblog-based social media site launched on July 13, 2006. In March 2020, 476.696 tweets about the government policy in COVID-19 spread on Twitter were captured by the Institute for Development of Economics and Finance (Indef). Government policy has a standard meaning, namely a decision systematically made by the government with specific goals and objectives relating to the public interest, whether carried out directly or indirectly. Sentiment analysis analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. In this decade, Sentiment Analysis is has become a trendy research area. The purpose of this paper is to focus how to implement word2vec using similarity word as a feature expansion for minimize the vocabulary mismatch in Twitter Sentiment Analysis using “word embeddings”. This research contains 11.395 tweets for a dataset, where the dataset will be used in two classifications: Support Vector Machine Algorithm and Artificial Neural Network Algorithm. The output of Word2Vec will be used for feature expansion in this research, where the algorithm of expansion will check in each row in the corpus where has a similarity vector with that word and will replace the word with the similarity of this words if the value is 0. The dataset in Feature Expansion is using 142.545 articles from Indonesian media. The result of this research is ANN is better than SVM, where the ANN without feature expansion gets 68.89 % and using feature expansion gets 72.58 %. For SVM, the final accuracy without feature expansion is 63.95 %, and using feature expansion gets 68.56 %. This research proves that feature expansion can improve the final accuracy.
