Journal of Intelligent Systems
Vol 1, No 1 (2015)

Two-Step Cluster based Feature Discretization of Naive Bayes for Outlier Detection in Intrinsic Plagiarism Detection

Wijaya, Adi ( STMIK Eresha)
Wahono, Romi Satria ( Dian Nuswantoro University)



Article Info

Publish Date
18 Feb 2015

Abstract

Intrinsic plagiarism detection is the task of analyzing a document with respect to undeclared changes in writing style which treated as outliers. Naive Bayes is often used to outlier detection. However, Naive Bayes has assumption that the values of continuous feature are normally distributed where this condition is strongly violated that caused low classification performance. Discretization of continuous feature can improve the performance of Naïve Bayes. In this study, feature discretization based on Two-Step Cluster for Naïve Bayes has been proposed. The proposed method using tf-idf and query language model as feature creator and False Positive/False Negative (FP/FN) threshold which aims to improve the accuracy and evaluated using PAN PC 2009 dataset. The result indicated that the proposed method with discrete feature outperform the result from continuous feature for all evaluation, such as recall, precision, f-measure and accuracy. The using of FP/FN threshold affects the result as well since it can decrease FP and FN; thus, increase all evaluation.

Copyrights © 2015






Journal Info

Abbrev

JIS

Publisher

Subject

Computer Science & IT

Description

Journal of Intelligent Systems adalah jurnal ilmiah berkala yang memuat hasil penelitian pada bidang komputasi dan sistem cerdas dari aspek teori, praktis maupun aplikasi. Jurnal ini akan mempublikasikan makalah orisinal baik makalah technical maupun makalah survei atau review perkembangan terakhir ...