EDUMATIC: Jurnal Pendidikan Informatika
Vol 6, No 1 (2022): Edumatic: Jurnal Pendidikan Informatika

Klasifikasi Teks menggunakan Genetic Programming dengan Implementasi Web Scraping dan Map Reduce

Wirarama Wedashwara (Program Studi Teknik Informatika, Universitas Mataram)
Andy Hidayat (Program Studi Teknik Informatika, Universitas Mataram)
Budi Irmawati (Program Studi Teknik Informatika, Universitas Mataram)
Ariyan Zubaidi (Program Studi Teknik Informatika, Universitas Mataram)



Article Info

Publish Date
19 Jun 2022

Abstract

Classification of text documents on online media is a big data problem and requires automation. Research has developed a text classification system with pre-processing using map-reduce and web scraping data collection. This study aims to evaluate text classification performance by combining genetic programming algorithms, map-reduce and web scraping for processing large data in the form of text. Data collection was carried out by observing web-based scraping. Data was collected by reducing 8126 duplicates. Map-reduce has tokenized and stopped-word removal with 28507 terms with 4306 unique terms and 24201 duplication terms. Text classification evaluation shows that a single tree produces better accuracy (0.7072) than a decision tree (0.6874), and the lowest is a multi-tree (0.6726). For the acquisition of genetic programming support values with the multi-tree, the highest average support is 0.3854, followed by the decision tree with 0.3584 and the smallest single tree with 0.3494. In general, the amount of support is not in line with the accuracy value achieved.

Copyrights © 2022






Journal Info

Abbrev

edumatic

Publisher

Subject

Computer Science & IT Education

Description

EDUMATIC: Jurnal Pendidikan Informatika (e-ISSN: 2549-7472) adalah jurnal ilmiah bidang pendidikan informatika yang diterbitkan oleh Universitas Hamzanwadi dua kali setahun yaitu pada bulan Juni dan Desember. Adapun fokus dan skup jurnal ini adalah (1) Komputer dan Informatika dalam Pendidikan; (2) ...