Vandha Pradwiyasma Widartha
Telkom University, Bandung, Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Text Classification Using Genetic Programming with Implementation of Map Reduce and Scraping Wirarama Wedashwara; Budi Irmawati; Heri Wijayanto; I Wayan Agus Arimbawa; Vandha Pradwiyasma Widartha
JOIV : International Journal on Informatics Visualization Vol 7, No 2 (2023)
Publisher : Politeknik Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.2.1813

Abstract

Classification of text documents on online media is a big data problem and requires automation. Text classification accuracy can decrease if there are many ambiguous terms between classes. Hadoop Map Reduce is a parallel processing framework for big data that has been widely used for text processing on big data. The study presented text classification using genetic programming by pre-processing text using Hadoop map-reduce and collecting data using web scraping. Genetic programming is used to perform association rule mining (ARM) before text classification to analyze big data patterns. The data used are articles from science-direct with the three keywords. This study aims to perform text classification with ARM-based data pattern analysis and data collection system through web-scraping, pre-processing using map-reduce, and text classification using genetic programming. Through web scraping, data has been collected by reducing duplicates as much as 17718. Map-reduce has tokenized and stopped-word removal with 36639 terms with 5189 unique terms and 31450 common terms. Evaluation of ARM with different amounts of multi-tree data can produce more and longer rules and better support. The multi-tree also produces more specific rules and better ARM performance than a single tree. Text classification evaluation shows that a single tree produces better accuracy (0.7042) than a decision tree (0.6892), and the lowest is a multi-tree(0.6754). The evaluation also shows that the ARM results are not in line with the classification results, where a multi-tree shows the best result (0.3904) from the decision tree (0.3588), and the lowest is a single tree (0.356).
Students Demography Clustering Based on The ICFL Program Using K-Means Algorithm Rachmadita Andreswari; Rokhman Fauzi; Berlian Maulidya Izzati; Vandha Pradwiyasma Widartha; Dita Pramesti
JOIV : International Journal on Informatics Visualization Vol 7, No 2 (2023)
Publisher : Politeknik Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.2.1916

Abstract

Independent Campus, Freedom to Learn (ICFL) Program is one of the manifestations of student-centered learning. This program can help students reach their full potential by allowing them to pursue their passions and talents. This study aims to see how the segmentation of students participating in the ICFL program is based on demographic data. This research is based on survey responses from students participating in the ICFL program. The method used in this study is input data preparation, pre-processing, data cleansing, and data analysis. The information will be pre-processed before being utilized and evaluated. To help produce better outcomes in data clustering, the K-Means clustering approach is used, which is processed using the Python computer language. The data is clustered using the K-Means clustering approach based on gender characteristics, Grade Point Average (GPA), university entrance selection, ICFL category, and year or semester when participating in ICFL. This study resulted in three clusters with each of its criteria. The dominant gender is found in clusters 2 (100% female) and 3 (100% male). Software Development was the most popular ICFL category among students in cluster 1, accounting for 67%, while Design and Analysis Information Systems was the most popular in clusters 2 and 3. The most dominant ICFL program is found in three clusters. ICFL - Internship program in which at least 40% of participants come from each cluster. The research results are expected to assist stakeholders in evaluating the implementation of the ICFL program.