Journal of Applied Data Sciences
Vol 5, No 1: JANUARY 2024

A Comparative Study on Data Collection Methods: Investigating Optimal Datasets for Data Mining Analysis

Hendra Jatnika (Informatics Engineering, PLN Institute of Technology, Jakarta 11750)
Ari Waluyo (Electronics Engineering, Piksi Ganesha Polytechnic, Bandung 40247)
Abdul Azis (Information System, AmikomPurwokerto University, Purwokerto 53127)



Article Info

Publish Date
29 Jan 2024

Abstract

This study is dedicated to evaluating the efficiency of diverse data collection methods in obtaining optimal data for computational data mining. The investigation meticulously compares the questionnaire and web mining methodologies within the framework of SVM and NBC algorithms to discern the flexibility inherent in each data type. The outcomes of this comprehensive analysis demonstrate that questionnaires showcase remarkable flexibility, exhibiting accuracy rates surpassing 80% in both algorithms, along with AUC values exceeding 0.9 when contrasted with data acquired through web mining techniques. These results underscore the paramount importance of the dataset collection method in the realm of computational data mining. The study contributes compelling evidence that advocates for the superiority of the questionnaire data collection method over web mining in the specific context of computational data mining. The questionnaire method not only outperforms in terms of flexibility but also achieves high accuracy, making it a more reliable choice for acquiring data in this domain. Beyond its practical implications, the research highlights a critical aspect of methodology in data collection by emphasizing the necessity of exploring and assessing methods that may have been overlooked in previous research endeavors. This underscores the continuous evolution of research methodologies and the need for ongoing exploration to enhance the robustness and effectiveness of data collection in computational data mining studies.   

Copyrights © 2024






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...