cover
Contact Name
Teuku Rizky Noviandy
Contact Email
trizkynoviandy@gmail.com
Phone
+6282275731976
Journal Mail Official
editorial-office@heca-analitika.com
Editorial Address
Jl. Makam T. Nyak Arief Kompleks BUPERTA Blok L7B, Lamgapang, Aceh Besar, Provinsi Aceh
Location
Kab. aceh besar,
Aceh
INDONESIA
Infolitika Journal of Data Science
ISSN : -     EISSN : 30258618     DOI : https://doi.org/10.60084/ijds
Infolitika Journal of Data Science is a distinguished international scientific journal that showcases high caliber original research articles and comprehensive review papers in the field of data science. The journals core mission is to stimulate interdisciplinary research collaboration, facilitate the exchange of knowledge, and drive the advancement and application of innovative strategies within the data science domain. Topics of this journal includes, but not limited to Data Mining and Analysis, Machine Learning and Artificial Intelligence, Big Data and Data Engineering, Predictive Modeling and Forecasting, Natural Language Processing, Computer Vision, Data Visualization and Interpretation, Ethics and Privacy in Data Science, Applications of Data Science, Interdisciplinary Approaches
Articles 10 Documents
Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm Aga Maulana; Farassa Rani Faisal; Teuku Rizky Noviandy; Tatsa Rizkia; Ghazi Mauer Idroes; Trina Ekawati Tallei; Mohamed El-Shazly; Rinaldi Idroes
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.72

Abstract

Diabetes is a chronic condition characterized by elevated blood glucose levels which leads to organ dysfunction and an increased risk of premature death. The global prevalence of diabetes has been rising, necessitating an accurate and timely diagnosis to achieve the most effective management. Recent advancements in the field of machine learning have opened new possibilities for improving diabetes detection and management. In this study, we propose a fine-tuned XGBoost model for diabetes detection. We use the Pima Indian Diabetes dataset and employ a random search for hyperparameter tuning. The fine-tuned XGBoost model is compared with six other popular machine learning models and achieves the highest performance in accuracy, precision, sensitivity, and F1-score. This study demonstrates the potential of the fine-tuned XGBoost model as a robust and efficient tool for diabetes detection. The insights of this study advance medical diagnostics for efficient and personalized management of diabetes.
ANFIS-Based QSRR Modelling for Kovats Retention Index Prediction in Gas Chromatography Rinaldi Idroes; Teuku Rizky Noviandy; Aga Maulana; Rivansyah Suhendra; Novi Reandy Sasmita; Muslem Muslem; Ghazi Mauer Idroes; Raudhatul Jannah; Razief Perucha Fauzie Afidh; Irvanizam Irvanizam
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.73

Abstract

This study aims to evaluate the implementation and effectiveness of the Adaptive Neuro-Fuzzy Inference System (ANFIS) based Quantitative Structure Retention Relationship (QSRR) to predict the Kovats retention index of compounds in gas chromatography. The model was trained using 340 essential oil compounds and their molecular descriptors. The evaluation of the ANFIS models revealed promising results, achieving an R2 of 0.974, an RMSE of 48.12, and an MAPE of 3.3% on the testing set. These findings highlight the ANFIS approach as remarkably accurate in its predictive capacity for determining the Kovats retention index in the context of gas chromatography. This study provides valuable perspectives on the efficiency of retention index prediction through ANFIS-based QSRR methods and the potential practicality in compound analysis and chromatographic optimization.
An Implementation of Hybrid CNN-XGBoost Method for Leukemia Detection Problem Taufiq Hidayat; Edrian Hadinata; Irfan Sudahri Damanik; Zakial Vikki; Irvanizam Irvanizam
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.87

Abstract

Leukemia is a blood cancer in which blood cells become malignant and uncontrolled. It can cause damage to the function of the body's organs. Several machine learning methods have been used to automatically detect biomedical images, including blood cell images. In this study, we utilized a hybrid machine learning method, called a hybrid Convolutional Neural Network-eXtreme Gradient Boosting (CNN-XGBoost) method to detect leukemia in blood cells. The hybrid method combines two machine learning methods. We use CNN as the basic classifier and XGBoost as the main classification method. The aim of this methodology was to assess whether incorporating the basic classification method would lead to an enhancement in the performance of the main classification model. The experimental findings demonstrated that the utilization of XGBoost as the main classifier led to a marginal increase in accuracy, elevating it from 85.32% to 85.43% compared to the basic CNN classification. This research highlights the potential of hybrid machine learning approaches in biomedical image analysis and their role in advancing the early diagnosis of leukemia and potentially other medical conditions.
Maternal and Child Healthcare Services in Aceh Province, Indonesia: A Correlation and Clustering Analysis in Statistics Novi Reandy Sasmita; Siti Ramadeska; Reksi Utami; Zuhra Adha; Ulayya Putri; Risky Haezah Syarafina; La Ode Reskiaddin; Saiful Kamal; Yarmaliza Yarmaliza; Muliadi Muliadi; Arif Saputra
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.88

Abstract

Infant mortality remains a public health problem in Aceh Province, Indonesia. Health services during pregnancy are an essential factor in reducing infant mortality. Studies examining factors such as maternal and child health services that have implications for infant mortality in Aceh province are still scarce. Therefore, this study aims to examine the correlation between maternal and child health services variables such as Blood-Supplementing Tablets (TTD), Coverage of the First Visit of Pregnant Women (K1), Coverage of the First Visit of Pregnant Women (K4), and management of Obstetric Complications to live births and to map the maternal and child health services obtained during pregnancy. A cross-sectional study was used as the research study. This study used descriptive statistics, such as measures of data centering and data dispersion. In this work, inferential statistical analysis was conducted using the Shapiro-Wilk test, Spearman test, and fuzzy c-means. The result of the Shapiro Wilk test stated that the live birth rate variable and all Maternal and Child Healthcare Services variables were not normally distributed (p-value < 0.05), all Maternal and Child Healthcare Services variables were positively correlated to live birth rate based on the Spearman test (p-value < 0.05). Based on the Silhouette Index with 0.555, the formation of 3 clusters is the optimal cluster. The clustering is based on the Maternal and Child Healthcare Services that have been provided, where the first, second, and third clusters consist of five districts/city, eight districts/city, and ten districts/city, respectively, as a result of Fuzzy C-Means Clustering.
Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review Teuku Rizky Noviandy; Aga Maulana; Ghazi Mauer Idroes; Talha Bin Emran; Trina Ekawati Tallei; Zuchra Helwani; Rinaldi Idroes
Infolitika Journal of Data Science Vol. 1 No. 1 (2023): September 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i1.91

Abstract

This comprehensive review explores the pivotal role of ensemble machine learning techniques in Quantitative Structure-Activity Relationship (QSAR) modeling for drug discovery. It emphasizes the significance of accurate QSAR models in streamlining candidate compound selection and highlights how ensemble methods, including AdaBoost, Gradient Boosting, Random Forest, Extra Trees, XGBoost, LightGBM, and CatBoost, effectively address challenges such as overfitting and noisy data. The review presents recent applications of ensemble learning in both classification and regression tasks within QSAR, showcasing the exceptional predictive accuracy of these techniques across diverse datasets and target properties. It also discusses the key challenges and considerations in ensemble QSAR modeling, including data quality, model selection, computational resources, and overfitting. The review outlines future directions in ensemble QSAR modeling, including the integration of multi-modal data, explainability, handling imbalanced data, automation, and personalized medicine applications while emphasizing the need for ethical and regulatory guidelines in this evolving field.
Enhancing the Red Wine Quality Classification Using Ensemble Voting Classifiers Deny Joefakri Iwa Supriatna; Huzair Saputra; Khaidir Hasan
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.95

Abstract

This study introduces an ensemble voting classifier for red wine quality classification using machine learning algorithms. Wine quality assessment, traditionally reliant on subjective expert evaluations, is addressed through data-driven methodologies. The dataset comprises physicochemical attributes and quality ratings of red wines. Results reveal individual models with accuracy ranging from 0.816 to 0.873, while the ensemble approach significantly enhances accuracy. The combination of Random Forest and XGBoost achieves an accuracy of 0.885, demonstrating its potential in red wine quality assessment. In conclusion, this study showcases the potential of machine learning in enhancing the classification of red wine quality, offering a more objective and precise alternative to traditional sensory evaluation. The ensemble voting classifier, especially when combining Random Forest and XGBoost, provides a robust solution for this task, improving the accuracy of wine quality assessments.
Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach Teuku Rizky Noviandy; Sarah Ika Nainggolan; Raihan Raihan; Isra Firmansyah; Rinaldi Idroes
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.123

Abstract

Maternal health risk detection is crucial for reducing morbidity and mortality among pregnant women. In this study, we employed the Light Gradient Boosting Machine (LightGBM) model to identify risk levels using data from rural healthcare facilities. The dataset included key health indicators aligned with the United Nations Sustainable Development Goals. The LightGBM model underwent rigorous optimization through hyperparameter tuning and 10-fold cross-validation. Its predictive performance was benchmarked against other algorithms using accuracy, precision, recall, and F1-score, with feature importance assessed to identify critical risk predictors. The LightGBM model demonstrating the highest performance across all metrics. The results underscore the value of advanced machine learning techniques in public health. Future research directions include expanding the demographic scope, incorporating temporal data, and enhancing model transparency. This study highlights the transformative potential of machine learning in maternal healthcare, providing a foundation for improved risk detection and proactive healthcare interventions.
A Statistical Clustering Approach: Mapping Population Indicators Through Probabilistic Analysis in Aceh Province, Indonesia Novi Reandy Sasmita; Moh Khairul; Hizir Sofyan; Rumaisa Kruba; Selvi Mardalena; Arriz Dahlawy; Feby Apriliansyah; Muliadi Muliadi; Dimas Chaerul Ekty Saputra; Teuku Rizky Noviandy; Ahmad Watsiq Maula
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.130

Abstract

The clustering, one of statistical analysis, can be used for understanding population patterns and as a basis for more targeted policy making. In this ecological study, we explored the population dynamics across 23 districts/cities in Aceh Province. The study used the Aceh Population Development Profile Year 2022 data, focusing on the total population, in-migrants, out-migrants, fertility, and maternal mortality as variables. The study employed descriptive statistics to ascertain the data distribution, followed by the Shapiro-Wilk test to evaluate normality, which is crucial for selecting the appropriate statistical methods. The Spearman test was used to determine correlations between the total population and the variable as indicators. Probabilistic Fuzzy C-Means (PFCM) method is used for clustering. To optimize clustering, the silhouette coefficient was calculated using the Euclidean Distance and the elbow method, with the results analyzed using R-4.3.2 software. This study's design and methods aim to provide a nuanced understanding of demographic patterns for targeted policy-making and regional development in Aceh, Indonesia. Based on the data normality test results, only fertility (p-value = 0.45), while the other variables are not normally distributed. Spearman test was used, and the results showed that only in-migrants (p-value = 1.78 x 10-6) and out-migrants (p-value = 2.30 x 10-6) correlated to the Aceh Province population. Using the population variable and the two variables associated with it, it was found that 4 is the best optimum number of clusters, where clusters 1, 2, 3, and 4 consist of three districts/city, nine districts/city, four districts/city and seven districts/city respectively.
Cardiovascular Disease Prediction Using Gradient Boosting Classifier Rivansyah Suhendra; Noviana Husdayanti; Suryadi Suryadi; Ilham Juliwardi; Sanusi Sanusi; Abdurrahman Ridho; Muhammad Ardiansyah; Murhaban Murhaban; Ikhsan Ikhsan
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.131

Abstract

Cardiovascular Disease (CVD), a prevalent global health concern involving heart and blood vessel disorders, prompts this research's focus on accurate prediction. This study explores the predictive capabilities of the Gradient Boosting Classifier (GBC) in cardiovascular disease across two datasets. Through meticulous data collection, preprocessing, and GBC classification, the study achieves a noteworthy accuracy of 97.63%, underscoring the GBC's effectiveness in accurate CVD detection. The robust performance of the GBC, evidenced by high accuracy, highlights its adaptability to diverse datasets and signifies its potential as a valuable tool for early identification of cardiovascular diseases. These findings provide valuable insights into the application of machine learning methodologies, particularly the GBC, in advancing the accuracy of CVD prediction, with implications for proactive healthcare interventions and improved patient outcomes.
Unraveling Geospatial Determinants: Robust Geographically Weighted Regression Analysis of Maternal Mortality in Indonesia Latifah Rahayu; Elvitra Mutia Ulfa; Novi Reandy Sasmita; Hizir Sofyan; Rumaisa Kruba; Selvi Mardalena; Arif Saputra
Infolitika Journal of Data Science Vol. 1 No. 2 (2023): December 2023
Publisher : Heca Sentra Analitika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.60084/ijds.v1i2.133

Abstract

Maternal Mortality Rate (MMR) in Indonesia has experienced a concerning annual increase, reaching 4,627 deaths in 2020 compared to 4,221 in 2019. This upward trajectory underscores the urgency of investigating the factors contributing to MMR. Recognizing the spatial heterogeneity and outliers in the data, our study employs the Robust Geographically Weighted Regression (RGWR) method with the Least Absolute Deviation approach. Using secondary data from the 2020 Indonesian Health Profile publication, the research seeks to establish province-specific models for MMR in 2020 and identify the key influencing factors in each region. Standard regression analyses fall short in addressing the complexities present in the data, making the RGWR approach crucial for understanding the nuanced relationships. The chosen RGWR model utilizes the Least Absolute Deviation method and a fixed kernel exponential weighting function. Notably, this model maintains a consistent bandwidth value across all locations, showcasing its robustness. In evaluating the model variations, the exponential fixed kernel weighting function emerges as the most optimal, boasting the smallest Akaike Information Criterion (AIC) value of 23.990 and the highest coefficient of determination  value of 93.66%. The outcomes of this research yield 24 distinct models, each tailored to the unique characteristics of every province in Indonesia. This nuanced, location-specific approach is vital for developing effective interventions and policies to address the persistently high MMR. By providing insights into the complex interplay of factors influencing maternal mortality in different regions, the study contributes to the groundwork for targeted and impactful public health initiatives across Indonesia.

Page 1 of 1 | Total Record : 10