Agus M Soleh
Department of Statistics, IPB University, Indonesia

Published : 9 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 9 Documents
Search

PEMODELAN CLUSTERWISE REGRESSION PADA STATISTICAL DOWNSCALING UNTUK PENDUGAAN CURAH HUJAN BULANAN Victor Pandapotan Butar-butar; Agus M Soleh; Aji H Wigena
Indonesian Journal of Statistics and Applications Vol 3 No 3 (2019)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (577.527 KB) | DOI: 10.29244/ijsa.v3i3.310

Abstract

Statistical downscaling (SDS) is one of the developing models for rainfall estimation. The SDS model is a regression model used to analyze the relation of global (GCM output) and local data (rainfall). Rainfall has large variance so that clustering is needed to minimize the variance. One of the analytical methods that can be used in clustering rainfall estimation is cluster wise regression. There are three Methods for Clusterwise regression namely Linear Regresion, Finite Mixture Method (FMM) and Cluster-Weighted Method (CWM). This study used GCM outputs data namely CFRSv2 as a covariate. The response variable is rainfall data in four stations such as Bandung, Bogor, Citeko and Jatiwangi from BMKG. The purpose of this study is to increase the accuracy of rainfall estimation using the three methods and compare the clusterwise regression with PCR and PLS models. Based on the value of RMSEP, the clusterwise regression with FMM was the best method to estimate rainfall in four stations.
KAJIAN PENGARUH PENAMBAHAN INFORMASI GEROMBOL TERHADAP PREDIKSI AREA NIRCONTOH PADA DATA BINOMIAL Beny Trianjaya; Anang Kurnia; Agus M Soleh
Indonesian Journal of Statistics and Applications Vol 4 No 4 (2020)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v4i4.333

Abstract

Employment data is one of the important indicators related to the development progress of a country. Labor conditions in the territory of Indonesia can only be compared between times through the Survei Angkatan Kerja Nasional (Sakernas) data. Data generated from Sakernas and published by BPS is the number of employed and unemployed. The obstacle in estimating the semester unemployment rate at the regency/municipality level is the lack of a number of examples. One of the indirect estimates currently developing is small area estimation (SAE). This study developed the generalized linear mixed model (GLMM) by adding cluster information and examines the development of modifications with several model scenarios. The purpose of this study was to develop a prediction model for basic GLMM on a small area approach by adding cluster information as a fixed effect or random effect. The simulation results show that Model-2, a model that adds a fixed effect k-cluster and also adds a mean from the estimated effect of random areas in the sample area, is the best model with the smallest relative bias (RB) and Relative root mean squares error (RRMSE). This model is better than the basic GLMM model (Model-0) and Model-1 (a model which only adds a mean from the estimated random effect area in the sample area). Model-2 is applied to estimate the proportion of unemployed sub-district level in Southeast Sulawesi Province. Estimating the proportion of unemployed with calibration Model-2 produced an estimated aggregation of the unemployment proportion of Southeast Sulawesi Province at 0.0272. These results are similar to BPS (0.0272). Thus, the results of the estimated proportion of unemployment at the sub-district level with a calibration Model-2 can be said to be feasible to use.
Handling of Overdispersion in the Poisson Regression Model with Negative Binomial for the Number of New Cases of Leprosy in Java: Penanganan Overdispersi pada Model Regresi Poisson dengan Binomial Negatif untuk Jumlah Kasus Baru Kusta di Jawa Yopi Ariesia Ulfa; Agus M Soleh; Bagus Sartono
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p1-13

Abstract

Based on data from the Directorate General of Disease Prevention and Control of the Ministry of Health of the Republic of Indonesia, in 2017, new leprosy cases that emerged on Java Island were the highest in Indonesia compared to the number of events on other islands. The purpose of this study is to compare Poisson regression to a negative binomial regression model to be applied to the data on the number of new cases of leprosy and to find out what explanatory variables have a significant effect on the number of new cases of leprosy in Java. This study's results indicate that a negative binomial regression model can overcome the Poisson regression model's overdispersion. Variables that significantly affect the number of new cases of leprosy based on the results of negative binomial regression modeling are total population, percentage of children under five years who had immunized with BCG, and percentage of the population with sustainable access to clean water.
PENDUGAAN CURAH HUJAN DENGAN TEKNIK STATISTICAL DOWNSCALING MENGGUNAKAN CLUSTERWISE REGRESSION SEBARAN TWEEDIE Riza Indriani Rakhmalia; Agus M Soleh; Bagus Sartono
Indonesian Journal of Statistics and Applications Vol 4 No 3 (2020)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v4i3.667

Abstract

Rainfall prediction is one of the most challenging problems of the last century. Statistical Downscaling Technique is one of the rainfall estimation techniques that are often used. The goal of this paper is to develop the modeling of cluster-wise regression with rainfall data set that has Tweedie distribution. The data used in this paper were the precipitation from Climate Forecast System Reanalysis (CFSR) version 2 as the predictor variables and rainfall from BMKG as the response variable. Data were collected from January 2010 to December 2019 on the Bogor, Citeko, Jatiwangi, and Bandung rain posts. The best result of this study is a Cluster-wise Regression model with 4 clusters and using Tweedie distribution in each rain post. The best model was evaluated by the Root Mean Square Error Prediction. RMSEP value on Bogor rain post is 17.11 (three clusters), Citeko rain post 14.85 (two clusters), Jatiwangi rain post 15.26 (three clusters), and Bandung rain post 14.33 (two clusters). This model was able to make models and clusters well on daily rainfall application.
Nowcasting Indonesia’s GDP Growth Using Machine Learning Algorithms Nadya Dwi Muchisha; Novian Tamara; Andriansyah Andriansyah; Agus M Soleh
Indonesian Journal of Statistics and Applications Vol 5 No 2 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i2p355-368

Abstract

GDP is very important to be monitored in real time because of its usefulness for policy making. We built and compared the ML models to forecast real-time Indonesia's GDP growth. We used 18 variables that consist a number of quarterly macroeconomic and financial market statistics. We have evaluated the performance of six popular ML algorithms, such as Random Forest, LASSO, Ridge, Elastic Net, Neural Networks, and Support Vector Machines, in doing real-time forecast on GDP growth from 2013:Q3 to 2019:Q4 period. We used the RMSE, MAD, and Pearson correlation coefficient as measurements of forecast accuracy. The results showed that the performance of all these models outperformed AR (1) benchmark. The individual model that showed the best performance is random forest. To gain more accurate forecast result, we run forecast combination using equal weighting and lasso regression. The best model was obtained from forecast combination using lasso regression with selected ML models, which are Random Forest, Ridge, Support Vector Machine, and Neural Network.
Nowcasting Indonesia's GDP Growth Using Machine Learning Algorithms Nadya Dwi Muchisha; Novian Tamara; Andriansyah Andriansyah; Agus M Soleh
Indonesian Journal of Statistics and Applications Vol 5 No 2 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i2p355-368

Abstract

GDP is very important to be monitored in real time because of its usefulness for policy making. We built and compared the ML models to forecast real-time Indonesia's GDP growth. We used 18 variables that consist a number of quarterly macroeconomic and financial market statistics. We have evaluated the performance of six popular ML algorithms, such as Random Forest, LASSO, Ridge, Elastic Net, Neural Networks, and Support Vector Machines, in doing real-time forecast on GDP growth from 2013:Q3 to 2019:Q4 period. We used the RMSE, MAD, and Pearson correlation coefficient as measurements of forecast accuracy. The results showed that the performance of all these models outperformed AR (1) benchmark. The individual model that showed the best performance is random forest. To gain more accurate forecast result, we run forecast combination using equal weighting and lasso regression. The best model was obtained from forecast combination using lasso regression with selected ML models, which are Random Forest, Ridge, Support Vector Machine, and Neural Network.
Pemodelan Pola Produktivitas Cabai Rawit di Kabupaten Magelang Yohanes Purnama; Farit M Affendi; Agus M Soleh
Xplore: Journal of Statistics Vol. 10 No. 1 (2021)
Publisher : Department of Statistics, IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (282.296 KB) | DOI: 10.29244/xplore.v10i1.358

Abstract

The objective of this study was to determine the best model that describe the pattern of cayenne pepper productivity in Magelang Regency. This study uses primary data which was obtained from the results of a survey of cayenne pepper production by the General Director of Horticulture on several sample plots in Magelang District, Central Java Province in 2018. The process of data analysis was divided into two parts: grouping the sample plots based on the similarity in productivity pattern and then fitting models in each group. The models used to fit data were Logistic Growth Model, Monomolecular Growth Model, Exponential Growth Model, Polynomial Model and Linear B-Spline Model. The best model was determined based on R2 and MAPE. The results showed that the pattern of cayenne pepper productivity in Magelang District had eight different characteristics. Characteristics of each groups were illustrated by the similarity of their productivity pattern. The best model in each group was B-Spline Linear Model.
Identifikasi Faktor-Faktor yang Memengaruhi Prestasi Mahasiswa Menggunakan Regresi Logistik Ordinal dan Random Forest Ordinal: Studi Kasus Mahasiswa FMIPA IPB Angkatan 2015-2017 Zuhdiyah Izzatun Nisa'; Agus M Soleh; Hari Wijayanto
Xplore: Journal of Statistics Vol. 10 No. 1 (2021)
Publisher : Department of Statistics, IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (267.829 KB) | DOI: 10.29244/xplore.v10i1.465

Abstract

Student achievement is the result of student learning processes and efforts. This research was conducted through a survey of students of the 2015-2017 FMIPA IPB with the selection of respondents using stratified random sampling. The purpose of this study is to identify the factors that influence the achievements of the 2015-2017 FMIPA IPB students using ordinal logistic regression and ordinal random forest. The response variable used is the PPKU GPA category and the last even semester GPA which is categorized based on the predicate of IPB graduation. The results of ordinal logistic regression get 7 explanatory variables that influence the PPKU GPA and 7 explanatory variables that influence the last even semester GPA. Explanatory variables that have a significant effect on ordinal logistic regression and become 10 variables with the highest level of importance in the ordinal random forest for both response variables are department, mother’s education, internet access in a day for games, activity in the class, and active work on a group assignment.
Perbandingan Perbandingan Pengklasifikasian Metode Support Vector Machine dan Random Forest (Kasus Perusahaan Kebun Kelapa Sawit) Nabila Destyana Achmad; Agus M Soleh; Akbar Rizki
Xplore: Journal of Statistics Vol. 11 No. 2 (2022):
Publisher : Department of Statistics, IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (660.14 KB) | DOI: 10.29244/xplore.v11i2.919

Abstract

Palm oil is one of the leading commodities that support the economy in Indonesia. One of the companies engaged in the oil palm plantation sector has 146 units of oil palm plantations. It is very important to optimize oil palm production, so it is necessary to classify the status of plantation units. Classification aims to predict new plantation units and find the most important variables in the modeling process. The variables used were the status of the garden as a response variable and nine explanatory variables, namely harvested area, rainfall, percentage of normal fruit, fresh fruit bunches production, oil palm loose fruits, production, harvest job performance, harvesting rotation, and farmers. The classification process is carried out using the Support Vector Machine and Random Forest methods to find which method is the best. The data is divided into 80% training data and 20% test data with ten iterations so that ten models are produced for each method. Comparing accuracy value, F1 score, and Area Under Curve (AUC) to evaluate the model. The modeling results show that the random forest method has better performance than the SVM method. The random forest has an average occuracy, F1 score, and AUC, respectively, 90%, 86%, and 89%. Variables of harvest job performance, oil palm loose fruits, harvested area, rainfall, and harvesting rotation are important variables that contribute more than 10% of the model. The results of the research are used for the evaluation and development process of oil palm companies by taking into account the result of important variables that affect productivity and predictive results of new plantation units.