Claim Missing Document
Check
Articles

Found 4 Documents
Search

Model Selection For Forecasting Rainfall Dataset Amri Muhaimin; Hendri Prabowo; Suhartono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 1 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (450.352 KB) | DOI: 10.33005/ijdasea.v1i1.2

Abstract

The objective of this research is to obtain the best method for forecasting rainfall in the Wonorejo reservoir in Surabaya. Time series and causal approaches using statistical methods and machine learning will be compared to forecast rainfall. Time series regression (TSR), autoregressive integrated moving average (ARIMA), linear regression (LR), and transfer function (TF) are used as a statistical method. Feedforward neural network (FFNN) and deep feed-forward neural network (DFFNN) is used as a machine learning method. Statistical methods are used to capture linear patterns, whereas the machine learning method is used to capture nonlinear patterns. Data about hourly rainfall in the Wonorejo reservoir is used as a case study. The data has a seasonal pattern, i.e. monthly seasonality. Based on the cross-validation and information criteria, the results showed that DFFNN using the time series approach has a more accurate forecast than other methods. In general, machine learning methods have better accuracy than statistical methods. Furthermore, additional information is obtained, through this research the parameter that best to make a neural network model is known. Moreover, these results are also not in line with the results of M3 and M4 competition, i.e. more complex methods do not necessarily produce better forecasts than simpler methods.
Negative Binomial Time Series Regression – Random Forest Ensemble in Intermittent Data Amri Muhaimin; Prismahardi Aji Riyantoko; Hendri Prabowo; Trimono Trimono
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 1 No. 2 (2021): International Journal of Data Science, Engineering, and Analytics Vol 1, No 2,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (331.85 KB) | DOI: 10.33005/ijdasea.v1i2.10

Abstract

Intermittent dataset is a unique data that will be challenging to forecast. Because the data is containing a lot of zeros. The kind of intermittent data can be sales data and rainfall data. Because both sometimes no data recorded in a certain period. In this research, the model is created to overcome the problem. The approach that is used in this research is the ensemble method. Mostly the intermittent data comes from the Negative Binomial because the variance is over the mean. We use two datasets, which are rainfall and sales data. So, our approach is creating the base model from the time series regression with Negative Binomial based, and then we augmented the base model with a tree-based model which is random forest. Furthermore, we compare the result with the benchmark method which is The Croston method and Single Exponential Smoothing (SES). As the result, our approach can overcome the benchmark based on metric value by 1.79 and 7.18.
Metric Comparison For Text Classification Amri Muhaimin; Tresna Maulana Fahrudin; Trimono; Prismahardi Aji Riyantoko; Kartika Maulida Hindrayani
Internasional Journal of Data Science, Engineering, and Anaylitics Vol. 2 No. 1 (2022): International Journal of Data Science, Engineering, and Analytics Vol 2, No 1,
Publisher : International Journal of Data Science, Engineering, and Analytics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijdasea.v2i1.34

Abstract

Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar evaluation scores. Another finding is that Random Forest is the best method among others, also the best metric for text classification is Term Frequencies – Inverse Document Frequencies.
Social Media Analysis and Topic Modeling: Case Study of Stunting in Indonesia Amri Muhaimin; Tresna Maulana Fahrudin; Syifa Syarifah Alamiyah; Heidy Arviani; Ade Kusuma; Allan Ruhui Fatmah Sari; Angela Lisanthoni
Telematika Vol 20, No 3 (2023): Edisi Oktober 2023
Publisher : Jurusan Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31315/telematika.v20i3.10797

Abstract

Purpose: Stunting is a problem that currently requires special attention in Indonesia. The stunting rate in 2022 will drop to 21.6%, and for the future, the government has set a target of up to 14% in 2024. Rapid technological developments and freedom of expression on the internet produce review text data that can be analyzed for evaluation. This study analyzes the text data of Twitter users' reviews on stunting. The method used is a text-mining approach and topic modeling based on Latent Dirichlet Allocation.Design/methodology/approach: The methodology used in this study is Latent Dirichlet Allocation. The data was collected from twitter with the keyword “stunting.” After, the data was cleaned and then modeled using the Latent Dirichlet Allocation.Findings/results: The results show that negative sentiment dominates by 60.6%, positive sentiment by 31.5%, and neutral by 7.9%. In addition, this research shows that 'children,' 'decrease,' 'number,' 'prevention,' and 'nutrition' are among the words that often appear on stunting.Originality/value/state of the art: This study uses the keyword stunting and analyzes it. Social media analytics show that the people of Indonesia are primarily aware of stunting. Also, the Latent Dirichlet Analysis can be used to create the model.