Claim Missing Document
Check
Articles

Found 40 Documents
Search

Dimensionality Reduction Algorithms on H Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 2, No 2 (2014)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (10447.419 KB)

Abstract

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).
Data Mining Approach for Breast Cancer Patient Recovery Fahrudin, Tresna Maulana; Syarif, Iwan; Barakbah, Ali Ridho
EMITTER International Journal of Engineering Technology Vol 5, No 1 (2017)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (994.12 KB) | DOI: 10.24003/emitter.v5i1.190

Abstract

Breast cancer is the second highest cancer type which attacked Indonesian women. There are several factors known related to encourage an increased risk of breast cancer, but especially in Indonesia that factors often depends on the treatment routinely. This research examines the determinant factors of breast cancer and measures the breast cancer patient data to build the useful classification model using data mining approach.The dataset was originally taken from one of Oncology Hospital in East Java, Indonesia, which consists of 1097 samples, 21 attributes and 2 classes. We used three different feature selection algorithms which are Information Gain, Fisher’s Discriminant Ratio and Chi-square to select the best attributes that have great contribution to the data. We applied Hierarchical K-means Clustering to remove attributes which have lowest contribution. Our experiment showed that only 14 of 21 original attributes have the highest contribution factor of the breast cancer data. The clustering algorithmdecreased the error ratio from 44.48% (using 21 original attributes) to 18.32% (using 14 most important attributes).We also applied the classification algorithm to build the classification model and measure the precision of breast cancer patient data. The comparison of classification algorithms between Naïve Bayes and Decision Tree were both given precision reach 92.76% and 92.99% respectively by leave-one-out cross validation. The information based on our data research, the breast cancer patient in Indonesia especially in East Java must be improved by the treatment routinely in the hospital to get early recover of breast cancer which it is related with adherence of patient.
Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 4, No 2 (2016)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (761.855 KB) | DOI: 10.24003/emitter.v4i2.149

Abstract

This paper describes the advantages of using Evolutionary Algorithms (EA) for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS) are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA) and Particle Swarm Optimizations (PSO) as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN) as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets). However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time.
Arrhythmia Classification Using Long Short-Term Memory with Adaptive Learning Rate Assodiky, Hilmy; Syarif, Iwan; Badriyah, Tessy
EMITTER International Journal of Engineering Technology Vol 6, No 1 (2018)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24003/emitter.v6i1.265

Abstract

Arrhythmia is a heartbeat abnormality that can be harmless or harmful. It depends on what kind of arrhythmia that the patient suffers. People with arrhythmia usually feel the same physical symptoms but every arrhythmia requires different treatments. For arrhythmia detection, the cardiologist uses electrocardiogram that represents the cardiac electrical activity. And it is a kind of sequential data with high complexity. So the high performance classification method to help the arrhythmia detection is needed. In this paper, Long Short-Term Memory (LSTM) method was used to classify the arrhythmia. The performance was boosted by using AdaDelta as the adaptive learning rate method. As a comparison, it was compared to LSTM without adaptive learning rate. And the best result that showed high accuracy was obtained by using LSTM with AdaDelta. The correct classification rate was 98% for train data and 97% for test data.
Influence of Logistic Regression Models For Prediction and Analysis of Diabetes Risk Factors Maulana, Yufri Isnaini Rochmat; Badriyah, Tessy; Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 6, No 1 (2018)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24003/emitter.v6i1.258

Abstract

Diabetes is a very serious chronic. Diabetes can occurs when the pancreas doesnt produce enough insulin (a hormone used to regulate blood sugar), cause glucose in the blood to be high. The purpose of this study is to provide a different approach in dealing with cases of diabetes, thats with data mining techniques mengguanakan logistic regression algorithm to predict and analyze the risk of diabetes that is implemented in the mobile framework. The dataset used for data modeling using logistic regression algorithm was taken from Soewandhie Hospital on August 1 until September 30, 2017. Attributes obtained from the Hospital Laboratory have 11 attribute, with remove 1 attribute that is the medical record number so it becomes 10 attributes. In the data preparation dataset done preprocessing process using replace missing value, normalization, and feature extraction to produce a good accuracy. The result of this research is performance measure with ROC Curve, and also the attribute analysis that influence to diabetes using p-value. From these results it is known that by using modeling logistic regression algorithm and validation test using leave one out obtained accuracy of 94.77%. And for attributes that affect diabetes is 9 attributes, age, hemoglobin, sex, blood sugar pressure, creatin serum, white cell count, urea, total cholesterol, and bmi. And for attributes triglycerides have no effect on diabetes.
Dimensionality Reduction Algorithms on High Dimensional Datasets Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 2, No 2 (2014)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (10447.419 KB) | DOI: 10.24003/emitter.v2i2.24

Abstract

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).
Classification Algorithms of Maternal Risk Detection For Preeclampsia With Hypertension During Pregnancy Using Particle Swarm Optimization Tahir, Muhlis; Badriyah, Tessy; Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 6, No 2 (2018)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (565.13 KB) | DOI: 10.24003/emitter.v6i2.287

Abstract

Preeclampsia is a pregnancy abnormality that develops after 20 weeks of pregnancy characterized by hypertension and proteinuria.  The purpose of this research was to predict the risk of preeclampsia level in pregnant women during pregnancy process using Neural Network and Deep Learning algorithm, and compare the result of both algorithm. There are 17 parameters that taken from 1077 patient data in Haji General Hospital Surabaya and two hospitals in Makassar start on December 12th 2017 until February 12th 2018. We use particle swarm optimization (PSO) as the feature selection algorithm. This experiment shows that PSO can reduce the number of attributes from 17 to 7 attributes. Using LOO validation on the original data show that the result of Deep Learning has the accuracy of 95.12% and it give faster execution time by using the reduced dataset (eight-speed quicker than the original data performance). Beside that the accuracy of Deep Learning increased 0.56% become 95.68%. Generally, PSO gave the excellent result in the significantly lowering sum attribute as long as keep and improve method and precision although lowering computational period. Deep Learning enables end-to-end framework, and only need input and output without require for tweaking the attributes or features and does not require a long time and complex systems and understanding of the deep data on computing.
Enhanced PEGASIS using Dynamic Programming for Data Gathering in Wireless Sensor Network Mufid, Mohammad Robihul; Al Rasyid, M. Udin Harun; Syarif, Iwan
EMITTER International Journal of Engineering Technology Vol 7, No 1 (2019)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (900.727 KB) | DOI: 10.24003/emitter.v7i1.360

Abstract

A number of routing protocol algorithms such as Low-Energy Adaptive Clustering Hierarchy (LEACH) and Power-Efficient Gathering in Sensor Information Systems (PEGASIS) have been proposed to overcome the problem of energy consumption in Wireless Sensor Network (WSN) technology. PEGASIS is a development of the LEACH protocol, where within PEGASIS all nodes are active during data transfer rounds thus limiting the lifetime of the WSN. This study aims to propose improvements from the previous PEGASIS version by giving the name Enhanced PEGASIS using Dynamic Programming (EPDP). EPDP uses the Dominating Set (DS) concept in selecting a subset of nodes to be activated and using dynamic programming based optimization in forming chains from each node. There are 2 topology nodes that we use, namely random and static. Then for the Base Station (BS), it will also be divided into several scenarios, namely the BS is placed outside the network, in the corner of the network, and in the middle of the network. Whereas to determine the performance between EPDP, PEGASIS and LEACH, an analysis of the number of die nodes, number of alive nodes, and remaining of energy were analyzed. From the experiment result, it was found that the EPDP protocol had better performance compared to the LEACH and PEGASIS protocols in terms of number of die nodes, number of alive nodes, and remaining of energy. Whereas the best BS placement is in the middle of the network and uses static node distribution topologies to save more energy.
Spatio Temporal with Scalable Automatic Bisecting-Kmeans for Network Security Analysis in Matagaruda Project Hisyam, Masfu; Barakbah, Ali Ridho; Syarif, Iwan; S, Ferry Astika
EMITTER International Journal of Engineering Technology Vol 7, No 1 (2019)
Publisher : Politeknik Elektronika Negeri Surabaya (PENS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (715.88 KB) | DOI: 10.24003/emitter.v7i1.340

Abstract

Internet attacks are a frequent occurrence and the incidence is always increasing every year, therefore Matagaruda project is built to monitor and analyze internet attacks using IDS (Intrusion Detection System). Unfortunately, the Matagaruda project has lacked in the absence of trend analysis and spatiotemporal analysis. It causes difficulties to get information about the usual seasonal attacks, then which sector is the most attacked and also the country or territory where the internet attack originated. Due to the number of unknown clusters, this paper proposes a new method of automatic bisecting K-means with the average of SSE is 93 percents better than K-means and bisecting K-means. The usage of big spark data is highly scalable for processing massive data attack.
SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance Iwan Syarif; Adam Prugel-Bennett; Gary Wills
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 14, No 4: December 2016
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v14i4.3956

Abstract

Machine Learning algorithms have been widely used to solve various kinds of data classification problems. Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicated and computationally expensive, especially when the number of possible different combinations of variables is so high. Support Vector Machine (SVM) has been proven to perform much better when dealing with high dimensional datasets and numerical features. Although SVM works well with default value, the performance of SVM can be improved significantly using parameter optimization. We applied two methods which are Grid Search and Genetic Algorithm (GA) to optimize the SVM parameters. Our experiment showed that SVM parameter optimization using grid search always finds near optimal parameter combination within the given ranges. However, grid search was very slow; therefore it was very reliable only in low dimensional datasets with few parameters. SVM parameter optimization using GA can be used to solve the problem of grid search. GA has proven to be more stable than grid search. Based on average running time on 9 datasets, GA was almost 16 times faster than grid search. Futhermore, the GA’s results were slighlty better than the grid search in 8 of 9 datasets.