Abdelrahman Elsharif Karrar
College of Computer Science and Engineering, Taibah University, Saudi Arabia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

The Effect of Using Data Pre-Processing by Imputations in Handling Missing Values Abdelrahman Elsharif Karrar
Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol 10, No 2: June 2022
Publisher : IAES Indonesian Section

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52549/ijeei.v10i2.3730

Abstract

The evolution of big data analytics through machine learning and artificial intelligence techniques has caused organizations in a wide range of sectors including health, manufacturing, e-commerce, governance, and social welfare to realize the value of massive volumes of data accumulating on web-based repositories daily. This has led to the adoption of data-driven decision models; for example, through sentiment analysis in marketing where produces leverage customer feedback and reviews to develop customer-oriented products. However, the data generated in real-world activities is subject to errors resulting from inaccurate measurements or fault input devices, which may result in the loss of some values. Missing attribute/variable values make data unsuitable for decision analytics due to noises and inconsistencies that create bias. The objective of this paper was to explore the problem of missing data and develop an advanced imputation model based on Machine Learning and implemented on K-Nearest Neighbor (KNN) algorithm in R programming language as an approach to handle missing values. The methodology used in this paper relied on the applying advanced machine learning algorithms with high-level accuracy in pattern detection and predictive analytics on the existing imputation techniques, which handle missing values by random replacement or deletion..  According to the results, advanced imputation technique based on machine learning models replaced missing values from a dataset with 89.5% accuracy. The experimental results showed that pre-processing by imputation delivers high-level performance efficiency in handling missing data values. These findings are consistent with the key idea of paper, which is to explore alternative imputation techniques for handling missing values to improve the accuracy and reliability of decision insights extracted from datasets.
Handling Imbalanced Data through Re-sampling: Systematic Review Razan Eltayeb; Abdelrahman Elsharif Karrar; Waleed Ibrahim Osman; Moez Mutasim
Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol 11, No 2: June 2023
Publisher : IAES Indonesian Section

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52549/.v11i2.4471

Abstract

Handling imbalanced data is an important issue that can affect the validity and reliability of the results. One common approach to addressing this issue is through re-sampling the data. Re-sampling is a technique that allows researchers to balance the class distribution of their dataset by either over-sampling the minority class or under-sampling the majority class. Over-sampling involves adding more copies of the minority class examples to the dataset in order to balance out the class distribution. On the other hand, under-sampling involves removing some of the majority class examples from the dataset in order to balance out the class distribution. It's also common to combine both techniques, usually called hybrid sampling. It is important to note that re-sampling techniques can have an impact on the model's performance, and it is essential to evaluate the model using different evaluation metrics and to consider other techniques such as cost-sensitive learning and anomaly detection. In addition, it is important to keep in mind that increasing the sample size is always a good idea to improve the performance of the model. In this systematic review, we aim to provide an overview of existing methods for re-sampling imbalanced data. We will focus on methods that have been proposed in the literature and evaluate their effectiveness through a thorough examination of experimental results. The goal of this review is to provide practitioners with a comprehensive understanding of the different re-sampling methods available, as well as their strengths and weaknesses, to help them make informed decisions when dealing with imbalanced data.