CAUCHY: Jurnal Matematika Murni dan Aplikasi

Comparing Several Missing Data Estimation Methods in Linear Regression;Real Data Example and A Simulation Study

Anwar Fitrianto (Department of Statistics, IPB University)
Jap Ee Jia (Department of Mathematics Faculty of Science Universiti Putra Malaysia)
Budi Susetyo (Department of Statistics IPB University)
La Ode Abdul Rahman (Department of Statistics IPB University)

Article Info

Publish Date
24 May 2023


Analysis on incomplete could lead to biased estimation when using standard statistical procedure since it ignores the missing observations. The disadvantage of ignoring missing data is that the researcher might not have enough data to conduct an analysis. The main objective of the study is to compare the performance between listwise deletion (LD), mean substitution (MS) and multiple imputation (MI) method in estimating parameters. The performance will be measured through bias, standard error and 95% confidence interval of interested estimates for handling missing data with 10% missing observations. A complete empirical data set was used and assumed as population data. Ten percent of total observations in the population ere set as missing arbitrarily by generating random numbers from a uniform distribution,  . Then, bias of parameter estimates and confidence interval of parameter estimates are calculated to compare the three methods. A Monte Carlo simulation was carried out to know the properties of missing data and investigated using simulated random numbers. Simulation of 1000 sampled data with 20, 50, and 100 observations and each sample is set to have 10% missing observations. Standard statistical analyses are run for each missing data and get the average of parameter estimates to calculate the bias and standard error of parameter estimates for every missing data method. The analysis was conducted by using SAS version 9.2. It was found that the MI method provided the smallest bias and standard error of parameter estimates and a narrower confidence interval compared to the LD and MS methods Meanwhile, the LD method gives a smaller bias of parameter estimates and standard error for small sample size of missing data. And, MS method is strongly recommended not to use for handling missing data because it will result in large bias and standard error of parameter estimates.

