preview

Imputation Of Missing Data Analysis

Good Essays

1.4 Imputation of Missing Data Imputation refers to the ability to predict a missing value based on information from other variables that the individual provides. Development of easy and fast sophisticated computer methods has led to the ability for various imputation methods. Algorithms for imputation include those for educated guessing; where one can make an informed "guess" about a missing value. For example, in a data matrix, if the participant responded with all 5s, then one could assume that the missing value is a 5. Average Imputation uses the average value of the responses from the other participants, for a variable, to fill in the missing value. If the average of the 40 responses on the question is a 6.5, they would use a 6.5 as …show more content…

Be that as it may, these methodologies have a strong dependence on the regression model. In the event that the incomplete data cannot be properly modelled parametrically, the method might have poor predictive power. For the most part, the true distribution of the data set is unknown, which is indispensable to the foundation of regression models. Non-parametric methods can offer unrivalled outcomes by capturing the structure in the datasets, for example in kernel-based imputation [44] and K- nearest neighbour imputation [37]. K-nearest neighbour(k-NN or KNN) imputation replaces NaNs (Na’s) in the data with the corresponding value from the nearest-neighbour column. A case is imputed using values from the k most similar cases. The nearest-neighbour column is the closest column in Euclidean distance. If the corresponding value from the nearest-neighbour column is also NaN, the next nearest column is used. In the article "An Evaluation of k-Nearest Neighbour Imputation Using Likert Data", Likert Data", Per Jönsson and Claes Wohlin report that they simulated the k-NN method with different values of k and for different proportions of missing data and state that their findings indicate that it is feasible to use the k-NN method with Likert data. They suggested that a suitable value of k is approximately the square root of the number of complete cases. They also demonstrated that even when the method rules with respect to selecting neighbours were

Get Access