Missing data

From Wikiversity
Jump to navigation Jump to search

What is missing data?[edit | edit source]

Survey responses may have missing data because:

  • A respondent did not answer the question intentionally or unintentionally
  • The answer provided did not follow the requested format
  • More than two answers are provided
  • A response is illegible

In addition, missing data can be introduced, usually unintentionally, if the:

  • Data entry person does not enter the data
  • Data analyst deletes data

In SPSS:

  • For numeric variables, the default for missing data is a decimal point or full-stop in the cell, which will appear automatically. Alternatively a specific value, such as -1 or 99, can be used and specified in the Variable View to indicate missing values.
  • For string variables, missing data is indicated by a blank cell

In jamovi:

  • Missing is indicated by a blank cell
  • Coded missing values can be specified through Variables

Missing data in a cell will mean that a variable and a case each have some missing data.

Dealing with missing data[edit | edit source]

The presence of missing data should be identified through data screening.

Strategies for dealing with missing data should be decided prior to data analysis.

Listwise[edit | edit source]

One strategy for dealing with missing data is listwise. This means that all cases with even a single piece of missing data (for the variables in an analysis) will not be used e.g.,:

DESCRIPTIVES VARIABLES=VAR00001
  /STATISTICS=MEAN STDDEV MIN MAX
  /MISSING=LISTWISE.

In other words, to be used in the analysis, each case must have no missing data for the variables of interest.

This approach has the advantage of only working with complete data, but it may remove a lot of potentially useful data.

Pairwise[edit | edit source]

Alternatively, missing data can be dealt with pairwise. This means that all available data is used, even from cases with some missing data.

This approach includes more data, but it can mean that there are different Ns for different analyses.

Imputation[edit | edit source]

Other approaches involve imputation. Imputation involves predicting or "filling in" the missing data.

The simplest form of imputation is mean replacement (i.e., replace the missing data with the mean score for other cases for the same variable).

Where composite scores are computed using a reliable set of indicators, tolerance for some missing data in the indicators can be allowed, so that a composite score is created for as many cases as possible.

More sophisticated imputation uses regression-based prediction (using scores on other related variables to predict the missing value), such as estimation maximalisation or multiple imputation.

See also[edit | edit source]

External links[edit | edit source]