Using data mining techniques to discover bias patterns in missing data Article

Tremblay, MC, Dutta, K, Vandermeer, D. (2010). Using data mining techniques to discover bias patterns in missing data . ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2(1), 10.1145/1805286.1805288

cited authors

  • Tremblay, MC; Dutta, K; Vandermeer, D

abstract

  • In today's data-rich environment, decision makers draw conclusions from data repositories that may contain data quality problems. In this context, missing data is an important and known problem, since it can seriously affect the accuracy of conclusions drawn. Researchers have described several approaches for dealing with missing data, primarily attempting to infer values or estimate the impact of missing data on conclusions. However, few have considered approaches to characterize patterns of bias in missing data, that is, to determine the specific attributes that predict the missingness of data values. Knowledge of the specific systematic bias patterns in the incidence of missing data can help analysts more accurately assess the quality of conclusions drawn from data sets with missing data. This research proposes a methodology to combine a number of Knowledge Discovery and Data Mining techniques, including association rule mining, to discover patterns in related attribute values that help characterize these bias patterns. We demonstrate the efficacy of our proposed approach by applying it on a demo census dataset seeded with biased missing data. The experimental results show that our approach was able to find seeded biases and filter out most seeded noise. © 2010 ACM 1936-1955/2010/07-ART2.

publication date

  • July 1, 2010

Digital Object Identifier (DOI)

volume

  • 2

issue

  • 1