A method of matching data Article

Rishe, N, Hanani, M. (1987). A method of matching data . MATHEMATICAL MODELLING, 8(C), 172-174. 10.1016/0270-0255(87)90565-3

cited authors

  • Rishe, N; Hanani, M

authors

abstract

  • A probabilistic model and a software implementation have been developed to aid in finding missing persons and in related applications. The method can be applied generally to find most probable correspondences between two sets of imprecisely described objects. These can be descriptions of illnesses vs patients (diagnoses), job offerings vs job applicants, special tasks vs a personnel file (task assignment problem), etc. Every object of the two sets is described by a collection of data, a significant part of which can be erroneous, unreliable, imprecise or given in several contradicting versions. Among the parameters of the method is the following information about each of the data item types and of some of their possible combinations (the parametric information does not depend on the actual data): its logical characteristics, its importance relatively to other types of data items, the meaning and the relative degrees of kinship between values of this data item for two objects to be compared (e.g. kinship of equal values: phonetic kinship; numeric kinship, whose degree is proportional to the inverse of arithmetic difference between the values; matrix of kinship degrees defined for possible pairs of values), interpretation of multiplicity of values for this data item for one object, the a priori probability of data item's correctness (in addition, the probability of any value for any object can provided in a set of objects' descriptions by an investigator who gathers the actual data), etc. A straightforward implementation of the method by software would result in unfeasible time complexity for large sets of objects. Therefore special algorithms have been designed to preprocess the sets of descriptions so that the time of matching-finding is reduced byan order of magnitude while the probabilistic output remains unaltered. © 1987.

publication date

  • January 1, 1987

published in

Digital Object Identifier (DOI)

start page

  • 172

end page

  • 174

volume

  • 8

issue

  • C