SpSJoin: Parallel spatial similarity joins Conference

Ballesteros, J, Cary, A, Rishe, N. (2011). SpSJoin: Parallel spatial similarity joins . 481-484. 10.1145/2093973.2094054

cited authors

  • Ballesteros, J; Cary, A; Rishe, N



  • A spatial similarity join of two geospatial datasets finds pairs of records that are simultaneously similar on spatial and textual attributes. Such join is useful for a variety of applications, like data cleansing, record linkage, duplications detection and geocoding enhancement. Efficient techniques exist for the individual joins on either spatial or textual attributes. However, the combined problem has received much less research attention. This paper presents the SpSJoin (Spatial Similarity join) system to fill in this need. SpSJoin is a platform that merges geospatial and text processing techniques for efficiently performing spatial similarity joins. The platform leverages parallel computing with MapReduce to tackle scalability issues in joining large datasets. The efficiency of the proposed techniques are experimentally validated with a join case for improving the geolocation of entities in a real geospatial dataset with referential entities of another dataset. © 2011 Authors.

publication date

  • December 1, 2011

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

start page

  • 481

end page

  • 484