Preview

Spatial Approximate String Search

Powerful Essays
Open Document
Open Document
15261 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Spatial Approximate String Search
1394

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 25,

NO. 6,

JUNE 2013

Spatial Approximate String Search
Feifei Li, Member, IEEE, Bin Yao, Mingwang Tang, and Marios Hadjieleftheriou
Abstract—This work deals with the approximate string search in large spatial databases. Specifically, we investigate range queries augmented with a string similarity search predicate in both euclidean space and road networks. We dub this query the spatial approximate string (SAS) query. In euclidean space, we propose an approximate solution, the MHR-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on the set resemblance between the query string and the q-grams from the subtrees of index nodes. We also discuss how to estimate the selectivity of a SAS query in euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree. For queries on road networks, we propose a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice. The RSASSOL combines the q-gram-based inverted lists and the reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approaches.
Index Terms—Approximate string search, range query, road network, spatial databases

Ç
1

INTRODUCTION

K

search over a large amount of data is an important operation in a wide range of domains. Felipe et al. have recently extended its study to spatial databases
[17], where keyword search becomes a fundamental building block for an increasing number of real-world applications, and proposed the IR2 -Tree. A main limitation of the IR2 -Tree is that it only supports exact keyword
search.



References: Management of Data, pp. 13-24, 1999. Conf. Advances in Geographic Information Systems (GIS), pp. 61-70, 2010. pp. 322-331, 1990. ACM 30th Symp. Theory of Computing (STOC), pp. 327-336, 1998. SIGMOD Int’l Conf. Management of Data, pp. 805-818, 2008. SIGMOD Int’l Conf. Management of Data, pp. 313-324, 2003. Proc. Int’l Conf. Data Eng. (ICDE), pp. 227-238, 2004. (ICDE), pp. 5-16, 2006. Sciences, vol. 55, no. 3, pp. 441-453, 1997. vol. 2, no. 1, pp. 337-348, 2009. (ICDM), pp. 139-146, 2002. Symp. Discrete Algorithms (SODA), pp. 156-165, 2005. Bases (VLDB), pp. 491-500, 2001. Real Attributes,” The VLDB J., vol. 14, no. 2, pp 137-154, 2005. pp. 47-57, 1984. pp. 397-408, 2005. vol. 17, no. 5, pp. 1213-1229, 2008. Structure,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 325336, 2005. Very Large Data Bases (VLDB), pp. 1078-1086, 2004.

You May Also Find These Documents Helpful

  • Good Essays

    ECET 370 Week 5 Lab 5

    • 650 Words
    • 3 Pages

    Exercise 1: Review of the Lecture Content Create a project using the ArrayList class and the Main class provided in DocSharing. The ArrayList class contains implementations of the first three search methods explained in this week's lecture: sequential, sorted, and binary search. The Main class uses these three methods. These programs test the code discussed in the lecture. Compile the project, run it, and review the code that is given carefully.…

    • 650 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Nt1310 Unit 1 Test Paper

    • 381 Words
    • 2 Pages

    3. Create a binary search function that tells whether a given value exists in the tree or not.…

    • 381 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    6) Trapdoor(GP,pkS,skRi,Q): Taking GP,pkS,pkRi and a keyword queries for Q=(w1,.wm),m<=las the source of information ,it yields a trapdoor TQ, 1 for Q produced by Ri.…

    • 259 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    unit 6

    • 360 Words
    • 2 Pages

    14) The LIKE keyword can be used with the '%' to search for patterns in character data.…

    • 360 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    LYT2 Task2

    • 4061 Words
    • 12 Pages

    Stein, S. S., Gerding, E. H., Rogers, A. C., Larson, K. K., & Jennings, N. R. (2011). Algorithms…

    • 4061 Words
    • 12 Pages
    Satisfactory Essays
  • Good Essays

    Literature Search

    • 952 Words
    • 4 Pages

    1. Use a library database such as CINAHL Plus with full text for your search.…

    • 952 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Boolean Search Operators

    • 1581 Words
    • 7 Pages

    On Internet search engines, the options for constructing logical relationships among search terms often modify the traditional practice of Boolean searching. This will be covered in the section below, Boolean Searching on the Internet.…

    • 1581 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    Full text search is a technique for searching a document or database stored in the computer. A full text search engine examines all the words, in every stored document, to find a match of the keyword searched by the user. Many web sites and application programs provide full-text search capabilities.…

    • 599 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Another goal of decomposition is to have each individual relation Ri in the decomposition D be in BCNF or 3NF. Additional properties of decomposition are needed to prevent from generating spurious tuples…

    • 2087 Words
    • 9 Pages
    Good Essays
  • Powerful Essays

    The goal of the Turnpike Problem is to reconstruct those point sets that arise from a given distance multiset. Although the Turnpike Problem itself is of unknown complexity, variants of it have been proven to be NP-complete, and there are no existing polynomial algorithms for it. P systems with active membranes and P systems with membrane creation are parallel computing models based on the characteristics of living cells; both have been used to solve NPcomplete problems in polynomial time or better by trading time for an exponential workspace. In this paper we present a P system with active membranes and membrane creation that implements an O(2n n log n)-time backtracking algorithm for the Turnpike Problem in linear time. multiset; according to [9], when these point sets are unique (that is, none of them is a reflection of another), they are called homometric sets. TP first appeared in the 1930’s as a problem in X-ray crystallography, and reappeared in DNA sequencing as the Partial Digest Problem (PDP). The exact computational complexity of TP remains an open problem, although certain variants of it, as well as the decision problem of whether n points in R realize a multiset of n distances, have been proven to be NP-complete 2 in [9]. (Similarly, PDP’s own computational complexity is an open problem; variants of it are proven to be NP-hard or NP-complete in [2].) However, no polynomial-time algorithm has been found that solves TP. Among the algorithms that have been proposed is a polynomial factorization algorithm presented by Rosenblatt and Seymour in [8], and a backtracking algorithm presented by Skiena et al in [9]. The polynomial factorization algorithm…

    • 10209 Words
    • 41 Pages
    Powerful Essays
  • Good Essays

    This book provides a comprehensive introduction to the modern study of computer algorithms. It presents many algorithms and covers them in considerable depth, yet makes their design and analysis accessible to all levels of readers. We have tried to keep explanations elementary without sacrificing depth of coverage or mathematical rigor. Each chapter presents an…

    • 242616 Words
    • 971 Pages
    Good Essays
  • Powerful Essays

    modeling

    • 5987 Words
    • 23 Pages

    The data in large, commercial databases pose special challenges for database designers and users. Some major concerns are:…

    • 5987 Words
    • 23 Pages
    Powerful Essays
  • Powerful Essays

    Genetic Programming

    • 3553 Words
    • 15 Pages

    GENETIC PROGRAMMING: AN INTRODUCTION AND SURVEY OF APPLICATIONS M.J. Willis*, H.G Hiden*, P. Marenbach+, B. McKay* and G.A. Montague* * Symbolic Optimisation Research Group (SORG) Dept. of Chemical and Process Engineering University of Newcastle upon Tyne NE1 7RU, UK + Institute of Control Engineering Darmstadt University of Technology Landgraf-Georg-Strasse 4 D-64283 Darmstadt, Germany {Mark. Willis, H.G.Hiden, Ben. McKay, Gary.…

    • 3553 Words
    • 15 Pages
    Powerful Essays
  • Better Essays

    References: [1] I. Aron and P. Van Hentenryck. A constraints satisfaction approach to the robust spanning tree problem with interval data. In preparation. Computer Science Department, Brown University, May 2002. [2] I. Aron and P. Van Hentenryck. On the complexity of the robust spanning tree with interval data. Operations Research Letters, to appear. [3] D.P. Bertsekas and R. Gallagher. Data Networks. Prentice-Hall, Englewood Cliffs, NJ, 1987. [4] A. Cayley. A theorem on trees. Quarterly Journal of Pure and Applied Mathematics, 23:376–378, 1889. [5] J.J. Dongarra. Performance of various computers using standard linear algebra software in a fortran environment. Technical Report CS-89-85, University of Tennessee, July 2003. [6] H.N. Gabow. Two algorithms for generating weighted spanning trees in order. SIAM Journal on Computing, 6(1):139–150, March 1977.…

    • 5768 Words
    • 24 Pages
    Better Essays
  • Powerful Essays

    Index No Statistics

    • 3770 Words
    • 16 Pages

    The aim of this paper is to present a scheme for the description of index numbers in a…

    • 3770 Words
    • 16 Pages
    Powerful Essays

Related Topics