Socialising Data with Google Fusion Tables.

Hector Gonzalez,Alon Y. Halevy,Anno Langen,Jayant Madhavan,Rod McChesney,Rebecca Shapley,Warren Shen,Jonathan Goldberg Kidon

Socialising Data with Google Fusion Tables.

2010

Hector Gonzalez
Alon Y. Halevy
Anno Langen
Jayant Madhavan
Rod McChesney
Rebecca Shapley
Warren Shen
Jonathan Goldberg Kidon

Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whose similarity to the query is larger than a user specified threshold. Weighted string similarity queries are useful in applications like data cleaning and integration for finding approximate matches in the presence of typographical mistakes, multiple formatting conventions, data transformation errors, etc. We show that this problem has semantic properties that can be exploited to design index structures that support very efficient algorithms for query answering.

Keywords:

Information retrieval
Data mining
Semantic property
Disk formatting
String metric
Query string
Computer science
efficient algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations