Entity linking

In natural language processing, entity linking, also referred to as named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence 'Paris is the capital of France', the idea is to determine that 'Paris' refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as 'Paris'. Entity linking is different from named entity recognition (NER) in that NER identifies the occurrence of a named entity in text but it does not identify which specific entity it is (see Differences from other techniques).Paris is the capital of France.City is the capital of Country.Paris is the capital of France. It is also the largest city in France. In natural language processing, entity linking, also referred to as named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence 'Paris is the capital of France', the idea is to determine that 'Paris' refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as 'Paris'. Entity linking is different from named entity recognition (NER) in that NER identifies the occurrence of a named entity in text but it does not identify which specific entity it is (see Differences from other techniques). In entity linking, words of interest (names of persons, locations and companies) are mapped from an input text to corresponding unique entities in a target knowledge base. Words of interest are called named entities (NEs), mentions, or surface forms. The target knowledge base depends on the intended application, but for entity linking systems intended to work on open-domain text it is common to use knowledge-bases derived from Wikipedia (such as Wikidata or DBpedia). In this case, each individual Wikipedia page is regarded as a separate entity. Entity linking techniques that map named entities to Wikipedia entities are also called wikification. Considering again the example sentence 'Paris is the capital of France', the expected output of an entity linking system will be Paris and France. These uniform resource locators (URLs) can be used as unique uniform resource identifiers(URIs) for the entities in the knowledge base. Using a different knowledge base will return different URIs, but for knowledge bases built starting from Wikipedia there exist one-to-one URI mappings. In most cases, knowledge bases are manually built, but in applications where large text corpora are available, the knowledge base can be inferred automatically from the available text. Entity linking is beneficial in fields that needs to extract abstract representations from text, as it happens in text analysis, recommender systems, semantic search and chatbots. In all these fields, concepts relevant to the application are separated from text and other non-meaningful data. For example, a common task performed by search engines is to find documents that are similar to one given as input, or to find additional information about the persons that are mentioned in it.Consider a sentence that contains the expression 'the capital of France': without entity linking, the search engine that looks at the content of documents would not be able to directly retrieve documents containing the word 'Paris', leading to so-called false negatives (FN). Even worse, the search engine might producespurious matches (or false positives (FP)), such as retrieving documents referring to 'France' as a country. Many approaches orthogonal to entity linking exist to retrieve documents similar to an input document. For example, latent semantic analysis (LSA) or comparing document embeddings obtained withdoc2vec. However, these techniques do not allow the same fine-grained control that is offered by entity linking, as they will return otherdocuments instead of creating high-level representations of the original one. For example, obtaining schematic information about 'Paris', as presented by Wikipedia infoboxes would be much less straightforward, or sometimes even unfeasible, depending on the query complexity. Moreover, entity linking has been used to improve the performance of information retrieval systems and to improve search performance on digital libraries. Entity linking is also a key input for semantic search.

Parent Topic

Child Topic

No Parent Topic