Focused Name Recognition (NER)?

I want to recognize named objects in a specific field (like baseball). I know there are tools like StanfordNER, LingPipe, AlchemyAPI and I've done a little testing with them. But I want them to be field as I mentioned earlier. How is this possible?

+3


source to share


2 answers


One approach might be

  • Use a generic (non-domain) tool to identify people's names

  • Use the object classifier to filter out texts that are not in the domain

If the overall size of the dataset is sufficient and the precision of the extractor and classifier is good enough, you can use the result to get a list of names of people that are closely related to the domain (for example, limiting the results to those that are mentioned significantly more often in domain-specific texts than in other texts).



In the case of baseball, this should be a pretty good way to get a list of baseball-related people. However, that wouldn't be a good way to get a list of baseball players. For the latter, it is necessary to analyze the exact context in which the names are mentioned and what is said about them; but it may not be required.

Edit: By subject classifier, I mean the same thing that other people can refer simply to categorization, document classification, domain classification or the like. Examples of ready-to-use tools include the classifier in Python-NLTK (see here for an example) and the one in LingPipe (see here ).

+3


source


Take a look at smile-ner.appspot.com, which covers over 250 categories. In particular, it covers many people / teams / sports clubs. May be helpful for your purpose.



0


source







All Articles