Focused Name Recognition (NER)?
One approach might be
-
Use a generic (non-domain) tool to identify people's names
-
Use the object classifier to filter out texts that are not in the domain
If the overall size of the dataset is sufficient and the precision of the extractor and classifier is good enough, you can use the result to get a list of names of people that are closely related to the domain (for example, limiting the results to those that are mentioned significantly more often in domain-specific texts than in other texts).
In the case of baseball, this should be a pretty good way to get a list of baseball-related people. However, that wouldn't be a good way to get a list of baseball players. For the latter, it is necessary to analyze the exact context in which the names are mentioned and what is said about them; but it may not be required.
Edit: By subject classifier, I mean the same thing that other people can refer simply to categorization, document classification, domain classification or the like. Examples of ready-to-use tools include the classifier in Python-NLTK (see here for an example) and the one in LingPipe (see here ).
source to share