Identifying names in a string

I would like to find a good way to define the names of people, places, etc. in user searches on my site. For example, if a user asks "how old is George Washington", I need to know from a predefined list that George Washington is human.

Some of the lists will be global and some will be user specific. For example, if they asked "how old is John Smith," I can only want to identify a specific John Smith who is my accomplice - and I would not want to identify him as a person if he is not my associate.

Is there an NLP library or scan of these lists that I could do to take advantage of Soundx functionality, mature NLP features, misspell, etc.? I can write this by hand, but I would rather use something mature. Thank you.

+3


source to share


2 answers


What you need is called Named Entity Recognition

One of the best programs available for this is Stanford NLP: http://nlp.stanford.edu/software/CRF-NER.shtml (written in Java)



If you are on a different platform, there are good open source projects in Ruby and Python. Search for Named Entity Recognition.

+3


source


The natural language processing (NLP) problem you are looking for is called Named Entity Recognition

(NER)

Besides Stanford CRF-NER (in java), the popular python choice from Natural Language ToolKit

( NLTK ) is often used as a baseline for NER tasks.



You can try installing NLTK and then execute the following code:

>>> from nltk.tokenize import word_tokenize
>>> from nltk.tag import pos_tag
>>> from nltk.chunk import ne_chunk
>>> sentence = "How old is John Smith?"
>>> ne_chunk(pos_tag(word_tokenize(sentence)))
Tree('S', [('How', 'WRB'), ('old', 'JJ'), ('is', 'VBZ'), Tree('PERSON', [('John', 'NNP'), ('Smith', 'NNP')]), ('?', '.')])

      

+2


source







All Articles