Identifying the context of a word in a sentence
I have created a classifier for the classroom class of nouns, adjectives, Named Entities in this sentence. I have used Wikipedia's large dataset for classification.
Like:
Where was Abraham Lincoln born?
So the classifier will give a result like this - word - class
- Where is the question
- Abraham Lincoln - man, film, book (because the classifier find Abraham Lincoln in all categories)
- born - time
When was Titanic released?
- when is the question
- Titanic - Song, Movie, Car, Game (Titanic is classified in all of these categories).
Is there a way to define the exact context for a word?
Please look:
- The meaning of the sense of the word would not help here. Because there may not be many words in a sentence that can help
-
Lesk's algorithm with wordnet or sysnet doesn't help either. Because this means that the word
Bank
lesk algo will behave like this======== TESTING simple_lesk ===========
TESTING simple_lesk () ...
Context: I went to the bank to deposit money
Meaning: Synset ('depository_financial_institution.n.01')
Definition: A financial institution that accepts deposits and channels money into lending activities.
TESTING simple_lesk () with POS ...
Context: the riverbank was full of dead fish
Sense: Synset ('bank.n.01')
Definition: Sloping ground (especially slope around a body of water)
Here Bank
it is proposed for the word as financial institute
and slopping land
. Although in my case I already get a prediction like Titanic
, then it could be movie
or game
.
I want to know whether there is a different approach, except Lesk algo
, baseline algo
, traditional word sense disambiguation
that can help me determine which class is right for a particular keyword?
Titanic -
source to share
Thanks for using the pywsd
examples . As far as wsd goes, there are many other options and I code them myself in my spare time. So if you want this to improve, please join me on coding an open source tool =)
At the same time, you will find the following technologies more important to your task, such as:
-
Knowledge base population ( http://www.nist.gov/tac/2014/KBP/ ) where tokens / text segments are assigned an object and the challenge is to link them or solve a simplified problem and answer.
-
Knowledge view ( http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html )
-
Knowledge Extraction ( https://en.wikipedia.org/wiki/Knowledge_extraction )
The above technologies usually include several subtasks, for example:
- Wikification ( http://nlp.cs.rpi.edu/kbp/2014/elreading.html )
- Linking objects
- Slot filling ( http://surdeanu.info/kbp2014/def.php )
Basically, you are asking for a tool that is an NP-complete AI system for text and text processing, so I don't think such a tool exists yet. Perhaps it is IBM Watson.
if you're looking for a search box, the box is there, but if you're looking at tools, chances are the wikification tools are closest to what you might need. ( http://nlp.cs.rpi.edu/paper/WikificationProposal.pdf )
source to share