Link Extraction Using Stanford CoreNLP
I am trying to extract information from natural language using Stanford CoreNLP library.
My goal is to extract "subject-action-object" pairs (simplified) from sentences.
As an example, consider the following sentence:
John Smith only eats an apple and a banana for lunch. He is on a diet and his mother told him it would be great to eat less for lunch. John doesn't like it at all, but since he takes his diet very seriously, he doesn't want to stop.
From this proposal, I would like to get the following results:
- John Smith - eats - only apple and banana for lunch
- He - is - on a diet
- His mother told him it would be great to eat less for lunch.
- John - don't like - he (at all)
- He is very serious about his diet.
How to do it?
Or, more specifically: How can I parse a dependency tree (or a more appropriate tree?) To get the results above?
Any hint, resource, or code snippet considering this task would be much appreciated.
Side note: I was able to replace the main links on their representational references, which then would have changed the tags he
and his
the respective object (in this case, John Smith).
source to share
The Stanford CoreNLP toolkit comes with a dependency analyzer.
First of all, this is a link that describes the types of edges in the tree:
http://universaldependencies.github.io/docs/
There are many ways to use the toolkit to generate a dependency tree.
Here's some sample code you can get started:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
public class DependencyTreeExample {
public static void main (String[] args) throws IOException {
// set up properties
Properties props = new Properties();
props.setProperty("ssplit.eolonly","true");
props.setProperty("annotators",
"tokenize, ssplit, pos, depparse");
// set up pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// get contents from file
String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
System.out.println(content);
// read in a product review per line
Annotation annotation = new Annotation(content);
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
System.out.println("---");
System.out.println("sentence: "+sentence);
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
}
}
}
:
- Cut and paste this into DependencyTreeExample.java
- put this file in stanford-corenlp-full-2015-04-20 directory
- javac -cp "* :." DependencyTreeExample.java
- add your suggestions one sentence per line to a file named dependency_sentences.txt
- java -cp "* :." DependencyTreeExample dependency_sentences.txt
example output:
sentence: John doesn't like it at all.
dep reln gov
--- ---- ---
like-4 root root
John-1 nsubj like-4
does-2 aux like-4
n't-3 neg like-4
it-5 dobj like-4
at-6 case all-7
all-7 nmod:at like-4
.-8 punct like-4
This will print the dependency analyzes. By working with the SemanticGraph object, you can write code to find the templates you need.
In this example, you will notice that "how" refers to "John" with "nsubj" and "like" points to "it" to "dobj"
For reference you should look at edu.stanford.nlp.semgraph.SemanticGraph
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html
source to share
You can also try the new Stanford OpenIE system: http://nlp.stanford.edu/software/openie.shtml . In addition to offline download, it is now connected to CoreNLP 3.6.0+.
source to share