Link Extraction Using Stanford CoreNLP

I am trying to extract information from natural language using Stanford CoreNLP library.

My goal is to extract "subject-action-object" pairs (simplified) from sentences.

As an example, consider the following sentence:

John Smith only eats an apple and a banana for lunch. He is on a diet and his mother told him it would be great to eat less for lunch. John doesn't like it at all, but since he takes his diet very seriously, he doesn't want to stop.

From this proposal, I would like to get the following results:

  • John Smith - eats - only apple and banana for lunch
  • He - is - on a diet
  • His mother told him it would be great to eat less for lunch.
  • John - don't like - he (at all)
  • He is very serious about his diet.

How to do it?

Or, more specifically: How can I parse a dependency tree (or a more appropriate tree?) To get the results above?

Any hint, resource, or code snippet considering this task would be much appreciated.

Side note: I was able to replace the main links on their representational references, which then would have changed the tags he

and his

the respective object (in this case, John Smith).

+3


source to share


2 answers


The Stanford CoreNLP toolkit comes with a dependency analyzer.

First of all, this is a link that describes the types of edges in the tree:

http://universaldependencies.github.io/docs/

There are many ways to use the toolkit to generate a dependency tree.

Here's some sample code you can get started:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;

public class DependencyTreeExample {

    public static void main (String[] args) throws IOException {

        // set up properties
        Properties props = new Properties();
        props.setProperty("ssplit.eolonly","true");
        props.setProperty("annotators",
                "tokenize, ssplit, pos, depparse");
        // set up pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // get contents from file
        String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
        System.out.println(content);
        // read in a product review per line
        Annotation annotation = new Annotation(content);
        pipeline.annotate(annotation);

        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            System.out.println("---");
            System.out.println("sentence: "+sentence);
            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
        }


    }

}

      

:



  • Cut and paste this into DependencyTreeExample.java
  • put this file in stanford-corenlp-full-2015-04-20 directory
  • javac -cp "* :." DependencyTreeExample.java
  • add your suggestions one sentence per line to a file named dependency_sentences.txt
  • java -cp "* :." DependencyTreeExample dependency_sentences.txt

example output:

sentence: John doesn't like it at all.
dep                 reln                gov                 
---                 ----                ---                 
like-4              root                root                
John-1              nsubj               like-4              
does-2              aux                 like-4              
n't-3               neg                 like-4              
it-5                dobj                like-4              
at-6                case                all-7               
all-7               nmod:at             like-4              
.-8                 punct               like-4 

      

This will print the dependency analyzes. By working with the SemanticGraph object, you can write code to find the templates you need.

In this example, you will notice that "how" refers to "John" with "nsubj" and "like" points to "it" to "dobj"

For reference you should look at edu.stanford.nlp.semgraph.SemanticGraph

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html

+3


source


You can also try the new Stanford OpenIE system: http://nlp.stanford.edu/software/openie.shtml . In addition to offline download, it is now connected to CoreNLP 3.6.0+.



+2


source







All Articles