Stanford coreNLP: how to get labels, positions and typed dependencies on parse tree

Question

Stanford coreNLP: how to get labels, positions and typed dependencies on parse tree

I am using Stanford coreNLP to parse some text. I am getting several suggestions. On these suggestions, I was able to extract Noun Phrases using TregexPattern. So I get a child tree that is my phrase. I also managed to figure out the chapter of the phrase.

How can one get the position or even the marker / coreLabel of this chapter in a sentence?

Better yet, how can you find the Chapter's dependent relationship with the rest of the sentence?

Here's an example:

public void doSomeTextKarate(String text){

    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    this.pipeline = pipeline;


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {


        SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
        Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
        System.out.println("typedDeps ==>  "+typedDeps);

        SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
        SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);

        List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);

        sentenceTree.percolateHeads(headFinder);
        Set<Dependency<Label, Label, Object> > sentenceDeps =   sentenceTree.dependencies();
        for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
            System.out.println("sentence dep = " + dependency);

            System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
        }


        //find nounPhrases in setence
        TregexPattern pat = TregexPattern.compile("@NP");
        TregexMatcher matcher = pat.matcher(sentenceTree);
        while (matcher.find()) {

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);

            Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
            for (Dependency<Label, Label, Object> dependency : npDeps ) {
                System.out.println("nounPhraseTree  dep = " + dependency);
            }


            Tree head = nounPhraseTree.headTerminal(headFinder);
            System.out.println("head " + head);


            Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
            for (Dependency<Label, Label, Object> dependency : headDeps) {
                System.out.println("head dep " + dependency);
            }


            //QUESTION : 
            //How do I get the position of "head" in tokens or numerizedTokens ?
            //How do I get the dependencies where "head" is involved in typedDeps ? 

        }
    }
}

In other words, I would like to query the ALL dependency relationship where the word head / token / label participates in the ENTIRE clause. So I thought I needed to figure out the position of this token in the sentence in order to correlate it with typed dependencies, but do I have an easier way?

Thanks in advance.

[EDIT]

So, I could find an answer or a beginning.

If I call .label () on the head, I get myself a CoreLabel, which is pretty much what I need to find the rest. Now I can iterate over typed dependencies and look for dependencies where either the dominance label or the dependent label has the same index as my headLabel.

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);
            Tree head = nounPhraseTree.headTerminal(headFinder);
            CoreLabel headLabel = (CoreLabel) head.label();

            System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));

            System.out.println("");
            System.out.println("Iterating over typed deps");
            for (TypedDependency typedDependency : typedDeps) {
                System.out.println(typedDependency.gov().backingLabel());
                System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
                System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());

                if(typedDependency.gov().index() == headLabel.index() ){

                    System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
                            typedDependency.dep().backingLabel().equals(headLabel)));  //why does this return false all the time ? 


                    System.out.println(" !!!!!!!!!!!!!!!!!!!!!  HIT ON " + headLabel + " == " + typedDependency.gov());
                }
            }

So it seems that I can only match my note to my post with typedDeps using the index. I wonder if this way can help this. As you can see in my code, I also tried to use TypedDependency.backingLabel () to check for equality with my headLabel either with a governor or a dependent, but systematically returns false. I wonder why!?

Any feedback is appreciated.

+3

stanford-nlp

azpublic Apr 24 At 10:09 am

source to share

1 answer

Jon Gauthier · Accepted Answer · 2015-04-26T00:11:21+0000

You can get the position of the CoreLabel in your annotated containing clause CoreAnnotations.IndexAnnotation

.

Your method of finding all the dependents of a given word seems to be correct, and is probably the easiest way to do it.

Stanford coreNLP: how to get labels, positions and typed dependencies on parse tree

More articles: