Stanford coreNLP: how to get labels, positions and typed dependencies on parse tree
I am using Stanford coreNLP to parse some text. I am getting several suggestions. On these suggestions, I was able to extract Noun Phrases using TregexPattern. So I get a child tree that is my phrase. I also managed to figure out the chapter of the phrase.
How can one get the position or even the marker / coreLabel of this chapter in a sentence?
Better yet, how can you find the Chapter's dependent relationship with the rest of the sentence?
Here's an example:
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("@NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
In other words, I would like to query the ALL dependency relationship where the word head / token / label participates in the ENTIRE clause. So I thought I needed to figure out the position of this token in the sentence in order to correlate it with typed dependencies, but do I have an easier way?
Thanks in advance.
[EDIT]
So, I could find an answer or a beginning.
If I call .label () on the head, I get myself a CoreLabel, which is pretty much what I need to find the rest. Now I can iterate over typed dependencies and look for dependencies where either the dominance label or the dependent label has the same index as my headLabel.
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
So it seems that I can only match my note to my post with typedDeps using the index. I wonder if this way can help this. As you can see in my code, I also tried to use TypedDependency.backingLabel () to check for equality with my headLabel either with a governor or a dependent, but systematically returns false. I wonder why!?
Any feedback is appreciated.
source to share