ParseSiblingExample.java

#

The source code for this example can be obtained here.

This example shows how to use Edison to get siblings from a parse tree.

package edu.illinois.cs.cogcomp.edison.examples;

import java.io.FileNotFoundException;
import java.util.List;

import edu.illinois.cs.cogcomp.core.datastructures.trees.Tree;
import edu.illinois.cs.cogcomp.core.datastructures.trees.TreeParserFactory;
import edu.illinois.cs.cogcomp.core.io.LineIO;
import edu.illinois.cs.cogcomp.edison.sentences.Constituent;
import edu.illinois.cs.cogcomp.edison.sentences.Queries;
import edu.illinois.cs.cogcomp.edison.sentences.Relation;
import edu.illinois.cs.cogcomp.edison.sentences.TextAnnotation;
import edu.illinois.cs.cogcomp.edison.sentences.TreeView;
import edu.illinois.cs.cogcomp.edison.sentences.ViewNames;

public class ParseSiblingExample {

    public static void main(String[] args) throws FileNotFoundException {
#

First, create a text annotation. For more information about this see the basic example.

	String corpus = "2001_ODYSSEY";
	String textId = "002";

	String text = "No 9000 computer has ever made a"
		+ " mistake or distorted information. ";
	TextAnnotation ta = new TextAnnotation(corpus, textId, text);
#

Now we need to add the parse tree. In this example, we will manually add a parse tree. The following parse tree was generated by the Charniak and Johnson reranking parser. It uses the slurp function from the coreUtilities library.

#

The parse tree is loaded from example_data/parse_tree. You can see this parse tree here.

	String parseTree = LineIO.slurp("example_data/parse_tree");
#

Let's add the parse tree to the TextAnnotation object. First, we need to parse the string representation of the parse tree to create a Tree object that is also a part of coreUtilities.

	Tree<String> parse = TreeParserFactory.getStringTreeParser().parse(
		parseTree);
#

Create a new parse view

	double score = 1.0;
	TreeView parseView = new TreeView(ViewNames.PARSE_CHARNIAK,
		"CharniakJohnson2005", ta, score);
#

Set the parse tree of the first (and in this case, only) sentence.

	int sentenceId = 0;
	parseView.setParseTree(sentenceId, parse);
#

Now we are ready to pull out siblings. Suppose we want to get the siblings for the VP that corresponds to the verb phrase made a mistake. First, we need to get the constituent from the view. Here, we use the linq like architecture that Edison provides to get the constituents. All views implement the interface IQueryable, which allows us to make queries of the form "Get me all constituents in this view where ...". Several queries are available in the class Queries.

	for (Constituent c : parseView.where(Queries.startsAt(5)).where(
		Queries.endsAt(8))) {

	    System.out.println("Siblings of " + c.getLabel());
#

The sibling of c are all constituents that share the same parent. So let's get the parent by getting the source of the incoming edge. Since we are dealing with a tree, there can be at most one incoming edge.

	    List<Relation> incomingRelations = c.getIncomingRelations();

	    if (incomingRelations.size() == 0) {
		System.out
			.print("No siblings because c is the root of the tree");
	    } else {

		assert incomingRelations.size() == 1;
		Constituent parent = incomingRelations.get(0).getSource();
#

Now that we have the parent, all the children of the parent except c are c's siblings.

		for (Relation outgoingRelation : parent.getOutgoingRelations()) {
		    Constituent child = outgoingRelation.getTarget();
		    if (child == c)
			continue;

		    if (child.getEndSpan() <= c.getStartSpan())
			System.out.println(child.getLabel() + "\t Left child");
		    else
			System.out.println(child.getLabel() + "\t Right child");
		}
	    }
	}
    }
}