edu.illinois.cs.cogcomp.lbj.coref.ir
Class Chunk

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.ir.Chunk
All Implemented Interfaces:
java.io.Serializable, java.lang.Comparable<Chunk>

public class Chunk
extends java.lang.Object
implements java.io.Serializable, java.lang.Comparable<Chunk>

Represents a chunk of text in the context of a document. Contains start and end characters and a mechanism for determining the start and end word numbers, and the words, contained in the chunk. Capable of being sorted.

See Also:
Serialized Form

Field Summary
private  Doc m_doc
           
private  int m_end
          In count characters
private  int m_start
           
private  java.lang.String m_text
          In count characters
private static long serialVersionUID
          This ID should change if the serialization changes.
 
Constructor Summary
Chunk(Doc d, int start, int end, java.lang.String text)
          Constructs a chunk given a range of characters, some text, and a document for context.
 
Method Summary
 int compareTo(Chunk c)
          Compare the chunk in a way that Sorts ascending, first by start positions, or if starts are equal, by end positions.
 boolean equals(java.lang.Object o)
          Determines whether this chunk is equal to a specified object.
 java.lang.String getCleanText()
          Gets a cleaned text that has newlines replaced with spaces.
 int getEnd()
          Gets the end character number of the chunk.
 int getEndWN()
          Gets the word number of the last word of the chunk.
 int getStart()
          Gets the start character number of the chunk.
 int getStartWN()
          Gets the word number of the first word of the chunk.
 java.lang.String getText()
          Gets the text of the chunk.
 java.util.List<java.lang.String> getWords()
          Gets the words of the chunk.
 int hashCode()
          Gets the hash code of this chunk.
 java.lang.String toString()
          Gets a string representation of the chunk
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
This ID should change if the serialization changes.

See Also:
Constant Field Values

m_doc

private Doc m_doc

m_start

private int m_start

m_end

private int m_end
In count characters


m_text

private java.lang.String m_text
In count characters

Constructor Detail

Chunk

public Chunk(Doc d,
             int start,
             int end,
             java.lang.String text)
Constructs a chunk given a range of characters, some text, and a document for context.

Parameters:
d - The document containing the chunk. (Only a reference is kept; the document is not copied).
start - The position of the character that starts this Chunk.
end - The position of the character that ends this Chunk.
text - A convenience access to the text between start and end. This text should match the text in the character range in the document. WARNING: text is not used for comparison or hashing.
Method Detail

toString

public java.lang.String toString()
Gets a string representation of the chunk

Overrides:
toString in class java.lang.Object
Returns:
A string representation of the chunk.

getStart

public int getStart()
Gets the start character number of the chunk.

Returns:
The position of the character that starts this Chunk.

getEnd

public int getEnd()
Gets the end character number of the chunk.

Returns:
The position of the character that ends this Chunk.

getStartWN

public int getStartWN()
Gets the word number of the first word of the chunk. This is the position in the document of the first word of the chunk.

Returns:
the word number of the first word in the Chunk.

getEndWN

public int getEndWN()
Gets the word number of the last word of the chunk. This is the position in the document of the last word of the chunk.

Returns:
the word number of the last word in the Chunk.

getWords

public java.util.List<java.lang.String> getWords()
Gets the words of the chunk.

Returns:
An unmodifiable view of the sublist (of the Doc's getWords()) of words contained in this Chunk.

getText

public java.lang.String getText()
Gets the text of the chunk.

Returns:
The text; the text is for convenience only and does not affect equality or hashCode() value.

getCleanText

public java.lang.String getCleanText()
Gets a cleaned text that has newlines replaced with spaces.


equals

public boolean equals(java.lang.Object o)
Determines whether this chunk is equal to a specified object. Chunks are equal if they occupy the same character positions. equals and hashcode methods inspired by article hosted on Technofundo, called "Euals and Hash Code", by Manish Hatwalne Available as of Feb 27 2007 at URL: http://www.geocities.com/technofundo/tech/java/equalhash.html

Overrides:
equals in class java.lang.Object
Parameters:
o - Any object.
Returns:
Whether this is equal to the specified object.

hashCode

public int hashCode()
Gets the hash code of this chunk. The hash code of a chunk is determined entirely by its start and end character positions. equals and hashcode methods inspired by article hosted on Technofundo, called "Euals and Hash Code", by Manish Hatwalne Available as of Feb 27 2007 at URL: http://www.geocities.com/technofundo/tech/java/equalhash.html

Overrides:
hashCode in class java.lang.Object

compareTo

public int compareTo(Chunk c)
Compare the chunk in a way that Sorts ascending, first by start positions, or if starts are equal, by end positions.

Specified by:
compareTo in interface java.lang.Comparable<Chunk>
Parameters:
c - Another chunk.
Returns:
-1 if this chunk is first, 0 if they are the same, and 1 if this chunk appears after c.