Curator Annotator

Curator annotators implement one of the Curator architecture services. Each service returns a different type of annotation data structure. The services are:

If your annotator does not fit into any of these service patterns you may have to define a new service.

Writing the handler

A handler is a class that implements one of the above services. This is as simple as implement the inner Iface interface of the service (i.e., Labeler.Iface, Parser.Iface etc).

All handlers must implement all the methods of the interface. All services inherit from the BaseService which defines the following methods:

  • ping() - return true.

  • getName() - a human readable long name for the service

  • getVersion() - the version number for the service as a string (i.e., 1.0, 0.1 etc).

  • getSourceIdentifier() - return the an identifier to be used in source fields. This should be of the form <shortname>-<version> where <shortname> contains no spaces or hypens and <version> is parsable into a double (using Java's Double.valueOf). i.e.,. server-1.0. Any data structures returned must have the source field populated with the result of this function.

Pay special attention to the getSourceIdentifier() method as the Curator uses this information to determine when Records are stale/out of date.

labelRecord, clusterRecord, parseRecord

The final method to implement is one of labelRecord, clusterRecord, parseRecord, depending on the service. This is the method that should do all the work. Typically this involves:

  1. Unpacking the Record instance into the native object of the underlying annotator implementation.

  2. Calling the annotator.

  3. Packing the result into a Labeling, Clustering or Forest (or list of).

  4. Setting the source field with the result of getSourceIdentifier().

Example handlers

For example handlers see the curator-annotators directory in the source package.

Running a handler

Assuming your handler is called !MyTaggerHandler that implemented a Labeler. Write a main method that does the following:

#!java

MyTaggerHandler handler = new MyTaggerHandler();

int port = 9090;

Labeler.Processor processor = new Labeler.Processor(handler);

TNonblockingServerTransport serverTransport = new TNonblockingServerSocket(port);

TServer server = new TNonblockingServer(processor, serverTransport);

server.serve();

Exposing the annotator via the Curator Server

Update the annotators.xml according to the Curator Server README.

In general this means adding a section of the form:

#!xml

<annotator>

  <type>labeler</type>

  <field>pos</field>

  <host>hostname:9091</host>

  <requirement>sentences</requirement>

  <requirement>tokens</requirement>

</annotator>

The fields are:

type

with the value labeler, multilabeler, clustergenerator, parser or multiparser.

field

the field name in the Record the annotation will be stored.

host

hostname:port where the annotator is running

requirement

annotations that must be present before calling this annotator.