Projects

I have worked on a broad range of projects in machine learning and natural language processing.
Details of some of the projects are listed below.

Amortized Inference

For many structured prediction tasks in natural language processing and computer vision, the prediction
phrase is complicated and time consuming. As we move in an era where we dream of processing the entire web
on a daily basis or we want to process the twitter feeds in real time, we need to use the tools over billions
of instances. Many of these instances are similar (e.g. many people blaming Democrats or many people blaming
Republicans for the Government shutdown). Our tools will process many similar instances and therefore it
is bound to repeat the same computations again and again. The method we proposed was to store the predicted
structure for inputs as we process them and for each new input, check the cache of previously processed inputs
to see if some theorecital conditions are satisfied to guarantee that one of structures predicted previously
will be the optimal structure for the new input (without solving the new input problem). As we process more,
our cache will get richer and we will need to solve fewer problems from scratch in future as we will be able
to answer most of the problems from cache alone.

Relevant Publications

Gourab Kundu*, Vivek Srikumar*, Dan Roth, Margin based Decomposed Amortized Inference, Annual Meeting of
Association of Computational Linguistics (ACL), 2013 Poster
Vivek Srikumar*, Gourab Kundu*, Dan Roth, On Amortizing Inference Cost for Structured Prediction, Empirical
Methods on Natural Language Processing (EMNLP)2012 Poster

Domain Adaptation

Statistical learning methods have an assumption that the training and test data are drawn from the same
distribution. Unfortunately, in fields like language processing and computer vision and others, quite often
we have to train models from training data of one domain and need to test it on data from a different domain.
As labeling data is very expensive and the number of different domains is very large, it is infeasible to label enough data
from each domain and train a separate model for each domain. The solution is then to train a model from training
data of some domain and then adapt the model for a new domain without any labeled data. The direction I am working on
is to use the same model on all domains and change the text from each domain so that the same model can have high accuracy
on this adapted text. Text adaptation for a new domain can be less time consuming than building a new model for
each new domain. Moreover, model adaptation is infeasible when the model is itself a mixture of several independent
models and tools. In these cases, text adaptation is the only feasible solution.

Relevant Publications

Gourab Kundu, Dan Roth, Adapting Text instead of the Model : An Open Domain Approach, International Conference on
Computational Natural Language Learning (CoNLL), Portland, 2011. (BEST STUDENT PAPER AWARD)
Gourab Kundu, Ming-wei Chang, Dan Roth, Prior Knowledge Driven Domain Adaptation, ICML Workshop on Combining Learning
Strategies to Reduce Label Cost, Seattle, 2011
Gourab Kundu, Ming-wei Chang, Dan Roth, ChengXiang Zhai, A New Framework for Domain Adaptation without Model Retraining,
Computer Science Research and Technical Reports, University of Illinois (2013)

Scientific Literature Analysis

One of the basic steps of analysing the scientific literature is to identify the key techniques, problems and applications.
In this project, I built a bootstrapper from only a few seed patterns using a variant of Yarowsky algorithm to identify the
techniques and applications from a scientific domain and then did coreference acorss them using the citation network. The
idea is that two mentions should be coreferent if they are often used in the context of similar citations. This analysis proved
essential in future trend analysis of scientific research. I showed how it is possible to answer questions like what techniques
are used in what appplications, how does a concept change role over time for example, it tends to be used more as a technique instead
of an application.

Relevant Publications

Chen-tse Tsai, Gourab Kundu, Dan Roth, Concept-Based Analysis of Scientific Literature, Conference on Information and Knowledge
Management (CIKM), 2013

Textual Entailment

Textual Entailment is the task of determining whether a given text entails a given hypothesis. The classifier for entailment decision
depends on the latent alignment of the text and the hypothesis and in turn, this latent alignment must be learned to be predictive of
the final task in consideration. In this project, I framed alignment as a constrained optimization problem and jointly learned the
alignment and entailment decisions.

Relevant Publications

Mark Sammons, V.G.Vinod Vydiswaran, Tim Vieira, Nikhil Johri, Ming-Wei Chang, Dan Goldwasser, Vivek Srikumar, Gourab Kundu,
Yuancheng Tu, Kevin Small, Joshua Rule, Quang Do, Dan Roth, Relation Alignment for Textual Entailment Recognition, TAC 2009
Workshop, Maryland, USA Poster Oral Presentation

Computational Learning Theory

Teaching is a scenario where a teacher knows the true concept and wants to teach it to a learner using examples. The complexity of
teaching a concept class can be thought of as a lower bound of any other form of learning. In this project, I analysed a real world
scenario where firstly, the teacher does not know the target concept exactly, instead has some partial knowledge and secondly, he is
constrained to teach using examples from a set of available examples instead of all possible examples. I showed how even in these cases,
the sample complexity is polynomial for teaching several important concept classes under different characterizations of the set of
available examples to the teacher.

Relevant Publications

Gourab Kundu, Dan Roth, Teaching with Examples in a Real Environment, Computer Science Research and Technical Reports, University of Illinois (2013)