Reading Comprehension over Multiple Sentences


MultiRC (Multi-Sentence Reading Comprehension) is a dataset of short paragraphs and multi-sentence questions that can be answered from the content of the paragraph.
We have designed the dataset with three key challenges in mind:
  • The number of correct answer-options for each question is not pre-specified. This removes the over-reliance of current approaches on answer-options and forces them to decide on the correctness of each candidate answer independently of others. In other words, unlike previous work, the task here is not to simply identify the best answer-option, but to evaluate the correctness of each answer-option individually.
  • The correct answer(s) is not required to be a span in the text.
  • The paragraphs in our dataset have diverse provenance by being extracted from 7 different domains such as news, fiction, historical text etc., and hence are expected to be more diverse in their contents as compared to single-domain datasets.
The goal of this dataset is to encourage the research community to explore approaches that can do more than sophisticated lexical-level matching.


Each question is associated with several choices for answer-options, out of which one or more correctly answer the question. Each instance consists of a multi-sentence paragraph, a question, and answer-options. All instances were constructed such that it is not possible to answer a question correctly without gathering information from multiple sentences.
Here is an example:
Sent 1: Most young mammals, including humans, like to play.
Sent 2: Play is one way they learn the skills that they will need as adults.
Sent 3: Think about how kittens play.
Sent 4: They pounce on toys and chase each other.
Sent 5: This helps them learn how to be better predators.
Sent 6: Big cats also play.
Sent 7: The lion cubs pictured below are playing.
Sent 8: At the same time, they are also practicing their hunting skills.
Sent 9: The dogs are playing tug-of-war with a toy.
Sent 10: What do you think they are learning by playing together this way?
Sent 11: Human children learn by playing as well.
Sent 12: For example, playing games and sports can help them learn to follow rules.
Sent 13: They also learn to work together.
Sent 14: The young child pictured below is playing in the sand.
Sent 15: She is learning about the world through play.
Sent 16: What do you think she might be learning?
  • Question: What do human children learn by playing games and sports?
  • to follow rules
    They learn to follow rules and work together.
    They learn about the world
    Learn to work together
    skills that they will need as adult
    they learn about how to cheat
    how to hunt
    only learns to follow rules
    only learns working together
    hunting skills


Here we show a summary of the best results on our dataset:
System Paper Dev Test(R1)
F1m F1a F1m F1a
Human (avg of 4) (Khashabi et al, 2018) 86.40 83.80 84.32 81.82
Logistic Regression (Khashabi et al, 2018) 66.08 63.77 66.68 63.46
Information Retrieval (Khashabi et al, 2018) 64.25 60.04 54.83 53.94
Random baseline (Khashabi et al, 2018) 46.12 46.74 47.11 47.57
To see our evaluation script and a few baseline scores take a look at this repository. For instructions on how to evaluate your system, refer to to our CodaLab worksheet. To be added to the leaderboard, please email danielkh [at] cis.upenn [dot] edu with a link to your published/arXiv paper on this dataset.

Releases and Downloads

The entire corpus consists of ~10K questionss (~6k multiple-sentence questions). We release about 60% of this data as training/dev data.

The rest of the data is saved for evaluation. Every few months we will include a new unseen additional evaluation data in CodaLab. The purpose of this is to prevent unintentional overfitting over time, through many evaluations. Here is our current expected release plan:

Release Tag Release Date Released?
R1 Spring, 2018 icon name
R2 Winter, 2019 icon name
R3 Summer, 2019 icon name
R4 Fall, 2019 icon name

This work is actively being developed. To hear about the most recent updates and changes, register in our list:

If there are any problems please discuss it at our github issue tracker.


If you find this data helpful in your work, please cite this paper: