Natural Language Inference is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
Dataset | # sentence pair |
---|---|
SNLI | 570K |
MultiNLI | 433K |
SciTail | 27K |
- SNLI is the short of Stanford Natural Language Inference, which has 570k human annotated sentence pairs. Thre premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed.
- MultiNLI is short of Multi-Genre NLI, which has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACE-TO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11.
- SciTail entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist “in the wild”. Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises.
Model | Code | Accuracy | Paper |
---|---|---|---|
SAN (Liu et al., 2018) | — | 88.4 | Stochastic Answer Networks for Natural Language Inference |
HCRN (Tay et al., 2018) | — | 80.0 | Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains |
HBMP (Talman et al., 2018) | 86.0 | Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture | |
CAFE (Tay et al., 2018) | — | 83.3 | Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference |
RE2 (Yang et al., 2019) | 86.0 | Simple and Effective Text Matching with Richer Alignment Features | |
MT-DNN (Liu et al., 2019) | 94.1(base)/95.0(large) | Multi-Task Deep Neural Networks for Natural Language Understanding |