diff --git a/README.md b/README.md
index f2e7c67..dc54f5d 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,127 @@
-# negation-datasets
-Curated datasets for sentence negation.
+<p align="center"><img width="500" src="https://github.com/dmlls/cannot-dataset/assets/22967053/a380dfdf-3514-4771-90c4-636698d5043d" alt="CANNOT dataset"></p>
+<p align="center" display="inline-block">
+  <a href="https://github.com/dmlls/cannot-dataset/">
+    <img src="https://img.shields.io/badge/version-1.1-green">
+  </a>
+</p>
+<h2 align="center">Compilation of ANnotated, Negation-Oriented Text-pairs</h2>
+
+<br><br>
+
+## Introduction
+
+**CANNOT** is a dataset that focuses on negated textual pairs. It currently
+contains **77,376 samples**, of which roughly of them are negated pairs of
+sentences, and the other half are not (they are paraphrased versions of each
+other).
+
+The most frequent negation that appears in the dataset is verbal negation (e.g.,
+will → won't), although it also contains pairs with antonyms (cold → hot).
+
+<br>
+
+## Format
+
+The dataset is given as a
+[`.tsv`](https://en.wikipedia.org/wiki/Tab-separated_values) file with the
+following structure:
+
+| premise     | hypothesis                                         | label |
+|:------------|:---------------------------------------------------|:-----:|
+| A sentence. | An equivalent, non-negated sentence (paraphrased). | 0     |
+| A sentence. | The sentence negated.                              | 1     |
+
+<br>
+
+The dataset can be easily loaded into a Pandas DataFrame by running:
+
+```Python
+import pandas as pd
+
+dataset = pd.read_csv('negation_dataset_v1.0.tsv', sep='\t')
+
+```
+
+<br>
+
+## Construction
+
+The dataset has been created by cleaning up and merging the following datasets:
+
+1. _Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
+    Negation_ (see
+[`datasets/nan-nli`](https://github.com/dmlls/cannot-dataset/tree/main/datasets/nan-nli)).
+
+2. _GLUE Diagnostic Dataset_ (see
+[`datasets/glue-diagnostic`](https://github.com/dmlls/cannot-dataset/tree/main/datasets/glue-diagnostic)).
+
+3. _Automated Fact-Checking of Claims from Wikipedia_ (see
+[`datasets/wikifactcheck-english`](https://github.com/dmlls/cannot-dataset/tree/main/datasets/wikifactcheck-english)).
+
+4. _From Group to Individual Labels Using Deep Features_ (see
+[`datasets/sentiment-labelled-sentences`](https://github.com/dmlls/cannot-dataset/tree/main/datasets/sentiment-labelled-sentences)).
+In this case, the negated sentences were obtained by using the Python module
+[`negate`](https://github.com/dmlls/negate).
+
+5. _It Is Not Easy To Detect Paraphrases: Analysing Semantic Similarity With
+Antonyms and Negation Using the New SemAntoNeg Benchmark_ (see
+[`datasets/antonym-substitution`](https://github.com/dmlls/cannot-dataset/tree/main/datasets/antonym-substitution)).
+
+<br>
+
+Additionally, for each of the negated samples, another pair of non-negated
+sentences has been added by paraphrasing them with the pre-trained model
+[`🤗tuner007/pegasus_paraphrase`](https://huggingface.co/tuner007/pegasus_paraphrase).
+
+Finally, the swapped version of each pair (premise ⇋ hypothesis) has also been
+included, and any duplicates have been removed.
+
+The contribution of each of these individual datasets to the final CANNOT
+dataset is:
+
+| Dataset                                                                   | Samples    |
+|:--------------------------------------------------------------------------|-----------:|
+| Not another Negation Benchmark                                            |      118   |
+| GLUE Diagnostic Dataset                                                   |      154   |
+| Automated Fact-Checking of Claims from Wikipedia                          |   14,970   |
+| From Group to Individual Labels Using Deep Features                       |    2,110   |
+| It Is Not Easy To Detect Paraphrases                                      |    8,597   |
+| <p align="right"><b>Total</b></p>                                         | **25,949** |
+
+_Note_: The numbers above include only the original queries present in the
+datasets.
+
+<br>
+
+## Contributions
+
+Questions? Bugs...? Then feel free to [open a new
+issue](https://github.com/dmlls/cannot-dataset/issues/new/).
+
+<br>
+
+## Acknowledgments
+
+We thank all the previous authors that have made this dataset possible:
+
+Thinh Hung Truong, Yulia Otmakhova, Timothy Baldwin, Trevor Cohn, Jey Han Lau,
+Karin Verspoor, Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer
+Levy, Samuel R. Bowman, Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry,
+Joonsuk Park, Dimitrios Kotzias, Misha Denil, Nando De Freitas, Padhraic Smyth,
+Teemu Vahtola, Mathias Creutz, and Jörg Tiedemann.
+
+<br>
+
+## License
+
+The CANNOT dataset is released under [CC BY-SA
+4.0](https://creativecommons.org/licenses/by-sa/4.0/).
+
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
+    <img alt="Creative Commons License" width="100px" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png"/>
+</a>
+
+<br><br>
+
+## Citation
+tba
diff --git a/cannot-dataset/README.md b/cannot-dataset/README.md
deleted file mode 100644
index 739297f..0000000
--- a/cannot-dataset/README.md
+++ /dev/null
@@ -1,47 +0,0 @@
-### Negation Dataset
-
-The version 1.1 of the dataset contains **77376 samples**, of which roughly of
-them are negated pairs of sentences, and the other half are not (they are
-paraphrased versions of each other).
-
-<br>
-
-The dataset has been created by cleaning up and merging the following datasets:
-
-1. _Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
-Negation_ (see
-[`nan-nli`](https://github.com/dmlls/negation-datasets/tree/main/nan-nli)).
-
-2. _GLUE Diagnostic Dataset_ (see
-[`glue-diagnostic`](https://github.com/dmlls/negation-datasets/tree/main/glue-diagnostic)).
-
-3. _Automated Fact-Checking of Claims from Wikipedia_ (see
-[`glue-diagnostic`](https://github.com/dmlls/negation-datasets/tree/main/wikifactcheck-english)).
-
-4. _From Group to Individual Labels Using Deep Features_ (see
-[`sentiment-labelled-sentences`](https://github.com/dmlls/negation-datasets/tree/main/sentiment-labelled-sentences)).
-In this case, the negated sentences were obtained by using the Python module
-[`negate`](https://github.com/dmlls/negate).
-
-
-Additionally, for each of the negated samples, another pair of non-negated
-sentences has been added by paraphrasing them with the pre-trained model
-[`🤗tuner007/pegasus_paraphrase`](https://huggingface.co/tuner007/pegasus_paraphrase).
-
-Finally, the dataset from _It Is Not Easy To Detect Paraphrases: Analysing
-Semantic Similarity With Antonyms and Negation Using the New SemAntoNeg
-Benchmark_ (see
-[`antonym-substitution`](https://github.com/dmlls/negation-datasets/tree/main/antonym-substitution))
-has also been included. This dataset already provides both the paraphrased and
-negated version for each premise, so no further processing was needed.
-
-<br>
-
-The resulting file is a
-[`.tsv`](https://github.com/dmlls/negation-datasets/blob/main/negation-dataset/negation_dataset_v1.1.tsv)
-with the following format:
-
-| premise     | hypothesis                              | label |
-|:------------|:----------------------------------------|:-----:|
-| A sentence. | The sentence non-negated (paraphrased). | 0     |
-| A sentence. | The sentence negated.                   | 1     |