Skip to content
This repository has been archived by the owner on May 4, 2021. It is now read-only.

Commit

Permalink
Merge branch 'master' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
achimr committed Oct 25, 2017
2 parents 4be52d0 + 54bbd3e commit 2dd960b
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions monolingual/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
For monolingual [Common Crawl](http://commoncrawl.org) data and code to process it please refer to these resources:
* [University of Edinburgh N-gram site](http://statmt.org/ngrams)
* Code to process corpora: https://github.com/kpu/preprocess
* Code to produce raw monolingual files from CommonCrawl: https://github.com/treigerm/CommonCrawlProcessing
* Alternative monolingual data extraction under development in ParaCrawl project: https://github.com/paracrawl/extractor

0 comments on commit 2dd960b

Please sign in to comment.