-
Notifications
You must be signed in to change notification settings - Fork 142
Pretraining data
zhezhaoa edited this page Aug 25, 2023
·
2 revisions
CLUECorpusSmall consists of news, web, wiki, and comments corpus. The original data and detailed description can be found here.
Corpus | Link |
---|---|
CLUECorpusSmall | https://share.weiyun.com/sC6PMhxx |
CLUECorpusSmall (BERT format) | https://share.weiyun.com/9SPPGUOK |
News Commentary v13 consists of parallel data and can be downloaded from here.
Corpus | Link |
---|---|
news-Commentary-v13-en-zh | https://share.weiyun.com/PLMxw6ae |
news-Commentary-v13-zh-en | https://share.weiyun.com/5rMwRhDi |
news-Commentary-v13-en-zh_sampled | https://share.weiyun.com/1KTxq3Dc |
CIFAR100_nolabel consists of 50 thousand images which can be used by unsupervised pre-training. CIFAR100_nolabel can be downloaded from here
Corpus | Link |
---|---|
CIFAR100_nolabel | https://share.weiyun.com/M2tA9P8p |