Ru_kw_eval_datasets

Datasets for evaluation of keyword extraction in Russian

You can find all the datasets in /data directory. The datasets are stored in .jsonlines format (every line in a file is a json). The datasets are split into parts due to github file size limitations.

Sources of data:

Every line in files represents one document. For the RussiaToday, NG and Habrahabr the json line has the following structure:

{'url':'https://url.here', content': 'Text of the document here', 'title': 'Title of the document here', 
 
'summary': 'short summary of the document here', 'keywords': ['key', 'words', 'here']}

For Cyberleninka files the structure of the json is:

{'url':'https://url.here', 'content': 'Text of the document here', 'title': 'Title of the document here',

'abstract': 'abstract of the document here', 'keywords': ['key', 'words', 'here']}

Cyberleninka documents are pdfs converted to raw texts with pdf2text so there may be a bunch of mistakes and random linebreaks. Also note that the keywords were extracted from the documents manually (hell, that was boring!) after conversion and I could easily skipped something. Please inform me if you find undeleted keywords inside the content field.

My e-mail: [email protected]

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property	value
name	`Datasets for evaluation of keyword extraction in Russian`
url	`https://github.com/mannefedov/ru_kw_eval_datasets`
sameAs	`https://github.com/mannefedov/ru_kw_eval_datasets`
description	`Datasets for evaluation of keyword extraction in Russian`
author	`Mikhail Nefedov`

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ru_kw_eval_datasets

Datasets for evaluation of keyword extraction in Russian

Dataset Metadata

About

Releases

Packages

mannefedov/ru_kw_eval_datasets

Folders and files

Latest commit

History

Repository files navigation

Ru_kw_eval_datasets

Datasets for evaluation of keyword extraction in Russian

Dataset Metadata

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages