European Court of Human Rights OpenData construction process

This repository contains the scripts to build the database and datasets from the European Court of Human Rights OpenData (ECHR-OD) project. The purposes of such repository are many:

Reproducibility: everyone can rebuild the entire database from scratch,
Extensibility: any new version of the database must be created from a updated version of those scripts.
Revision: all cases are automatically processed. There are many corner cases and such repository allow anyone to check the intermediate files to understand if the results are correct or not and locate the root cause of parsing errors.

DOWNLOAD DATA

General information

Official website: ECHR-OD project
Original paper: paper, code, supplementary material
Creation process: https://github.com/echr-od/ECHR-OD_process
Website sources: https://github.com/echr-od/ECHR-OD_website

If you are using the project, please consider citing:

@article{Quemy2019_ECHROD,
  title={European Court of Human Right Open Data project},
  author={Alexandre Quemy},
  journal={CoRR},
  year={2019},
  volume={abs/1810.03115}
}

Building process

The building chain starts from scratch and consists in the following steps:

get_cases_info.py: Retrieve the list and basic information about cases from HUDOC
filter_cases.py: Remove inconsistant, ambiguous or difficult-to-process cases
preprocess_documents.py: Analyse the raw judgments to construct a JSON nested structures representing the paragraphs
process_documents.py: Normalize the documents and generate a Bag-of-Words and TFID representation
generate_datasets.py: Combine all the information to generate several datasets

Installation & Usage

NLTK packages

In order to parse and normalize the documents, the following packages from nltk have to be installed: stopwords, averaged_perceptron_tagger and wordnet. To install them, start bin/download-nltk:

python bin/download-nltk

Webdrivers

In order to automatically retrieve the number of documents available on HUDOC, Selenium is installed as a dependency. For Selenium to work, a webdriver is mandatory and must be manually installed. See Selenium documentation for help.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
bin		bin
build		build
changelog		changelog
config		config
nlp		nlp
.gitignore		.gitignore
README.md		README.md
build.py		build.py
countries.json		countries.json
filter_cases.py		filter_cases.py
generate_datasets.py		generate_datasets.py
generate_stats.py		generate_stats.py
get_cases_info.py		get_cases_info.py
get_documents.py		get_documents.py
normalize_database.py		normalize_database.py
normalize_documents.py		normalize_documents.py
originatingbody.json		originatingbody.json
preprocess_documents.py		preprocess_documents.py
process_documents.py		process_documents.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

European Court of Human Rights OpenData construction process

General information

Building process

Installation & Usage

NLTK packages

Webdrivers

Contributing

Versions

Contributors

About

Releases

Packages

Languages

gijsvd/ECHR-OD_process

Folders and files

Latest commit

History

Repository files navigation

European Court of Human Rights OpenData construction process

General information

Building process

Installation & Usage

NLTK packages

Webdrivers

Contributing

Versions

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages