A Contrastive Learning Method for Automated Fact-Checking (C2E2)

This repository includes the implementation of the C2E2 contrastive learning method for a Korean fact-check dataset.

Data

factcheck-ko

The Korean fact-checking dataset can be obtained from this repository.

data/wiki_claims.json: human-annotated Dataset for the Factcheck
data/train_val_test_ids.json: Lists of claim ids for train/validation/test split
data/wiki/wiki_docs.json: Wikipedia documents corresponding to claims in wiki_claims.json
dr/dr_results.json

Newly processed data

pretrain/data/c2e2_data.csv
pretrain/data/simcse_data.csv

Contrastive pretraining

C2E2

cd pretrain
python ./train.py --input_df="c2e2_data.csv" --pos_neg="c2e2"

SimCSE

cd pretrain
python ./train.py --input_df="simcse_data.csv" --pos_neg="simcse"

The backbone model is fixed in our implementation as KPFBERT.
You can obtain the KPFBERT-C2E2 pretrained checkpoint here.

Inference (Sentence Selection)

python sentence_selection/embedding_based_similarity.py --split="test" --gpu_number=0 --checkpoints_dir="./pretrain/checkpoints/" --max_length=512 --model="kosimcse_kpfbert_c2e2" --model_name="kpfbert_c2e2_checkpoint.pt"

Reference

For more details on the task and method, please take a look at the paper published in the Journal of KIISE (in Korean).

@article{송선영2023팩트체킹,
  title={자동화 팩트체킹을 위한 대조학습 방법},
  author={송선영 and 안제준 and 박건우},
  journal={정보과학회논문지},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
pretrain		pretrain
sentence_selection		sentence_selection
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Contrastive Learning Method for Automated Fact-Checking (C2E2)

Data

factcheck-ko

Newly processed data

Contrastive pretraining

Inference (Sentence Selection)

Reference

About

Releases

Packages

Languages

ssu-humane/factcheck-ko-c2e2

Folders and files

Latest commit

History

Repository files navigation

A Contrastive Learning Method for Automated Fact-Checking (C2E2)

Data

factcheck-ko

Newly processed data

Contrastive pretraining

Inference (Sentence Selection)

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages