Releases
v2022.3Q
Train Data Release: v2022.3Q
Latest
Beomi
released this
07 Nov 06:55
๋ถ๊ธฐ๋ณ ์ ๊ท ๋ฐ์ดํฐ์
๋ฆด๋ฆฌ์ฆ: v2022.3Q
๋ฐ์ดํฐ์
์ ๋ณด
v2022.3Q = 2022๋
๋ 3๋ถ๊ธฐ ๋ฆด๋ฆฌ์ฆ
๋ฐ์ดํฐ์
ํฌํจ: v2019.1Q - v2022.3Q
์ ์ฒด ๋ฐ์ดํฐ ์(๊ณต๋ฐฑ์ด ์ ์ธ): 345,452,030
์ผ์: 2019.01์ ~ 2022.09์
TrainData_v1
์์ ์ฐจ์ด์
๋์ผ ํ๋์ ๋๊ธ๊ณผ ๋๋๊ธ์ ๋จ์ผ linebreak (\n
)
๋ค๋ฅธ ํ๋์ ๋๊ธ๊ฐ์๋ ๋๊ฐ์ linebreak (\n\n
)
์ผ์๋ณ๋ก ์ค๋ณต ํ
์คํธ ์ ๊ฑฐ
๊ทธ ์ธ์ clean ์ฒ๋ฆฌ ์ต๋ํ ํ์ง ์์
Quarterly Aggregated Korean News Comments Dataset: v2022.3Q
Dataset Spec
v2022.3Q = 2022 3Q Release
Add Dataset from v2019.1Q ~ v2022.3Q
Total Lines(w/o Blank lines): 345,452,030
Date Range: 2019.01 ~ 2022.09
Difference from TrainData_v1
Reply comments(in same thread) are grouped by 1 linebreak(\n
)
Different threads are splitted by whiteline(\n\n
)
Duplicated comments within a day are removed (only the first comment left)
texts are raw as much as possible
You canโt perform that action at this time.