SEA Lang Codes Collection

Mini Repo Containing Lang Codes (ISO 639-1, ISO 639-2, or ISO 639-3) from Countries in Southeast Asia (SEA).

Goal

The NLP Research & Development activities in SEA is starting to be recognized throughout the world, due to its potential of cultural, linguistic, and economy. However there are no accessible resources for activist or practitioners to obtain relevant information for Lang Codes that are spoken/originated in SEA. This mini-repo resource is aimed to aid people to quickly check lang codes in SEA region, for the following purposes:

Do a quick-check on the lang coverage in for any new models released
Filter and Extract any new datasets released that includes SEA languages.

Getting Started

The list of langs informations are available on json_resources/sea_country_lang_full_info.json, and its lite version (containing json dict of country->lang_list only) available on json_resources/sea_country_lang_list.json

Construction, explained Step-by-Step

The collection itself was constructed using a hybrid approach:

The initial collection were retrieved from Babel, using manually constructed SEA Country Info available on json_resources/sea_country_info_alpha2.json and stored in json_resources/sea_country_langs_babel.json. The lang names were obtained from iso639-lang package by decoding its code using ISO 639-1, ISO 639-2, or ISO 639-3.
The collection then filtered using manual curation effort w/ codes available in json_add_remove_collection.py. The collection will only consider languages that were originated and/or commonly used by indigenous people/tribes in that Country.
Step 1 and Step 2 results are filtered and/or added by using code_updater.py

This code is executed in Python 3.11.3, so it's recommended to use the same version for reproducibility. Any contribution to this collection will be greatly appreciated by creating PR/raising Issue!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
json_resources		json_resources
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
code_retrieval.py		code_retrieval.py
code_updater.py		code_updater.py
json_add_remove_collection.py		json_add_remove_collection.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEA Lang Codes Collection

Goal

Getting Started

Construction, explained Step-by-Step

About

Releases

Packages

Languages

License

sabilmakbar/sea_lang_codes

Folders and files

Latest commit

History

Repository files navigation

SEA Lang Codes Collection

Goal

Getting Started

Construction, explained Step-by-Step

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages