Skip to content

pfeyz/wiktionary-ipa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wiktionary IPA

This is a script I'm writing to extract all the IPA transcriptions of dictionary words present in the Wiktionary. I'm using a sax XML parser because the xml file is ~ 3 gigs and I don't have enough ram to use a tree-based parser on something that big.

Run get-wiktionary.sh to download and decompress the wiktionary dump of all articles, and parse-wiktionary.py (python3!) to extract the transcriptions.

Still a work in progress.

About

Extract IPA transcriptions from wiktionary xml dump

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published