Project Plan (2/5/2020)

Gather and clean data; filter it into dataframes
General analysis of the corpus of Japanese words in general; figure out what percentage of the words in the corpus are in katakana
Filter out anything non-katakana and analyze the length; try to come up with a metric for what shortens a word (I'm not sure what else I could analyze; try to come up with more ideas later)
Maybe bring in Google n-gram data to track usage of these loan words over time? Not sure if this is possible, but it's something I'm interested in
Perhaps some sociolinguistic analysis based on the data - I'm interested in the idea of linguistic imperialism and since I know many gairaigo are of English origin, the analysis of these loanwords could have something of a connection there.

Provide feedback