- Gather and clean data; filter it into dataframes
- General analysis of the corpus of Japanese words in general; figure out what percentage of the words in the corpus are in katakana
- Filter out anything non-katakana and analyze the length; try to come up with a metric for what shortens a word (I'm not sure what else I could analyze; try to come up with more ideas later)
- Maybe bring in Google n-gram data to track usage of these loan words over time? Not sure if this is possible, but it's something I'm interested in
- Perhaps some sociolinguistic analysis based on the data - I'm interested in the idea of linguistic imperialism and since I know many gairaigo are of English origin, the analysis of these loanwords could have something of a connection there.