Covid19 is a global epidemic that affects many aspects of society and is discussed a lot on social network such as Facebook, Twitter. What topics become popular together? And these are topics that are discussed frequently or only appear in a certain period of time? The purpose of this project is finding popular consistent topics in each time period from the COVID19 Tweets dataset. This project will implement two approaches to finding popular consistent topics:
- The first approach is trying some finding frequent items set algorithms to find words appear together in many tweets in a time period.
- The second approach is apply some topic modeling methods to find latent topic in each tweets and then find topics become together in a time period by applying some finding frequent items set algorithms.
python3.6
pip install -r requirements.txt
python finding_similar_items.py [--pre_process_type remove_twitter_account] [--alg fpg] [--min_sup 0.01] [--min_conf 0.01]
pre_process_type
: method when applying pre-process tweet contentremove_url
remove_twitter_account
remove_url_replace_twitter_account
remove_twitter_account_replace_url
replace_twitter_account_and_url
alg
: finding similar item set algorithmfpg
: Frequent Pattern Growth algorithmpcy
: PCY algorithmapriori
: A-Priori algrithm
min_sup
,min_conf
: support and confidence parameter of fiding similar item set algorithm- The result in
output
folder
python finding_popular_topic.py [--pre_process_type remove_twitter_account] [--alg_similar fpg] [--alg_topic LDA] [--num_topic 20]
pre_process_type
: same as finding similar item setsalg_similar
: finding similar topics algorithmsfpg
: Frequent Pattern Growth algorithmpcy
: PCY algorithmapriori
: A-Priori algrithm
alg_topic
: algorithm for finding latent topics from each tweetsLDA
: Latent Dirichlet AllocationBTM
: Biterm topic model
num_topic
: number of topic- The results in
output
folder