This is our mini-project for SC1015 (Intro to Data Science and Artificial Intelligence).
@nghochi123
@Mel-NLY
Here's the sequence of files that we'd recommend for you to look through. There are more insights and explanations in the Jupyter Notebooks:
The dataset is called the Global Terrorism Database, obtained from the National Consortium for the Study of Terrorism and Responses to Terrorism (START).
Maintained by researchers headquartered at the University of Maryland. The dataset consists of information on more than 200,000 global terrorist attacks.
Using this dataset, we are trying to find the answer to the following categorical questions:
- What determines a successful terrorist attack?
- What are terrorists really after?
We realized that wanting to determine what makes or breaks a successful terrorist attack was important, so that we could focus on the important combination of features. Coming up with solutions that target the more important features to prevent the attacks from succeeding and hurting many others in the process.
Following the same methodology, being able to pick out the motives that are the most harmful and common would also serve to reduce the number of successful terror attacks.
- Random Forest Classifier
- Logistic Regression
- K-Nearest Neighbours Classifier
- Support Vector Classification (SVC)
- Neural Networks
- Stochastic Gradient Descent Classifier
- Kernel SVM
- Decision Tree Classifier
- AdaBoost
- Gradient Boosting
-
What determines a successful terrorist attack?
These was the best combination of features found that could predict the successful-ness of a terrorist attack.- Number of Kills
- Timestamp
- AttackType
- Number of Wounded
- WeaponType
- Month
-
What are terrorists really after?
Retaliation was found to be the most common of motives among the terrorists.
Folium - Interactive Leaflet Map
Keplergl - Geospatial Analytic Visualizations
Plotly - Interactive Web-based Visualizations
Pickle - Serializing and deserializing object structures
Here are some of the lessons we learnt through the journey of developing this project. More of our insights and realisations can be found in the Jupyter Notebooks, and Presentation.
- We used all the rows initially for dataset cleaning, which resulted in us dropping too many points. Instead we first picked out the relevant columns, then cleaned those values. Resulting in us obtaining a fuller dataset.
- Dataset had an imbalance in number of failed and successful columns. Resulting in us having to use weights for rebalancing.
- Library versioning could've been affecting the results obtained on different OS (Mac/Windows). Therefore, this required us to check the requirements.txt, tried to shift to a Windows device, and make use of Google Collab.
- Initially, we had all of our codes in a file, which resulted in our neural network not having enough memory to run (and the kernel failing). Hence we split up the code files into different Jupyter Notebook files.
- Split the Neural Network Model into another notebook
- Comparision between models, sometimes score is not the best heuristic to use.
- cool_resources
- Interactive Maps,
- GTD CodeBook, and
- Slide Deck
- saved
- Saved Trained Models, and
- Images obtained from analysis
https://www.start.umd.edu/gtd/analysis/
https://ourworldindata.org/terrorism
https://realpython.com/python-statistics/
https://machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/
https://medcraveonline.com/FRCIJ/motivation-leading-to-radicalization-in-terrorists.html
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
https://towardsdatascience.com/topic-modelling-in-python-with-nltk-and-gensim-4ef03213cd21
https://towardsdatascience.com/text-classification-supervised-unsupervised-learning-approaches-9fd5e01a036
https://www.mdpi.com/2076-0760/11/1/23#
https://monkeylearn.com/topic-analysis/