Skip to content

nghochi123/sc1015_project

Repository files navigation

Introducing our SC1015 Mini-Project

About


This is our mini-project for SC1015 (Intro to Data Science and Artificial Intelligence).

Contributors (SC7 Group 10)

@nghochi123
@Mel-NLY

Understanding our project


Here's the sequence of files that we'd recommend for you to look through. There are more insights and explanations in the Jupyter Notebooks:

  1. Data analysis
  2. What determines a successful terrorist attack?
  3. What do terrorists really want?

Dataset


The dataset is called the Global Terrorism Database, obtained from the National Consortium for the Study of Terrorism and Responses to Terrorism (START).

Maintained by researchers headquartered at the University of Maryland. The dataset consists of information on more than 200,000 global terrorist attacks.

Problem Definition + Motivation


Using this dataset, we are trying to find the answer to the following categorical questions:

  • What determines a successful terrorist attack?
  • What are terrorists really after?

We realized that wanting to determine what makes or breaks a successful terrorist attack was important, so that we could focus on the important combination of features. Coming up with solutions that target the more important features to prevent the attacks from succeeding and hurting many others in the process.

Following the same methodology, being able to pick out the motives that are the most harmful and common would also serve to reduce the number of successful terror attacks.

Models Used


  • Random Forest Classifier
  • Logistic Regression
  • K-Nearest Neighbours Classifier
  • Support Vector Classification (SVC)
  • Neural Networks
  • Stochastic Gradient Descent Classifier
  • Kernel SVM
  • Decision Tree Classifier
  • AdaBoost
  • Gradient Boosting

What We Discovered


  • What determines a successful terrorist attack?
    These was the best combination of features found that could predict the successful-ness of a terrorist attack.

    • Number of Kills
    • Timestamp
    • AttackType
    • Number of Wounded
    • WeaponType
    • Month
  • What are terrorists really after?
    Retaliation was found to be the most common of motives among the terrorists.

Tools Used


Folium - Interactive Leaflet Map
Keplergl - Geospatial Analytic Visualizations
Plotly - Interactive Web-based Visualizations
Pickle - Serializing and deserializing object structures

Lessons Learnt


Here are some of the lessons we learnt through the journey of developing this project. More of our insights and realisations can be found in the Jupyter Notebooks, and Presentation.

  • We used all the rows initially for dataset cleaning, which resulted in us dropping too many points. Instead we first picked out the relevant columns, then cleaned those values. Resulting in us obtaining a fuller dataset.
  • Dataset had an imbalance in number of failed and successful columns. Resulting in us having to use weights for rebalancing.
  • Library versioning could've been affecting the results obtained on different OS (Mac/Windows). Therefore, this required us to check the requirements.txt, tried to shift to a Windows device, and make use of Google Collab.
  • Initially, we had all of our codes in a file, which resulted in our neural network not having enough memory to run (and the kernel failing). Hence we split up the code files into different Jupyter Notebook files.
  • Split the Neural Network Model into another notebook
  • Comparision between models, sometimes score is not the best heuristic to use.

Also Do Check Out the Other Project Folders


  • cool_resources
    • Interactive Maps,
    • GTD CodeBook, and
    • Slide Deck
  • saved
    • Saved Trained Models, and
    • Images obtained from analysis

References


https://www.start.umd.edu/gtd/analysis/
https://ourworldindata.org/terrorism
https://realpython.com/python-statistics/
https://machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/
https://medcraveonline.com/FRCIJ/motivation-leading-to-radicalization-in-terrorists.html
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
https://towardsdatascience.com/topic-modelling-in-python-with-nltk-and-gensim-4ef03213cd21
https://towardsdatascience.com/text-classification-supervised-unsupervised-learning-approaches-9fd5e01a036
https://www.mdpi.com/2076-0760/11/1/23#
https://monkeylearn.com/topic-analysis/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published