Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

ubco-W2022T2-data301/project-group-group27

Repository files navigation

Group 27 - Default Payments Investigation

  • Investigating credit card clients' delayed payments in Taiwan from April 2005 to September 2005.

Milestones

Details for Milestone are available on Canvas (left sidebar, Course Project).

  • Milestone 1: Form team and find dataset.
  • Milestone 2: Load the dataset, explain it, and define research questions.
  • Milestone 3: Exploratory Data Analysis (EDA).
  • Milestone 4: Visualizations, analysis, and pipeline.
  • Milestone 5: Present the Dashboard!
  • Milestone 6: Address feedback and refine project.

Describe your topic/interest in about 150-200 words

We hope that by participating in this project, we will be able to ask descriptive or exploratory questions and attain the ability to explore data, which may include, but are not limited to, visualizations, analyzing the data and producing results using a dashboard with the analytics project. Moreover, it is an excellent opportunity to cooperate with different people using Git and Github. Since working alone on a real-life project is a big challenge, it's better to learn how to communicate with others efficiently and how to work with a team. Also, the project provides a great platform to practice all the techniques we studied in class, such as Python, markdown, terminal, Jupyter lab, Git and Github. As a learning opportunity, it is an excellent way for us to find out what is useful in data analysis and what we are missing or do not understand.m the lecture. This is a fairly interesting topic because it includes 24 attributes which give us a broader view of this specific topic. We can do analysis from different perspectives, and the dataset can contribute to everyone’s specific research questions. Moreover, credit is an essential part of our lives. Our team is interested in learning how the credit card system determines which consumers are trustworthy and which are not and how it determines whether to increase or decrease the available credit limit. Considering that credit is always related to loans, the system should consider multiple perspectives to reduce the risk, which could include broader topics of risk management for financial institutions, such as filtering out unreliable clients. Finally, we could totally imagine building a user-facing dashboard with this dataset since it contains various aspects of data that can be explored. We would like it to have a precise formation and contain data that needs to be presented, so the dashboard is user-friendly, interactive, and informative.

Describe your dataset in about 150-200 words

The data set was about the default of credit card clients. It was provided by I-Cheng Yeh and two institutions: the Department of Information Management, Chung Hua University, Taiwan and the Department of Civil Engineering, Taming University, Taiwan. The way of the collection was not explicitly shown; however, the information mentioned that they tracked the past monthly payment records from April to September 2005. This study examined the case of customers' default payments in Taiwan and compared the predictive accuracy of the probability of default based on six different data mining techniques. Considering risk management, it is more valuable to obtain the predictive accuracy of the estimated probability of default rather than the binary result of credibility versus lack of credibility. Since the actual probability of default is unknown, the study presented a novel Sorting Smoothing Method to estimate the actual probability of default. This simple linear regression result (Y = A + BX) illustrates the fact that the forecasting model produced by the artificial neural network has the highest coefficient of determination when the actual probability of default (Y) is the response variable, and the predictive probability of default (X) is the independent variable. The regression intercept (A) of the model is close to zero, and the regression coefficient (B) is close to one, which indicates that it has a high coefficient of determination. Thus, the researcher provides evidence that the artificial neural network is the only one that can accurately estimate the actual probability of default among the six data mining techniques.

Team Members

  • Person 1: My name is Betty, I am a second-year student majoring in Mathematics, and I enjoy playing music and snowboarding.
  • Person 2: My name is Remy, I am a second-year psychology student, and my hobby is watching Netflix.
  • Person 3: My name is Shirley, I am a third-year student majoring in statistics, and I like to play tennis.

Images

Dashboard 1

DashboardScreenshot1

Dashboard 2

DashboardScreenshot2

Dashboard 3

DashboardScreenshot3

References

https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients