This semester's challenge is especially open-ended. Here is a dataset on Kaggle called "CarsForSale". It contains data scraped from the online car marketplace Cars.com. Each row contains 25 pieces of information about a car's listing, such as its price, year, model, and color.
The challenge is to do something interesting with the data. Can you find a pattern, answer a question, or create a visualization? In case nothing comes to mind, here are some ideas, with varying complexity:
- What qualities about a car do buyers seem to value the most?
- Make a graph to visualize the most popular car models over time.
- What colors of cars are most expensive?
- Do different brands try to appeal to people looking for different things?
- Come up with your own algorithm to figure out how good of a deal a listing is and compare it to the one in the dataset (
DealType
). - Use cluster analysis to group the cars into categories.
- How do people's taste in cars differ between states?
- Train a machine learning model to predict some aspect of a car based on other information from its listing.
However, we strongly encourage you to come up with your own problem to solve!
You can use any programming language, framework, or library you want, but we recommend creating a notebook in Kaggle and using Python. This will run in your browser, interlaces code with documentation, allows you to import the CarsForSale dataset easily by pressing the "Add data" button, and gives you access to Python's high-quality, high-level libraries for working with data. Learn more about data science in Python.
-
Create a public fork of this repository and name it
ACM-Research-coding-challenge-22F
(click the "Fork" button in the top right). -
Replace this README file with a description (written in Markdown) of your solution. Regardless of your success, describe the problem you set out to solve and how you did it. Split it up into sections with headers, and, if relevant, include figures.
-
Make sure to include all relevant files in your fork. If you made the project in a Kaggle notebook, click File → Download Notebook to download it as an
.ipynb
file. -
You may have to "clone" the fork you made to edit files locally on your computer and "push" them to GitHub. Learn how to do that here.
-
Submit the link to your fork in this form.
You may not collaborate with anyone on this challenge. You are allowed (and encouraged) to use internet documentation. If you use existing code (either from Github, Stack Overflow, or other sources), please cite your sources in the README.
Please don't spend too long on this project: 30 to 60 minutes is reasonable. It's okay to put more time into your submission than that, but we don't expect you to get that much done; we really don't want this challenge to be a burden!
If you're completely new to this kind of project, however, it will likely take you more than an hour. This is a densely useful project to go through (you will learn a lot), so we believe this is justified.
Submissions will be evaluated holistically, in combination with the rest of your application. We will consider your effort, use of external resources, how you approached the problem, and presentation, among other considerations.
Feel free to ask for clarifications in the #research-qna channel in the ACM UTD Discord server! You can also directly message Roman Hauksson on Discord: RomanHauksson#3458
.