-
Notifications
You must be signed in to change notification settings - Fork 41
Project Assignments
The point of the Project Assignments is to try out the skills you've learned in the course on your own dataset. We've been working on understanding networks and natural language processing, so the idea is to find a dataset to analyze that will let you show off what you can do.
You should find a dataset from a Wiki source, but remember to work with something that interests you - that way the project will be much more fun to work on - you'll be able to use the methods from the class to learn new things about a topic you care about.
Of course you can use your own dataset to re-do the analyses you've seen in class, but we also hope that you'll use your knowledge of the data to come up with new analyses, where you combine the methods you've learned in new ways, or use methods from other classes to understand networks and text.
Below are some examples of Wiki datasets that it might be fun to work with
- Specialized wikis for whatever you're into. E.g.
- Wookiepedia
- Game of Thrones Wiki
- Simpson's Wiki or one of the other Simpson's wikis.
- Bruce base
- And of course any subset from Wikipedia
- Etc, etc.
You will be working together in groups just as for the first two assignments.
Note 1: You are allowed to use secondary data sources to enrich your Wiki data. However, your primary data source should be a Wiki.
Note 2: If the resulting network you want to analyse is big (in the order of thousands nodes) you are more likely to find interesting patterns in your data. If it is small (in the order of hundreds nodes) we expect you to compensate the possible lack of complex patterns/structure with other types of analysis, e.g., more in depth text analysis, deeper characterisation of nodes and connections, etc.
The first part of the final project is a 2 minute movie, which should explain the central idea/concept that you will investigate in your final project. You're making the movie so that the TAs, Anna, and I can give you feedback, and so that other groups can 'steal' your ideas (and you can steal ideas from them). The movie must contain the following:
- An explanation of the central idea behind your final project (what is the idea?, why is it interesting? which datasets did you need to explore the idea?, how did you download them)
- An outline on the elements you'll need to get to your goal & the implementation plan..
- A walk-through of your preliminary data-analysis, addressing
- What is the total size of your data? (MB, number of pages, other variables, etc.)
- What is the network you will be analyzing? (number of nodes? number of links?, degree distributions, what are node attributes?, etc.)
But other than that, there are no constraints on the video format. And we do appreciate funny/inventive/beautiful movies, although the academic content is most important. Note that we'll display the movie to the entire class.
I've put some example videos here for your viewing pleasure.
Handing in the assignment: Simply upload your video to youtube (the higher the resolution the better) and submit the link to peergrade.
Note that since Project Assignment A now requires significant data-work, you have 2 weeks to create the video presentation.
The deliverables for the Final project will be
- A website. The website should contain your analysis, it should tell the story about the data that you're interested in getting across. The website should not be technical, but rather aim at using visualization and explanation to get your insights across to a non-scientific reader.
- An explainer Notebook. The Notebook should contain all the behind the scenes stuff, details on the dataset, why you've selected this particular dataset, explanations of your choices regarding network analysis, etc. You should link to the notebook from the site.
The idea is that you can create much more complex, fun and interactive analysis (and visualizations) on line. So the website is a way for you to present your work in a way that everyone can understand it ... including dynamic visualizations, interactive analysis, etc, etc ... that would not work on a piece of paper. (Also, it'll hopefully be something cool you can show your friends <-- sorry, I know I'm a nerd).
This part of the assignment is quite free. The main point of the website is to present your idea/analyses to the world in a way that showcases your use of what you've learned in class. It can be as simple as an oldfashioned static web-page, and as complicated as you want it to be. Let your creativity run wild (but keep in mind that this is not a coding class - we care mostly about content and analysis).
The website should be self-contained and tell the story of your dataset without the need for the Explainer Notebook (the purpose of the notebook is to provide additional details for interested readers). Here are some requirements
- The page should say clearly what the dataset is and give the reader some idea of its most important properties (kind of Project Assignment A-style).
- The page should contain your network and text analysis (that's the main part).
- There should be download options for data sets (so the user can play around).
- You must link to the Explainer Notebook (more details below) that explains the details of your analysis (including all of the machine learning, the model selection, etc). You can achieve this with a link to an IPython notebook displaying on the nbviewer.
For hosting, we recommend using your DTU website or github pages. Here, I have put together a starting tutorial to set up your website with GitHub and Hugo!
The notebook should contain your analysis and code. Please structure it into the following 4 sections
- Motivation
- What is your dataset?
- Why did you choose this/these particular dataset(s)?
- What was your goal for the end user's experience?
- Basic stats. Let's understand the dataset better
- Write about your choices in data cleaning and preprocessing
- Write a short section that discusses the dataset stats (here you can recycle the work you did for Project Assignment A)
- Tools, theory and analysis. Describe the process of theory to insight
- Talk about how you've worked with text, including regular expressions, unicode, etc.
- Describe which network science tools and data analysis strategies you've used, how those network science measures work, and why the tools you've chosen are right for the problem you're solving.
- How did you use the tools to understand your dataset?
- Discussion. Think critically about your creation
- What went well?,
- What is still missing? What could be improved?, Why?
- Contributions. Who did what?
-
You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That’s what you should explain).
-
It is not OK simply to write "All group members contributed equally".
Make sure that you use references when they're needed and follow academic standards.
I envision Part 3: Tools, theory and analysis as the central part of the assignment, where you basically go through the steps in the analysis. So the structure of this part would be something like
- Explain the overall idea
- Analysis step 1
- explain what you're interested in
- explain the tool
- apply the tool
- discuss the outcome
- Analysis step 2
- explain what you're interested in
- explain the tool
- apply the tool
- discuss the outcome
- Analysis step 3,
- ... and so on until the analysis is done
This class has been hand crafted for you by Sune Lehmann and Anna Sapienza
This work is licensed under a Creative Commons Attribution 4.0 International License.