Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Streamlining #138

Merged
merged 45 commits into from
Oct 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
36f4f9e
removed all "details" tags
Sep 30, 2022
01a921b
rips out fancier features which base jupyter lacks
Sep 30, 2022
051c636
changed spellings
Sep 30, 2022
c2a4253
Delete settings.json
Sep 30, 2022
fff64ed
Delete checklist.md
Sep 30, 2022
69fd1b4
grammar for m1.1
Sep 30, 2022
9be9c0b
pre grammar script
Oct 3, 2022
c72bf86
applied some grammar scripts
Oct 3, 2022
3f543d3
tweaked commas
Oct 3, 2022
7dd8c9a
Minor tweaks for - stopping to move m2
Oct 3, 2022
f2a23f9
Fixed naming for m2
Oct 3, 2022
c39dd66
increase accessibility of landing page
Oct 4, 2022
137ef94
move setting up environment to dedicated page
Oct 4, 2022
b6e3a23
add WIP glossary and contact pages
Oct 4, 2022
83c2fb9
add appendix to toc
Oct 4, 2022
4a9e652
rename module 4 notebooks for consistency
Oct 4, 2022
e4828f9
fix links and typo
Oct 4, 2022
d3de947
edit overviews for consistent styling
Oct 4, 2022
6d12e3d
increase accessibility of landing page
Oct 4, 2022
bb1805b
move setting up environment to dedicated page
Oct 4, 2022
863378b
add WIP glossary and contact pages
Oct 4, 2022
95752ec
add appendix to toc
Oct 4, 2022
c503cfa
rename module 4 notebooks for consistency
Oct 4, 2022
1e63cd6
fix links and typo
Oct 4, 2022
023dd9c
edit overviews for consistent styling
Oct 4, 2022
bc7d97e
typo in toc
Oct 4, 2022
6038441
typo in index
Oct 4, 2022
ada4a4b
change 'Exploring and Wrangling' to 'Data Wrangling'
Oct 4, 2022
dbab6f9
change title of 2.2 to Data Wrangling
Oct 5, 2022
38b85d7
typo
Oct 5, 2022
0cde0fd
large rewrite of 1.1
Oct 5, 2022
1e0ad32
Merge branch 'streamlining' into streamlining-cm
Oct 5, 2022
9f3d20d
Merge pull request #140 from alan-turing-institute/streamlining-cm
Oct 5, 2022
0aad8fc
add definition of RDS @AoifeHughes
Oct 5, 2022
7f5f166
small 4.1. tweaks
Oct 5, 2022
617001d
Shifted disclaimer to be more general
Oct 7, 2022
575e4c6
grammar!!!
Oct 7, 2022
044550f
minor grammar
Oct 7, 2022
d4d92a5
Added some placeholders
Oct 7, 2022
d5dde5c
73 warnings!
Oct 14, 2022
f2254e5
tweaked to not run on hands-on
Oct 16, 2022
f0f4bb0
70 warnings...
Oct 16, 2022
7c37bc3
Remove appendix + refs
Oct 16, 2022
fb7984c
added data
Oct 16, 2022
e3c7b0f
resolve merge
Oct 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ data/*
node_modules/
package-lock.json
package.json
UKDA-7724-csv
*.sh
8 changes: 8 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"editor.fontSize": 20,
"editor.rulers": {},
"editor.renderLineHighlight": "line",
"workbench.colorCustomizations": {
"editor.lineHighlightBorder": "#fff"
}
}
6 changes: 3 additions & 3 deletions coursebook/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ execute:
execute_notebooks: force
exclude_patterns:
- '3.5-*'
- '4.3_*'
- '4.4_*'
- '2-hands-on*'
- '4.3-*'
- '4.4-*'
- '*hands-on*'

exclude_patterns: [
'*README.md'
Expand Down
57 changes: 31 additions & 26 deletions coursebook/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,33 +25,33 @@ parts:
- file: modules/m1/hands-on
- caption: "Module 2: Handling data"
chapters:
- file: modules/m2/2-overview
- file: modules/m2/overview
# getting data
- file: modules/m2/2-01-GettingLoading
- file: modules/m2/2.1-GettingLoading
sections:
- file: modules/m2/2-01-01-WhereToFindData
- file: modules/m2/2-01-02-LegalityAndEthics
- file: modules/m2/2-01-03-PandasIntro
- file: modules/m2/2-01-04-DataSourcesAndFormats
- file: modules/m2/2-01-05-ControllingAccess
- file: modules/m2/2.1.1-WhereToFindData
- file: modules/m2/2.1.2-LegalityAndEthics
- file: modules/m2/2.1.3-PandasIntro
- file: modules/m2/2.1.4-DataSourcesAndFormats
- file: modules/m2/2.1.5-ControllingAccess
# cleaning and wrangling
- file: modules/m2/2-02-ExploringWrangling
- file: modules/m2/2.2-DataWrangling
sections:
- file: modules/m2/2-02-01-DataConsistency
- file: modules/m2/2-02-02-ModifyingColumnsAndIndices
- file: modules/m2/2-02-03-FeatureEngineering
- file: modules/m2/2-02-04-DataManipulation
- file: modules/m2/2.2.1-DataConsistency
- file: modules/m2/2.2.2-ModifyingColumnsAndIndices
- file: modules/m2/2.2.3-FeatureEngineering
- file: modules/m2/2.2.4-DataManipulation
sections:
- file: modules/m2/2-02-04-01-TimeAndDateData
- file: modules/m2/2-02-04-02-TextData
- file: modules/m2/2-02-04-03-CategoricalData
- file: modules/m2/2-02-04-04-ImageData
- file: modules/m2/2-02-05-PrivacyAndAnonymisation
- file: modules/m2/2-02-06-LinkingDatasets
- file: modules/m2/2-02-07-MissingData
- file: modules/m2/2.2.4.1-TimeAndDateData
- file: modules/m2/2.2.4.2-TextData
- file: modules/m2/2.2.4.3-CategoricalData
- file: modules/m2/2.2.4.4-ImageData
- file: modules/m2/2.2.5-PrivacyAndAnonymisation
- file: modules/m2/2.2.6-LinkingDatasets
- file: modules/m2/2.2.7-MissingData
# hands on
- file: modules/m2/2-hands-on
- file: modules/m2/2-hands-on-complete
- file: modules/m2/hands-on
- file: modules/m2/hands-on-complete
- caption: "Module 3: Data Visualisation & Exploration"
chapters:
- file: modules/m3/overview
Expand All @@ -64,9 +64,14 @@ parts:
- caption: "Module 4: Introduction to Modelling"
chapters:
- file: modules/m4/overview
- file: modules/m4/4.1_What_and_Why
- file: modules/m4/4.2_Fitting_Models
- file: modules/m4/4.3_Building_simple_model
- file: modules/m4/4.4_Evaluating_a_model
- file: modules/m4/hands-on
- file: modules/m4/4.1-WhatAndWhy
- file: modules/m4/4.2-ModelFitting
- file: modules/m4/4.3-ModelBuilding
- file: modules/m4/4.4-ModelEvaluation
- file: modules/m4/hands-on
- caption: "Appendix"
chapters:
- file: modules/appendix/A.1-Glossary
- file: modules/appendix/A.2-SettingUp
- file: modules/appendix/A.3-ContactUs

77 changes: 53 additions & 24 deletions coursebook/index.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,46 @@
# Welcome to The Alan Turing Institute's Introduction to Research Data Science course
# Welcome!

This Research Data Science online training course was
developed by [The Alan Turing Institute's](https://www.turing.ac.uk/)
Welcome to an **Introduction to Research Data Science**, developed by [The Alan Turing Institute's](https://www.turing.ac.uk/)
AoifeHughes marked this conversation as resolved.
Show resolved Hide resolved
[Research Engineering Group](https://www.turing.ac.uk/research-engineering).

The course consists of four modules, each involving a half-day taught session and a half-day hands-on session. The material can be used for synchronous online attendance or asynchronous study.

## Summary
## Introduction

Data science methods and tools have become commonplace in research projects across academia, government and industry. Researchers increasingly need to collaborate with multi-disciplinary teams of data scientists, software engineers and other stakeholders.

This course is designed for researchers interested in understanding and using data science methods in their work. The course will help learners move beyond data science principles, to learn how to tackle real, complex and sometimes vaguely defined research data science projects. They will learn how to do this in a collaborative environment, with an emphasis on practical techniques and technologies and with an overarching awareness of ethics and diversity issues. This is an intensive, hands-on course, informed by REG’s experience with research data science projects and aiming to bring learners in touch with day-to-day research data science practices.
The goal of this course is to introduce how you can use data science principles to tackle real, complex, and sometimes vaguely defined research data science projects. The course is not a handbook of data science methods. Rather, the focus is how to begin using these methods on collaborative research projects, with an emphasis an awareness of ethics and diversity issues.

The course consists of:
- Taught modules that will introduce learners to key concepts, methodology and ways of solving problems.
- Hands-on modules where learners will work in teams to tackle a real research data science problem, including scoping it, discussing it from an equality, diversity and inclusion (EDI) point of view and coding collaboratively to produce a data science solution.

This course complements the Turing’s Research Software Engineering with Python course (found [here](https://alan-turing-institute.github.io/rsd-engineeringcourse/)).
## Who?

## Key objectives and learning outcomes
The main objectives of the course are the following:
- Teach attendees how to use research data science (RDS) methods in an interdisciplinary research environment.
- Move beyond core principles and methdology, towards a hands-on, practical understanding, focused on collaboration, reproducibility and openness.
- Provide exposure to a real-world RDS project and demonstrate the decision-making process used to choose the right method and tools for each setting and in each project step.
- Embed data ethics, diversity and inclusion awareness into the learners’ approach to all stages of an RDS project, providing multiple examples.
**We are** a group of data scientists and software engineers that work on a wide range of research problems.

**You are** someone interested in learning about, or using, data science methods in research. To completely follow along with the course some basic programming is needed, see [Prerequisites](#prerequisites) for more information.



## Course materials

This free and open course is primarily the jupyter book you're reading. You can work through the material by yourself. See the [Syllabus](#syllabus).


Some tips on **how to use this course**:

- You will get a lot out of simply reading the online course book. However, the
course is built by executable jupyter notebooks that you can run yourself, and
we encourage learners to try the hands-on sections where we tackle a real
research data science problem. Visit the [the readme](https://github.com/alan-turing-institute/rds-course/tree/develop/coursebook) page to setup your computer to
follow along.

- There are some benefits to reading the course chronologically. The same dataset is used throughout the modules, especially on the hands-on sessions. However, much of the material is self-contained and can be consumed independently.


- If you are a self-learner and have questions, comments, ideas or issues please
use: [RDS-Course Issues](https://github.com/alan-turing-institute/rds-course/issues)


- There is also a synchronous, taught, version of the course, where modules are spread over a half-day taught session and a half-day hands-on session.

The learning outcomes are the following:
- Attendees will understand fundamental RDS methods (e.g., data wrangling, visualisation, exploration, modeling) and know when/how to apply them to their research in order to draw data-driven insights or create data-driven tools.
- Attendees will be familiar with the stages of a collaborative RDS project, from scoping and data exploration to visualisation and modeling and will become aware of the challenges of tackling real-world problems.
- Attendees will be able to recognise power imbalances, bias and diversity issues in their technical work and in their ways of working with others and challenge them.

## Syllabus

Expand Down Expand Up @@ -79,11 +91,28 @@ Hands-on session:


## Prerequisites
Participants are expected to:
- Be comfortable with basic Python, either through working on a project or through attending a training course. Indicatively, they should be comfortable with the concepts covered in the “Introduction to Python” module from the Turing’s Research Software Engineering with Python course. The Programming with Python Software Carpentry also covers some of these concepts. Familiarity with Matplotlib, NumPy and Pandas is beneficial but not required.
- Have some basic knowledge of Git (setting up repositories, commits) through using it in projects or by attending training, e.g. the Software Carpentry’s Version Control with Git (Sections 1 to 4 and 7 to 9).
- Have read the first two sections of the Turing Way’s Guide for Collaboration (“Getting Started in GitHub” and “Maintainers and Reviewers in GitHub”).

There is no code in Module 1. Students will get more out of Modules 2-4 if they:

- Are comfortable with basic Python, as presented in:
- The [Introduction to Python](https://alan-turing-institute.github.io/rse-course/html/module01_introduction_to_python/index.html) module from the Turing's Research Software Engineering.
- Software Carpentry's [Programming with Python](https://swcarpentry.github.io/python-novice-inflammation/).
- Have some basic knowledge of using Git for version control, for example the Software Carpentry’s [Version Control with Git](https://swcarpentry.github.io/git-novice/) (Sections 1 to 4 and 7 to 9).
- Have basic knowledge of using Github for collaboration. See the first two sections of the Turing Way’s Guide for Collaboration ([Getting Started in GitHub](https://the-turing-way.netlify.app/collaboration/github-novice.html) and [Maintainers and Reviewers in GitHub](https://the-turing-way.netlify.app/collaboration/maintain-review.html)).


This course complements the [Turing’s Research Software Engineering with Python](https://alan-turing-institute.github.io/rse-course/) course.


## Disclaimer
The work and materials here are developed by a group of \[research\] data
scientists and software engineers from a diverse background. Many of the topics,
examples and discussed work here is biased against our own experiences. As such,
our definitions and understandings of certain words, phrases, or methodologies used
may differ from others'. We do not claim to be a definitive authority, and
welcome open discussion and feedback.


## Acknowledgement
This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/W006022/1 & The Alan Turing Institute.

Loading