Applied Data Anaytics Program at National Center for Science and Engineering Statistics (NCSES), National Science Foundation (NSF)
The National Center for Science and Engineering Statistics (NCSES) hosted the Fall 2021 Coleridge Initiative Applied Data Analytics training program.
Participants work in teams to define and complete a project related to career pathways for doctoral recipients. The program provides up-to-date perspectives on the use of administrative and survey data for policy analysis, and instruction on how to manage and analyze micro data according to best practices. Instructors facilitate hands-on coding of micro data in SQL and R for the following tasks: data management, text analysis, data visualization, and machine learning.
This repository contains the class materials for the NCSES applied data analytics program.
Datasets Used in the Class:
-
Survey of Earned Doctorates, Survey of Doctorate Recipients, Higher Education Research and Development Survey (provided by NCSES)
-
UMETRICS (provided by the Institute for Research on Innovation and Science)
-
Integrated Post-secondary Education Data System (https://nces.ed.gov/ipeds/, open source)
-
Federal RePORTER Grant data (https://federalreporter.nih.gov, open source)
Class Program
Day 1 - Overview, Project Scoping, and Privacy and Confidentiality
Day 2 - Dataset Introduction SED and SDR
Day 3 - Dataset Introduction UMETRICS
Day 4 - Record Linkage
Day 5 - Basics and Applications of Data Visualization
Day 6 - Text Analysis
Day 7 - Interim Presentations
Day 8 - Measurement
Day 9 - Machine Learning and Evaluation
Day 10 - Supervised Machine Learning and Evaluation
Day 11 - Inference
Day 12 - Privacy, Confidentiality, and Ethics
References
The notebooks in this repository were inspired by previous applied data analytics class materials and notebooks.