[Interactive Data Science] - Creating Nontechnical Jupyter Notebooks For Exploratory Data Analysis and Machine Learning Modeling of Cancer Data
Exploratory data analysis is the analysis of datasets to understand their main characteristics. Oftentimes, this will be in the form of visual data. These analysis techniques that fall under EDA help to identify trends, problems, and potentially a hypothesis. In Python, we can use libraries such as pandas and seaborn to do this analysis. It is common that this is done before any Machine Learning is applied because we can understand the facts that revolve around the data in question. Some common graphical techniques are Box Plots, Histogram, and Scatter Plots.
An autoencoder is a neural network that compresses data Autoencoders can reduce data dimesions because it can learn to noise in data and they can detect anomalies in data. They consist of
Voila turns Jupyter notebooks into standalone web applications. Voila allows us to connect to a dedicated Jupyter kernel which can execute callbacks to interactive widget changes. This is important because our notebooks need to have these widgets (sliders, check boxes, etc.) to be accessible for people without a coding background.
Here is a diagram explaining the execution model of Voila (credit):
Oftentimes, nontechnical scientists and biologists want to analyze large sets of data and benefit from the immense power of Data Science. However, although Jupyter Notebooks are a step in the right direction to helping nontechnical people to utilize these workflows, they still present problems for those who don't have the background or confidence in using them. For example, the Notebooks depend on running code cells that might not be understood or able to be modified by someone that doesn't understand what the code means. Through our solution, we can
We believe that this problem is important because the faciliation of collaboration and understanding between biologists and data scientists can lead to new breakthroughs and enrich the research of those biologists. Data science and machine learning have made a mark on fields such as genomics and healthcare, and as the data that we deal with in these fields gets larger
- Clone this repository on Jupyter Hub.
- Create a Conda environment using
environment.yml
- Open main.ipynb and render with Voila
- Packages defined in
environment.yml
- Voila to render Jupyter Notebooks as interactive apps.
Want to contribute? Great!
- Write MORE Tests
- Add Night Mode
Here are all of the awesome people who contributed to this project:
- George Zaki
- Robin Kramer
- Garrett Stevens
- Yogesh Dhande
- Amulya Shastry
- Siqi Sun
- Ruben Cuevas
MIT