Skip to content

Commit

Permalink
Merge pull request #2 from kameshsampath/main
Browse files Browse the repository at this point in the history
(feat!): SiS deployment and Streamlit in Notebook
  • Loading branch information
kameshsampath authored Nov 14, 2024
2 parents f2688f3 + 3c19939 commit 2767d8b
Show file tree
Hide file tree
Showing 14 changed files with 792 additions and 104 deletions.
24 changes: 20 additions & 4 deletions docs/data_prepration.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,21 @@
# Data Preperation
# Data Preparation

As part of this module we will prepare the data for model training. With our penguin dataset we need to `Encode`:
Before applying our machine learning model, we need to convert categorical variables into numerical format using One-Hot[^1] Encoding. In our penguin dataset, we'll use pandas `get_dummies()`[^2] to encode:

- The **X(features)** namely `island` and `sex`
- The **y(target)** `species` names
**Features (X):**
- [x] `island` - Categorical location of penguin
- [x] `sex` - Gender of penguin

**Target (y):**
- [x] `species` - Type of penguin (our prediction target)

📝 **One-Hot Encoding** converts categorical variables into binary (0 or 1) format. For example:
```python
# Original: island = ['Torgersen', 'Biscoe']
# After encoding:
# island_Torgersen = [1, 0]
# island_Biscoe = [0, 1]
```

## Encode Features and Target

Expand Down Expand Up @@ -156,3 +168,7 @@ with st.expander("Data Preparation"):
y
```

After successfully preprocessing our penguin dataset with appropriate encoding and feature selection, let's move forward to training our model and calculating species prediction probabilities. This step will prepare us for creating interactive visualizations in Streamlit.

[^1]: <https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/>{:target=_blank}
[^2]: <https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html>{:target=_blank}
40 changes: 29 additions & 11 deletions docs/datset.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,32 @@
# Dataset
# Exploring the Penguins Dataset with Streamlit

Any machine learning application requries a dataset. To keep things simple and easy for this tutorial we gonna use the data set from from
<https://github.com/dataprofessor/data/blob/master/penguins_cleaned.csv>.
In this chapter, we'll explore the [Penguins dataset](https://github.com/dataprofessor/data/blob/master/penguins_cleaned.csv) using Streamlit's interactive features.

By the end of this chapter you will,

- [x] Loading, preprocessing, and preparing the dataset for visualization
- [x] Using Streamlit Expander to display dataset information and summary statistics
- [x] Creating interactive scatter plots with Streamlit Scatter chart to identify patterns and relationships
- [x] Enhancing the visualization with user interactions and filters

By the end of this chapter, you'll have a solid understanding of how to use Streamlit for data exploration and be ready to move on to building machine learning models.

## Download Datset

Let us download the data locally,

```shell
mkdir -p data
mkdir -p "$TUTORIAL_HOME/data"
curl -sSL \
-o data/penguins_cleaned.csv \
https://raw.githubusercontent.com/dataprofessor/data/refs/heads/master/penguins_cleaned.csv
```

## Display the data
## Displaying the Data

Edit and update the `streamlit_app.py` with the following code,

```py
```py linenums="1" hl_lines="4 10-12"
import streamlit as st

# import pandas to read the our data file
Expand Down Expand Up @@ -47,7 +57,13 @@ git push origin master

In few seconds you should notice the your application on Streamlit cloud refreshed with the changes.

## Add our First widget
## Application Overview

As part of this machine learning application, we will be building a simple classification model to predict penguin species (y) using input variables (X). Using Streamlit's interactive widgets, we'll display these variables to make our application user-friendly and intuitive.

This classification model will help us categorize penguins into their respective species based on their physical characteristics. The input variables and target variable will be presented through Streamlit's interface, allowing users to easily interact with and understand the prediction process.

## Adding Our First Widget

Let us add our first Streamlit widget [expander](https://docs.streamlit.io/develop/api-reference/layout/st.expander){:target="_blank"} to allow expand and collapse of the data frame.

Expand All @@ -70,9 +86,9 @@ with st.expander("Data"):
df
```

## Variables
## Displaying the Variables

As part of this machine learning application we will be predicting the penguin species(**y**) using the input variables(**X**). Let us have them displayed for our reference and clarity.
Let us create and dsisplay the input features(**X**) and target(**y**).

Edit and update the `streamlit_app.py` with the following code,

Expand Down Expand Up @@ -103,7 +119,7 @@ with st.expander("Data"):

## Data Visualization

Let is visualize the penguins data using a [scatter plot](https://docs.streamlit.io/develop/api-reference/charts/st.scatter_chart){:target=_blank}
Let us visualize the penguins data using a [scatter plot](https://docs.streamlit.io/develop/api-reference/charts/st.scatter_chart){:target=_blank}

Edit and update the `streamlit_app.py` with the following code,

Expand Down Expand Up @@ -138,4 +154,6 @@ with st.expander("Data Visualization"):
y="body_mass_g",
color="species",
)
```
```

Now that we have our variables and target displayed for reference, let's move to the next chapter where we'll explore Streamlit's interactive features.
52 changes: 37 additions & 15 deletions docs/deploy_to_streamlit_cloud.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,37 @@
# Deploy Application
# Deploying Your Streamlit App to Streamlit Cloud

At the end of this module you would have understood,
# Deploying Your Streamlit App to Streamlit Cloud

- [x] Installed Python packages needed for this tutorial
- [x] Deloyed the Application to Streamlit cloud
Congratulations on setting up your project using the Stream Starter Kit! In this chapter, we'll take the next exciting step: deploying your bare-bones application to Streamlit Cloud. This will be the starting point of your application-building journey.

Navgiate to the `$TUTORIAL_HOME` and open the project in VS Code,
Streamlit Cloud is a platform that allows you to easily deploy, manage, and share your Streamlit applications with the world. By deploying your app to Streamlit Cloud, you'll be able to access it from anywhere, collaborate with others, and showcase your work to a wider audience.

In this chapter, we'll cover the following topics:

- [x] Creating a Streamlit Cloud account
- [x] Preparing your app for deployment
- [x] Connecting your GitHub repository to Streamlit Cloud
- [x] Configuring your app settings on Streamlit Cloud
- [x] Deploying your app and accessing it via a public URL
- [x] Making updates to your app and watch application refresh automatically in few seconds

By the end of this chapter, you'll have a live, publicly accessible Streamlit app that serves as a foundation for your application-building exercises. You'll be able to share the URL with others, gather feedback, and iterate on your app as you progress through the tutorial.

Let's dive in and get your app deployed to Streamlit Cloud!

## Navigate to app folder

If you are not on `$TUTORIAL_HOME`, naviaget to it and open the project in VS Code,

```shell
cd $TUTORIAL_HOME
code .
```


## Update packages
## Update Python packages

!!!NOTE
Making sure we use right package version that will allow us to deploy the application to SiS in the later module.
Making sure we use right package version that will allow us to deploy the same application to Snowflake in Streamlit(SiS) in the later module.

Update the `requirements.txt` to be like,

Expand Down Expand Up @@ -60,13 +75,17 @@ conda activate st_ml_app

!!!TIP
- Using [direnv](https://direnv.net) declutters your environment and you can create Python virtual environment with just one like `layout_python`
- The project is also enabled with DevContainers, in case you want to use it with your VS Code

## Application Update

Edit and update the `streamlit_app.py` as shown
Let us start with a small change to the application code.

```py
!!!NOTE
For the entire tutorial we will making changes to the `streamlit_app.py` file, the code listing will show the entire source with the changes highlighted. This is avoid any copy/paste error while doing the exercises.

Edit and update the `streamlit_app.py`,

```py linenums="1"
import streamlit as st

st.title("🤖 Machine Learning App")
Expand All @@ -76,17 +95,20 @@ st.write("Welcome to world of Machine Learning with Streamlit.")

Commit and push the code your remote Github repository.


## Deploy to Streamlit Community Cloud

To deploy the application naviagate to <https://streamlit.io>, Sign-In and click **Create app**.
To deploy the application naviagate to <https://streamlit.io>{:target=_blank}, Sign-In and click **Create app**.

You will need the following details for deploying the app,

- **GitHub Repo name** - `<your-gh-user>/st-ml-app`
- **Branch** - `master`
- **Main file path** - `streamlit_app.py`
- **App URL** - choose a public url for your application e.g. `<your gh user>-ml-app`.
- **App URL** - choose a public url for your application easiest one to avoid is to use something like `<your gh user>-ml-app`.

Any commit and push to your repository will trigger a new application update. **Give it a try!**

Great! Now that you have successfully deployed your bare-bones Streamlit app to Streamlit Cloud, you're ready to dive into the exciting world of building machine learning applications.

Any commit and push to your repository will trigger a new application update. Give it a try!
In the next chapter, we'll start transforming your starter app into a fully-fledged ML application.

Binary file added docs/images/app-starter-kit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 56 additions & 13 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,61 @@
# Zero to Streamlit
# Streamlit 101: From Open Source to Snowflake Native Development

A quick and easy guide to get started with [Streamlit](https://streamlit.io). As part of this tutorial we will cover
## Get Ready to Build!
Ready to transform a simple Streamlit application into an enterprise-grade solution in Snowflake? In this hands-on tutorial, you'll explore Streamlit's versatility while building an interactive data application. Using a Machine Learning example, you'll discover how easily Streamlit can evolve from your local machine to a fully integrated Snowflake application.

- [x] Understand Basics of Streamlit
- [x] Build Streamlit App
- [x] Deploy to Streamlit Cloud
- [x] Deploy to Streamlit on Snowflake
- [x] Use Streamlit in Snowflake Notebooks
## What You'll Build

## What is required
Your journey will take you through the complete Streamlit development lifecycle. Starting locally, you'll progress to cloud deployment, and finally integrate with Snowflake. Get ready to unlock Streamlit's powerful features at each stage!

## Your Development Journey

1. **Start with Streamlit Local Development**
- Create your first interactive web application with Streamlit's components
- Set up data handling and visualization features
- Discover Streamlit's intuitive widget system
- Learn application state management

2. **Deploy to Streamlit Cloud**
- Launch your application to the cloud
- Master deployment best practices
- Handle dependencies like a pro

3. **Connect to Snowflake**
- Level up your app with Snowflake connectivity
- Implement smart data access patterns
- Set up secure connections

4. **Go Native with Streamlit in Snowflake**
- Deploy directly in Snowflake
- Adapt your code seamlessly
- Leverage enterprise-grade security

5. **Explore Snowflake Notebooks**
- Rebuild your app in a new environment
- Combine notebook analytics with Streamlit
- Discover alternative development approaches

## What You'll Achieve

Watch your Streamlit application evolve:

- From your laptop to the cloud
- Through Snowflake integration
- Into native Snowflake deployment
- With surprisingly few code changes

## Your Learning Goals

By the end of this tutorial, you'll:

- Command Streamlit's core features
- Master multiple deployment options
- Integrate seamlessly with Snowflake
- Explore various development environments
- Create production-ready applications

Ready to begin? Let's start your journey from local Streamlit development to deploying enterprise-ready applications in Snowflake!

*Note: This tutorial uses a Machine Learning example to showcase Streamlit's capabilities, but the skills you'll learn apply to any data application you want to build.*

* Latest Chrome Browser
* [Snowflake Account](https://signup.snowflake.com)
* [Snowlake CLI](https://docs.snowflake.com/en/developer-guide/snowflake-cli/index)
* [Visual Studio Code](https://code.visualstudio.com/)
* [Docker for Desktop](https://www.docker.com/products/docker-desktop/)

35 changes: 21 additions & 14 deletions docs/interactivity.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# Interactivity

The power of Streamlit excels when we want to add user interactivity to the visualization.
# Interactive Features

By the end of this module you will have learnt how to use the following widgets,
In this chapter, we'll discover how to add user interactivity to our application using Streamlit widgets:

- [x] [Sidebar](https://docs.streamlit.io/develop/api-reference/layout/st.sidebar){:target=blank}
- [x] [Selectbox](https://docs.streamlit.io/develop/api-reference/widgets/st.selectbox){:target=blank}
- [x] [Slider](https://docs.streamlit.io/develop/api-reference/widgets/st.slider){:target=blank}
- [x] [Radio](https://docs.streamlit.io/develop/api-reference/widgets/st.radio){:target=blank}
- [⚡ Sidebar](https://docs.streamlit.io/library/api-reference/layout/st.sidebar){:target="_blank"} for organizing widgets in a collapsible panel
- [⚡ Select box](https://docs.streamlit.io/library/api-reference/widgets/st.selectbox){:target="_blank"} for choosing categorical features
- [⚡ Slider](https://docs.streamlit.io/library/api-reference/widgets/st.slider){:target="_blank"} for adjusting numerical values
- [⚡ Radio buttons](https://docs.streamlit.io/library/api-reference/widgets/st.radio){:target="_blank"} for making single-choice selections
- [⚡ Checkbox](https://docs.streamlit.io/library/api-reference/widgets/st.checkbox){:target="_blank"} for toggle options

## Build Sidebar
These widgets will allow users to dynamically modify feature values, which will then update our model's predictions in real-time.

Let us build the sidebar with the widgets to filter the input features.
## Build Sidebar

Edit and update the `streamlit_app.py` with the following code,
Let's build the sidebar with widgets to filter the input features. We'll edit and update the streamlit_app.py with the following code:

```py linenums="1" hl_lines="39-43 50-54 62-66 74-79 86-91 93-96"
import streamlit as st
Expand All @@ -32,8 +33,8 @@ with st.expander("Data"):
df
# define and display
st.write("**X**")
x = df.drop("species", axis=1)
x
X = df.drop("species", axis=1)
X

st.write("**y**")
y = df.species
Expand Down Expand Up @@ -114,9 +115,9 @@ with st.sidebar:
)
```

## Input Features Dataframe
## Feature Data Preprocessing

__TODO__: explain what we are doing here
Create a DataFrame from user input and combine it with existing penguin data using pd.concat, ensuring the new data undergoes the same preprocessing steps as our training data.

Edit and update the `streamlit_app.py` with the following code,

Expand Down Expand Up @@ -228,7 +229,7 @@ data = {
"sex": gender,
}
input_df = pd.DataFrame(data, index=[0])
input_penguins = pd.concat([input_df, x], axis=0)
input_penguins = pd.concat([input_df, X], axis=0)

with st.expander("Input Features"):
st.write("**Input Penguins**")
Expand All @@ -237,3 +238,9 @@ with st.expander("Input Features"):
input_penguins
```

Now that we've prepared our input data, let's handle the categorical variables using encoding techniques. Feature encoding is crucial for converting categorical data into a format suitable for machine learning models.

Helpful resources for encoding:

- Scikit-learn encoding guide: https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
- Pandas get_dummies documentation: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
Loading

0 comments on commit 2767d8b

Please sign in to comment.