Merge pull request #2 from kameshsampath/main

(feat!): SiS deployment and Streamlit in Notebook
Snowflake-Labs · Nov 14, 2024 · 2767d8b · 2767d8b
2 parents f2688f3 + 3c19939
commit 2767d8b
Show file tree

Hide file tree

Showing 14 changed files with 792 additions and 104 deletions.
diff --git a/docs/data_prepration.md b/docs/data_prepration.md
@@ -1,9 +1,21 @@
-# Data Preperation
+# Data Preparation
 
-As part of this module we will prepare the data for model training. With our penguin dataset we need to `Encode`:
+Before applying our machine learning model, we need to convert categorical variables into numerical format using One-Hot[^1] Encoding. In our penguin dataset, we'll use pandas `get_dummies()`[^2] to encode:
 
-- The **X(features)** namely `island` and `sex`
-- The **y(target)** `species` names
+**Features (X):**
+- [x] `island` - Categorical location of penguin
+- [x] `sex` - Gender of penguin
+
+**Target (y):**
+- [x] `species` - Type of penguin (our prediction target)
+
+📝 **One-Hot Encoding** converts categorical variables into binary (0 or 1) format. For example:
+```python
+# Original: island = ['Torgersen', 'Biscoe']
+# After encoding:
+# island_Torgersen = [1, 0]
+# island_Biscoe    = [0, 1]
+```
 
 ## Encode Features and Target
 
@@ -156,3 +168,7 @@ with st.expander("Data Preparation"):
     y
 ```
 
+After successfully preprocessing our penguin dataset with appropriate encoding and feature selection, let's move forward to training our model and calculating species prediction probabilities. This step will prepare us for creating interactive visualizations in Streamlit.
+
+[^1]: <https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/>{:target=_blank}
+[^2]: <https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html>{:target=_blank}
diff --git a/docs/datset.md b/docs/datset.md
@@ -1,22 +1,32 @@
-# Dataset
+# Exploring the Penguins Dataset with Streamlit
 
-Any machine learning application requries a dataset. To keep things simple and easy for this tutorial we gonna use the data set from from 
-<https://github.com/dataprofessor/data/blob/master/penguins_cleaned.csv>.
+In this chapter, we'll explore the [Penguins dataset](https://github.com/dataprofessor/data/blob/master/penguins_cleaned.csv) using Streamlit's interactive features. 
+
+By the end of this chapter you will,
+
+- [x] Loading, preprocessing, and preparing the dataset for visualization
+- [x] Using Streamlit Expander to display dataset information and summary statistics
+- [x] Creating interactive scatter plots with Streamlit Scatter chart to identify patterns and relationships
+- [x] Enhancing the visualization with user interactions and filters
+
+By the end of this chapter, you'll have a solid understanding of how to use Streamlit for data exploration and be ready to move on to building machine learning models.
+
+## Download Datset
 
 Let us download the data locally,
 
 ```shell
-mkdir -p data
+mkdir -p "$TUTORIAL_HOME/data"
 curl -sSL \
   -o data/penguins_cleaned.csv \
   https://raw.githubusercontent.com/dataprofessor/data/refs/heads/master/penguins_cleaned.csv
 ```
 
-## Display the data
+## Displaying the Data
 
 Edit and update the `streamlit_app.py` with the following code,
 
-```py
+```py linenums="1" hl_lines="4 10-12"
 import streamlit as st
 
 # import pandas to read the our data file
@@ -47,7 +57,13 @@ git push origin master
 
 In few seconds you should notice the your application on Streamlit cloud refreshed with the changes.
 
-## Add our First widget
+## Application Overview
+
+As part of this machine learning application, we will be building a simple classification model to predict penguin species (y) using input variables (X). Using Streamlit's interactive widgets, we'll display these variables to make our application user-friendly and intuitive.
+
+This classification model will help us categorize penguins into their respective species based on their physical characteristics. The input variables and target variable will be presented through Streamlit's interface, allowing users to easily interact with and understand the prediction process.
+
+## Adding Our First Widget
 
 Let us add our first Streamlit widget [expander](https://docs.streamlit.io/develop/api-reference/layout/st.expander){:target="_blank"} to allow expand and collapse of the data frame.
 
@@ -70,9 +86,9 @@ with st.expander("Data"):
     df
 ```
 
-## Variables
+## Displaying the Variables
 
-As part of this machine learning application we will be predicting the penguin species(**y**) using the input variables(**X**). Let us have them displayed for our reference and clarity.
+Let us create and dsisplay the input features(**X**) and target(**y**).
 
 Edit and update the `streamlit_app.py` with the following code,
 
@@ -103,7 +119,7 @@ with st.expander("Data"):
 
 ## Data Visualization
 
-Let is visualize the penguins data using a [scatter plot](https://docs.streamlit.io/develop/api-reference/charts/st.scatter_chart){:target=_blank}
+Let us visualize the penguins data using a [scatter plot](https://docs.streamlit.io/develop/api-reference/charts/st.scatter_chart){:target=_blank}
 
 Edit and update the `streamlit_app.py` with the following code,
 
@@ -138,4 +154,6 @@ with st.expander("Data Visualization"):
         y="body_mass_g",
         color="species",
     )
-```
+```
+
+Now that we have our variables and target displayed for reference, let's move to the next chapter where we'll explore Streamlit's interactive features.
diff --git a/docs/deploy_to_streamlit_cloud.md b/docs/deploy_to_streamlit_cloud.md
@@ -1,22 +1,37 @@
-# Deploy Application
+# Deploying Your Streamlit App to Streamlit Cloud
 
-At the end of this module you would have understood,
+# Deploying Your Streamlit App to Streamlit Cloud
 
-- [x] Installed Python packages needed for this tutorial
-- [x] Deloyed the Application to Streamlit cloud
+Congratulations on setting up your project using the Stream Starter Kit! In this chapter, we'll take the next exciting step: deploying your bare-bones application to Streamlit Cloud. This will be the starting point of your application-building journey.
 
-Navgiate to the `$TUTORIAL_HOME` and open the project in VS Code,
+Streamlit Cloud is a platform that allows you to easily deploy, manage, and share your Streamlit applications with the world. By deploying your app to Streamlit Cloud, you'll be able to access it from anywhere, collaborate with others, and showcase your work to a wider audience.
+
+In this chapter, we'll cover the following topics:
+
+- [x] Creating a Streamlit Cloud account
+- [x] Preparing your app for deployment
+- [x] Connecting your GitHub repository to Streamlit Cloud
+- [x] Configuring your app settings on Streamlit Cloud
+- [x] Deploying your app and accessing it via a public URL
+- [x] Making updates to your app and watch application refresh automatically in few seconds
+
+By the end of this chapter, you'll have a live, publicly accessible Streamlit app that serves as a foundation for your application-building exercises. You'll be able to share the URL with others, gather feedback, and iterate on your app as you progress through the tutorial.
+
+Let's dive in and get your app deployed to Streamlit Cloud!
+
+## Navigate to app folder
+
+If you are not on `$TUTORIAL_HOME`, naviaget to it and open the project in VS Code,
 
 ```shell
 cd $TUTORIAL_HOME
 code .
 ```
 
-
-## Update packages
+## Update Python packages
 
 !!!NOTE
-    Making sure we use right package version that will allow us to deploy the application to SiS in the later module.
+    Making sure we use right package version that will allow us to deploy the same application to Snowflake in Streamlit(SiS) in the later module.
 
 Update the `requirements.txt` to be like,
 
@@ -60,13 +75,17 @@ conda activate st_ml_app
 
 !!!TIP
     - Using [direnv](https://direnv.net) declutters your environment and you can create Python virtual environment with just one like `layout_python`
-    - The project is also enabled with DevContainers, in case you want to use it with your VS Code
 
 ## Application Update
 
-Edit and update the `streamlit_app.py` as shown
+Let us start with a small change to the application code.
 
-```py
+!!!NOTE
+    For the entire tutorial we will making changes to the `streamlit_app.py` file, the code listing will show the entire source with the changes highlighted. This is avoid any copy/paste error while doing the exercises.
+
+Edit and update the `streamlit_app.py`,
+
+```py linenums="1"
 import streamlit as st
 
 st.title("🤖 Machine Learning App")
@@ -76,17 +95,20 @@ st.write("Welcome to world of Machine Learning with Streamlit.")
 
 Commit and push the code your remote Github repository.
 
-
 ## Deploy to Streamlit Community Cloud
 
-To deploy the application naviagate to <https://streamlit.io>, Sign-In and click **Create app**.
+To deploy the application naviagate to <https://streamlit.io>{:target=_blank}, Sign-In and click **Create app**.
 
 You will need the following details for deploying the app,
 
 - **GitHub Repo name** - `<your-gh-user>/st-ml-app`
 - **Branch** - `master`
 - **Main file path** - `streamlit_app.py`
-- **App URL** - choose a public url for your application e.g. `<your gh user>-ml-app`.
+- **App URL** - choose a public url for your application easiest one to avoid is to use something like `<your gh user>-ml-app`.
+
+Any commit and push to your repository will trigger a new application update. **Give it a try!**
+
+Great! Now that you have successfully deployed your bare-bones Streamlit app to Streamlit Cloud, you're ready to dive into the exciting world of building machine learning applications.
 
-Any commit and push to your repository will trigger a new application update. Give it a try!
+In the next chapter, we'll start transforming your starter app into a fully-fledged ML application.
 
diff --git a/docs/images/app-starter-kit.png b/docs/images/app-starter-kit.png
diff --git a/docs/index.md b/docs/index.md
@@ -1,18 +1,61 @@
-# Zero to Streamlit
+# Streamlit 101: From Open Source to Snowflake Native Development
 
-A quick and easy guide to get started with [Streamlit](https://streamlit.io). As part of this tutorial we will cover 
+## Get Ready to Build!
+Ready to transform a simple Streamlit application into an enterprise-grade solution in Snowflake? In this hands-on tutorial, you'll explore Streamlit's versatility while building an interactive data application. Using a Machine Learning example, you'll discover how easily Streamlit can evolve from your local machine to a fully integrated Snowflake application.
 
-- [x] Understand Basics of Streamlit
-- [x] Build Streamlit App 
-- [x] Deploy to Streamlit Cloud
-- [x] Deploy to Streamlit on Snowflake
-- [x] Use Streamlit in Snowflake Notebooks
+## What You'll Build
 
-## What is required
+Your journey will take you through the complete Streamlit development lifecycle. Starting locally, you'll progress to cloud deployment, and finally integrate with Snowflake. Get ready to unlock Streamlit's powerful features at each stage!
+
+## Your Development Journey
+
+1. **Start with Streamlit Local Development**
+    - Create your first interactive web application with Streamlit's components
+    - Set up data handling and visualization features
+    - Discover Streamlit's intuitive widget system
+    - Learn application state management
+
+2. **Deploy to Streamlit Cloud**
+    - Launch your application to the cloud
+    - Master deployment best practices
+    - Handle dependencies like a pro
+
+3. **Connect to Snowflake**
+    - Level up your app with Snowflake connectivity
+    - Implement smart data access patterns
+    - Set up secure connections
+
+4. **Go Native with Streamlit in Snowflake**
+    - Deploy directly in Snowflake
+    - Adapt your code seamlessly
+    - Leverage enterprise-grade security
+
+5. **Explore Snowflake Notebooks**
+    - Rebuild your app in a new environment
+    - Combine notebook analytics with Streamlit
+    - Discover alternative development approaches
+
+## What You'll Achieve
+
+Watch your Streamlit application evolve:
+
+- From your laptop to the cloud
+- Through Snowflake integration
+- Into native Snowflake deployment
+- With surprisingly few code changes
+
+## Your Learning Goals
+
+By the end of this tutorial, you'll:
+
+- Command Streamlit's core features
+- Master multiple deployment options
+- Integrate seamlessly with Snowflake
+- Explore various development environments
+- Create production-ready applications
+
+Ready to begin? Let's start your journey from local Streamlit development to deploying enterprise-ready applications in Snowflake! 
+
+*Note: This tutorial uses a Machine Learning example to showcase Streamlit's capabilities, but the skills you'll learn apply to any data application you want to build.*
 
-* Latest Chrome Browser
-* [Snowflake Account](https://signup.snowflake.com)
-* [Snowlake CLI](https://docs.snowflake.com/en/developer-guide/snowflake-cli/index)
-* [Visual Studio Code](https://code.visualstudio.com/)
-* [Docker for Desktop](https://www.docker.com/products/docker-desktop/)
 
diff --git a/docs/interactivity.md b/docs/interactivity.md
@@ -1,19 +1,20 @@
 # Interactivity
 
-The power of Streamlit excels when we want to add user interactivity to the visualization.
+# Interactive Features
 
-By the end of this module you will have learnt how to use the following widgets,
+In this chapter, we'll discover how to add user interactivity to our application using Streamlit widgets:
 
-- [x] [Sidebar](https://docs.streamlit.io/develop/api-reference/layout/st.sidebar){:target=blank}
-- [x] [Selectbox](https://docs.streamlit.io/develop/api-reference/widgets/st.selectbox){:target=blank}
-- [x] [Slider](https://docs.streamlit.io/develop/api-reference/widgets/st.slider){:target=blank}
-- [x] [Radio](https://docs.streamlit.io/develop/api-reference/widgets/st.radio){:target=blank}
+- [⚡ Sidebar](https://docs.streamlit.io/library/api-reference/layout/st.sidebar){:target="_blank"} for organizing widgets in a collapsible panel
+- [⚡ Select box](https://docs.streamlit.io/library/api-reference/widgets/st.selectbox){:target="_blank"} for choosing categorical features
+- [⚡ Slider](https://docs.streamlit.io/library/api-reference/widgets/st.slider){:target="_blank"} for adjusting numerical values
+- [⚡ Radio buttons](https://docs.streamlit.io/library/api-reference/widgets/st.radio){:target="_blank"} for making single-choice selections
+- [⚡ Checkbox](https://docs.streamlit.io/library/api-reference/widgets/st.checkbox){:target="_blank"} for toggle options
 
-## Build Sidebar
+These widgets will allow users to dynamically modify feature values, which will then update our model's predictions in real-time.
 
-Let us build the sidebar with the widgets to filter the input features.
+## Build Sidebar
 
-Edit and update the `streamlit_app.py` with the following code,
+Let's build the sidebar with widgets to filter the input features. We'll edit and update the streamlit_app.py with the following code:
 
 ```py linenums="1" hl_lines="39-43 50-54 62-66 74-79 86-91 93-96"
 import streamlit as st
@@ -32,8 +33,8 @@ with st.expander("Data"):
     df
     # define and display
     st.write("**X**")
-    x = df.drop("species", axis=1)
-    x
+    X = df.drop("species", axis=1)
+    X
 
     st.write("**y**")
     y = df.species
@@ -114,9 +115,9 @@ with st.sidebar:
     )
 ```
 
-## Input Features Dataframe
+## Feature Data Preprocessing
 
-__TODO__: explain what we are doing here 
+Create a DataFrame from user input and combine it with existing penguin data using pd.concat, ensuring the new data undergoes the same preprocessing steps as our training data.
 
 Edit and update the `streamlit_app.py` with the following code,
 
@@ -228,7 +229,7 @@ data = {
     "sex": gender,
 }
 input_df = pd.DataFrame(data, index=[0])
-input_penguins = pd.concat([input_df, x], axis=0)
+input_penguins = pd.concat([input_df, X], axis=0)
 
 with st.expander("Input Features"):
     st.write("**Input Penguins**")
@@ -237,3 +238,9 @@ with st.expander("Input Features"):
     input_penguins
 ```
 
+Now that we've prepared our input data, let's handle the categorical variables using encoding techniques. Feature encoding is crucial for converting categorical data into a format suitable for machine learning models.
+
+Helpful resources for encoding:
+
+- Scikit-learn encoding guide: https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
+- Pandas get_dummies documentation: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html