-
Notifications
You must be signed in to change notification settings - Fork 0
/
ClassifAI_ 2 - Pandas
1 lines (1 loc) · 19 KB
/
ClassifAI_ 2 - Pandas
1
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"ClassifAI: 2 - Pandas","provenance":[{"file_id":"1RwH9tpqywnZbMuslhcVxUKhiRKdE42G6","timestamp":1654019570492},{"file_id":"1JaG-uPtzR1sCgUVrxjjG8hu1bcy9QMHf","timestamp":1653848365892},{"file_id":"https://github.com/google/eng-edu/blob/master/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb","timestamp":1633812417552}],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"code","metadata":{"id":"ZmL0l551Iibq"},"source":["import numpy as np\n","import pandas as pd"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"RutIK84wIp1S"},"source":["## Creating a DataFrame\n","\n","The following code cell creates a simple DataFrame containing 10 cells organized as follows:\n","\n"," * 5 rows\n"," * 2 columns, one named temperature and the other named activity\n","\n","The following code cell instantiates a pd.DataFrame class to generate a DataFrame. The class takes two arguments:\n","\n"," * The first argument provides the data to populate the 10 cells. The code cell calls np.array to generate the 5x2 NumPy array.\n"," * The second argument identifies the names of the two columns."]},{"cell_type":"code","metadata":{"id":"FNZsPOgSD4F2","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1653848115059,"user_tz":420,"elapsed":8,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"ee4392b3-604e-4da6-c605-d2eabcc52224"},"source":["# Create and populate a 5x2 NumPy array.\n","my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])\n","\n","# Create a Python list that holds the names of the two columns.\n","my_column_names = ['temperature', 'activity']\n","\n","# Create a DataFrame.\n","my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)\n","\n","# Print the entire DataFrame\n","print(my_dataframe)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" temperature activity\n","0 0 3\n","1 10 7\n","2 20 9\n","3 30 14\n","4 40 15\n"]}]},{"cell_type":"markdown","metadata":{"id":"NJ-I78_7OFVs"},"source":["## Adding a new column to a DataFrame\n","\n","You may add a new column to an existing pandas DataFrame just by assigning values to a new column name. For example, the following code creates a third column named `adjusted` in `my_dataframe`: "]},{"cell_type":"code","metadata":{"id":"JEBZyMdEOngx","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1653848162210,"user_tz":420,"elapsed":252,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"4a0b7455-aa65-411b-e3e9-d38a3742923c"},"source":["# Create a new column named adjusted.\n","my_dataframe[\"adjusted\"] = my_dataframe[\"activity\"] + 2\n","\n","# Print the entire DataFrame\n","print(my_dataframe)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" temperature activity adjusted\n","0 0 3 5\n","1 10 7 9\n","2 20 9 11\n","3 30 14 16\n","4 40 15 17\n"]}]},{"cell_type":"markdown","metadata":{"id":"RJ2aziCR5th2"},"source":["## Specifying a subset of a DataFrame\n","\n","Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame. "]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":112},"id":"gxiukyFE01N-","executionInfo":{"status":"ok","timestamp":1653848189686,"user_tz":420,"elapsed":366,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"840631ee-fcce-40e1-c49a-e273aa201933"},"source":["my_dataframe.head(2)\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" temperature activity adjusted\n","0 0 3 5\n","1 10 7 9"],"text/html":["\n"," <div id=\"df-5652563b-052b-4627-930a-4f07c5db82d6\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>temperature</th>\n"," <th>activity</th>\n"," <th>adjusted</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>0</td>\n"," <td>3</td>\n"," <td>5</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>10</td>\n"," <td>7</td>\n"," <td>9</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-5652563b-052b-4627-930a-4f07c5db82d6')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-5652563b-052b-4627-930a-4f07c5db82d6 button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-5652563b-052b-4627-930a-4f07c5db82d6');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":4}]},{"cell_type":"code","metadata":{"id":"RIO91Fu65s6k","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1653848266682,"user_tz":420,"elapsed":260,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"fb41d5e4-e887-461d-c711-e662a59ee0d1"},"source":["print(\"Rows #0, #1, and #2:\")\n","print(my_dataframe.head(3), '\\n')\n","\n","print(\"Row #2:\")\n","print(my_dataframe.iloc[[2]], '\\n')\n","\n","print(\"Rows #1, #2, and #3:\")\n","print(my_dataframe[1:4], '\\n')\n","\n","print(\"Column 'temperature':\")\n","print(my_dataframe['temperature'])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Rows #0, #1, and #2:\n"," temperature activity adjusted\n","0 0 3 5\n","1 10 7 9\n","2 20 9 11 \n","\n","Row #2:\n"," temperature activity adjusted\n","2 20 9 11 \n","\n","Rows #1, #2, and #3:\n"," temperature activity adjusted\n","1 10 7 9\n","2 20 9 11\n","3 30 14 16 \n","\n","Column 'temperature':\n","0 0\n","1 10\n","2 20\n","3 30\n","4 40\n","Name: temperature, dtype: int64\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7Qt59Sjc1P8e","executionInfo":{"status":"ok","timestamp":1653848306927,"user_tz":420,"elapsed":347,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"e0754b9f-8b5a-4b8a-b1f2-8bcfa12d55e1"},"source":["my_dataframe.loc[0] #loc stands for location, usually used with a string to find a row or column"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["temperature 0\n","activity 3\n","adjusted 5\n","Name: 0, dtype: int64"]},"metadata":{},"execution_count":6}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"HXe5TytK04E_","executionInfo":{"status":"ok","timestamp":1653848307648,"user_tz":420,"elapsed":4,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"9b39b7dc-498e-4a07-9e25-726746f7b1bc"},"source":["my_dataframe.loc[:, \"temperature\"] #just like numpy, the colon means the function will look at a column as opposed to a row"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0 0\n","1 10\n","2 20\n","3 30\n","4 40\n","Name: temperature, dtype: int64"]},"metadata":{},"execution_count":7}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"JON3sou80IEt","executionInfo":{"status":"ok","timestamp":1653848312969,"user_tz":420,"elapsed":349,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"df7d7f2b-ffc8-48a3-c2fa-47472b0cf4ff"},"source":["my_dataframe.iloc[0] #iloc stands for integer location"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["temperature 0\n","activity 3\n","adjusted 5\n","Name: 0, dtype: int64"]},"metadata":{},"execution_count":8}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7LPyH-R20LXT","executionInfo":{"status":"ok","timestamp":1653848314900,"user_tz":420,"elapsed":6,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"d742a11a-1525-46a0-e717-6c7266f1dab9"},"source":["my_dataframe.iloc[:, 2] #just like numpy, the colon means the function will look at a column as opposed to a row"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0 5\n","1 9\n","2 11\n","3 16\n","4 17\n","Name: adjusted, dtype: int64"]},"metadata":{},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"4dF2HAzf2li_"},"source":["#Reading a CSV File\n"]},{"cell_type":"code","metadata":{"id":"mEpidTCU2jFF"},"source":["df = pd.read_csv(\"https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv\") #also can use a file name for local pandas usage!"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":424},"id":"yExMSZyG2yaC","executionInfo":{"status":"ok","timestamp":1653848355030,"user_tz":420,"elapsed":256,"user":{"displayName":"Leo Huang","userId":"10046166245660810812"}},"outputId":"60e65c82-644d-49f9-a2f0-4f8c16ee65c7"},"source":["df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" 5.1 3.5 1.4 0.2 Iris-setosa\n","0 4.9 3.0 1.4 0.2 Iris-setosa\n","1 4.7 3.2 1.3 0.2 Iris-setosa\n","2 4.6 3.1 1.5 0.2 Iris-setosa\n","3 5.0 3.6 1.4 0.2 Iris-setosa\n","4 5.4 3.9 1.7 0.4 Iris-setosa\n",".. ... ... ... ... ...\n","144 6.7 3.0 5.2 2.3 Iris-virginica\n","145 6.3 2.5 5.0 1.9 Iris-virginica\n","146 6.5 3.0 5.2 2.0 Iris-virginica\n","147 6.2 3.4 5.4 2.3 Iris-virginica\n","148 5.9 3.0 5.1 1.8 Iris-virginica\n","\n","[149 rows x 5 columns]"],"text/html":["\n"," <div id=\"df-74edc212-c6e4-4c38-bf3d-20e2ce463093\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>5.1</th>\n"," <th>3.5</th>\n"," <th>1.4</th>\n"," <th>0.2</th>\n"," <th>Iris-setosa</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>4.9</td>\n"," <td>3.0</td>\n"," <td>1.4</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>4.7</td>\n"," <td>3.2</td>\n"," <td>1.3</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>4.6</td>\n"," <td>3.1</td>\n"," <td>1.5</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>5.0</td>\n"," <td>3.6</td>\n"," <td>1.4</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>5.4</td>\n"," <td>3.9</td>\n"," <td>1.7</td>\n"," <td>0.4</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>...</th>\n"," <td>...</td>\n"," <td>...</td>\n"," <td>...</td>\n"," <td>...</td>\n"," <td>...</td>\n"," </tr>\n"," <tr>\n"," <th>144</th>\n"," <td>6.7</td>\n"," <td>3.0</td>\n"," <td>5.2</td>\n"," <td>2.3</td>\n"," <td>Iris-virginica</td>\n"," </tr>\n"," <tr>\n"," <th>145</th>\n"," <td>6.3</td>\n"," <td>2.5</td>\n"," <td>5.0</td>\n"," <td>1.9</td>\n"," <td>Iris-virginica</td>\n"," </tr>\n"," <tr>\n"," <th>146</th>\n"," <td>6.5</td>\n"," <td>3.0</td>\n"," <td>5.2</td>\n"," <td>2.0</td>\n"," <td>Iris-virginica</td>\n"," </tr>\n"," <tr>\n"," <th>147</th>\n"," <td>6.2</td>\n"," <td>3.4</td>\n"," <td>5.4</td>\n"," <td>2.3</td>\n"," <td>Iris-virginica</td>\n"," </tr>\n"," <tr>\n"," <th>148</th>\n"," <td>5.9</td>\n"," <td>3.0</td>\n"," <td>5.1</td>\n"," <td>1.8</td>\n"," <td>Iris-virginica</td>\n"," </tr>\n"," </tbody>\n","</table>\n","<p>149 rows × 5 columns</p>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-74edc212-c6e4-4c38-bf3d-20e2ce463093')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-74edc212-c6e4-4c38-bf3d-20e2ce463093 button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-74edc212-c6e4-4c38-bf3d-20e2ce463093');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":11}]},{"cell_type":"markdown","metadata":{"id":"VdP36XNg2ldq"},"source":[""]}]}