diff --git a/intro2pandas-VA.IPYNB b/intro2pandas-VA.IPYNB
new file mode 100644
index 0000000..b1f2117
--- /dev/null
+++ b/intro2pandas-VA.IPYNB
@@ -0,0 +1,1275 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Review of Pandas\n",
+ "\n",
+ "-----"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pandas is one of the most useful packages in Python. Its primary data structure is the DataFrame, which is a table of data with rows and columns.\n",
+ "\n",
+ "In this review, I will show you the basics of DataFrames. I will show you how to create, manipulate, and filter DataFrames. However, before I can do that, I also need to review the Series data structure since that is the basis of the DataFrame.\n",
+ "\n",
+ "In a future lesson, I will show you how to group data in a DataFrame."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preliminaries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'/home/data_scientist'"
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Print working directory\n",
+ "%pwd"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/data_scientist/accy575/readonly/Pcard\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd '/home/data_scientist/accy575/readonly/Pcard'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[0m\u001b[01;32mPCard_FY2010.csv\u001b[0m* \u001b[01;32mPCard_FY2012.csv\u001b[0m* \u001b[01;32mPCard_FY2014.csv\u001b[0m*\r\n",
+ "\u001b[01;32mPCard_FY2011.csv\u001b[0m* \u001b[01;32mPCard_FY2013.csv\u001b[0m* \u001b[01;32mPCard_FY2015.csv\u001b[0m*\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "%ls"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## The `Series` Data Structure"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can think of a series as a single column of data. Each element of the series has a label (called the index). \n",
+ "\n",
+ "Let's create a simple series:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 a\n",
+ "1 b\n",
+ "2 c\n",
+ "3 d\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s1 = pd.Series(['a','b','c','d'])\n",
+ "s1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'c'"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s1[2]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice that, by default, Pandas created labels for each element of my series. These default labels always start at 0. If I want to use different labels, I can do so:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "element 1 a\n",
+ "element 2 b\n",
+ "element 3 c\n",
+ "element 4 d\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s2 = pd.Series(\n",
+ " ['a','b','c','d'], \n",
+ " index = ['element 1', 'element 2', 'element 3', 'element 4']\n",
+ ")\n",
+ "s2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'b'"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s2['element 2']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### A useful function: value_counts\n",
+ "\n",
+ "You will likely find you want to count the number of times each item appears in a Pandas Series. Here's a built-in way to do it:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 a\n",
+ "1 b\n",
+ "2 c\n",
+ "3 a\n",
+ "4 b\n",
+ "5 c\n",
+ "6 a\n",
+ "7 b\n",
+ "8 c\n",
+ "9 d\n",
+ "10 e\n",
+ "11 f\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s3 = pd.Series((list('abc') * 3) + ['d', 'e', 'f'])\n",
+ "s3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "a 3\n",
+ "b 3\n",
+ "c 3\n",
+ "f 1\n",
+ "e 1\n",
+ "d 1\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s3.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## The `DataFrame` Data Structure\n",
+ "\n",
+ "You can think of a DataFrame as a table of data, with rows and columns. Alternatively, you can think of a DataFrame as a collection of Series objects, _each of which share the same row index_.\n",
+ "\n",
+ "I will now show you how to create a DataFrames from a CSV file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df2012 = pd.read_csv('PCard_FY2012.csv')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's look at the first 5 rows of the DataFrame using the head command."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Agency Number | \n",
+ " Agency Name | \n",
+ " Cardholder Last Name | \n",
+ " Cardholder First Initial | \n",
+ " Description | \n",
+ " Amount | \n",
+ " Vendor | \n",
+ " Transaction Date | \n",
+ " Posted Date | \n",
+ " Merchant Category Code (MCC) | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $60.07 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $41.29 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BEROUSEK | \n",
+ " M | \n",
+ " GENERAL PURCHASE | \n",
+ " ($180.00) | \n",
+ " A.C.E. SUPPLY & SERVICE | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $9.36 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $16.86 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Agency Number Agency Name Cardholder Last Name \\\n",
+ "0 1000 OKLAHOMA STATE UNIVERSITY BELL \n",
+ "1 1000 OKLAHOMA STATE UNIVERSITY BELL \n",
+ "2 1000 OKLAHOMA STATE UNIVERSITY BEROUSEK \n",
+ "3 1000 OKLAHOMA STATE UNIVERSITY FOCHT \n",
+ "4 1000 OKLAHOMA STATE UNIVERSITY FOCHT \n",
+ "\n",
+ " Cardholder First Initial Description Amount \\\n",
+ "0 D GENERAL PURCHASE $60.07 \n",
+ "1 D GENERAL PURCHASE $41.29 \n",
+ "2 M GENERAL PURCHASE ($180.00) \n",
+ "3 R GENERAL PURCHASE $9.36 \n",
+ "4 R GENERAL PURCHASE $16.86 \n",
+ "\n",
+ " Vendor Transaction Date Posted Date \\\n",
+ "0 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "1 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "2 A.C.E. SUPPLY & SERVICE 30-Jun-11 1-Jul-11 \n",
+ "3 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "4 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "\n",
+ " Merchant Category Code (MCC) \n",
+ "0 GROCERY STORES AND SUPERMARKETS \n",
+ "1 GROCERY STORES AND SUPERMARKETS \n",
+ "2 STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... \n",
+ "3 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "4 AUTOMOTIVE PARTS AND ACCESSORIES STORES "
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This is looking pretty good. Let's get some basic stats about our DataFrame:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(442184, 10)"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# This gives (number of rows, number of columns)\n",
+ "df2012.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Index(['Agency Number', 'Agency Name', 'Cardholder Last Name',\n",
+ " 'Cardholder First Initial', 'Description', 'Amount', 'Vendor',\n",
+ " 'Transaction Date', 'Posted Date', 'Merchant Category Code (MCC)'],\n",
+ " dtype='object')"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.columns"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Agency Number int64\n",
+ "Agency Name object\n",
+ "Cardholder Last Name object\n",
+ "Cardholder First Initial object\n",
+ "Description object\n",
+ "Amount object\n",
+ "Vendor object\n",
+ "Transaction Date object\n",
+ "Posted Date object\n",
+ "Merchant Category Code (MCC) object\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### What's the actual data type??!!\n",
+ "\n",
+ "Consider the Description field in the data. A number of you have issues removing leading and trailing spaces. That's because some of the values in the Description field are blank and Python imported those blank values as numbers ('nan') instead of empty strings.\n",
+ "\n",
+ "How can we look at the output of dtype above and actually figure out what data types are stored? Here's a little code snipped to help:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " 442147\n",
+ " 37\n",
+ "Name: Description, dtype: int64"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Take the Description column and apply the type function. \n",
+ "# Then use value_counts to see counts of the different types.\n",
+ "df2012.Description.apply(type).value_counts()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The above output shows that there are 37 problematic values in the Description column."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Renaming Columns\n",
+ "\n",
+ "What if we want to rename the columns? You can rename the columns as follows:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " AgencyNum | \n",
+ " AgencyName | \n",
+ " LastName | \n",
+ " Cardholder First Initial | \n",
+ " Description | \n",
+ " Amount | \n",
+ " Vendor | \n",
+ " Transaction Date | \n",
+ " Posted Date | \n",
+ " Merchant Category Code (MCC) | \n",
+ " PyType | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $60.07 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $41.29 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BEROUSEK | \n",
+ " M | \n",
+ " GENERAL PURCHASE | \n",
+ " ($180.00) | \n",
+ " A.C.E. SUPPLY & SERVICE | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $9.36 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $16.86 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " AgencyNum AgencyName LastName Cardholder First Initial \\\n",
+ "0 1000 OKLAHOMA STATE UNIVERSITY BELL D \n",
+ "1 1000 OKLAHOMA STATE UNIVERSITY BELL D \n",
+ "2 1000 OKLAHOMA STATE UNIVERSITY BEROUSEK M \n",
+ "3 1000 OKLAHOMA STATE UNIVERSITY FOCHT R \n",
+ "4 1000 OKLAHOMA STATE UNIVERSITY FOCHT R \n",
+ "\n",
+ " Description Amount Vendor Transaction Date \\\n",
+ "0 GENERAL PURCHASE $60.07 WM SUPERCENTER 30-Jun-11 \n",
+ "1 GENERAL PURCHASE $41.29 WM SUPERCENTER 30-Jun-11 \n",
+ "2 GENERAL PURCHASE ($180.00) A.C.E. SUPPLY & SERVICE 30-Jun-11 \n",
+ "3 GENERAL PURCHASE $9.36 NAPA AUTO PARTS 29-Jun-11 \n",
+ "4 GENERAL PURCHASE $16.86 NAPA AUTO PARTS 29-Jun-11 \n",
+ "\n",
+ " Posted Date Merchant Category Code (MCC) \\\n",
+ "0 1-Jul-11 GROCERY STORES AND SUPERMARKETS \n",
+ "1 1-Jul-11 GROCERY STORES AND SUPERMARKETS \n",
+ "2 1-Jul-11 STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... \n",
+ "3 1-Jul-11 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "4 1-Jul-11 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "\n",
+ " PyType \n",
+ "0 \n",
+ "1 \n",
+ "2 \n",
+ "3 \n",
+ "4 "
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "newColNames = {\n",
+ " 'Agency Number': 'AgencyNum', \n",
+ " 'Agency Name': 'AgencyName',\n",
+ " 'Cardholder Last Name': 'LastName'}\n",
+ "\n",
+ "df2012.rename(columns = newColNames).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Ain't life great? Let's take a look at our DataFrame again."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Agency Number | \n",
+ " Agency Name | \n",
+ " Cardholder Last Name | \n",
+ " Cardholder First Initial | \n",
+ " Description | \n",
+ " Amount | \n",
+ " Vendor | \n",
+ " Transaction Date | \n",
+ " Posted Date | \n",
+ " Merchant Category Code (MCC) | \n",
+ " PyType | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $60.07 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $41.29 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BEROUSEK | \n",
+ " M | \n",
+ " GENERAL PURCHASE | \n",
+ " ($180.00) | \n",
+ " A.C.E. SUPPLY & SERVICE | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $9.36 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $16.86 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Agency Number Agency Name Cardholder Last Name \\\n",
+ "0 1000 OKLAHOMA STATE UNIVERSITY BELL \n",
+ "1 1000 OKLAHOMA STATE UNIVERSITY BELL \n",
+ "2 1000 OKLAHOMA STATE UNIVERSITY BEROUSEK \n",
+ "3 1000 OKLAHOMA STATE UNIVERSITY FOCHT \n",
+ "4 1000 OKLAHOMA STATE UNIVERSITY FOCHT \n",
+ "\n",
+ " Cardholder First Initial Description Amount \\\n",
+ "0 D GENERAL PURCHASE $60.07 \n",
+ "1 D GENERAL PURCHASE $41.29 \n",
+ "2 M GENERAL PURCHASE ($180.00) \n",
+ "3 R GENERAL PURCHASE $9.36 \n",
+ "4 R GENERAL PURCHASE $16.86 \n",
+ "\n",
+ " Vendor Transaction Date Posted Date \\\n",
+ "0 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "1 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "2 A.C.E. SUPPLY & SERVICE 30-Jun-11 1-Jul-11 \n",
+ "3 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "4 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "\n",
+ " Merchant Category Code (MCC) PyType \n",
+ "0 GROCERY STORES AND SUPERMARKETS \n",
+ "1 GROCERY STORES AND SUPERMARKETS \n",
+ "2 STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... \n",
+ "3 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "4 AUTOMOTIVE PARTS AND ACCESSORIES STORES "
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "WTF? Why didn't the new column names stick? The reason is that the rename function returned a new DataFrame. It didn't make the changes \"in place\". To make the changes permanent, use one of the following 2 commands."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "# The following are equivalent. # Only run one of these!\n",
+ "df2012.rename(columns = newColNames, inplace = True)\n",
+ "#df2012 = df2012.rename(columns = newColNames)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " AgencyNum | \n",
+ " AgencyName | \n",
+ " LastName | \n",
+ " Cardholder First Initial | \n",
+ " Description | \n",
+ " Amount | \n",
+ " Vendor | \n",
+ " Transaction Date | \n",
+ " Posted Date | \n",
+ " Merchant Category Code (MCC) | \n",
+ " PyType | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $60.07 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $41.29 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BEROUSEK | \n",
+ " M | \n",
+ " GENERAL PURCHASE | \n",
+ " ($180.00) | \n",
+ " A.C.E. SUPPLY & SERVICE | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $9.36 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $16.86 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ " <class 'str'> | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " AgencyNum AgencyName LastName Cardholder First Initial \\\n",
+ "0 1000 OKLAHOMA STATE UNIVERSITY BELL D \n",
+ "1 1000 OKLAHOMA STATE UNIVERSITY BELL D \n",
+ "2 1000 OKLAHOMA STATE UNIVERSITY BEROUSEK M \n",
+ "3 1000 OKLAHOMA STATE UNIVERSITY FOCHT R \n",
+ "4 1000 OKLAHOMA STATE UNIVERSITY FOCHT R \n",
+ "\n",
+ " Description Amount Vendor Transaction Date \\\n",
+ "0 GENERAL PURCHASE $60.07 WM SUPERCENTER 30-Jun-11 \n",
+ "1 GENERAL PURCHASE $41.29 WM SUPERCENTER 30-Jun-11 \n",
+ "2 GENERAL PURCHASE ($180.00) A.C.E. SUPPLY & SERVICE 30-Jun-11 \n",
+ "3 GENERAL PURCHASE $9.36 NAPA AUTO PARTS 29-Jun-11 \n",
+ "4 GENERAL PURCHASE $16.86 NAPA AUTO PARTS 29-Jun-11 \n",
+ "\n",
+ " Posted Date Merchant Category Code (MCC) \\\n",
+ "0 1-Jul-11 GROCERY STORES AND SUPERMARKETS \n",
+ "1 1-Jul-11 GROCERY STORES AND SUPERMARKETS \n",
+ "2 1-Jul-11 STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... \n",
+ "3 1-Jul-11 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "4 1-Jul-11 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "\n",
+ " PyType \n",
+ "0 \n",
+ "1 \n",
+ "2 \n",
+ "3 \n",
+ "4 "
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A better approach is to rename the columns when you import the file. Like this:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Note: if you don't say header = 0, it will import these names as the first row of your dataset!\n",
+ "\n",
+ "df2012 = pd.read_csv(\n",
+ " 'PCard_FY2012.csv', \n",
+ " header = 0,\n",
+ " names = [\n",
+ " 'AgencyNum', 'AgencyName', \n",
+ " 'LastName', 'FirstInit',\n",
+ " 'Description','Amount','Vendor',\n",
+ " 'TransDate','PostDate',\n",
+ " 'MCC']\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " AgencyNum | \n",
+ " AgencyName | \n",
+ " LastName | \n",
+ " FirstInit | \n",
+ " Description | \n",
+ " Amount | \n",
+ " Vendor | \n",
+ " TransDate | \n",
+ " PostDate | \n",
+ " MCC | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $60.07 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BELL | \n",
+ " D | \n",
+ " GENERAL PURCHASE | \n",
+ " $41.29 | \n",
+ " WM SUPERCENTER | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " GROCERY STORES AND SUPERMARKETS | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " BEROUSEK | \n",
+ " M | \n",
+ " GENERAL PURCHASE | \n",
+ " ($180.00) | \n",
+ " A.C.E. SUPPLY & SERVICE | \n",
+ " 30-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $9.36 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1000 | \n",
+ " OKLAHOMA STATE UNIVERSITY | \n",
+ " FOCHT | \n",
+ " R | \n",
+ " GENERAL PURCHASE | \n",
+ " $16.86 | \n",
+ " NAPA AUTO PARTS | \n",
+ " 29-Jun-11 | \n",
+ " 1-Jul-11 | \n",
+ " AUTOMOTIVE PARTS AND ACCESSORIES STORES | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " AgencyNum AgencyName LastName FirstInit Description \\\n",
+ "0 1000 OKLAHOMA STATE UNIVERSITY BELL D GENERAL PURCHASE \n",
+ "1 1000 OKLAHOMA STATE UNIVERSITY BELL D GENERAL PURCHASE \n",
+ "2 1000 OKLAHOMA STATE UNIVERSITY BEROUSEK M GENERAL PURCHASE \n",
+ "3 1000 OKLAHOMA STATE UNIVERSITY FOCHT R GENERAL PURCHASE \n",
+ "4 1000 OKLAHOMA STATE UNIVERSITY FOCHT R GENERAL PURCHASE \n",
+ "\n",
+ " Amount Vendor TransDate PostDate \\\n",
+ "0 $60.07 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "1 $41.29 WM SUPERCENTER 30-Jun-11 1-Jul-11 \n",
+ "2 ($180.00) A.C.E. SUPPLY & SERVICE 30-Jun-11 1-Jul-11 \n",
+ "3 $9.36 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "4 $16.86 NAPA AUTO PARTS 29-Jun-11 1-Jul-11 \n",
+ "\n",
+ " MCC \n",
+ "0 GROCERY STORES AND SUPERMARKETS \n",
+ "1 GROCERY STORES AND SUPERMARKETS \n",
+ "2 STATIONERY OFFICE SUPPLIES PRINTING AND WRIT... \n",
+ "3 AUTOMOTIVE PARTS AND ACCESSORIES STORES \n",
+ "4 AUTOMOTIVE PARTS AND ACCESSORIES STORES "
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(442184, 10)"
+ ]
+ },
+ "execution_count": 41,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df2012.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Filtering\n",
+ "\n",
+ "Sometimes, you only want to work with a subset of your DataFrame. There are many ways to filter a DataFrame and I will only show you a few.\n",
+ "\n",
+ "Let's say we're only interested in cardholders whose last name is Bell."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "# The following are equivalent:\n",
+ "\n",
+ "df2012[df2012['LastName'] == 'BELL']\n",
+ "#df2012[df2012.LastName == 'BELL']\n",
+ "#df2012[df2012.loc[:,'LastName'] == 'BELL']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Multiple filters\n",
+ "\n",
+ "What if we want to use multiple filters? Use the following tips:\n",
+ "* Each condition *MUST* be grouped in parentheses\n",
+ "* Use the operators & for and, | for or, and ~ for not\n",
+ "\n",
+ "In the following, note that the date field hasn't been converted to a Python date."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "df2012[(df2012['LastName'] == 'BELL') & (df2012['Transaction Date'] == '30-Jun-11')]"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}