Skip to content

Latest commit

 

History

History
53 lines (32 loc) · 2.58 KB

problem_set_1_pandas.md

File metadata and controls

53 lines (32 loc) · 2.58 KB

Data Analysis with Pandas - Problem Set

I recommend doing these problems in a Jupyter notebook.

Reading in Data

  1. Read in the comma-separated file "client_list.csv". Assign as variable df1.
  2. Read in the delimted file "client_list.table". Assign as variable as df2.
  3. Read in the fixed-width file "client_list.txt". Assign as variable df3
  4. Read in the comma-separated file "client_list.csv", skip the first 3 rows, and ignore the header. Do not assign to variable (just return a view).
  5. Read in the comma-separated file "client_list.csv". Set the column headers in all caps. Assign as variable df.
  6. Read in the comma-separated file "client_list_practice.csv" and only extract the columns ["FIRST_NAME","AGE","EYE_COLOR"]. Do not assign to a variable.

Slicing a Data Set

  1. Slice rows 5 through 11 of df. Can you provide two ways of doing this?
  2. Return only the columns ['LAST_NAME','AGE','HAIR_COLOR'] for df. Can you provide two ways of doing this?
  3. Combine problems 7 and 8: return rows 5 though 11 and columns ['LAST_NAME','AGE','HAIR_COLOR'] for df. Can you provide two ways of doing this?

Simple Queries

  1. Find the subset of df where the client's last name is "Smith".
  2. Find the subset of df where the client's hair color is not black.
  3. Find the subset of df where the client's hair color is red and reset the values to "ginger".

Complex Queries

  1. Find the subset of df where the clients are females older than 30 years.
  2. Repeat problem 13, but return only the hair color and eye color.
  3. Find the unique combination of hair and eye color for women older than 25 years.

Additional Dataframe Operations

  1. Perform a merge using "client_list.csv" and "customer_id_list.csv". Assign the resulting dataframe as clients.
  2. Perform a merge using clients and "purchase_log.csv" and limit the subset to only clients who made purchases. Assign the resulting dataframe as detailed_sales.
  3. Use groupby to find the client who spent the most money on purchases. Determine how much he/she spent. HINT: save the intermediate dataframe from using groupby as spenders before applying slicing to determine the client who spent the most money on purchases.
  4. (BONUS) Modify the answer to problem 18 slightly to determine exactly what items where purchased by the top spending client.

Writing Files

  1. Save detailed_sales as a csv file named "df_out.csv" with no indices.
  2. Save detailed_sales to a pickle file named "df_out.p"

Solutions

Back to Lecture