This section of the Python pre-work is intended to be an introduction to Pandas. Explore the "Pandas Resources" links below for more advanced options.
Pandas is a Python library for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Pandas provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
The name is derived from the term "Panel data", an econometrics term for multidimensional structured data sets.
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
Review the Jupyter Notebook 1_pandas_jeopardy_example.ipynb
which uses the jeopardy.csv
data.
- Open the Jupyter Notebook
2_pandas_olive_questions.ipynb
which uses theolive.csv
data. - Make a copy of the notebook and name it
3_pandas_olive_answers_myname.ipynb
. (Example: my notebook would be called3_pandas_olive_answers_reshama.ipynb
). - Update the header at the top by adding in your name and date.
- Edit this notebook and complete the exercises.
groupby
objects- applying functions
- indexing
- conditional selecting; filtering
- selecting rows and columns:
.loc
,.iloc
,.ix
- working with missing data:
Null
,NaN
,None
- sorting
- merge, join