-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathDS101-03-01compare_averages.py
48 lines (34 loc) · 1.67 KB
/
DS101-03-01compare_averages.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy
import scipy.stats
import pandas as pd
def compare_averages(filename):
"""
Performs a t-test on two sets of baseball data (left-handed and right-handed hitters).
You will be given a csv file that has three columns. A player's
name, handedness (L for lefthanded or R for righthanded) and their
career batting average (called 'avg'). You can look at the csv
file via the following link:
https://www.dropbox.com/s/xcn0u2uxm8c4n6l/baseball_data.csv
Write a function that will read that the csv file into a pandas data frame,
and run Welch's t-test on the two cohorts defined by handedness.
One cohort should be a data frame of right-handed batters. And the other
cohort should be a data frame of left-handed batters.
We have included the scipy.stats library to help you write
or implement Welch's t-test:
http://docs.scipy.org/doc/scipy/reference/stats.html
With a significance level of 95%, if there is no difference
between the two cohorts, return a tuple consisting of
True, and then the tuple returned by scipy.stats.ttest.
If there is a difference, return a tuple consisting of
False, and then the tuple returned by scipy.stats.ttest.
For example, the tuple that you return may look like:
(True, (9.93570222, 0.000023))
"""
# read data
df = pd.read_csv(filename)
left_handed = df[df['handedness'] == 'L']
right_handed = df[df['handedness'] == 'R']
result = scipy.stats.ttest_ind(left_handed['avg'], right_handed['avg'], equal_var=False)
retain_null = result[1] > .05
return retain_null, result
print compare_averages(r'Data\baseball_data.csv')