chapter 6: two mistakes in Example 6.1 #8

damian0604 · 2023-02-24T13:59:51Z

There are two mistakes in Example 6.1

a typo: it refers to d instead of d2 for the mean imputation
an unnecessary lambda df2 that can be removed

Correct code, I believe:

# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)

# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())

# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
    rep2=d2.rep.str.replace("[^0-9\\.]", ""),
    rep3=pd.to_numeric(d2.rep2),
    Support2=d2.Support.fillna(d2.Support.mean()),
)

# Finally, you can create your own function
def clean_num(x):
    x = re.sub("[^0-9\\.]", "", x)
    return int(x)

cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter 6: two mistakes in Example 6.1 #8

chapter 6: two mistakes in Example 6.1 #8

damian0604 commented Feb 24, 2023

chapter 6: two mistakes in Example 6.1 #8

chapter 6: two mistakes in Example 6.1 #8

Comments

damian0604 commented Feb 24, 2023