Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter 6: two mistakes in Example 6.1 #8

Open
damian0604 opened this issue Feb 24, 2023 · 0 comments
Open

chapter 6: two mistakes in Example 6.1 #8

damian0604 opened this issue Feb 24, 2023 · 0 comments

Comments

@damian0604
Copy link

There are two mistakes in Example 6.1

  • a typo: it refers to d instead of d2 for the mean imputation
  • an unnecessary lambda df2 that can be removed

Correct code, I believe:

# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)

# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())

# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
    rep2=d2.rep.str.replace("[^0-9\\.]", ""),
    rep3=pd.to_numeric(d2.rep2),
    Support2=d2.Support.fillna(d2.Support.mean()),
)

# Finally, you can create your own function
def clean_num(x):
    x = re.sub("[^0-9\\.]", "", x)
    return int(x)

cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant