You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a typo: it refers to d instead of d2 for the mean imputation
an unnecessary lambda df2 that can be removed
Correct code, I believe:
# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)
# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())
# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
rep2=d2.rep.str.replace("[^0-9\\.]", ""),
rep3=pd.to_numeric(d2.rep2),
Support2=d2.Support.fillna(d2.Support.mean()),
)
# Finally, you can create your own function
def clean_num(x):
x = re.sub("[^0-9\\.]", "", x)
return int(x)
cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```
The text was updated successfully, but these errors were encountered:
There are two mistakes in Example 6.1
d
instead ofd2
for the mean imputationlambda df2
that can be removedCorrect code, I believe:
The text was updated successfully, but these errors were encountered: