-
-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve error(s) in looping exercise solution. #670
Conversation
🆗 Pre-flight checks passed 😃This pull request has been checked and contains no modified workflow files, spoofing, or invalid commits. It should be safe to Approve and Run the workflows that need maintainer approval. |
Thank you! This used to be perfectly fine because pandas would silently ignore numeric operations, such as Setting the index column to 'country' works for some files but one of the files (americas) also has the continent column (All values the same). This could be an oversight in the generation of the datafile, since the regional tables now do not follow the same pattern. So what to do? Your suggestion together with cleaning up the data files would work. On the other hand life is full of unclean data so a more realistic approach could be to filter out the 'gdp' columns. Something like
which gives Whoops, the x labels do not look good here either, they write into each other. We need to handle them somehow. To summarise: great that you found this error! I myself haven't decided which solution is the best here and appreciate comments from other maintainers. Olav |
Thank you for catching this @davidwilby and taking the time to document and submit a PR for this! Ditto @vahtras for the additional context. I like the filtering approach and think it's useful. To muddy the waters even more here's two other ways we could do this:
to only compute the mean on columns with
to only compute the mean on columns with numeric data |
Great options @alee! I did not know about filter...like. Both are fine with me, but I guess numeric_only is closest to the original solution, so I would prefer that in this case. |
Glad to be of help! Depending on how kind you want to be to learners, it may be worth including a note in the exercise description that there's a complication/gotcha/extra challenge in this one. Along with a comment in the solution pointing out the reason for this. |
pandas now raises errors when computing the mean on a dataframe with non-numeric columns add a note to the challenge describing the issue and provide a few additional ways of solving it Thanks to @davidwilby for finding this bug and outlining possible solutions to it in swcarpentry#670 Co-authored-by: David Wilby <[email protected]> Co-authored-by: Olav Vahtras <[email protected]>
closing this in favor of #674 - thanks! |
Hi, I think there is an error in the last exercise's solution in episode 14: Looping Datasets.
I don't think that this is associated with an open issue, sorry if I'm mistaken.
The last exercise "Comparing Data" has the following solution:
However, this results in an error at in the call to
dataframe.mean()
, eg.:This should be corrected by using
index_col="country"
when reading the csv.Secondly, there is another error caused due to the
continent
column being present in thegapminder_gdp_americas.csv
(but not any of the others).I appreciate that these may be deliberate mistakes, though I suspect not since this is within the solution.