Remove CSV copies, read from original CSV in notebook #23

colevandersWands · 2024-02-15T15:58:05Z

Maintaining copies of data files in your repository is a risk. It's easy to update the original and forget to copy it everywhere else.

Better to read directly from the original generated CSV file.

Maintaining copies of data files in your repository is a risk. It's easy to update the original and forget to copy it everywhere else. Better to read directly from the original generated CSV file.

cleaning analysis folder: use generated CSV, remove unused CSV

joshuaSamuel06 · 2024-02-16T15:23:16Z

The file named original is not sorted and is the original dataset that we downloaded. I think I should only keep the latest and cleaned dataset which is not the file named original. I will delete the unwanted dataset. I cant merge this pull request as you have deleted the file we are using for analysis. But I will delete the unwanted data.

Question: I kept the other files because as you said, if someone tries to run your code it should run without errors. But I think I got it wrong. Its the main(Analysis) code that should run without errors and not the other data cleaning related code. Right?

colevandersWands · 2024-02-16T18:02:17Z

The file named original is not sorted and is the original dataset that we downloaded. I think I should only keep the latest and cleaned dataset which is not the file named original.

It's a good idea to keep all copies, especially the original. Without the original copy it's hard for anyone to know if you made a mistake cleaning and sorting it. Anyone should be able to run your script on the original data to re-generate the cleaned/sorted set

colevandersWands · 2024-02-16T18:05:37Z

I cant merge this pull request as you have deleted the file we are using for analysis.

I haven't! Check the changes in your analysis notebook, I updated the path to read from the generated file in Data Analysis. This way you never need to copy-paste or manually update any data - everything you expect someone to do manually is something they can either forget to do, or do incorrectly. Even if it's carefully documented.

I know this may seem unnecessary because the data is already cleaned, but you need to think about reproducibility. What if someone had an update to your original data? They should just need to replace the original file and re-run the scripts.

colevandersWands · 2024-02-16T18:06:33Z

Its the main(Analysis) code that should run without errors and not the other data cleaning related code. Right?

All of your scripts should run without errors. Someone checking your project shouldn't need to debug it, or even need to read all the code if they don't want to understand the details.

joshuaSamuel06 · 2024-02-19T12:38:12Z

I tried giving the relative path. But I think the relative path is not supported in VS code

colevandersWands · 2024-02-19T14:25:38Z

Can you share a screen shot of your notebook and error? I was able to run these notebooks from my VSCode

joshuaSamuel06 · 2024-02-19T14:39:14Z

…data set

colevandersWands · 2024-02-21T09:16:24Z

@joshuaSamuel06 , thanks for the screenshot! I replaced the string path with os.path.join, now it should work on windows and mac.

colevandersWands added 3 commits February 15, 2024 16:54

Delete Data_cleaning/Second_iteration/countries_of_interest_data.csv

86c38af

Delete Data_cleaning/Second_iteration/Gdp/countries_of_interest_data.csv

f367e84

read from original generated CSV file

c6d0d86

Maintaining copies of data files in your repository is a risk. It's easy to update the original and forget to copy it everywhere else. Better to read directly from the original generated CSV file.

colevandersWands requested a review from joshuaSamuel06 February 15, 2024 15:58

colevandersWands added the enhancement New feature or request label Feb 15, 2024

colevandersWands added this to the Data Cleaning milestone Feb 15, 2024

colevandersWands added 4 commits February 15, 2024 17:19

Delete unused countries_of_interest_data.csv

5568dae

Delete copy: merged_Dataset.csv

ece4aa8

use original generated CSV file

5602095

Merge pull request #1 from colevandersWands/main

2a70e2a

cleaning analysis folder: use generated CSV, remove unused CSV

colevandersWands added 3 commits February 21, 2024 09:41

replace path literal with os-generated path for cross-platform support

5b2596e

update with changes from group's main notebook

276c0a7

replaced path literal with os-generated path, ran script to generate …

cab8cd5

…data set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove CSV copies, read from original CSV in notebook #23

Remove CSV copies, read from original CSV in notebook #23

colevandersWands commented Feb 15, 2024

joshuaSamuel06 commented Feb 16, 2024 •

edited

Loading

colevandersWands commented Feb 16, 2024

colevandersWands commented Feb 16, 2024

colevandersWands commented Feb 16, 2024

joshuaSamuel06 commented Feb 19, 2024

colevandersWands commented Feb 19, 2024

joshuaSamuel06 commented Feb 19, 2024

colevandersWands commented Feb 21, 2024

Remove CSV copies, read from original CSV in notebook #23

Are you sure you want to change the base?

Remove CSV copies, read from original CSV in notebook #23

Conversation

colevandersWands commented Feb 15, 2024

joshuaSamuel06 commented Feb 16, 2024 • edited Loading

colevandersWands commented Feb 16, 2024

colevandersWands commented Feb 16, 2024

colevandersWands commented Feb 16, 2024

joshuaSamuel06 commented Feb 19, 2024

colevandersWands commented Feb 19, 2024

joshuaSamuel06 commented Feb 19, 2024

colevandersWands commented Feb 21, 2024

joshuaSamuel06 commented Feb 16, 2024 •

edited

Loading