Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Some P0 tests are broken #4764

Closed
2 tasks done
Tracked by #5783
iaroslav-ciupin opened this issue Jan 23, 2024 · 8 comments · Fixed by flyteorg/flytesnacks#1766
Closed
2 tasks done
Tracked by #5783

[Docs] Some P0 tests are broken #4764

iaroslav-ciupin opened this issue Jan 23, 2024 · 8 comments · Fixed by flyteorg/flytesnacks#1766
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working documentation Improvements or additions to documentation hacktoberfest

Comments

@iaroslav-ciupin
Copy link
Contributor

Description

P0 priority tests data_types_and_io.file.normalize_csv_file and data_types_and_io.folder.download_and_normalize_csv_files are broken. Error:

Message:

    Failed to get data from https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv to /tmp/flyte2vp55qml/local_flytekit/7c6f30c3abdb118b985175720c546321/biostats.csv (recursive=False).

Original exception: https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv

User error.

GET https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv indeed returns 404 Not Found. Probably, examples/tests should not rely on external data sources that are not hosted by Flyte.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@iaroslav-ciupin iaroslav-ciupin added documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers labels Jan 23, 2024
@dosubot dosubot bot added the bug Something isn't working label Jan 23, 2024
Copy link

dosubot bot commented Jan 23, 2024

🤖

Hello @iaroslav-ciupin! I'm Dosu, a bot here to assist while we wait for a human maintainer. Whether it's solving bugs, answering your questions, or helping you become a contributor, I'm here to help. Let's work together to get things sorted out!

Thank you for bringing this to our attention. It seems like the tests data_types_and_io.file.normalize_csv_file and data_types_and_io.folder.download_and_normalize_csv_files are indeed failing due to an external data source that is no longer available.

As you correctly pointed out, relying on external data sources that are not hosted by Flyte can lead to such issues. A more reliable approach would be to host the necessary test data within the Flyte repository or another controlled environment.

As for the specific functionalities of these tests, based on the context and naming conventions, it can be inferred that:

  • data_types_and_io.file.normalize_csv_file likely verifies the functionality of a method that normalizes a CSV file. Normalization in this context typically means transforming data to a common scale without distorting differences in the ranges of values or losing information.

  • data_types_and_io.folder.download_and_normalize_csv_files likely checks the functionality of a method that downloads CSV files from provided URLs and then normalizes them.

However, without the actual test files or more context, we cannot confirm this with certainty.

In the meantime, I would recommend updating the tests to use a reliable data source. If the data is not sensitive, consider adding it to the repository or hosting it in a controlled environment that the tests can access.

Please let me know if you need further assistance or have any other questions. Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 similar comment
Copy link

dosubot bot commented Jan 23, 2024

🤖

Hello @iaroslav-ciupin! I'm Dosu, a bot here to assist while we wait for a human maintainer. Whether it's solving bugs, answering your questions, or helping you become a contributor, I'm here to help. Let's work together to get things sorted out!

Thank you for bringing this to our attention. It seems like the tests data_types_and_io.file.normalize_csv_file and data_types_and_io.folder.download_and_normalize_csv_files are indeed failing due to an external data source that is no longer available.

As you correctly pointed out, relying on external data sources that are not hosted by Flyte can lead to such issues. A more reliable approach would be to host the necessary test data within the Flyte repository or another controlled environment.

As for the specific functionalities of these tests, based on the context and naming conventions, it can be inferred that:

  • data_types_and_io.file.normalize_csv_file likely verifies the functionality of a method that normalizes a CSV file. Normalization in this context typically means transforming data to a common scale without distorting differences in the ranges of values or losing information.

  • data_types_and_io.folder.download_and_normalize_csv_files likely checks the functionality of a method that downloads CSV files from provided URLs and then normalizes them.

However, without the actual test files or more context, we cannot confirm this with certainty.

In the meantime, I would recommend updating the tests to use a reliable data source. If the data is not sensitive, consider adding it to the repository or hosting it in a controlled environment that the tests can access.

Please let me know if you need further assistance or have any other questions. Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@eapolinario eapolinario added backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Jan 25, 2024
@flyteorg flyteorg deleted a comment from dosubot bot Feb 19, 2024
@10sharmashivam
Copy link
Contributor

10sharmashivam commented Oct 22, 2024

Hi,
I would like to work on this issue. Instead of depending on external resources, I propose that we host the necessary CSV files within the Flyte repository. By creating a folder like test_data/ within flytesnacks/examples/data_types_and_io/, we can store the required CSV files and then use raw GitHub links to access them in the code. This approach will ensure that our tests are stable and not affected by the availability of external files.

Implementing this change will enhance the robustness of our testing framework and make it easier to maintain moving forward.

Please let me know your thoughts on this! #take

@10sharmashivam
Copy link
Contributor

10sharmashivam commented Oct 23, 2024

Hi,

I've created the necessary CSV files in the test_data folder within flytesnacks/examples/data_types_and_io/ as part of my work on this issue. However, I've encountered an issue where the CSV files are not showing up in git status, while other files, like .py, are tracked correctly.

I checked the .gitignore file in the repository and noticed that it might be excluding CSV files. Since I cannot modify the .gitignore myself, I wanted to reach out to see if the exclusion of CSV files was intentional and, if so, what the reasoning behind it might be.

Here are some troubleshooting steps I've taken:

  • Checked for any global .gitignore configurations.
  • Verified file permissions.
  • Created a new test CSV file, which still didn't show up.

Or is there a better place to keep the test data? Like flyte's static-resources

I'd appreciate any guidance or insights you could provide regarding this. Thank you!

@davidmirror-ops
Copy link
Contributor

@eapolinario @cosmicBboy I guess referencing datasets hosted somewhere else helps keep the repo lightweight, but also may affect the example when those datasets are not available anymore in the hardcoded location. What do you think about hosting datasets used in examples in flytesnacks or static-resources?

@eapolinario
Copy link
Contributor

Keeping the necessary datasets in the repo (as long as they are not big files) is the sensible choice. Here's a PR to allow csv files in flytesnacks: flyteorg/flytesnacks#1763

@10sharmashivam
Copy link
Contributor

Thank you @eapolinario @davidmirror-ops for the insights! I agree that hosting the datasets within the repository is a sensible approach, especially to avoid issues with broken links in the future. Also, thanks for creating the PR to allow CSV files in Flytesnacks.

Once the PR#1763 is merged, I’ll go ahead and add the necessary CSV files for the normalization examples. If there’s anything else you think I should consider while preparing my changes, please let me know!

@10sharmashivam
Copy link
Contributor

@davidmirror-ops Since PR for allowing .csv file to be added got merged, I also created test_data folder and and added .csv files. Here is the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working documentation Improvements or additions to documentation hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants