Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/translate query to python snippet #98

Merged
merged 8 commits into from
May 22, 2024

Conversation

aswallace
Copy link
Contributor

@aswallace aswallace commented May 13, 2024

Context

In the Code Snippet section, provide the code that would allow the File Explorer query to be converted to Python (pandas)
Should close #80

Testing

Tested manually in Python 3.8-3.10 using the Variance Paper dataset.
Works as is for Pandas 1.5-2.2. I tested manually down to Python 3.7 and pandas 1.3, but the groupby method changed after 1.5 so will throw an error for older versions. It's not a huge deprecation so if we wanted we could make it more dynamic or add a dropdown option for older Python/pandas versions

To Dos:

See #99
Currently only works for external data sources & string type fields, which means it doesn't consider range filters and the sorting may not work as expected for numerical and date fields.

Screenshot

image

@aswallace aswallace requested a review from SeanLeRoy May 13, 2024 17:44
const extension = collection.uri.substring(collection.uri.lastIndexOf(".") + 1);
// Currently suggest setting all fields to strings; otherwise pandas assumes type conversions
// TO DO: Address different non-string type conversions
const code = `df = pandas.read_${extension}('${collection.uri}').astype('str')`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment: Even though external data sources are read into FES app as strings, when pandas converts them to a dataframe it tries to guess datatypes unless we strictly enforce .astype

@SeanLeRoy
Copy link
Contributor

Works as is down to Pandas 1.5. Was unsure if we want to enforce versioning in the code (e.g., change it to pip install pandas>=1.5). I tested manually down to Python 3.7 and pandas 1.3, but the groupby method changed after 1.5 so will throw an error for older versions. It's not a huge deprecation so if we wanted we could make it more dynamic

Lets add the version floor to pandas to make sure it works for people / is explicit. I'd also include a note at the top that says this only works for Python version 3.7+

@aswallace
Copy link
Contributor Author

aswallace commented May 13, 2024

Lets add the version floor to pandas to make sure it works for people / is explicit. I'd also include a note at the top that says this only works for Python version 3.7+

Will do! Also since we have that dropdown and the difference isn't substantial, I could theoretically differentiate options for Python 3.8+ pandas >= 1.5 and add a separate dropdown option for pandas < 1.5

@aswallace
Copy link
Contributor Author

Lets add the version floor to pandas

Added in 34b3d35

Copy link
Contributor

@BrianWhitneyAI BrianWhitneyAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the code snippet is confusing when not using your own datasource. Having the option seems like it is supported for AICS FMS. Maybe include some logic that disables the code snippet option if the source url is undefined/not readable?

packages/core/entity/FileExplorerURL/index.ts Outdated Show resolved Hide resolved
packages/core/entity/FileExplorerURL/index.ts Show resolved Hide resolved
Copy link
Contributor

@SeanLeRoy SeanLeRoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving brian's suggestions then merging sounds good to me 👍

@aswallace
Copy link
Contributor Author

The only comment I didn't resolve was the windows part, and it's partially because with the new data source system I don't think I have access to the file paths of the data sources anymore. Unsure how best to address this

@aswallace aswallace requested a review from SeanLeRoy May 21, 2024 21:00
@aswallace aswallace merged commit ab23c6b into main May 22, 2024
3 checks passed
@aswallace aswallace deleted the feature/translate-query-to-python-snippet branch May 22, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate query into Python Snippet
4 participants