-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/translate query to python snippet #98
Conversation
const extension = collection.uri.substring(collection.uri.lastIndexOf(".") + 1); | ||
// Currently suggest setting all fields to strings; otherwise pandas assumes type conversions | ||
// TO DO: Address different non-string type conversions | ||
const code = `df = pandas.read_${extension}('${collection.uri}').astype('str')`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the comment: Even though external data sources are read into FES app as strings, when pandas converts them to a dataframe it tries to guess datatypes unless we strictly enforce .astype
Lets add the version floor to pandas to make sure it works for people / is explicit. I'd also include a note at the top that says this only works for Python version 3.7+ |
Will do! Also since we have that dropdown and the difference isn't substantial, I could theoretically differentiate options for Python 3.8+ pandas >= 1.5 and add a separate dropdown option for pandas < 1.5 |
Added in 34b3d35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the code snippet is confusing when not using your own datasource. Having the option seems like it is supported for AICS FMS. Maybe include some logic that disables the code snippet option if the source url is undefined/not readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving brian's suggestions then merging sounds good to me 👍
…er-app into feature/translate-query-to-python-snippet
The only comment I didn't resolve was the windows part, and it's partially because with the new data source system I don't think I have access to the file paths of the data sources anymore. Unsure how best to address this |
Context
In the Code Snippet section, provide the code that would allow the File Explorer query to be converted to Python (pandas)
Should close #80
Testing
Tested manually in Python 3.8-3.10 using the Variance Paper dataset.
Works as is for Pandas 1.5-2.2. I tested manually down to Python 3.7 and pandas 1.3, but the
groupby
method changed after 1.5 so will throw an error for older versions. It's not a huge deprecation so if we wanted we could make it more dynamic or add a dropdown option for older Python/pandas versionsTo Dos:
See #99
Currently only works for external data sources & string type fields, which means it doesn't consider range filters and the sorting may not work as expected for numerical and date fields.
Screenshot