Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Multiple Data Sources #109

Merged
merged 18 commits into from
Jun 17, 2024
Merged

Conversation

SeanLeRoy
Copy link
Contributor

@SeanLeRoy SeanLeRoy commented May 21, 2024

Description
This changeset enables users to select more than one data source in a single query. This is largely a proof-of-concept implementation of this feature as I think we could spend more time to make this much more performant like by perhaps creating views that are representations of a query's data sources.

How to review
Most of the changes to look for are in the SQLBuilder & logics.tsx

Testing
Tested on desktop and in the web adding various types of data source combination and performing queries.

@SeanLeRoy SeanLeRoy self-assigned this May 21, 2024
@SeanLeRoy SeanLeRoy changed the title Feature/multi data source support Enable Multiple Data Sources May 21, 2024
@SeanLeRoy SeanLeRoy marked this pull request as ready for review May 21, 2024 19:44
Copy link
Contributor

@aswallace aswallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having some trouble testing locally. Main thing is that when I use two different source files that should have the same columns, it shows up like this with null columns:
image
The data from these two sources came from the same file, I just downloaded them as two separate files and then re-selected them each as datasources in the same query

Also, if I add a third source I get

Error: Parser Error: syntax error at or near "JOIN"
LINE 5: ...,355,118,721"."File Path") AS "File Path"
            FROM "file-selection-Tue...

@@ -578,7 +608,7 @@ const replaceDataSourceLogic = createLogic({
deps.ctx.replaceDataSourceAction = deps.action;
const queries = selectionSelectors.getQueries(deps.getState());
const updatedQueries = queries.map((query) => {
if (query.parts.source?.name !== replacementDataSource.name) {
if (query.parts.sources[0]?.name !== replacementDataSource.name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't load for me locally until I put query.parts.source?.[0]?.name. It's possible that this was just a first time load error?

packages/web/src/services/DatabaseServiceWeb.ts Outdated Show resolved Hide resolved
@BrianWhitneyAI
Copy link
Contributor

BrianWhitneyAI commented May 22, 2024

image
Seeing this when trying to load in various datasets locally.

fro variance set

:9001/?source=%7B%22…ri%22%3A%7B%7D%7D:1 Uncaught (in promise) Catalog Error: Table with name Variance Paper Dataset does not exist!
Did you mean "pg_database"?
LINE 4:             FROM "Variance Paper Dataset"

And this for the second dataset

localhost/:1 Uncaught (in promise) Catalog Error: Table with name AIND Image Atlas - AIND Image Atlas does not exist!
Did you mean "pg_settings"?
LINE 4:             FROM "AIND Image Atlas - AIND Image Atlas"

@aswallace
Copy link
Contributor

aswallace commented Jun 10, 2024

I'm now able to select, filter, group on multiple data sources, both web and desktop. Still having issues with downloading manifests:
Web:

  • I can download successfully for non AICS sources, but when I try to download from AICS source it claims to download successfully, but then the browser gives me a "File wasn't available on site" error and doesn't actually complete the download

Desktop:

  • Still having the header issue with downloading CSVs
  • I'm still seeing the error Brian mentioned when it tries to reload an external data source, and the modal + error won't go away even if I select a new file

@SeanLeRoy
Copy link
Contributor Author

I'm now able to select, filter, group on multiple data sources, both web and desktop. Still having issues with downloading manifests: Web:

  • I can download successfully for non AICS sources, but when I try to download from AICS source it claims to download successfully, but then the browser gives me a "File wasn't available on site" error and doesn't actually complete the download
    This was an issue with the type I was sending to the databaseservice, should work now.

Desktop:

  • Still having the header issue with downloading CSVs
    Added option "HEADER" to CSV exports, should work
  • I'm still seeing the error Brian mentioned when it tries to reload an external data source, and the modal + error won't go away even if I select a new file
    Fixed, the cache of annotations was saving [] as a valid list of annotations, seems to be working now

This should all be fixed now.

@aswallace
Copy link
Contributor

This should all be fixed now.

Confirmed that these fixes all worked for me!
Only running into a couple small issues now:

  • Desktop: Sometimes when I try to add a new query/data source it adds it to the old/current query rather than creating a new one? I'm not exactly sure how to reproduce
  • If the columns aren't all exactly the same for both data sources (e.g., one is missing one column), I get errors (first screenshot is desktop, second is web):
Screenshot 2024-06-13 at 4 29 09 PM image
  • Saw that when a query with multiple data sources has a source changed, it always throws an error even if it's successful: Uncaught (in promise) Catalog Error: Table with name AICS-13, AICS-5 does not exist! Did you mean "AICS-13"? FROM "AICS-13, AICS-5"

Also two future feature requests:

  • would be really nice if we could delete queries without opening them, since clicking on them to delete means the app automatically attempts to run the query, which can be annoying if it's an old data source I don't intend to replace
  • Some errors don't render in the file list UI, so it can look like it's just loading forever if you don't know to check console

@aswallace aswallace self-requested a review June 14, 2024 17:17
@SeanLeRoy SeanLeRoy merged commit 0a53f42 into main Jun 17, 2024
3 checks passed
@SeanLeRoy SeanLeRoy deleted the feature/multi-data-source-support branch June 17, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants