Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi line SQL queries from Python cause problems with GA4GH Search #3

Open
ianfore opened this issue Nov 8, 2020 · 3 comments
Open

Comments

@ianfore
Copy link
Collaborator

ianfore commented Nov 8, 2020

In my BigQuery example I can write a SQL statement which extends over multiple lines e.g.

 	searchClient = BigQuerySearchClient()
 	query = """
 		SELECT submitter_id, read_drs_id
		FROM `isbcgc-216220.onek_genomes.ssd_drs`
 		where population = 'BEB'
 		LIMIT 1"""
 	res = searchClient.runQuery(query)

This is helpful for readability.

If I submit a query formatted this way to DiscoverySearchClient the GA4GH Search implementation gives me an error.

@ianfore ianfore changed the title Resolve the issue with why multi line SQL queries cause problems with GA4GH Search Multi line SQL queries from Python cause problems with GA4GH Search Nov 8, 2020
@sdhutchins
Copy link
Collaborator

I'm going to take a look at this!

@ianfore
Copy link
Collaborator Author

ianfore commented Nov 18, 2020

This is the error

{'error': 'Bad Request',
 'message': 'JSON parse error: Illegal unquoted character ((CTRL-CHAR, code '
            '10)): has to be escaped using backslash to be included in string '
            'value; nested exception is '
            'com.fasterxml.jackson.databind.JsonMappingException: Illegal '
            'unquoted character ((CTRL-CHAR, code 10)): has to be escaped '
            'using backslash to be included in string value\n'
            ' at [Source: (PushbackInputStream); line: 1, column: 10] (through '
            'reference chain: '
            'com.dnastack.ga4gh.search.adapter.presto.SearchRequest["query"])',
 'path': '/search',
 'status': 400,
 'timestamp': '2020-11-18T13:19:32.316+0000'}

The new line characters (ascii 10) that end up in the query are illegal json and are rejected by the parser on the server.

Using the following approach should have some portability across Windows and unix as new line characters vary across platforms. It works for me but I'm going on hearsay about the portability.
query = query.replace("\n", " ").replace("\t", " ")

Note that I also had to replace the tab character that was ending up in the query string. Some will frown that I have python code indented with tabs, but as far as making the code robust goes, protecting against people like me is a good practice.

It makes best sense to handle this replacement in DiscoverySearchClient.

@ianfore
Copy link
Collaborator Author

ianfore commented Nov 18, 2020

Checked in a fix to DiscoverySearchClient.
Keeping this open pending further comment or alternative solutions.

ianfore added a commit that referenced this issue Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants