Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Killing SQL queries #178

Open
julienbaley opened this issue Feb 18, 2016 · 13 comments
Open

Killing SQL queries #178

julienbaley opened this issue Feb 18, 2016 · 13 comments

Comments

@julienbaley
Copy link
Contributor

Is it a feature that could be added to impyla, or is there no way to do that?

I've tried to use Cursor.cancel_operation, but it didn't seem to do anything. In fact, I've found that Cursor.execute_async was not returning immediately (it basically behaved like execute). Is this a known problem? Do I need some extra configuration?

(I'm using Hive 2 and Python 3.4)

@wesm
Copy link
Contributor

wesm commented Feb 20, 2016

@julienbaley this was just fixed in 1253050

Can you confirm if this solved your problem (I'm going to cut a release next week)

@julienbaley
Copy link
Contributor Author

Ah, I should have installed master! I used the latest version from pip and then looked through the code that was already fixed.
I'll have to try on Monday first thing and I'll let you know. Thanks

@julienbaley
Copy link
Contributor Author

@wesm : it's now working, indeed, thanks :)
There was a problem with fetch* methods not blocking, I've sent a PR #180
Could you ping me when you make a new release?

@wesm
Copy link
Contributor

wesm commented Mar 15, 2016

This fix was released

@wesm wesm closed this as completed Mar 15, 2016
@julienbaley
Copy link
Contributor Author

@wesm Out of curiosity, why have you guys stopped using Git tags / GitHub releases? It makes it difficult to read what's in what release..

@wesm
Copy link
Contributor

wesm commented Mar 16, 2016

Sorry about that. I just pushed a 0.13.5 tag. The others were mostly bumping dependencies (plus 0.13.4 contained a packaging hiccup that required bumping the release number to push a new tarball)

@julienbaley
Copy link
Contributor Author

julienbaley commented Apr 25, 2016

@wesm
I guess I should just reopen this one. It is now possible to kill queries, but not in a satisfying way:
I create a cursor and call execute_async on it. In another thread, I call cancel_operation on the same cursor. This doesn't run gracefully, and both execute_async and cancel_operation raise exceptions. While I could expect the execute_async to raise one, I don't see why cancel_operation does.

Is that not what cancel_operation is for? I haven't found how I'm supposed to use it otherwise.

@wesm wesm reopened this Apr 25, 2016
@wesm
Copy link
Contributor

wesm commented Apr 25, 2016

Is cancel_operation being called before execute_async has been completed? Sounds like there may be a threadsafety issue (may need to add a lock to the Cursor object)

@julienbaley
Copy link
Contributor Author

I'm calling it after. Do I misunderstand the name of the function? From the name, I take cancel_operation to allow me to kill a running query (which it does, but raises exceptions on both ends).
This is very useful if the user realises that their query is going to take an entire day to complete because e.g. they forgot the WHERE clause.

@wesm
Copy link
Contributor

wesm commented Apr 25, 2016

Could you break this down more concretely, I'm still not 100% clear on what is happening. it sounds like:

  • Thread 1: cur.execute_async(sql) <-- this succeeds
  • Thread 2: cur.cancel_operation() <-- this fails, but is invoked after Thread 1's call was successful

Is that not the case? Or can you explain where execute_async is failing? I don't understand what "raises exceptions on both ends" means.

If execute_async fails, then it may not be appropriate to call cancel_operation (which relies on execute_async having been successful)

If I'm misunderstanding can you please give more detail on the exact control flow and the stack traces. Thank you

@julienbaley
Copy link
Contributor Author

Sure, here is what I run:
Thread 1: cur.execute_async(sql) # runs for hours
Thread 2 (while Thread 1 is running) : cur.cancel_operation # triggered by the user in a web UI, when they realise their query in thread 1 is going to make the server crash. The goal is to make Hive stop running the query in Thread 1. (does this make sense?)

(I'm not at work so I can't check, but from memory)
Thread 1 raises a ThriftProtocolException(type=4).
Thread 2 raises a HiveServer2Error.

Of course, if it happens just because I'm using cancel_operation for something it's not intended for, then fine, I can just catch the exceptions and it's ok.

@wesm
Copy link
Contributor

wesm commented Apr 25, 2016

I see. execute_async in theory should not block, so that suggests that the runAsync option of the Thrift request is not working correctly. So this is either a Hive bug or an Impyla bug -- any way you can help debug this further?

@julienbaley
Copy link
Contributor Author

execute_async is not blocking, as far as I can see (it used to be but that's been fixed a few months ago). I can try to find what's the root of the problem and come back here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants