GraphQL Apache Arrow Flight Transport vs ASGI idea #867
Replies: 3 comments
-
Here are the test results with the client script running on a laptop over home WIFI to the same server on a windows desktop.. Over the network Arrow really improved transport speed which was what I expected, but performance should be scaling up as data size increases. I need to talk to the Arrow folks. I suspect that list structures are not being serialized into vectors. 10 records it was 0.049 seconds faster or 80% faster Also here is the data structure that is sent across the wire.. pyarrow.Table data: [ =============== ASGI Fetch Time for 10 records:0.05800437927246094 Server side data generation time:0.0 Arrow Flight Fetch Time for 10 records:0.011480569839477539 Server side data generation time:0.0 =============== ASGI Fetch Time for 10000 records:0.17502927780151367 Server side data generation time:0.03500080108642578 Arrow Flight Fetch Time for 10000 records:0.07013273239135742 Server side data generation time:0.03199934959411621 =============== ASGI Fetch Time for 10000000 records:100.53821158409119 Server side data generation time:30.61162829399109 Arrow Flight Fetch Time for 10000000 records:59.43856120109558 Server side data generation time:30.664714813232422 |
Beta Was this translation helpful? Give feedback.
-
In an ideal scenario a forked version of GraphQLPlayground would also communicate with a GraphQL server using Arrow flight and this REST style JSON response could be abandoned. Arrow flight clients can be written in Python, Java, Javascript, Ruby, etc.. Streamlit recently switched to using Arrow to move data from Python environments into Javascript web components. |
Beta Was this translation helpful? Give feedback.
-
Here's a similar graphQL implementation using Go and Arrow Flight. GraphQL and Apache Arrow: A Match Made in Data |
Beta Was this translation helpful? Give feedback.
-
So I was thinking about GraphQL schemas over the weekend and how they fit pretty well in the Apache Arrow Column format. The Apache Arrow project has its own RPC protocol called Arrow Flight which uses gRPC + HTTP2 + custom payloads which sends columnar batches of data across the wire.
This could be incorporated into a new pyarrow.graphql file like how asgi is added in asgi.graphql.
I decided to fork the Ariadne Repo and put together some benchmarks using an Arrow GraphQL Server.
This would include:
A python arrow flight server receiving the Query and running graphql_sync to produce a python object (lists of 1k, 100k, 10 million, etc. random ints, floats and strings)
Convert the python object to an Arrow columnar object.
Send the Arrow columnar object back to the client
A python arrow client sending the GraphQL Query as an Arrow Flight Ticket.
Client converts the Arrow columnar object into a standard python dictionary.
I ran both the client and server on my local PC, but I suspect (will test later) that splitting the client and server across a network will really show how Arrow Flight improves transport speed.
Even on my local PC the Arrow Flight results were roughly 33% faster than ASGI.
Here's the code and how to run it:
https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_asgi_server.py
Standard ASGI server. Start it up with uvicorn --host=0.0.0.0 test_asgi_server:app
https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_arrow_flight_server.py
New Arrow Flight server. Start it up with python test_arrow_flight_server.py
https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_asgi_arrow_client.py
Benchmarking script. Pass in length of lists to test and server host.
python test_asgi_arrow_client.py -l 10 -s 127.0.0.1
python test_asgi_arrow_client.py -l 1000 -s 127.0.0.1
python test_asgi_arrow_client.py -l 10000000 -s 127.0.0.1
Arrow flight was faster in all 3 test cases.
10 records it was 0.002 seconds faster
10000 records it was 0.025 seconds faster
10 million records it was 14 seconds faster or 33%.
===============
ASGI Fetch Time for 10 records:0.00400233268737793
ASGI Convert JSON to Dictionary Total Time:0.005000591278076172
Server side data generation time:0.0
Actual ASGI minus server side:0.005000591278076172
Arrow Flight Fetch Time for 10 records:0.0030961036682128906
Arrow Flight Convert Arrow to Dictionary Total Time:0.0030961036682128906
Server side data generation time:0.0
Actual Arrow Flight minus server side:0.0030961036682128906
===============
ASGI Fetch Time for 10000 records:0.14105224609375
ASGI Convert JSON to Dictionary Total Time:0.14405202865600586
Server side data generation time:0.05375194549560547
Actual ASGI minus server side:0.09030008316040039
Arrow Flight Fetch Time for 10000 records:0.11061930656433105
Arrow Flight Convert Arrow to Dictionary Total Time:0.11061930656433105
Server side data generation time:0.04554128646850586
Actual Arrow Flight minus server side:0.0650780200958252
===============
ASGI Fetch Time for 10000000 records:73.41773104667664
ASGI Convert JSON to Dictionary Total Time:76.54971837997437
Server side data generation time:32.58742570877075
Actual ASGI minus server side:43.96229267120361
Arrow Flight Fetch Time for 10000000 records:60.792750120162964
Arrow Flight Convert Arrow to Dictionary Total Time:60.792750120162964
Server side data generation time:31.123136520385742
Actual Arrow Flight minus server side:29.66961359977722
Beta Was this translation helpful? Give feedback.
All reactions