I couldn't find any benchmarks of TensorFlow Serving and wanted to know the throughput of the native gRPC client compared to a REST-based client that forwards requests to the a gRPC client (which most people are likely to use for external-facing services).
docker-compose run grpc-benchmark
Creating tensorflowservingbenchmark_server_1 ... done 10000 requests (10 max concurrent) 2487.13826442827 requests/second
docker-compose run wsgi-benchmark
Starting tensorflowservingbenchmark_server_1 ... done Creating tensorflowservingbenchmark_wsgi-client_1 ... done This is ApacheBench, Version 2.3 <$Revision: 1706008 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking wsgi-client (be patient) ... Server Software: gunicorn/19.7.1 Server Hostname: wsgi-client Server Port: 8000 Document Path: /prediction Document Length: 12 bytes Concurrency Level: 10 Time taken for tests: 7.116 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1790000 bytes HTML transferred: 120000 bytes Requests per second: 1405.31 [#/sec] (mean) Time per request: 7.116 [ms] (mean) Time per request: 0.712 [ms] (mean, across all concurrent requests) Transfer rate: 245.66 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 1 Processing: 2 7 25.4 6 953 Waiting: 2 7 25.4 6 953 Total: 3 7 25.4 6 954 Percentage of the requests served within a certain time (ms) 50% 6 66% 7 75% 7 80% 7 90% 8 95% 8 98% 10 99% 11 100% 954 (longest request)
docker-compose run tornado-benchmark
Starting tensorflowservingbenchmark_server_1 ... done Starting tensorflowservingbenchmark_tornado-client_1 ... done This is ApacheBench, Version 2.3 <$Revision: 1706008 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking tornado-client (be patient) ... Server Software: TornadoServer/4.5.3 Server Hostname: tornado-client Server Port: 8001 Document Path: /prediction Document Length: 15 bytes Concurrency Level: 10 Time taken for tests: 19.140 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 2100000 bytes HTML transferred: 150000 bytes Requests per second: 522.47 [#/sec] (mean) Time per request: 19.140 [ms] (mean) Time per request: 1.914 [ms] (mean, across all concurrent requests) Transfer rate: 107.15 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 2 Processing: 3 19 15.2 17 447 Waiting: 3 19 15.2 16 447 Total: 3 19 15.2 17 447 Percentage of the requests served within a certain time (ms) 50% 17 66% 21 75% 23 80% 25 90% 31 95% 36 98% 44 99% 50 100% 447 (longest request)