Google Summer Of Code '17 (PSF)
- To identify set of regressed benchmarks in the performance suite.
- To find reasons for this regression
- To fix the benchmark suite itself whereever possible
- To come-up with new benchmarks for modules that do not have benchmarks yet.
NOTE: For parallel reports on advances see here
This repository contains source files of the project but it is suggested to use the blog for navigation.
- NOTE: For system-specs see here
Regressed benchmarks:
Benchmark py2 py3 Times-slow
python_startup_no_site 9.42 ms 26.0 ms 2.76x slower (+176%)
python_startup 19.2 ms 42.3 ms 2.21x slower (+121%)
spectral_norm 194 ms 259 ms 2.20x slower (+120%)
sqlite_synth 6.70 us 8.49 us 1.27x slower (+27%)
crypto_pyaes 158 ms 199 ms 1.26x slower (+26%)
xml_etree_parse 193 ms 242 ms 1.25x slower (+25%)
xml_etree_iterparse 154 ms 179 ms 1.16x slower (+16%)
go 439 ms 493 ms 1.12x slower (+12%)
- Startup-time
Startup-time is the most regressed benchmark.
+-------------------------+----------+-------------------------------+
| python_startup | 19.2 ms | 42.3 ms: 2.21x slower (+121%) |
+-------------------------+----------+-------------------------------+
| python_startup_no_site | 9.42 ms | 26.0 ms: 2.76x slower (+176%) |
+-------------------------+----------+-------------------------------+
- Import time of specific modules:
* encodings + encodings.utf_8 + encodings.latin_1 took 2.5ms
* io + abc + _wakrefset took 1.2ms
* _collections_abc took 2.1ms
* sysconfig + _sysconfigdata took 0.9ms
-
py2 uses "read method of file object" which is done away with in py3. It imports
io
module(the prime reason for it being slow) thenTextIOWrapper
uses encoding passed by constructor. So following were the suggested solutions:i) improving time for 'abc module' areas like these because
getattr(value, "__isabstractmethod__", False)
is called for all class attributes of ABCs (including subclass of ABCs). It's slow because:-
When the value is not abstractmethod, AttributeError is raised and cleared internally.
-
getattr uses method cache (via PyType_Lookup), but
__isabstractmethod__
is mostly in instance dict. So checking method cache is mostly useless efforts.
ii) avoiding import of whole sysconfig and only of the variables required.This is fixed in this "bpo" thread.
iii) avoiding the import of uncommon modules.
iv) But if the module excluded from startup is very common ,The module will be imported while Python "application" startup anyways,So faster import time is better than avoiding importing for such common modules,Like "functools, pathlib, os, enum, collections, re."
-
-
Idea of parallelizing marshalling :
If we could somehow paralleize marshalling
and thus "loading"(not "executing") than it could speedup import and henceforth "startup-time".But it won't improve things drastically as loading is a small fraction of execution time
Eg:-for complex module like "typings" -> "29x" greater but for smaller ones like "ABC"-> "4x" greater.
-
NOTE: For comparison relating to
C
portion of code see here -
Next I tried writing a
C version
ofWeakSet
(here) but it wasn't approved by Raymond Hettinger as it would have been difficult to maintain. -
On Nick Coughlan's Suggestion I tried if we could:
- Push "commonly-imported" modules to a separate zip archive
- Seed
sys.modules
with contents of that archive - freeze the import of those modules
I wrote a python-script
to create a zip-archive
from common modules and ran the different versions of python inside docker
containers.See this blog entry for more details. But it was realised that this might not reap huge benefits because in writing a custom-importer we are already using import of some common modules and also python by itself adds a .zip of library in sys.path.
-
Lazy- Loading I used a custom lazy-loader/importer for import of modules during “startup”, so as to prevent import of module which are not necessary and explore the possibility of possible decrease in startup time.Here's blog entry explaining the implementation and the code for
custom lazy-loader/importer.
But lazy-loading didn't decreased the startup-time and rather increased it slightly(Mostly because thelazy-loader
already does a import of common modules.). -
Also there was a suggestion to write
cython modules
for common modules.But it wasn't pursued due to lack of time.
- Optimizing "logging" benchmark py3 showed regression on logging benchmark,
| logging_format | 57.7 us | 75.1 us: 1.30x slower (+30%) |
+-------------------------+----------+-------------------------------+
| logging_silent | 818 ns | 1.00 us: 1.22x slower (+22%) |
+-------------------------+----------+-------------------------------+
| logging_simple | 46.2 us | 70.0 us: 1.51x slower (+51%) |
This was fixed by this PR. See this blog entry for details.
- Other benchmarks
pickle
,sqlite_synth
,crypto_pyaes
:
Pickle/Unpickle
was realised of lower practical importance.here- For
sqlite_synth
see here - For
crypto_pyaes
see here
- There was some work on
FAT-python
but it wasn't pursued as it didn't passed the test-suite and generated incorrect byte codes. PRs merged #13 and #12.
The present performance-suite lacks benchmarks for many of the common library modules.
-
zlib
This benchmark tries to measure basic
Compression
andDecompression
usingzlib
.This showed significant regression as length of binary string increased. The code and the details to reproduce are in this blog entry. -
math
This benchmark measures basic math operations. And here here
py2
comes out faster. Here's blog entry for code and statistics. -
smtplib:
The smtplib module defines an SMTP client session object that can be used to send mail to any Internet machine with an SMTP or ESMTP listener daemon.This benchmark measure that performance. Here's the entry for code.Again py3 regresses.
-
Concurrency benchmark and concurency primitives:
There are primarily two concurency primitives offered by
python
that arethreading
andmultiprocessing
.This benchmark tries to measure a samenumber-crunching
task when done concurrently by "threading" and "multiprocessing" separately. Here's the benchmark that measures concurency and also the cost of creatingthreading-objects
(It is not of much use as such).
- Implementation of
cython
modules. - Improving the portion of
ABC's
code as pointed above - Accumlating use-cases for newer benchmarks.
- @Botanic(Matthew Lagoe)
- Victor Stinner
- Inada Naoki
- James Lopeman
- Ezio Melloti
- And everyone in python community :)