-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform update in parallel #110
Conversation
The update command implementation runs over files that are independent from each other. As such, the overall update operation can be trivially parallelised to speed things up. This change introduces The list of files that need to be compared/updated is collected in a first past. This list is then given to a multiprocessing pool to farm out the actual update of each individual file. The amount of parallelism is controlled through a new "jobs" parameter, command line option and environment variable. If no value is given for this option, all CPUs are used. I noticed this chance for improvement when doing a test run of the update of .po files for the Spanish translation of the CPython documentation. Local numbers in my 8-core, hyper-threaded AMD Ryzen 7 5825U: -j 1 (same as old behaviour) real 12m5.402s user 12m4.942s sys 0m0.273s -j 8 real 2m23.609s user 17m45.201s sys 0m0.460s <no value given> real 1m57.398s user 26m22.654s sys 0m0.989s Signed-off-by: Rodrigo Tobar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your wonderful improvement suggestions.
Unfortunately, I wasn't able to check the effect of the sphinx-doc I used to check the speed, as there were not many documents. However, it seems that there is no bad effect on the operation.
I will merge it.
Thanks @shimizukawa for reviewing and merging. I wonder whether you'd be inclined to make a release to PyPI with these changes at some point? No rush though, we can always install the package from this git repository in the meanwhile. |
I'm going to drop py38 and release as soon as possible. |
2.3.0 has been shipped ;) |
@shimizukawa thank you very much, this is very helpful 😄 |
Éstos son pequeños cambios que mejoran ligeramente el proceso de la construcción de la documentación, y hacen más mantenible el código a futuro. Primero, la lista de rutas relativas que hay que arreglar en los .rst de cpython ha sido simplificada, removiendo entradas innecesarias, y actualizando sólo los archivos que haga falta (en vez de ejecturas cada actualización sobre todos los archivos cada vez). Segundo, el target `build` del Makefile fue separado en sus sub-partes constituyentes, de tal modo que ahora en el step de CI donde antes teníamos una copia de los comandos `sed` ahora hay sólo una invocación a `make fix_relative_paths`. Finalmente, el PR que envié a `sphinx-intl` para realizar updates en paralelo [ya está aceptado](sphinx-doc/sphinx-intl#110) y una nueva versión ya fue publicada, por lo que la lista de requisitos ahora está actualizada para usar esa última versión (y así hacer más rápido el proceso de actualización a 3.13). --------- Signed-off-by: Rodrigo Tobar <[email protected]>
The update command implementation runs over files that are independent from each other. As such, the overall update operation can be trivially parallelised to speed things up.
This change introduces The list of files that need to be compared/updated is collected in a first past. This list is then given to a multiprocessing pool to farm out the actual update of each individual file. The amount of parallelism is controlled through a new "jobs" parameter, command line option and environment variable. If no value is given for this option, all CPUs are used.
I noticed this chance for improvement when doing a test run of the update of .po files for the Spanish translation of the CPython documentation. Local numbers in my 8-core, hyper-threaded AMD Ryzen 7 5825U: