Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

~2x speedup using numba decorator in one single place #13

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

baurst
Copy link

@baurst baurst commented May 9, 2024

Thanks a lot for this pure python port of potrace!

Looking at cProfile results, I saw that findnext() is a performance hotspot. By introducing numba, and adding a single numba.njit() decorator to findnext, a 2x speedup is achieved. I am sure there is a lot more possible, this is just the lowest of the low hanging fruits.

I did not touch the default installation and only added numba as an optional dependency. Users with the need for more performance can install using pip install potracer[numba]. I updated the Readme.md accordingly.

Profiling results:
I used this image to test: link, and was able to reduce the runtime from ~60 seconds to ~26 seconds.

Before:

44505289 function calls (43271614 primitive calls) in 61.255 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    154/1    0.000    0.000   61.266   61.266 {built-in method builtins.exec}
        1    0.069    0.069   61.266   61.266 test.py:1(<module>)
        1    0.076    0.076   61.069   61.069 test.py:6(file_to_svg)
        1    0.000    0.000   60.926   60.926 potrace.py:39(trace)
        1    0.020    0.020   46.160   46.160 potrace.py:810(bm_to_pathlist)
     3993    2.480    0.001   44.978    0.011 potrace.py:644(findnext)            <-- ~45 seconds spent here
     3993    0.006    0.000   42.493    0.011 fromnumeric.py:1881(nonzero)
     3994    0.008    0.000   42.487    0.011 fromnumeric.py:53(_wrapfunc)
     3993   42.478    0.011   42.478    0.011 {method 'nonzero' of 'numpy.ndarray' objects}
        1    0.021    0.021   14.734   14.734 potrace.py:1921(process_path)
     2988    7.397    0.002   10.508    0.004 potrace.py:1169(_calc_lon)
     2988    0.513    0.000    2.736    0.001 potrace.py:1348(_bestpolygon)
  1185378    1.745    0.000    2.159    0.000 potrace.py:1305(penalty3)
 15840196    1.517    0.000    1.517    0.000 potrace.py:1007(xprod)
     3992    0.680    0.000    0.926    0.000 potrace.py:570(findpath)
  8628578    0.773    0.000    0.773    0.000 potrace.py:853(sign)
  ...

After adding @numba.njit() to findnext():

45923494 function calls (44621093 primitive calls) in 26.091 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    481/1    0.002    0.000   26.103   26.103 {built-in method builtins.exec}
        1    0.064    0.064   26.103   26.103 test.py:1(<module>)
        1    0.071    0.071   25.788   25.788 test.py:6(file_to_svg)
        1    0.000    0.000   25.650   25.650 potrace.py:40(trace)
        1    0.020    0.020   14.359   14.359 potrace.py:1923(process_path)
        1    0.026    0.026   11.139   11.139 potrace.py:812(bm_to_pathlist)
     2988    7.186    0.002   10.198    0.003 potrace.py:1171(_calc_lon)
     3993    9.534    0.002    9.534    0.002 potrace.py:645(findnext)            <-- ~10 seconds spent here
     2988    0.493    0.000    2.675    0.001 potrace.py:1350(_bestpolygon)
  1185378    1.709    0.000    2.120    0.000 potrace.py:1307(penalty3)
 15840196    1.466    0.000    1.466    0.000 potrace.py:1009(xprod)
       49    0.001    0.000    0.800    0.016 __init__.py:1(<module>)
     3992    0.569    0.000    0.787    0.000 potrace.py:571(findpath)
     2988    0.334    0.000    0.767    0.000 potrace.py:1143(_calc_sums)
  8628578    0.759    0.000    0.759    0.000 potrace.py:855(sign)
  ...

Looking at cProfile results, I saw that method findnext() is a
performance hotspot. By introducing numba, and adding adding
a single numba.njit() decorator to findnext, 2x speedup is achieved.
@tatarize
Copy link
Owner

I probably won't be adding anything that is at all depending on anything else. The main point of the port was that I couldn't get the windows python port to work at all. The speed there is a sacrifice, but vendoring with any speed bonuses you find is likely going to be worth while where speed is an issue.

@baurst
Copy link
Author

baurst commented May 13, 2024

Thank you for the quick reply!

In the meantime I have discovered vtracer - which is even faster and offers nice python bindings (pip install vtracer).

Totally understand your hesitation to add any extra dependencies, that's why in this PR I made it completely optional. I just thought I'd add my profiling results and performance improvement as PR in case anybody wants to take this pure python approach further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants