Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factoring stalls on certain inputs #187

Closed
mhostetter opened this issue Oct 18, 2021 · 5 comments
Closed

Factoring stalls on certain inputs #187

mhostetter opened this issue Oct 18, 2021 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@mhostetter
Copy link
Owner

mhostetter commented Oct 18, 2021

As discovered in #185, factoring 2^256 - 1 stalls out which prevents testing if degree-256 polynomials over GF(2) are primitive. This would likely (?) be fixed once the quadratic sieve (#146) is added.

In [1]: import galois                                                                                                                        

# Takes forever... and may never return?
In [2]: galois.factors(2**256 - 1) 
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-6806de99fcaa> in <module>
----> 1 galois.factors(2**256 - 1)

~/repos/galois/galois/_polymorphic.py in factors(value)
    514     """
    515     if isinstance(value, (int, np.integer)):
--> 516         return integer_factor.factors(value)
    517     elif isinstance(value, Poly):
    518         return poly_factor.factors(value)

~/repos/galois/galois/_factor.py in factors(n)
    250     # Step 4
    251     while n > 1 and not is_prime(n):
--> 252         f = pollard_rho(n)  # A non-trivial factor
    253         while f is None:
    254             # Try again with a different random function f(x)

~/repos/galois/galois/_factor.py in pollard_rho(n, c)
    571         a = f(a)
    572         b = f(f(b))
--> 573         d = math.gcd(a - b, n)
    574 
    575     if d == n:

KeyboardInterrupt:
@pivis
Copy link
Contributor

pivis commented Dec 12, 2022

Apologies if all of this is obvious to you or already known/considered. I got here specifically because I really liked this project and was playing with larger GF(2**n) fields.

It looks like 2**256-1 factorization as implemented currently gets to the point where it needs to factorize 340282366920938463463374607431768211457==2**128+1 which is 59649589127497217 * 5704689200685129054721. Unfortunately, it looks like Pollard's rho algorithm is getting much more than usual time for this number for offset c=1 (not sure why - square root of the smallest factor is ~24M but it seems that it takes around 455M steps for Pollard's rho to find the factor). That said running pollard_rho with c=3 finds the factor after 19M iterations which is close to regular working times of Pollard's rho.

I only see 2 things can be done:

  1. Have as many as possible p**n-1 factorizations hardcoded (or in a sqlite db similar to conway_polys.db). A reasonable trick would be to introduce a hardcoded list of primes to check as factors where the list specifically contains large-ish primes that are known factors of 2**n-1 with n up to some value (maybe p**n-1 for other p too).
  2. Use more advanced factorization algorithms so they can get a bit further).

For the 1st approach there seems to be some very interesting public materials regarding mersenne numbers factorizations such as Factorization tables from Cunningham book.

I got a bit confused at first with the book only containing 2**n-1 factorization for odd values of n, but that turned out to be just because otherwise 2**(2k)-1 = (2**k-1) * (2**k +1) and there is a separate section of the book for the 2**n+1 factorization where n=4k. E.g. you can find 59649589127497217 there in factorization of 2**128+1.

So if just prime factors from 2**n-1 and 2**n+1 factorizations (tables called 2- and 2+) are taken from the book and saved aside to be checked e.g. after all factors under 10M and before running more advanced algorithms this should be enough to factorize 2**n-1 for some higher n numbers.

More about the notations they are using and such here. There are some things that are not very convenient (e.g. P40 meaning 40-digit prime number without specifying it, and C200 for not-factored number). The primes hidden under P40 notation are present in separate tables it seems, but they would not be necessarily needed here.

@pivis
Copy link
Contributor

pivis commented Dec 12, 2022

Specifically the link for the notations explanations: https://homes.cerias.purdue.edu/~ssw/cun/notat.txt

@mhostetter
Copy link
Owner Author

@pivis thanks for looking into this and responding. I've been hoping someone would help out here -- the prime factorization has always been a weakness in this library. Prime factorization algorithms were new to me when I started this library, so I'm open to review / suggestions / improvements to the factorization algorithms used, and their deployment order in factors().

I got here specifically because I really liked this project and was playing with larger GF(2**n) fields.

Taking too long / forever to factor 2^n and 2^n - 1 is a problem, and has been reported before. In fact, I made and documented some workarounds to avoid those factorizations, if necessary in a pinch. See the "Speed up creation of large finite field classes" note. However, this is only a stopgap and a better solution is definitely needed.

My thoughts on your ideas:

  1. Have as many as possible p**n-1 factorizations hardcoded (or in a sqlite db similar to conway_polys.db)

I think this is a good idea. Finite fields with order 2^n seem to be particularly popular, so adding a database of factorizations for 2^n and 2^n - 1 seems wise. I can work on incorporating that. Thanks for the links. I will definitely use them to construct the database.

Potentially in the future, we can add more p, as you mention. Although, which ones and how many degrees? That gets large quickly...

  1. Use more advanced factorization algorithms so they can get a bit further).

I tried implementing the Quadratic Sieve algorithm twice before, but for various reasons never got it fully working / merged. If you're aware of more powerful algorithms and are interested in contributing, I'd love the help.

I also recently noticed that the perfect power detection (Step 2 in the factors()) seems way too slow (or perhaps I don't have an appropriate expectation), see #443. My suspicion is that iroot() is the culprit. Any help in debugging / diagnosing issues there would be helpful too.

@mhostetter
Copy link
Owner Author

@pivis the parsing of those tables was more annoying than anticipated! But I added a database of all the prime factorizations in #452. This will really speed things up!

I'm also patching perfect_power() in #454 to speed up factoring powers of small primes.

Thanks for commenting here, the suggestions, and feedback. I'll release 0.3.2 with the improvement today.

@mhostetter
Copy link
Owner Author

I'm closing this. This will now be tracked in the consolidated #456.

@mhostetter mhostetter unpinned this issue Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants