forked from radii/msieve
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changes
2003 lines (1919 loc) · 98 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Vesrion 1.54:
- Merged the CPU lanczos code from the msieve-lacuda branch, which
includes long vector support; compiling with long vectors on
a modern machine with a modern gcc has yielded a 10-15%
speedup
- NFS Polynomial selection updates from many people:
- improved estimate of final e-value
- improved handling of the factors of leading alg. coefficient
- reduced verbosity
- search bounds informed by years of mersenneforum work
- Removed exit() call from error path in NFS filtering (thanks jrcrombie)
- Made the maximum merge weight configurable (thanks Lionel Debroux)
- Updated the makefile to build GPU code for more recent GPUs
- Updated CUB version
- Changed the default NFS filtering target density to 90
Version 1.53: 11/11/16
- Replaced the GPU sorting library with calls to CUB; this is more
compatible with the latest GPU models and works with CUDA
toolkits more recent than v5.5, which the old library was
stuck with
- Added primality proving of factors found (thanks David Cleaver /
Brian Gladman)
- Added a primality test for factors of NFS relations; apparently
the sieving tools will occasionally output relations with
factors that are composite, and this may be the cause of
mysterious problems with extremely large jobs
- Modified the matrix build in the NFS linear algebra to always use
quadratic characters near 2^32, to always choose them so they
do not occur in relations, and to decide the number of
characters at runtime with a compiled-in maximum. This paranoia
should prevent huge factorizations from failing in the
square root, like almost happened to Greg Childers
- Added an old fix for potential memory corruption doing NFS filtering
with extremely large datasets, since the patch actually
saved a factorization from failing (thanks Greg Childers)
- Fixed a memory corruption bug in the linear algebra when using many
MPI processes (thanks Ilya Popovyan)
- Fixed a bug in factor base generation that stopped NFS line sieving
from working (thanks DDEM)
- Fixed more problems with >4GB files in windows (thanks Nooby)
- Fixed a bug in the NFS line sieve (thanks Erik Branger)
- Simplify the inner loop of the QS hashtable sieving code;
modern processors run QS noticeably faster now
(thanks Mick Francis)
- Added a slight speedup for NFS relation parsing (thanks Lionel Debroux)
- Added an option to go straight to NFS skipping all preprocessing
(thanks Paul Zimmermann / Catalin Patulea)
- Fix a buffer overflow rating degree-8 polynomials (thanks jyb)
- Capped the number of GPU poly select stage 1 threads at 4
- Made the default compile flags include '-march=native' since it's
unlikely Apple's gcc still doesn't support it
- Overhaul the Visual Studio builds (thanks Brian Gladman)
Version 1.52: 2/4/14
- Added a major overhaul of the liner algebra; this uses the
thread pool with tighter coordination between threads
and much-reduced memory use. On a fast modern processor
with big caches the new code is 30% faster, and the speedup
increases with more threads
- Added the ability to start the NFS linear algebra after using
the CADO-NFS suite for filtering (thanks Paul Zimmermann)
- Fixed a bug in the MPI code that makes Nx1 and 1xN grids work again
- Fixed 32-bit overflow problems in the NFS square root (thanks Greg
Childers)
- Fixed problems parsing arguments in the NFS square root (thanks
Carlos Pinho / Lionel Debroux)
- Fixed a typo in the threadpool that caused races when shutting down
- Fixed the E-value estimate for poly selection (thanks Serge Batalov)
- Turned off asm in QS when compiling debug builds
- Fixed a buffer overflow parsing large inputs (thanks Ryan Propper)
- Fixed Windows CPU time measurement (thanks Roman Rusakov)
- Modified makefiles to account for environment vars in CUDA 5.5
Version 1.51: 2/17/13
- Performed another massive overhaul of the GPU polynomial selection
code; stage 1 now runs dozens to *hundreds* of times faster
on a GPU
- Added a thread pool implementation, and made GPU polynomial
selection multithread-capable. Eventually the CPU code should
be overhauled to look more like the GPU code, it will
probably be able to run several times faster
- Split stage 2 of NFS polynomial selection into the size optimization
and root optimization portions, which can be invoked inde-
pendently from the demo binary
- Added a caching layer for reading in the matrix, to reduce the
amount of disk IO required by an MPI grid (thanks Greg Childers)
- Changed the main API to allow free-text strings for configuring NFS,
then allowed all the parameters for polynomial selection to be
specified when calling the library
- Finally overhauled the Makefile to avoid everyone having to edit it
- Fixed a potential 32-bit overflow in the hashtable code, that could
occur for extremely large problems (thanks Paul Zimmermann)
- Fixed the computation of alpha value when polynomials are linear
(thanks Paul Zimmermann / Shi Bai)
- Fixed a line sieve initialization problem (thanks Ilya Popovyan)
Version 1.50: 2/3/12
- NFS polynomial selection changes:
- Added a massive overhaul of the stage 1 GPU code by Jayson
King, making it both much simpler and much faster
- Added a second size optimization pass when searching for
degree 6 polynomials. This makes stage 2 much more
reliable for very large problems
- Fixed a bug translating the degree 6 root sieve to
degree 5
- Fixed a long-standing problem initializing the root
sieve so that it will correctly detect roots modulo
small prime powers
- Patches from Jayson King: use a custom hashtable structure
to greatly speed up the stage 1 CPU code
- Patches from Jayson King: use a sieve to find larger
leading algebraic coefficients
- Patch from Jayson King: allow stage 2 to be interrupted
with Ctrl-C
- Modified the NFS code to remove almost all dependencies on mp_t
functions, using GMP instead
- Patch from Ilya Popovyan: make all MPI processes contribute to
a single vector-vector operation in the liner algebra,
instead of just the MPI processes in a single grid row.
This makes the entire Lanczos iteration up to 20% faster
for very large problems and grid sizes
- Patch from Brian Gladman: add ZLIB code to windows build
- Patches from Brian Gladman: lots of changes to the Visual Studio
projects; only MSVC10 is supported now
- Patch from Jayson King: fix longstanding problems that would
crop up rarely in tinyQS code
Version 1.49: 6/16/11
- Generalized the degree 6 root sieve to also handle degree
4 and 5. This makes stage 2 of NFS polyomial selection
hugely faster for very large problems
- Allowed the target matrix density within NFS filtering to
be specified from the demo binary (multiple people have
asked for this and I'd been too lazy to supply it)
- Modified the MPI code to flag in-place gather and scatter
operations as such (thanks Greg Childers)
- Performed a major overhaul of the various Readme files
- Fixed an erroneous error check in the MPI code (thanks
Ilya Popovyan)
- Fixed an MPI race condition in the Lanczos restart code
(thanks Jeff Gilchrist)
- From Brian Gladman: added build fixes for the latest CUDA tools
- Modified the NFS square root to print out factors as they are
found (thanks Paul Leyland)
- Made the library report the current SVN revision, determined at
compile time. This should finally end the confusion about
exactly which revision of the demo binary is running
- Added the (current) linux CUDA include and library paths to
the Makefile (thanks Paul Leyland)
Version 1.48: 1/8/11
- Performed a massive overhaul of the stage 1 NFS polynomial
selection, with a huge amount of help from Jayson King.
Once this is tuned a little better, polynomial selection
should become massively faster, especially on CPUs.
The GPU code is much simpler and more flexible now too
- Added a fast MPI parallel all-against-all xor implementation
courtesy of Ilya Popovyan
- Added more cache size detection for Intel processors
- Added a fix to prevent potential overflow in the hashtable code
- Increased the maximum input size to 1024 bits
- Changes from Brian Gladman:
- Corrected a bug in Windows win32 inline assembler code
- Removed the unmaintained Visual Studioo 2008 build projects
- Updated Visual Studio 2010 CUDA build for NVIDIA Parallel
Nsight 1.5 and the CUDA 3.2 toolkit
Version 1.47: 9/18/10
- Fixed several bugs in the linear algebra (thanks Serge
Batalov and many mersenneforum testers)
- Patches from Jayson King: tune some of the choices for NFS
polynomial selection
- Patches from Serge Batalov: fix some portability problems
dealing with the hodgepodge of zlib versions everybody
has on their unix systems
- Added a little optimization for Fermi GPUs
- Patch from Brian Gladman: fix bad printf format string
- Fixed other format string problems introduced in v1.46
Version 1.46: 7/31/10
The polynomial selection work in this release has benefitted
greatly from a week-long visit to Paul Zimmermann's CARAMEL group
in Nancy, France
The MPI changes in this release were made possible by generous
support from a startup allocation of CPU time on the National
Science Foundation's Teragrid system (TG-DMS100013), courtesy
of Greg Childers at Cal State Fullerton
- NFS linear algebra changes:
- Added MPI support to the linear algebra. Still a work in
progress, but for large problems the speedup from using
many nodes of a parallel system is just incredible
- Added multithreading to the vector-vector operations
- Patch from Serge Batalov: use larger structures to represent
matrix blocks; the larger blocks that are possible make
matrix multiplies run noticeably faster at the expense of
needing more memory to represent the matrix
- Made the linear algebra use actual time measurements to
decide how often a checkpoint file gets written
- NFS poly selection changes:
- Added a high-performance root sieve for degree 6 problems
- Allowed stage 1 and stage 2 to be run separately,
with stage 1 output saved to file and read back in stage 2
- For large problems, added a global optimizer that runs
before the conventional size optimization step; this is
currently not used because it destroys the size property
of polynomials that leave stage 1 with a good score
- Performed an overhaul of the size optimization (thanks
Paul Zimmermann)
- When rating poynomials, remember the number of real roots
(thanks Paul Zimmermann)
- Patch from Jayson King: resolve a GPU race condition
- Patch from Jayson King: start overhaul of the arithmetic
progression generator
- Patches from Serge Batalov: use zlib to allow QS or NFS relations
to be compressed on unix systems
- Patch from Brian Gladman: add support for MSVC 10
- Reduced the number of clique removal passes when there is a large
amount of excess (thanks Greg Childers)
- Optimized the power detection code in the main driver (thanks axn)
- The demo binary now uses GMP 5.0.1 and GMP-ECM 6.3
Version 1.45: 4/21/10
- NFS poly selection changes:
- Merged the CPU and GPU branches more tightly, and centralized
much of the GPU handling code
- Added PTX inline assembly language for a small speedup
- Added specialized routines that make poly selection
for inputs < 135 digits about 35% faster
- Fixed some degree 5 synchronization issues (thanks
Jayson King)
- Tightened up the construction of arithmetic progressions
in stage 1 (thanks Jayson King)
- Added code to increase the size of host arrays when using
more powerful GPUs (thanks Paul Zimmermann)
- Added code to automatically randomize the search for
inputs that are large enough
- Made the cutoff E-value more aggressive for the largest
jobs (thanks Tom Womack / Paul Leyland / Greg Childers)
- Modified the linear algebra to write new checkpoint files first,
then overwrite the old checkpoint only if the write completed
(thanks Greg Childers)
- Cleaned up the makefile a bit
- Cleaned up the wording of the makefile usage
- Added code to delete the largest temporary file generated during
filtering (thanks Greg Childers)
Version 1.44: 2/5/10
- NFS poly selection changes:
- Added a branch that uses Nvidia GPU code. This makes stage 1
run enormously faster (my medium-end GPU is 35x faster than
one core of my medium-end CPU). The GPU branch also contains
root generation code that is critically needed to search for
degree-6 polynomials. Hopefully in future releases the
GPU and CPU code will be more tightly integrated (thanks
to many folks at Mersenneforum for helping to test this)
- Force the bounds checking for degree 4 to be more strict
than degree 5 or 6 (thanks Jayson King)
- Modified the main driver to avoid saving nonexistent
polynomials
- Modified the main driver to use stricter bounds for
accepting polynomials
- Fixed a bug in the relation reading code that caused crashes when
initializing empty dependencies (thanks Jeff Gilchrist)
- Fixed a bug reading factor base files without a trailing newline
(thanks Jayson King)
- Allowed NFS poly selection and sieving for inputs down to 80 digits
(thanks andi47)
Version 1.43: 10/18/09
- Made the GMP library mandatory
- Removed the arbitrary precision math library, replaced with GMP
- Modified the NFS relation handling code to allow 63-bit 'a'
values and 48-bit prime factors. The code now also performs
runlength-encoding on the list of prime factors
- Added 64-bit modmul operations to all builds
- Added checks that the input NFS number corresponds to the input
NFS polynomials (thanks Tom Womack)
- NFS square root changes:
- Optimized the brute force square root code, especially
when dealing with degree > 6 (thanks Serge Batalov)
- Modified the NFS relation reading code to ignore
relations that would not participate in dependencies anyway.
This also makes the NFS square root a bit simpler
- Modified the initialization to be a little more
robust (thanks andi47)
- Add support for degree 8 NFS polynomials (thanks Serge Batalov)
- Deleted the ancient unskewed NFS polynomial selector
- Removed now-unused double-double code
- Reset the number of relations and ideals properly when reading
LP relations with increasing max weight (thanks Greg Childers)
- Increased the input size limit to ~300 digits
- Adjusted the library link order in the makefile
- Added float.h for unix builds (thanks Christian Cornelssen)
Version 1.42: 7/19/09
- NFS polynomial selection changes:
- Added support for generating degree-4 and degree-6
GNFS polynomials, as well as some tuning for the degree-4
case. Degree 6 is not completely done (needs a better
root sieve and parameter selection)
- Added a tweak to the end of stage 1 that makes sure
most polynomials passed to stage 2 have third-to-last
coefficients that are small enough. This allows stage
2 to run on more candidates, especially when the leading
coefficient is large
- Added several internal checks
- Tweaked the stage 1 sieving parameters to make the inner
loops longer during large jobs
- Switched to exponential (not linear) interpolation
when choosing parameters (thanks Tom Womack)
- Added degree-5 parameters for very large jobs, up to
180 digits (thanks Tom Womack)
- Reduced the minimum NFS input size to just under 90 digits
- Reverted back to to using the L2-norm for the initial
multivariable optimization step in stage 2; using Bernstein's
scoring function is not sensitive enough
- Pushed the file for saved NFS polynomials into the skewed
poly selector, instead of making it globally visible in
general poly config structures. This also allows the file
to only be created if polynomial selection happens
- NFS filtering changes:
- Removed most of the NFS-specific singleton removal; instead,
write the large ideals to file and read that in sub-
sequent passes
- Made the disk-based singleton removal into common code
- Retool the filtering driver to make the minimum number of
disk passes, based on the available memory and the number
of relations to deal with
- Made the duplicate removal guess the best size of the
hashtable to use (saves lots of memory for big jobs)
- Performed a complete overhaul of the polynomial rootfinder,
replacing Laguerre's method with a simplified Jenkins-
Traub rootfinder combined with Newton polishing. This
will hopefully always work, even for the surprising
number of SNFS polynomials whose equimodular roots
completely foiled the original rootfinder (thanks to
Brian Gladman for a great deal of help)
- Changed the ECM driver to use the supported public interface
to GMP-ECM; also compiled the demo binary with GMP-4.3.1
and GMP-ECM 6.2.3
- Added an integrity check to the linear algebra, to detect when
the state gets corrupted (thanks Kazumaro Aoki)
- Modified the NFS relation parsing to allow relations with
factor lists that are empty, since the smallest factors
are not printed (thanks Serge Batalov)
- Improved the error handling of the multivariable minimization
and special-cased 1-dimensional minimizations
- Added many patches to the inline asm (thanks Alex Kruppa)
- Made most of the matrix file IO routines into common code
- Got started cleaning up the huge number of type conversion warnings
that gcc 4.4.0 emits
- Fixed a bug in the linear algebra that caused an immediate
write of a checkpoint on startup (thanks Serge Batalov)
- Fixed a potential register allocation bug in the linear
algebra (thanks Emmanuel Thomé)
- Made sure the search for QCB primes does not get too close
to 2^32 (thanks Tom Womack)
- Fixed a small memory leak in the integrator (thanks valgrind)
Version 1.41: 4/3/09
- Added an extra phase after the clique removal in the NFS filtering,
that deletes heavy relations until the specified excess
is achieved. I only expect this to be useful in the case of
extreme oversieving (thanks to Bruce Dodson for showing
how necessary this was, even for very large jobs)
- Added tweaks to the GMP conversion functions to account for
64-bit MSVC (thanks Brian Gladman)
- Added assembly language for 64-bit MSVC, for use with NFS
polynomial selection (thanks Brian Gladman / Jeff Gilchrist)
- Fixed a crash in NFS polynomial selection, that happens when
two products of small primes have very different size
(thanks Mikael Klasson)
- Made the polynomial rootfinder choose initial values away from
the origin (thanks to Al Edwards for a very pathological
SNFS polynomial)
- Set the default skewness to 1.0 in more places (thanks Tom Womack)
- Lowered the minimum size that's allowed to run GNFS
- Recompiled the demo binary to use GMP-ECM v6.2.2
Version 1.40: 3/24/09
- NFS polynomial selection changes:
- Added Murphy's scoring algorithm, expressed as a
numerical integration. The Murphy score is used as the
final measure of polynomial goodness, and is directly
comparable to the scores produced by the GGNFS tools
- Made the numerical integration code adaptive, and greatly
simplified it
- Added major changes to the stage 2 root sieve, which reduce
overhead and allow quick searching of extremely large search
spaces. This is required so that large inputs do not cause
the root sieve to literally take forever
- Made the polynomial rootfinder work in double-double
precision. This is neeeded to compute roots to full
double precision accuracy, preventing numerical instability
in Bernstein's algorithm
- Reduced some of the overhead in stage1 and added 64-bit
assembly language (much more to do here)
- Changed the initial stage 2 numerical optimization to only
select rotations, and to use Bernstein's scoring function
- Fixed a bug in the multivariable optimization and made
the solver into common code
- Added error bailout code to the poly rootfinder
- Changed the format of intermediate saved polynomials to
be compatible with GGNFS; this means an entry from
the ".p" file can be cut-n-pasted into the GGNFS tools
- Made lots of NFS utility functions, and most of the NFS filtering,
into common code, in preparation for an overhaul of the QS code
- Generalized the hashtable code to automatically grow the hash
array and to index arbitrary size structures. This is a
necessary first step for allowing NFS postprocessing to
scale beyond what it can handle now
- Modified the main driver to allow NFS on any size input, no matter
how small, if only the postprocessing is desired
- Added patches from Brian Gladman that allow the Lanczos
inline asm to work with MSVC Express (thanks Ben Buhrow)
- Added more Intel cache codes and better CPU identification
- Made NFS input ranges 64-bit numbers to deal with large leading
coefficients for NFS polynomial selection
- Switched to measuring CPU time when calculating deadlines or
elapsed time (thanks andi47)
- Added printing of elapsed time in each stage of NFS postprocessing,
for compatibility with GGNFS scripts (thanks Jo Yeung Uk)
- Inlined the modular inverse routines and added 64-bit versions of
several functions
- Fixed a typo when conditionally defining HAS_CMOV, and also
when turning on MMX and SSE for the QS code
- Cleaned up the MSVC project files (thanks Jeff Gilchrist)
- Tweaked some asm to compile correctly with gcc 4.x; also changed
the generic code branch of mp_mod{add|sub}_1
- Allowed the NFS filtering to limit the number of relations read in
Version 1.39: 12/1/08
- NFS polynomial selection changes:
- Rewrote stage 1 to use Kleinjung's new algorithm as
described at the CADO workshop. This is much simpler and
potentially can find better polynomials
- Major overhaul of the root sieve; this was badly needed
because Kleinjung's new algorithm generates enormous
search spaces for roots. The result is a lot more modular,
allows 64-bit rotations, uses lattice sieving to cover
large ranges, and searches arbitrarily-shaped intervals
without regard to their size
- Added error aborts to the stage 2 size minimization step
- Added the ability to specify coefficient ranges to search
via command line option in the demo binary, and also
enforced a time limit on polynomial search that is specified
in the driver
- Merged extensive patches from Brian Gladman that fix the use of
inline assembly language for MSVC and Intel's compiler, on
both Windows and Linux
- Corrected many long-standing errors in the x86 assembly language
- Modified the NFS filtering to not preemptively rerun clique
processing unless the merge phase proves that it has the
capacity to see more ideals (thanks Bruce Dodson)
- Allowed NFS polynomials up to degree 7 (thanks Serge Batalov)
- Reduced the number of quadratic characters to 20; this makes
the NFS linear algebra slightly more efficient
Version 1.38: 9/24/08
- Changed the NFS square root to only consider relations that appear
an odd number of times, and then only once. This reduces
the runtime and memory use of the entire square root by up
to a factor of two, though perhaps this depends on the
amount of oversieving (thanks Paul Zimmermann)
- Upgraded the library to expect GMP-ECM v6.2.1, and merged many
fixes by Christian Cornelssen to the GMP-ECM driver
- Fixed a bug enumerating polynomial coefficients in stage 1 of
the skewed NFS polynomial selection (thanks Patrick Stach)
- Changed the NFS filtering to test the density of the dataset
after clique removal and retry the later stages if the
density is too low (thanks Tom Womack)
- Changed the linear algebra to report the estimated time to
completion (thanks Serge Batalov)
- NFS sieving changes:
- Modified the factor base generation to correctly report
progress when the algebraic and rational bounds are different
- Increased the maximum size of primes used by the
batch factoring
- Changed the sieving to not quit in the middle of
a sieve line when interrupted
- Incremented the maximum size of numbers multiplied in the NFS
square root (thanks Paul Zimmermann / Alex Kruppa)
- Removed the extra sanity check added to the NFS square root in
the previous version; Paul Zimmermann rightly points out
that it treats a natural outcome of the square root as an error
- Fixed a bug initializing the savefile memory buffer
Version 1.37: 8/27/08
- Performed a significant overhaul of stage 1 in the NFS polynomial
selection (needs much more work)
- NFS filtering changes:
- Combined the duplicate removal with analysis of free
relations; now successive filtering runs will add more
free relations, and large primes have free relations chosen
for them independent of any factor base bound (thanks
to Paul Zimmermann for showing that extra free relations
make a noticeable difference in filtering quality)
- Overhauled the main filtering driver and the singleton
removal; now the singleton and clique passes are completely
separate and redundant operations are removed. The most
noticeable effect is that there is no clique processing
after the disk-based singleton removal
- Changed the merging to avoid favoring merges that cancel
out relations; this makes merging faster, and the merge
postprocessing phase deletes redundant relations anyway
- Fixed a small bug in the merge phase, and added a
little extra cleanup
- Added a limit on the size of cliques (thanks Hugo Platzer)
- Increased the effort expended to calculate the root score
for NFS polynomials, for p dividing the polynomial
discriminant (thanks Paul Zimmermann)
- Rearranged the main driver to print any factors found if it
is interrupted
- Fixed an inconsistency in the way the windows version of the
savefile reading code emulated end-of-file
- Allow the demo binary to optionally run at idle priority
(thanks Mark Rodenkirch)
- Made GNFS parameters more sensible (thanks Jo Yeong Uk)
- Fixed the win32 version of rint() (thanks Brian Gladman)
- Added a little extra paranoia to the NFS square root
- Increased the maximum input size to 275 digits; anyone
who pushes *this* limit is nuts
Version 1.36: 5/17/08
- NFS polynomial selection changes:
- Complete overhaul of the root sieve code. The result
will potentially work a lot harder, and does not depend
on the approximate measures of polynomial size that were
originally used
- Added many changes and bugfixes to the stage 2 multivariate
optimization
- Changed the objective function in the initial multivariate
optimization pass of stage 2 to match the rounding of
variables at the end. This prunes many more polynomials
- Added buffering of the most promising polynomial rotations
so that Bernstein's rating algorithm is applied much more
often than before (it's much faster than Murphy's algorithm
so we can afford to use it more)
- Linear algebra changes:
- Removed one instruction from one of the Lanczos
vector-vector inner loops
- Moved the allocation of per-thread memory into the
thread loop proper, to hopefully allocate node-local
memory on NUMA systems
- Added extra paranoia when calculating the number of threads
to spawn; also added code to compensate if the actual number
of threads is different
- Add extra safeguards when constructing the post-Lanczos
matrix in the beginning of the linear algebra
- Reduced the minimum matrix size for which blocking and
multithreading is turned on
- Added forgotten 64-bit windows define
- Improved the cache size detection to account for the latest Intel
and AMD processors (thanks Greg Childers)
- Changed the NFS filtering to increase the target matrix density,
since the improved filtering code generates much better
matrices in the presence of significant excess relations
- Added a forgotten patch to the MP library, in the non-assembly-
language branch (thanks Christian Cornelssen)
Version 1.35: 4/14/08
- NFS skewed polynomial selection changes:
- Replaced the rather simple stage 2 polynomial optimization
code with more advanced global and local methods from
Numerical Recipes (needs lots of cleanup)
- Improved the calculation of projective roots after the
initial stage 2 optimization pass
- Substituted Bernstein's algorithm in place of Murphy's
for the final rating of skewed polynomials
- Modified stage 1 to forward all polynomials to the
optimization code in stage 2; this centralizes the
optimization step for a small performance penalty
- Removed large amounts of now-redundant code
- Modified the size score to consider all complex roots
of the polynomial; this is necessary even if the imaginary
parts of the root have large absolute value
- Fixed a buffer overflow bug in the NFS square root, which was
actually a long division bug that only showed up in MSVC
(thanks Chad Davis and Sergey Miklin for debugging help)
- Modified the NFS filtering to reduce the filtering bound
if more than two passes are needed (thanks Chad Davis)
- Updated the MSVC build project to v9 and fixed several
porting errors in the previous release (thanks Brian Gladman,
Chad Davis, Sergey Miklin)
Version 1.34: 3/22/08
- Added a heavily modified version of the Franke/Kleinjung NFS
polynomial selection tools. Massive additional work is
still needed here
- NFS filtering changes:
- Reduce the number of singleton removal passes to two
(on average). The second pass now uses an extremely small
filtering bound, writes to disk, and reads back only the
ideals that occur rarely. This is a lot faster and more
robust, especially in the case of heavy oversieving
- Fine-tuned the choice of initial bound. The new algorithm
is less needlessly conservative
- Changed the hashing of (a,b) pairs in the duplicate removal
to hopefully avoid most false positives
- Changed the QS code to use signed_mp_t functions where appropriate.
This removes a number of silly hacks
- Forced the NFS square root to always use IEEE precision, after
taking a week to find out it was set incorrectly for a large
SNFS run (thanks Richard Wackerbarth)
- Added tuning of the size of reciprocals in the NFS square root,
to avoid wasting time on small inputs (thanks Chad Davis)
- Added initialization for NFS parameters when the input size exceeds
the pre-tabulated list (thanks Greg Childers)
- Modified the mp_mod{add|sub}_1 routines to be more efficient and
avoid potential register allocation errors (thanks Alex Kruppa)
- Linked the win32 demo binary to be large-address aware, allowing
3GB of memory to be used on windows machines (thanks _dc_)
Version 1.33: 1/13/08
- Centralized all the handling of savefiles, in order to abstract
away a bug fix that lets Windows binaries read and write
savefiles over 4GB in size
- Overhauled the computation of root score in the NFS polynomial
selection. The result is more accurate, deterministic,
and much faster
- NFS square root changes
- Modified the FFT library to change the number of bits
in convolution elements dynamically, which halves the
size of the convolution sometimes
- Adjusted the size of reciprocals computed during the
algebraic square root to be as large as possible
for a given size FFT
- Implemented a cap on the number of quadratic characters; this
allows caching of the computed values within the relation
structures, and speeds up building of the initial NFS matrix
(the code is cleaner too)
- NFS filtering changes:
- Changed the duplicate and singleton removal to save the
line numbers of relations to *skip*, not keep, when
performing the initial disk-based passes. This saves having
to write out huge disk files when most relations start
off being valid and not duplicates
- Changed the final disk-based singleton pass to dump
relations to disk and read into memory at the end of the
pass. This greatly reduces the maximum memory use of the
singleton removal
- Generalized the clique removal to be able to handle
really huge amounts of excess relations (thanks Chad Davis)
- Changed the clique removal to avoid bothering to do any
work if the number of cliques to delete is too small
- Updated the comments to reflect the fact that what this
deletes is not really cliques but just connected components
of a graph (thanks Alex Kruppa / Chris Card)
- Modified the line sieve parameter lists to allow asymmetric sieve
lines (thanks Tom Womack)
- Added reporting of memory use to the NFS merge phase and linear
algebra (the former is not very accurate, for some reason)
- Modified the NFS free relation building code to avoid having
to generate a factor base
- Restored the code that allowed separate rational and algebraic
filtering bounds
- Modified the NFS batch factoring to be much more careful checking
the size of factors found (thanks Tom Womack)
- Added logging of the amount of ECM work completed, and improved
the stopping conditions to account for multiple factors found
(thanks henryzz)
- Restored error recovery code that was mistakenly removed from
the linear algebra (thanks Cedric Vonck)
- Fixed a bug in the early-exit part of the new trial factoring code
(thanks Dennis Langdeau)
- Added more special cases for Intel's compiler
- Fine-tuned the checks for corrupt NFS relations (thanks Tom Womack)
- Modified the NFS initialization to not delete the savefile if
no sieving is specified (thanks Philippe Strohl)
- Switched to a consistent buffer size to receive lines from
the savefile; also increase the size to accomodate larger N
- Removed the silly approximations used when reporting the size of
inputs to the QS and NFS routines
- Increased the maximum input size to 255 digits
Version 1.32: 12/15/07
- Merged a patch from 'forzles' that changes the threading
implementation in the linear algebra to work using a
thread pool instead of spawn-and-join. This allows a
significant speedup on multi-core CPUs, sometimes 2 to
4 times the speedup from using extra threads compared to
v1.31
- Split out the main computational linear algebra routines from the
multithread framework that uses them; this cuts out 2/3
of the code that people optimizing the linear algebra need
to see in one file
- Changed the type assigned to uint32, to silence cygwin warnings
- Changed the relation-reading code to reject relations whose
(a,b) pair is zero (thanks Greg Childers)
Version 1.31: 12/13/07
- NFS sieving changes:
- Added experimental code to batch relations containing large
primes together and factor them all at once, using an
algorithm described by D.J. Bernstein
- Changed the sieving to dynamically choose whether relations
will have two or three large primes, with the choice re-
calculated for every sieve block. This item and the
previous one makes it possible to find relations with three
rational and/or algebraic large primes at modest extra
runtime cost, and increases the overall speed of line
sieving by 50-100% when the input is large enough. The
same techniques can apply to a future lattice siever
- Made the choice of cutoff for trial factoring relations
automatic
- Fixed a minor bug in the resieving code
- Modified the sieving to avoid printing factors < 256
- Added the ability to optionally compile and optionally run GMP-ECM
on all inputs, with a little tuning to attempt to minimize
the overall time
- Overhauled the Pollard Rho code; the new version is much more
efficient and configured to do more work in order to find
factors of expected size
- Overhauled the main driver to remove some hacks and some
unnecessary work
- Added a compile-time list of primes that several subsystems within
the library now use
- Removed trial factoring from the MPQS and NFS drivers
- Added early-out optimizations to the trial factoring
- Modified the tiny factoring code to revert to QS if SQUFOF
fails (thanks Dennis Langdeau), and always perform
Pollard Rho
- Added wrappers for all the C memory allocation functions that abort
if memory allocation failure is detected
- Added compile options that allow accessing large files on many
32-bit unix systems (thanks Greg Childers)
Version 1.30: 11/16/07
- NFS filtering changes:
- Modified the merge phase to use pairs of 32-bit words
to represent nonzero matrix elements, instead of a single
linked list structure with pointers. This simplifies
a lot of code, makes merging 30-50% faster for big jobs, and
reduces the memory use on 64-bit machines by over 60%
- Modified the merge phase to concatenate the lists
of relations and ideals within a relation set. This removes
half of the memory management during merges
- Removed legacy #defines from the duplicate removal
- Added checks for the Intel compiler, which can now understand
gcc inline assembly (thanks to Brian Gladman)
- Restored MSVC versions of the Lanczos inline assembly (thanks to
Brian Gladman)
- Added an extra check for zero polynomials in the factor base
root finder (thanks Hallstein Hansen)
- Fixed a typo in the main library driver (thanks Philip Mason)
- Modified the tiny factoring code to revert to Pollard Rho if
the SQUFOF or tiny QS routines fail (thanks to Philip Mason
for a test case)
- Made the numerical rootfinder work harder to find difficult roots
- Turned the breakover point for using the tiny factoring routines
into a library-wide #define
- Restored logging of the initial NFS matrix size; this was mistakenly
removed in the previous version
Version 1.29: 10/28/07
- Linear algebra changes:
- Added checkpoint and restart for large matrix sizes
- Added Montgomery's optimizations to reduce the number
of vector-vector operations
- Modified the matrix multiply assembly code for slightly
higher performance on 64-bit x86
- Pushed the dense matrix rows into the per-thread state,
so that multithreaded runs execute the dense row portion
of the matrix multiply in parallel
- Modified the test for iteration failure *again*, after
Greg Childers encountered it on a very large job
- NFS linear algebra changes:
- Performed an overhaul of the matrix-building code.
This makes the process much more modular and eliminates
unnecessary sorting. It also noticeably decreases memory use
- Modified the matrix building code to avoid needing a
complete factor base
- NFS filtering changes:
- Increased the array sizes used for the 2-way merges,
and added extra sanity checking (thanks Tom Womack)
- Modified 2-way merge phase to delete relation sets
that would never survive the full merge phase
- Allowed the filtering bound and number of relations read
to be passed in from applications. The latter is convenient
for exploring the filtering space and the former can
be used to correct suboptimal bounds that the library chooses
- Modified find_large_ideals() to use a single bound for large
ideals, not separate rational and algebraic bounds
- Changed the file format for NFS cycles. The new format takes up
slightly more disk space, but simplifies nfs_read_cycles and
removes some hacks that were necessary with the old format
- Modified nfs_read_cycles to optionally return just the cycles
and not the relations they need
- Switched the 'size' and 'number' arguments of most calls to
fread and fwrite
- Increased the maximum size of the FFT multiply
- Removed the warning printed for large inputs; most factorization
attempts for numbers > 150 digits are SNFS jobs, not RSA keys
(thanks Sander Hoogendorn)
Version 1.28: 10/3/07
- Fixed logging of factors to avoid trying to free a stack array
- Fixed 64-bit compile warnings
Version 1.27: 10/2/07
- Much more work on NFS polynomial selection (still not runnable)
- Linear algebra changes:
- Added special block-multiply code for the moderately
dense matrix rows. This reduces the memory use of the
linear algebra by about 15%
- Used 32-bit loads to grab pairs of array offsets at a
time within the matrix multiply. Since the number of
memory accesses is a bottleneck, the reduced number of
these provides a 1-2% speedup
- Always track the number of dimensions solved. This prevents
spurious failures when the solver runs on small matrices
- Aligned the vector-vector loops
- Modified the polynomial rootfinder to use different starting points
when looking for roots. This is required for many SNFS
polynomials whose derivatives vanish at the default start point
- Modified the driver to not log any factors if there is only
one composite factor found (i.e. the library was interrupted)
- Updated the default makefile target to state that gcc
on x86 must use the specialized make targets
- Raised the input size limit to 235 digits
Version 1.26: 8/2/07
- Began experiments on improved polynomial selection (nothing
is in a runnable state yet)
- Changed the scratch structure used in the NFS filtering
merge phase to use hardwired array sizes. This removes
a lot of needlessly complex code and prevents a
subtle buffer overflow (thanks Hallstein Hansen)
- Added signed_mp_{add|sub|mul|clear|copy}
Version 1.25: 6/27/07
- Changed the QS sieve core to empty MMX state before attempting
any trial factoring of sieve values. This had been broken
since v1.20, but the last version's changes to mp_iroot
were the first to make the bug fatal (thanks Miroslaw Kwasniak)
- Fixed comments in the numerical integrator
Version 1.24: 6/25/07
- Removed the dependency on the Gnu Scientific Library. The NFS
polynomial selector now is completely self-contained, and
hopefully will be able to handle any input polynomial
(GSL's numerical integrator sometimes fails). Many thanks
to Brian Gladman for lots of help doing this
- Added numerical integration code provided by Brian Gladman
- Added a polynomial rootfinder derived from Numerical Recipes
- Modified the NFS filtering driver to lower the filtering bound
and retry the merge phase if the matrix produced is too
sparse. This seems to be required in order for the linear
algebra to produce nontrivial dependencies sometimes
(thanks to Tom Womack for a very obstinate factorization)
- Changed the NFS filtering to hardwire the maximum number of
relations in a relation set
- Performed a major overhaul of mp_iroot
- Modified an error test in the linear algebra, after a report
from Hallstein Hansen that the error still occurs,
probably spuriously
- Fixed a possible buffer overflow in mp_mul (thanks valgrind)
- Fixed a serious memory leak in the cleanup after the merge
portion of NFS filtering (thanks valgrind)
- Fixed a memory leak in the trial division used by the NFS
driver. Also made trial division mandatory in the NFS
driver, since the factor base bound will always exceed the
trial factoring bound (thanks valgrind)
- Corrected a typo in the NFS siever (thanks Joppe Bos)
Version 1.23: 6/3/07
- Linear algebra changes:
- Made singleton removal alternate with deleting cliques
from the input matrix. This seems to be required when
the initial matrix is large and very sparse, otherwise
only trivial dependencies are found
- Made the iteration skip verifying that all columns are
used in the last iteration. I think that this causes
unnecessary restarts of the linear algebra
- Allowed for combine_cols to find zero dependencies
- NFS filtering changes:
- Allowed the singleton removal to grow the size of
hashtables used during disk-based passes
- Modified the filtering driver to more intelligently tune
the bound on large ideals when there is a very large
amount of excess, and/or when sieving uses 29+ bit
large primes
- Changed the singleton removal cutoffs to avoid doing the
last (pretty useless) disk-based pass most of the time
- Changed the singleton removal to delete the file from the
duplicate removal phase when finished with it
- Fixed a bug reading NFS relations (thanks Greg Childers)
Version 1.22: 5/27/07
- Linear algebra changes:
- Added simple multithreading
- Modified the dense matrix multiply to use blocks of 64
rows instead of 32. This is slightly faster, and allows
reusing the general vector-vector code from elsewhere
- Modified the matrix packing code to pack the matrix in
two passes instead of one; this saves an enormous number
of reallocations and significantly reduces the working
set size of the linear algebra
- Modified the driver to correctly choose when to pack
the matrix, and to correctly pack dense rows when no
post-Lanczos matrix is desired
- Fixed a memory leak freeing the packed matrix
- Cleaned up combine_cols
- Added more fixes for OS X Intel (thanks Romain Muguet)
- Forced the buffer size for ideals in a relation_lp_t to match
that of the buffer in a relation_t. This fixes a fairly
common NFS square root failure (thanks Hallstein Hansen)
- Fixed a bug in the factor base generator change from v1.21
(thanks Tom Womack and Jes Hansen)
- Made the Pentium 2 and 3 QS sieve cores have a 64kB block size
- Modified the expression evaluator to accept integers in octal or
hex format. Dear reverse engineers: it isn't that hard
- Added more sanity checking to NFS matrix construction
- Added more QS tuning from Bill Hart
- Fixed a typo in the tiny QS code (thanks Volturno)
Version 1.21: 5/11/07
- NFS filtering changes:
- Changed the merge phase to count the average weight
of cycles that would appear in the matrix, not the average
weight of all cycles found. This allows for a somewhat
smaller matrix when there is a large amount of excess
- Changed the filtering driver to iteratively rerun the
singleton removal with gradually smaller bounds. This is
much better at dynamically figuring out a bound for large
ideals that is small and can produce a sensible matrix
- Added back the code that buries ideals that are too
heavy. The current merging architecture makes it possible
to apply burying of ideals more intelligently than my
initial experiments, and burying ideals saves a lot of
memory as merging progresses
- Modified the singleton removal to use a larger hashtable
for the initial disk-based pruning passes, and also to
rebuild the hashtable after a pruning pass if it was
saturated going into the pruning pass
- Increased the maximum clique batch size, to save time
when filtering datasets that have a lot of excess
- Increased the number of excess columns in the final matrix;
maybe this will help prevent occaisional bad behavior
- Merged Brian Gladman's MSVC versions of the inline assembly code,
as well as his rearrangement of the unrolled Lanczos loops
- Merged Brian Gladman's latest MSVC build files
- Modified the NFS square root to handle inputs where there are no
q for which (algebraic polynomial) mod q is irreducible
- Changed the NFS linear algebra to assemble the matrix on disk
and read in the columns after relations are freed. This
reduces the memory use of the linear algebra by about 30%
- Fixed yet another bug in the NFS factor base generator, which
would sometimes miss roots of polynomials that would be
degree 1 after reduction by p (thanks to Hallstein Hansen)
- Documented the QS improvements
- Sharpened the detection of older and newer AMD processors
- Allowed for use of AMD's MMX extensions; this allows a slight QS
speedup when the full SSE instruction set is not available
- Added some QS tuning from Bill Hart
Version 1.20: 5/1/07
- Added support throughout the NFS module for relations with
64-bit 'a' values. This makes the postprocessing a little
slower, but finally allows msieve to complete any previously
started GGNFS run
- QS changes:
- Modified the makefile to create a 'fat binary', the same
core sieving routine compiled over and over again with
different optimizations and preprocessor directives. The
QS driver now selects the optimal routine at runtime,
and this makes several CPUs noticeably faster for small
and moderate size factorizations
- Used two different reciprocal values instead of one,
so that remainder operations involving small factor
base primes are faster now
- Modified the sizing up of the factor base into distinct
regions; this gives ranges of factor base primes that
use reciprocals, and also allows a large range where
remainder operations are essentially free (for primes
smaller than the cutoff for using hashtables)
- Increased the unrolling of the sieve scanning loop to 64
- Used multimedia vector instructions to scan the sieve
interval, if they are available. This makes small and
moderate size factorizations up to 5% faster
- When sizing MPQS polynomials, use the rounded up value
of the sieve interval instead of the original. Not doing
this means the sieve interval would be slightly lopsided,
making sieve values slightly larger on average
- Modified generation of reciprocals so that the quotient
from integer division is exact, i.e. doesn't need correcting