-
Notifications
You must be signed in to change notification settings - Fork 23
/
ChangeLog
226 lines (208 loc) · 8.15 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
2021-12-14 v2.5.0
Improvements:
* fix many bugs
* replace micro-architecture detection by march=native by default or user provided flags
* dump support of gcc-4.9.2 -> minversion is now gcc-5
* improved SIMD's
* autotools refactorization/uniformization with Givaro/LinBox
New Features:
* first support for quasiseparable matrices (Compact Bruhat generators)
* full featured sub-cubic fsyrk (C <- a. A x A^T + b C)
2019-06-07 v2.4.1
Minor updates:
* fix setnumthreads in pDet
* add README.md in distribution
* support for Hygon Dhyana
2019-05-10 v2.4.0
New features:
* fsytrf: a symmetric triangular factorization, revealign the RPM
* fsyrk, fsyr2k, ftrssyr2k, ftrstr: subroutines for symmetric operations
* support for AVX512 vectorization
* parallelization of fgemm-rns, fsytrf, echelon forms, det, rank, etc
* API for parallel routines outside of par-block (for e.g. SageMath)
Improvements:
* more examples
* more benchmarks: fgesv
* many bug fixes
* improved testsuite
* update to the Givaro's revamped modular fields
* improved igemm
* improved test coverage for SIMD
* improved charpoly
* improved freduce and consequently speed up most routines
2017-12-21 v2.3.2
Improvements:
* minor bug fixes in the build system and with GF2
* new specialization for fgemv over recint
2017-11-22 v2.3.1
Improvements:
* minor bug fixes in the build system
* improved cblas/fblas detection and use
2017-11-17 v2.3.0
Improvements:
* improved build system (instruction set detection, C++11 and clang compatibility, ...)
* improved fttrtri (triangular matrix inverse)
* increased test-suite coverage
* more autotuning
* clean-up and update all random matrix generator so they can be seeded.
* clean-up the test-suite and enable seeding parameter
* many bug fixes (and merging sage patches)
New features:
* new pfgemv routine (parallel matrix vector product)
* new fpotrf routine (Cholesky factorization) and symmetric rand generator
* new tutorials
* Gauss-Jordan inverse made to work
Changes in API
* change signature for CharPoly (now takes a polynomial domain as input)
* change the signature of ftrtrm
2016-07-30 v2.2.2
* many bug fixes ensuring a consistent support of clang, gcc-4.8 5.3 6.1
icpc on i386 x86_64, ubuntu and fedora, ppcle and osx
* new SIMD detection
* use pkgconfig
* new feature: checkers for Freivalds based verification
* improved performance of permutation application
2016-04-08 v2.2.1
* many fixes to the build system
* more consistent use of flags and dependency to precompiled code
* fixes all remaining issues for the integration in SageMath
* numerous minor fixes to the parallel code
2016-02-23 v2.2.0
* new precompiled interface
* improvements and API change for the parallel code
* new random matrix generators
* fix many bugs
2015-06-11 v2.1.0
Test suite and benchmark improvement :
* much larger coverage
* run most tests over a wide range of fields
* systematic interface and options
New features:
* parallel PLUQ
* computation of rank profiles and rank profile matrices
* echelon and reduced echelon forms form both LUdivine and PLUQ
* getters to the forms and the transformation matrices
* igemm routine for BLAS like gemm on 64bits ints
* support of Modular<int64_t> and ModularBalanced<int64_t> using igemm,
to support fields of bitsize between 25 and 31
* support of Modular<rint<K> > for Z/pZ with p of size > 32bits (based
on Givaro's RecInt multiprecision integers)
* support of RNS based gaussian elimination on multiprecision fields
* Paladin: DSL for parallel programming adressing OMP, TBB and Kaapi
Improvements:
* a lot of new sparse mat-vec product improvements
* faster parallel and sequential fgemm
* many bugs found and removed (no known bugs at release time)
* improved helper system, with mode of operations
2014-08-08 v2.0.0
code update :
* rank profile
* clean namespaces
* use field one, zero, etc
* fix clang warnings
* more blas wrappers (sger, sdot, copy, etc)
* simplification of fgemm
* simplify blas detection (+cflags)
* easier permutation handling
* improve testers
* use std::min, max
* many functions have API change to use last pointer argument for return
* some more doc
* and probably many more in 2+ years !
bugs :
* correct permutations
* fix fgemm, fgemv, ftrmm, ftrsm bugs
* mem leaks
* bugs for degenerate cases
* fix bounds
* and probably many more in 2+ years !
new features :
* new pluq 2x2 recursive alg
* leftlooking
* parallel OMP fgemm, ftrmm, ftrsm
* parallel KAAPI fgemm, ftrmm, ftrsm
* new testers for pluq, fgemm, etc
* new tester for Bini approximate formula
* fadd, fsub, finit, fscal, etc
* vectorisation using AVX(2)
* in place schedules
* new Echelon code
* helper design for fgemm, fgemv, etc
* template factorisation for modular/multiprecision fields
* helper traits
* automatic matrix field conversion (ie double -> float)
* add spmv kernels
* enable use of sparse MKL
* parallel.h, avx and simd files
* new DSL for parallelism
* RNS and multiprecision fields
* new const_cast, fflas_new etc functions
* element_ptr in fields
* use Givaro dependency (compulsory now)
* new test for regressions (with tickets)
* and probably many more in 2+ years !
2011-04-15 v1.4.0
* Convert project to autotools (à la LinBox et Givaro)
2008-06-05 v1.3.3
* fix the design of specializations to modular<double> modular<float>
* give a proper name to ModularBalanced
* fix the bugs in the bound computations (Winograd recursion over the
finite field was too deep)
* prepare the interface for integrating compressed representation for
small finite fields
2007-09-28 v1.3.2
* add routines fgetrs and fgesv (cf LAPACK), for system solving.
supports rectangular, over/underdetermined systems.
2007-08-29 v1.3.1
* add the benchmark directory, for automatic benchmarking against GOTO
and ATLAS BLAS. Adapted from Pascal Giorgi's benchmark system.
2007-08-28 v1.3.0
* new version of ftrmm ftrsm: ftrsm based on a multicascade algorithm
reducing the number of modular reductions). Automated generation of each
of the 48 specializations
* several bug fixes
* add regression tests: testeur_fgemm, testeur_lqup and testeur_ftrsm
2007-07-05 v1.2.2
* add a transposed version of the LQUP decomposition routine
LUdivine
* fix many bugs in LUdivine
* new schedules for Winograd algorithm for matrix multiplication:
2 cases depending whether beta = 0 or not, taken form [Huss
Ledermann & Al. 96]
* add rowEchelon and ReducedRowEchelon routines + associated tests
2007-06-21 v1.2.1
* add the use of float BLAS, if the field caradinality is small enough
* improve genericity: gemm can be use over any field domain (not
requiring any conversion to a integral representation)
* add a variant of Winograd's algorithm with less temporaries for
the operation C = AxB
* add ColumnEchelon and ReducedColumnEchelon routines, using an
inplace algorithm, based on the LQUP decompositon of LUdivine
* add routines ftrtri (replacing invL), ftrtrm.
* fix bunch of memory leaks in the tests (not yet finished)
2007-03-13 v1.1.2
* change the genericity system for trsm to detect Field
implementations over double (compatibility with LinBox)
2007-03-11 v1.1.1
* complete preconditioning phase for the new Charpoly algorithm
* new Charpoly algorithm renamed CharpolyArithProg
* add exception for failure of the LasVegas algrithm
* default charpoly is now: 2 attempts to CharpolyArithProg, then LUKrylov
2007-02-27 v1.1.0
* change some naming conventions in the directories
* add a LQUP routine for small dimension (LUdivine_small) and the
cascading with LUdivine
* put the bound computations in the same file
* add dense_generator.C for the generation of random dense
matrices in tests
* add the new algorithm for characteristic polynomial (temporarily
named frobenius)
2006-08-11 v1.0.1
* add the field implementation modular-positive.h, especially for
p=2
* add a the flag 'balanced' to the finite fields modular<double>,
to switch to the apropriate bound computation (fgemm and trsm)
* fix a bug in LUDivine LQUP elimination (initialisation of the
permutation P for N=1 in the terminal case)
* fix a bug in the determination of the number of recursive levels
of Winograd Algorithm.