ChangeLog

2021-12-14 v2.5.0
	Improvements:
	* fix many bugs
	* replace micro-architecture detection by march=native by default or user provided flags
	* dump support of gcc-4.9.2 -> minversion is now gcc-5
	* improved SIMD's
	* autotools refactorization/uniformization with Givaro/LinBox
	New Features:
	* first support for quasiseparable matrices (Compact Bruhat generators)
	* full featured sub-cubic fsyrk (C <- a. A x A^T + b C)
2019-06-07 v2.4.1
	Minor updates:
	* fix setnumthreads in pDet
	* add README.md in distribution
	* support for Hygon Dhyana
2019-05-10 v2.4.0
	New features:
	* fsytrf: a symmetric triangular factorization, revealign the RPM
	* fsyrk, fsyr2k, ftrssyr2k, ftrstr: subroutines for symmetric operations
	* support for AVX512 vectorization
	* parallelization of fgemm-rns, fsytrf, echelon forms, det, rank, etc
	* API for parallel routines outside of par-block (for e.g. SageMath)
	Improvements:
	* more examples
	* more benchmarks: fgesv
	* many bug fixes
	* improved testsuite
	* update to the Givaro's revamped modular fields
	* improved igemm
	* improved test coverage for SIMD
	* improved charpoly
	* improved freduce and consequently speed up most routines

2017-12-21 v2.3.2
	Improvements:
	* minor bug fixes in the build system and with GF2
	* new specialization for fgemv over recint

2017-11-22 v2.3.1
	Improvements:
	* minor bug fixes in the build system
	* improved cblas/fblas detection and use

2017-11-17 v2.3.0
	Improvements:
	* improved build system (instruction set detection, C++11 and clang compatibility, ...)
	* improved fttrtri (triangular matrix inverse)
	* increased test-suite coverage
	* more autotuning
	* clean-up and update all random matrix generator so they can be seeded.
	* clean-up the test-suite and enable seeding parameter
	* many bug fixes (and merging sage patches)

	New features:
	* new pfgemv routine (parallel matrix vector product)
	* new fpotrf routine (Cholesky factorization) and symmetric rand generator
	* new tutorials
	* Gauss-Jordan inverse made to work

	Changes in API
	* change signature for CharPoly (now takes a polynomial domain as input)
	* change the signature of ftrtrm

2016-07-30 v2.2.2
	* many bug fixes ensuring a consistent support of clang, gcc-4.8 5.3 6.1
	icpc on i386 x86_64, ubuntu and fedora, ppcle and osx
	* new SIMD detection
	* use pkgconfig
	* new feature: checkers for Freivalds based verification
	* improved performance of permutation application
2016-04-08 v2.2.1
	* many fixes to the build system
	* more consistent use of flags and dependency to precompiled code
	* fixes all remaining issues for the integration in SageMath
	* numerous minor fixes to the parallel code

2016-02-23 v2.2.0
	* new precompiled interface
	* improvements and API change for the parallel code
	* new random matrix generators
	* fix many bugs

2015-06-11 v2.1.0

	Test suite and benchmark improvement :
	* much larger coverage
	* run  most tests over a wide range of fields
	* systematic interface and options

	New features:
	* parallel PLUQ
	* computation of rank profiles and rank profile matrices
	* echelon and reduced echelon forms form both LUdivine and PLUQ
	* getters to the forms and the transformation matrices
	* igemm routine for BLAS like gemm on 64bits ints
	* support of Modular<int64_t> and ModularBalanced<int64_t> using igemm,
	to support fields of bitsize between 25 and 31
	* support of Modular<rint<K> > for Z/pZ with p of size > 32bits (based
	on Givaro's RecInt multiprecision integers)
	* support of RNS based gaussian elimination on multiprecision fields
	* Paladin: DSL for parallel programming adressing OMP, TBB and Kaapi

	Improvements:
	* a lot of new sparse mat-vec product improvements
	* faster parallel and sequential fgemm
	* many bugs found and removed (no known bugs at release time)
	* improved helper system, with mode of operations

2014-08-08 v2.0.0
	code update :
        * rank profile
	* clean namespaces
	* use field one, zero, etc
	* fix clang warnings
	* more blas wrappers (sger, sdot, copy, etc)
	* simplification of fgemm
	* simplify blas detection (+cflags)
	* easier permutation handling
	* improve testers
	* use std::min, max
	* many functions have API change to use last pointer argument for return
	* some more doc
	* and probably many more in 2+ years !

	bugs :
	* correct permutations
	* fix fgemm, fgemv, ftrmm, ftrsm bugs 
	* mem leaks
	* bugs for degenerate cases
	* fix bounds
	* and probably many more in 2+ years !

	new features :
	* new pluq 2x2 recursive alg
	* leftlooking
	* parallel OMP fgemm, ftrmm, ftrsm
	* parallel KAAPI fgemm, ftrmm, ftrsm
	* new testers for pluq, fgemm, etc
	* new tester for Bini approximate formula
	* fadd, fsub, finit, fscal, etc
	* vectorisation using AVX(2)
	* in place schedules
	* new Echelon code
	* helper design for fgemm, fgemv, etc
	* template factorisation for modular/multiprecision fields
	* helper traits
	* automatic matrix field conversion (ie double -> float)
	* add spmv kernels
	* enable use of sparse MKL
	* parallel.h, avx and simd files
	* new DSL for parallelism
	* RNS and multiprecision fields
	* new const_cast, fflas_new etc functions
	* element_ptr in fields
	* use Givaro dependency (compulsory now)
	* new test for regressions (with tickets)
	* and probably many more in 2+ years !
	
2011-04-15 v1.4.0
        * Convert project  to autotools (à la LinBox et Givaro)
2008-06-05 v1.3.3
	* fix the design of specializations to modular<double> modular<float>
	* give a proper name to ModularBalanced
	* fix the bugs in the bound computations (Winograd recursion over the
	finite field was too deep)
	* prepare the interface for integrating compressed representation for
	small finite fields
2007-09-28 v1.3.2
	* add routines fgetrs and fgesv (cf LAPACK), for system solving.
	  supports rectangular, over/underdetermined systems.
2007-08-29 v1.3.1
	* add the benchmark directory, for automatic benchmarking against GOTO
	and ATLAS BLAS. Adapted from Pascal Giorgi's benchmark system.
2007-08-28 v1.3.0
	* new version of ftrmm ftrsm: ftrsm based on a multicascade algorithm
	reducing the number of modular reductions). Automated generation of each
	of the 48 specializations 
	* several bug fixes
	* add regression tests: testeur_fgemm, testeur_lqup and testeur_ftrsm
2007-07-05 v1.2.2
	* add a transposed version of the LQUP decomposition routine
	LUdivine
	* fix many bugs in LUdivine
	* new schedules for Winograd algorithm for matrix multiplication:
	2 cases depending whether beta = 0 or not, taken form [Huss
	Ledermann & Al. 96]
	* add rowEchelon and ReducedRowEchelon routines + associated tests
2007-06-21 v1.2.1
	* add the use of float BLAS, if the field caradinality is small enough
	* improve genericity: gemm can be use over any field domain (not
	requiring any conversion to a integral representation)
	* add a variant of Winograd's algorithm with less temporaries for
	the operation C = AxB
	* add ColumnEchelon and ReducedColumnEchelon routines, using an
	inplace algorithm, based on the LQUP decompositon of LUdivine
	* add routines ftrtri (replacing invL), ftrtrm.
	* fix bunch of memory leaks in the tests (not yet finished)
2007-03-13 v1.1.2
	* change the genericity system for trsm to detect Field
	implementations over double (compatibility with LinBox)
2007-03-11 v1.1.1
	* complete preconditioning phase for the new Charpoly algorithm
	* new Charpoly algorithm renamed CharpolyArithProg
	* add exception for failure of the LasVegas algrithm
	* default charpoly is now: 2 attempts to CharpolyArithProg, then LUKrylov

2007-02-27 v1.1.0
	* change some naming conventions in the directories
	* add a LQUP routine for small dimension (LUdivine_small) and the
	cascading with LUdivine
	* put the bound computations in the same file
	* add dense_generator.C for the generation of random dense
	matrices in tests
	* add the new algorithm for characteristic polynomial (temporarily
	named frobenius)
	
2006-08-11 v1.0.1
	* add the field implementation modular-positive.h, especially for
	p=2
	* add a the flag 'balanced' to the finite fields modular<double>,
	to switch to the apropriate bound computation (fgemm and trsm)
	* fix a bug in LUDivine LQUP elimination (initialisation of the
	permutation P for N=1 in the terminal case)
	* fix a bug in the determination of the number of recursive levels
	of Winograd Algorithm.