forked from eSlider/solr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
LUCENE_CHANGES.txt
11043 lines (8254 loc) · 493 KB
/
LUCENE_CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Lucene Change Log
For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions
======================= Lucene 4.10.4 ======================
Bug fixes
* LUCENE-6019, LUCENE-6117: Remove -Dtests.assert to make IndexWriter
infoStream sane. (Robert Muir, Mike McCandless)
* LUCENE-6161: Resolving deletes was failing to reuse DocsEnum likely
causing substantial performance cost for use cases that frequently
delete old documents (Mike McCandless)
* LUCENE-6192: Fix int overflow corruption case in skip data for
high frequency terms in extremely large indices (Robert Muir, Mike
McCandless)
* LUCENE-6207: Fixed consumption of several terms enums on the same
sorted (set) doc values instance at the same time.
(Tom Shally, Robert Muir, Adrien Grand)
* LUCENE-6093: Don't throw NullPointerException from
BlendedInfixSuggester for lookups that do not end in a prefix
token. (jane chang via Mike McCandless)
* LUCENE-6279: Don't let an abusive leftover _N_upgraded.si in the
index directory cause index corruption on upgrade (Robert Muir, Mike
McCandless)
* LUCENE-6287: Fix concurrency bug in IndexWriter that could cause
index corruption (missing _N.si files) the first time 4.x kisses a
3.x index if merges are also running. (Simon Willnauer, Mike
McCandless)
* LUCENE-6205: Fixed intermittent concurrency issue that could cause
FileNotFoundException when writing doc values updates at the same
time that a merge kicks off. (Mike McCandless)
* LUCENE-6214: Fixed IndexWriter deadlock when one thread is
committing while another opens a near-real-time reader and an
unrecoverable (tragic) exception is hit. (Simon Willnauer, Mike
McCandless)
* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is
small, or if the cache would be > 20% of the size of the FST.
(Robert Muir, Mike McCandless)
* LUCENE-6001: DrillSideways hits NullPointerException for certain
BooleanQuery searches. (Dragan Jotannovic, jane chang via Mike
McCandless)
* LUCENE-6306: Merging of doc values and norms now checks whether the
merge was aborted so IndexWriter.rollback can more promptly abort a
running merge. (Robert Muir, Mike McCandless)
API Changes
* LUCENE-6212: Deprecate IndexWriter APIs that accept per-document Analyzer.
These methods were trappy as they made it easy to accidentally index
tokens that were not easily searchable and will be removed in 5.0.0.
(Mike McCandless)
======================= Lucene 4.10.3 ======================
Bug fixes
* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has
an exponential worst case) so that if it would create too many states, it
now throws an exception instead of exhausting CPU/RAM. (Nik
Everett via Mike McCandless)
* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via
Mike McCandless)
* LUCENE-6049: Don't throw cryptic exception writing a segment when
the only docs in it had fields that hit non-aborting exceptions
during indexing but also had doc values. (Mike McCandless)
* LUCENE-6060: Deprecate IndexWriter.unlock (Simon Willnauer, Mike
McCandless)
* LUCENE-3229: Overlapping ordered SpanNearQuery spans should not match.
(Ludovic Boutros, Paul Elschot, Greg Dearing, ehatcher)
* LUCENE-6004: Don't highlight the LookupResult.key returned from
AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless)
* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes
via Mike McCandless)
* LUCENE-5980: Don't let document length overflow. (Robert Muir)
* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases,
such as when nested inside a boolean query. (Denis Lantsman via Robert Muir)
* LUCENE-5948: RateLimiter now fully inits itself on init. (Varun
Thacker via Mike McCandless)
* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying
bytes. (Shai Erera)
* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even
when it's stalling because there are too many merges. (Mike McCandless)
Documentation
* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless)
======================= Lucene 4.10.2 ======================
Bug fixes
* LUCENE-5977: Fix tokenstream safety checks in IndexWriter to properly
work across multi-valued fields. Previously some cases across multi-valued
fields would happily create a corrupt index. (Dawid Weiss, Robert Muir)
* LUCENE-6019: Detect when DocValuesType illegally changes for the
same field name. Also added -Dtests.asserts=true|false so we can
run tests with and without assertions. (Simon Willnauer, Robert
Muir, Mike McCandless).
======================= Lucene 4.10.1 ======================
Bug fixes
* LUCENE-5934: Fix backwards compatibility for 4.0 indexes.
(Ian Lea, Uwe Schindler, Robert Muir, Ryan Ernst)
* LUCENE-5939: Regenerate old backcompat indexes to ensure they were built with
the exact release
(Ryan Ernst, Uwe Schindler)
* LUCENE-5952: Improve error messages when version cannot be parsed;
don't check for too old or too new major version (it's too low level
to enforce here); use simple string tokenizer. (Ryan Ernst, Uwe Schindler,
Robert Muir, Mike McCandless)
* LUCENE-5958: Don't let exceptions during checkpoint corrupt the index.
Refactor existing OOM handling too, so you don't need to handle OOM special
for every IndexWriter method: instead such disasters will cause IW to close itself
defensively. (Robert Muir, Mike McCandless)
* LUCENE-5904: Fixed a corruption case that can happen when 1)
IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2)
on startup, when a new IndexWriter is created, a virus checker is
holding some of the previously written but unused files open and
preventing deletion, 3) IndexWriter writes these files again during
the course of indexing, then the files can later be deleted, causing
corruption. This case was detected by adding evilness to
MockDirectoryWrapper to have it simulate a virus checker holding a
file open and preventing deletion (Robert Muir, Mike McCandless)
* LUCENE-5916: Static scope test components should be consistent between
tests (and test iterations). Fix for FaultyIndexInput in particular.
(Dawid Weiss)
* LUCENE-5975: Fix reading of 3.0-3.3 indexes, where bugs in these old
index formats would result in CorruptIndexException "did not read all
bytes from file" when reading the deleted docs file. (Patrick Mi, Robert MUir)
Tests
* LUCENE-5936: Add backcompat checks to verify what is tested matches known versions
(Ryan Ernst)
======================= Lucene 4.10.0 ======================
New Features
* LUCENE-5778: Support hunspell morphological description fields/aliases.
(Robert Muir)
* LUCENE-5801: Added (back) OrdinalMappingAtomicReader for merging search
indexes that contain category ordinals from separate taxonomy indexes.
(Nicola Buso via Shai Erera)
* LUCENE-4175, LUCENE-5714, LUCENE-5779: Index and search rectangles with spatial
BBoxSpatialStrategy using most predicates. Sort documents by relative overlap
of query areas or just by indexed shape area. (Ryan McKinley, David Smiley)
* LUCENE-5806: Extend expressions grammar to support array access in variables.
Added helper class VariableContext to parse complex variable into pieces.
(Ryan Ernst)
* LUCENE-5826: Support proper hunspell case handling, LANG, KEEPCASE, NEEDAFFIX,
and ONLYINCOMPOUND flags. (Robert Muir)
* LUCENE-5815: Add TermAutomatonQuery, a proximity query allowing you
to create an arbitrary automaton, using terms on the transitions,
expressing which sequence of sequential terms (including a special
"any" term) are allowed. This is a generalization of
MultiPhraseQuery and span queries, and enables "correct" (including
position) length search-time graph synonyms. (Mike McCandless)
* LUCENE-5819: Add OrdsLucene41 block tree terms dict and postings
format, to include term ordinals in the index so the optional
TermsEnum.ord() and TermsEnum.seekExact(long ord) APIs work. (Mike
McCandless)
* LUCENE-5835: TermValComparator can sort missing values last. (Adrien Grand)
* LUCENE-5825: Benchmark module can use custom postings format, e.g.:
codec.postingsFormat=Memory (Varun Shenoy, David Smiley)
* LUCENE-5842: When opening large files (where its to expensive to compare
checksum against all the bytes), retrieve checksum to validate structure
of footer, this can detect some forms of corruption such as truncation.
(Robert Muir)
* LUCENE-5739: Added DataInput.readZ(Int|Long) and DataOutput.writeZ(Int|Long)
to read and write small signed integers. (Adrien Grand)
API Changes
* LUCENE-5752: Simplified Automaton API to be immutable. (Mike McCandless)
* LUCENE-5793: Add equals/hashCode to FieldType. (Shay Banon, Robert Muir)
* LUCENE-5692: DisjointSpatialFilter is deprecated (used by RecursivePrefixTreeStrategy)
(David Smiley)
* LUCENE-5771: SpatialOperation's predicate names are now aliased to OGC standard names.
Thus you can use: Disjoint, Equals, Intersects, Overlaps, Within, Contains, Covers,
CoveredBy. The area requirement on the predicates was removed, and Overlaps' definition
was fixed. (David Smiley)
* LUCENE-5850: Made Version handling more robust and extensible. Deprecated
Constants.LUCENE_MAIN_VERSION, Constants.LUCENE_VERSION and current Version
constants of the form LUCENE_X_Y. Added version constants that include bugfix
number of form LUCENE_X_Y_Z. Changed Version.LUCENE_CURRENT to Version.LATEST.
CheckIndex now prints the Lucene version used to write each segment.
(Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-5836: BytesRef has been splitted into BytesRef, whose intended usage is
to be just a reference to a section of a larger byte[] and BytesRefBuilder
which is a StringBuilder-like class for BytesRef instances. (Adrien Grand)
* LUCENE-5883: You can now change the MergePolicy instance on a live IndexWriter,
without first closing and reopening the writer. This allows to e.g. run a special
merge with UpgradeIndexMergePolicy without reopening the writer. Also, MergePolicy
no longer implements Closeable; if you need to release your custom MegePolicy's
resources, you need to implement close() and call it explicitly. (Shai Erera)
* LUCENE-5859: Deprecate Analyzer constructors taking Version. Use Analyzer.setVersion()
to set the version an analyzer to replicate behavior from a specific release.
(Ryan Ernst, Robert Muir)
Optimizations
* LUCENE-5780: Make OrdinalMap more memory-efficient, especially in case the
first segment has all values. (Adrien Grand, Robert Muir)
* LUCENE-5782: OrdinalMap now sorts enums before being built in order to
improve compression. (Adrien Grand)
* LUCENE-5798: Optimize MultiDocsEnum reuse. (Robert Muir)
* LUCENE-5799: Optimize numeric docvalues merging. (Robert Muir)
* LUCENE-5797: Optimize norms merging (Adrien Grand, Robert Muir)
* LUCENE-5803: Add DelegatingAnalyzerWrapper, an optimized variant
of AnalyzerWrapper that doesn't allow to wrap components or readers.
This wrapper class is the base class of all analyzers that just delegate
to another analyzer, e.g. per field name: PerFieldAnalyzerWrapper and
Solr's schema support. (Shay Banon, Uwe Schindler, Robert Muir)
* LUCENE-5795: MoreLikeThisQuery now only collects the top N terms instead
of collecting all terms from the like text when building the query.
(Alex Ksikes, Simon Willnauer)
* LUCENE-5681: Fix RAMDirectory's IndexInput to not do double buffering
on slices (causes useless data copying, especially on random access slices).
This also improves slices of NRTCachingDirectory, because the cache
is based on RAMDirectory. BufferedIndexInput.wrap() was marked with a
warning in javadocs. It is almost always a better idea to implement
slicing on your own! (Uwe Schindler, Robert Muir)
* LUCENE-5834: Empty sorted set and numeric doc values are now singletons.
(Adrien Grand)
* LUCENE-5841: Improve performance of block tree terms dictionary when
assigning terms to blocks. (Mike McCandless)
* LUCENE-5856: Optimize Fixed/Open/LongBitSet to remove unnecessary AND.
(Robert Muir)
* LUCENE-5884: Optimize FST.ramBytesUsed. (Adrien Grand, Robert Muir,
Mike McCandless)
* LUCENE-5882: Add Lucene410DocValuesFormat, with faster term lookups
for SORTED/SORTED_SET fields. (Robert Muir)
* LUCENE-5887: Remove WeakIdentityMap caching in AttributeFactory,
AttributeSource, and VirtualMethod in favour of Java 7's ClassValue.
Always use MethodHandles to create AttributeImpl classes.
(Uwe Schindler)
Bug Fixes
* LUCENE-5796: Fixes the Scorer.getChildren() method for two combinations
of BooleanQuery. (Terry Smith via Robert Muir)
* LUCENE-5790: Fix compareTo in MutableValueDouble and MutableValueBool, this caused
incorrect results when grouping on fields with missing values.
(海老澤 志信, hossman)
* LUCENE-5817: Fix hunspell zero-affix handling: previously only zero-strips worked
correctly. (Robert Muir)
* LUCENE-5818, LUCENE-5823: Fix hunspell overgeneration for short strings that also
match affixes, words are only stripped to a zero-length string if FULLSTRIP option
is specifed in the dictionary. (Robert Muir)
* LUCENE-5824: Fix hunspell 'long' flag handling. (Robert Muir)
* LUCENE-5838: Fix hunspell when the .aff file has over 64k affixes. (Robert Muir)
* LUCENE-5869: Added restriction to positive values for maxExpansions in
FuzzyQuery. (Ryan Ernst)
* LUCENE-5672: IndexWriter.addIndexes() calls maybeMerge(), to ensure the index stays
healthy. If you don't want merging use NoMergePolicy instead. (Robert Muir)
* LUCENE-5908: Fix Lucene43NGramTokenizer to be final
Test Framework
* LUCENE-5786: Unflushed/ truncated events file (hung testing subprocess).
(Dawid Weiss)
* LUCENE-5881: Add "beasting" of tests: repeats the whole "test" Ant target
N times with "ant beast -Dbeast.iters=N". (Uwe Schindler, Robert Muir,
Ryan Ernst, Dawid Weiss)
Build
* LUCENE-5770: Upgrade to JFlex 1.6, which has direct support for
supplementary code points - as a result, ICU4J is no longer used
to generate surrogate pairs to augment JFlex scanner specifications.
(Steve Rowe)
* SOLR-6358: Remove VcsDirectoryMappings from idea configuration
vcs.xml (Ramkumar Aiyengar via Steve Rowe)
======================= Lucene 4.9.1 ======================
Bug fixes
* LUCENE-5907: Fix corruption case when opening a pre-4.x index with
IndexWriter, then opening an NRT reader from that writer, then
calling commit from the writer, then closing the NRT reader. This
case would remove the wrong files from the index leading to a
corrupt index. (Mike McCandless)
* LUCENE-5919: Fix exception handling inside IndexWriter when
deleteFile throws an exception, to not over-decRef index files,
possibly deleting a file that's still in use in the index, leading
to corruption. (Mike McCandless)
* LUCENE-5922: DocValuesDocIdSet on 5.x and FieldCacheDocIdSet on 4.x
are not cacheable. (Adrien Grand)
* LUCENE-5843: Added IndexWriter.MAX_DOCS which is the maximum number
of documents allowed in a single index, and any operations that add
documents will now throw IllegalStateException if the max count
would be exceeded, instead of silently creating an unusable
index. (Mike McCandless)
* LUCENE-5844: ArrayUtil.grow/oversize now returns a maximum of
Integer.MAX_VALUE - 8 for the maximum array size. (Robert Muir,
Mike McCandless)
* LUCENE-5827: Make all Directory implementations correctly fail with
IllegalArgumentException if slices are out of bounds. (Uwe SChindler)
* LUCENE-5897, LUCENE-5400: JFlex-based tokenizers StandardTokenizer and
UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of
text partially matching certain grammar rules. The scanner default
buffer size was reduced, and scanner buffer growth was disabled, resulting
in much, much faster tokenization for these text sequences.
(Chris Geeringh, Robert Muir, Steve Rowe)
======================= Lucene 4.9.0 =======================
Changes in Runtime Behavior
* LUCENE-5611: Changing the term vector options for multiple field
instances by the same name in one document is not longer accepted;
IndexWriter will now throw IllegalArgumentException. (Robert Muir,
Mike McCandless)
* LUCENE-5646: Remove rare/undertested bulk merge algorithm in
CompressingStoredFieldsWriter. (Robert Muir, Adrien Grand)
New Features
* LUCENE-5610: Add Terms.getMin and Terms.getMax to get the lowest and
highest terms, and NumericUtils.get{Min/Max}{Int/Long} to get the
minimum numeric values from the provided Terms. (Robert Muir, Mike
McCandless)
* LUCENE-5675: Add IDVersionPostingsFormat, a postings format
optimized for primary-key (ID) fields that also record a version
(long) for each ID. (Robert Muir, Mike McCandless)
* LUCENE-5680: Add ability to atomically update a set of DocValues
fields. (Shai Erera)
* LUCENE-5717: Add support for multiterm queries nested inside
filtered and constant-score queries to postings highlighter.
(Luca Cavanna via Robert Muir)
* LUCENE-5731, LUCENE-5760: Add RandomAccessInput, a random access API for directory.
Add DirectReader/Writer, optimized for reading packed integers directly
from Directory. Add Lucene49Codec and Lucene49DocValuesFormat that make
use of these. (Robert Muir)
* LUCENE-5743: Add Lucene49NormsFormat, which can compress in some cases
such as very short fields. (Ryan Ernst, Adrien Grand, Robert Muir)
* LUCENE-5748: Add SORTED_NUMERIC docvalues type, which is efficient
for processing numeric fields with multiple values. (Robert Muir)
* LUCENE-5754: Allow "$" as part of variable and function names in
expressions module. (Uwe Schindler)
Changes in Backwards Compatibility Policy
* LUCENE-5634: Add reuse argument to IndexableField.tokenStream. This
can be used by custom fieldtypes, which don't use the Analyzer, but
implement their own TokenStream. (Uwe Schindler, Robert Muir)
* LUCENE-5640: AttributeSource.AttributeFactory was moved to a
top-level class: org.apache.lucene.util.AttributeFactory
(Uwe Schindler, Robert Muir)
* LUCENE-4371: Removed IndexInputSlicer and Directory.createSlicer() and replaced
with IndexInput.slice(). (Robert Muir)
* LUCENE-5727, LUCENE-5678: Remove IndexOutput.seek, IndexOutput.setLength().
(Robert Muir, Uwe Schindler)
API Changes
* LUCENE-5756: IndexWriter now implements Accountable and IW#ramSizeInBytes()
has been deprecated infavor of IW#ramBytesUsed() (Simon Willnauer)
* LUCENE-5725: MoreLikeThis#like now accepts multiple values per field.
The pre-existing method has been deprecated in favor of a variable arguments
for the like text. (Alex Ksikes via Simon Willnauer)
* LUCENE-5711: MergePolicy accepts an IndexWriter instance
on each method rather than holding state against a single
IndexWriter instance. (Simon Willnauer)
* LUCENE-5582: Deprecate IndexOutput.length (just use
IndexOutput.getFilePointer instead) and IndexOutput.setLength.
(Mike McCandless)
* LUCENE-5621: Deprecate IndexOutput.flush: this is not used by Lucene.
(Robert Muir)
* LUCENE-5611: Simplified Lucene's default indexing chain / APIs.
AttributeSource/TokenStream.getAttribute now returns null if the
attribute is not present (previously it threw
IllegalArgumentException). StoredFieldsWriter.startDocument no
longer receives the number of fields that will be added (Robert
Muir, Mike McCandless)
* LUCENE-5632: In preparation for coming Lucene versions, the Version
enum constants were renamed to make them better readable. The constant
for Lucene 4.9 is now "LUCENE_4_9". Version.parseLeniently() is still
able to parse the old strings ("LUCENE_49"). The old identifiers got
deprecated and will be removed in Lucene 5.0. (Uwe Schindler,
Robert Muir)
* LUCENE-5633: Change NoMergePolicy to a singleton with no distinction between
compound and non-compound types. (Shai Erera)
* LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams
are using Attributes, Token is no longer used. (Uwe Schindler, Robert Muir)
* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and
IndexWriter.deleteDocuments(Query) with their varargs counterparts.
(Shai Erera)
* LUCENE-5706: Removed the option to unset a DocValues field through DocValues
updates. (Shai Erera)
* LUCENE-5700: Added oal.util.Accountable that is now implemented by all
classes whose memory usage can be estimated. (Robert Muir, Adrien Grand)
* LUCENE-5708: Remove IndexWriterConfig.clone, so now IndexWriter
simply uses the IndexWriterConfig you pass it, and you must create a
new IndexWriterConfig for each IndexWriter. (Mike McCandless)
* LUCENE-5701: Core closed listeners are now available in the AtomicReader API,
they used to sit only in SegmentReader. (Adrien Grand, Robert Muir)
* LUCENE-5678: IndexOutput no longer allows seeking, so it is no longer required
to use RandomAccessFile to write Indexes. Lucene now uses standard FileOutputStream
wrapped with OutputStreamIndexOutput to write index data. BufferedIndexOutput was
removed, because buffering and checksumming is provided by FilterOutputStreams,
provided by the JDK. (Uwe Schindler, Mike McCandless)
* LUCENE-5703: BinaryDocValues API changed to work like TermsEnum and not allocate/
copy bytes on each access, you are responsible for cloning if you want to keep
data around. (Adrien Grand)
* LUCENE-5695: DocIdSet implements Accountable. (Adrien Grand)
* LUCENE-5757: Moved RamUsageEstimator's reflection-based processing to RamUsageTester
in the test-framework module. (Robert Muir)
* LUCENE-5761: Removed DiskDocValuesFormat, it was very inefficient and saved very little
RAM over the default codec. (Robert Muir)
* LUCENE-5775: Deprecate JaspellLookup. (Mike McCandless)
Optimizations
* LUCENE-5603: hunspell stemmer more efficiently strips prefixes
and suffixes. (Robert Muir)
* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
InputStream. (Christoph Kaser via Shai Erera)
* LUCENE-5591: pass an IOContext with estimated flush size when applying DV
updates. (Shai Erera)
* LUCENE-5634: IndexWriter reuses TokenStream instances for String and Numeric
fields by default. (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir)
* LUCENE-5638, LUCENE-5640: TokenStream uses a more performant AttributeFactory
by default, that packs the core attributes into one implementation
(PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and
restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for
instantiating Attribute implementations. (Uwe Schindler, Robert Muir)
* LUCENE-5609: Changed the default NumericField precisionStep from 4
to 8 (for int/float) and 16 (for long/double), for faster indexing
time and smaller indices. (Robert Muir, Uwe Schindler, Mike McCandless)
* LUCENE-5670: Add skip/FinalOutput to FST Outputs. (Christian
Ziech via Mike McCandless).
* LUCENE-4236: Optimize BooleanQuery's in-order scoring. This speeds up
some types of boolean queries. (Robert Muir)
* LUCENE-5694: Don't score() subscorers in DisjunctionSumScorer or
DisjunctionMaxScorer unless score() is called. (Robert Muir)
* LUCENE-5720: Optimize DirectPackedReader's decompression. (Robert Muir)
* LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing
implementations. This improves random access as used by docvalues codecs
if used with MMapDirectory. (Robert Muir, Uwe Schindler)
* LUCENE-5730: FSDirectory.open returns MMapDirectory for 64-bit operating
systems, not just Linux and Windows. (Robert Muir)
* LUCENE-5703: BinaryDocValues producers don't allocate or copy bytes on
each access anymore. (Adrien Grand)
* LUCENE-5721: Monotonic compression doesn't use zig-zag encoding anymore.
(Robert Muir, Adrien Grand)
* LUCENE-5750: Speed up monotonic addressing for BINARY and SORTED_SET
docvalues. (Robert Muir)
* LUCENE-5751: Speed up MemoryDocValues. (Adrien Grand, Robert Muir)
* LUCENE-5767: OrdinalMap optimizations, that mostly help on low cardinalities.
(Martijn van Groningen, Adrien Grand)
* LUCENE-5769: SingletonSortedSetDocValues now supports random access ordinals.
(Robert Muir)
Bug fixes
* LUCENE-5738: Ensure NativeFSLock prevents opening the file channel for the
lock if the lock is already obtained by the JVM. Trying to obtain an already
obtained lock in the same JVM can unlock the file might allow other processes
to lock the file even without explicitly unlocking the FileLock. This behavior
is operating system dependent. (Simon Willnauer)
* LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws
a confusing OutOfMemoryError wrapped inside IOException if the FileChannel
mapping failed because of lack of virtual address space. The IOException is
rethrown with more useful information about the problem, omitting the
incorrect OutOfMemoryError. (Robert Muir, Uwe Schindler)
* LUCENE-5682: NPE in QueryRescorer when Scorer is null
(Joel Bernstein, Mike McCandless)
* LUCENE-5691: DocTermOrds lookupTerm(BytesRef) would return incorrect results
if the underlying TermsEnum supports ord() and the insertion point would
be at the end. (Robert Muir)
* LUCENE-5618, LUCENE-5636: SegmentReader referenced unneeded files following
doc-values updates. Now doc-values field updates are written in separate file
per field. (Shai Erera, Robert Muir)
* LUCENE-5684: Make best effort to detect invalid usage of Lucene,
when IndexReader is reopened after all files in its index were
removed and recreated by the application (the proper way to do
this is IndexWriter.deleteAll, or opening an IndexWriter with
OpenMode.CREATE) (Mike McCandless)
* LUCENE-5704: Fix compilation error with Java 8u20. (Uwe Schindler)
* LUCENE-5710: Include the inner exception as the cause and in the
exception message when an immense term is hit during indexing (Lee
Hinman via Mike McCandless)
* LUCENE-5724: CompoundFileWriter was failing to pass through the
IOContext in some cases, causing NRTCachingDirectory to cache
compound files when it shouldn't, then causing OOMEs. (Mike
McCandless)
* LUCENE-5747: Project-specific settings for the eclipse development
environment will prevent automatic code reformatting. (Shawn Heisey)
* LUCENE-5768, LUCENE-5777: Hunspell condition checks containing character classes
were buggy. (Clinton Gormley, Robert Muir)
Test Framework
* LUCENE-5622: Fail tests if they print over the given limit of bytes to
System.out or System.err. (Robert Muir, Dawid Weiss)
* LUCENE-5619: Added backwards compatibility tests to ensure we can update existing
indexes with doc-values updates. (Shai Erera, Robert Muir)
Build
* LUCENE-5442: The Ant check-lib-versions target now runs Ivy resolution
transitively, then fails the build when it finds a version conflict: when a
transitive dependency's version is more recent than the direct dependency's
version specified in lucene/ivy-versions.properties. Exceptions are
specifiable in lucene/ivy-ignore-conflicts.properties.
(Steve Rowe)
* LUCENE-5715: Upgrade direct dependencies known to be older than transitive
dependencies: com.sun.jersey.version:1.8->1.9; com.sun.xml.bind:jaxb-impl:2.2.2->2.2.3-1;
commons-beanutils:commons-beanutils:1.7.0->1.8.3; commons-digester:commons-digester:2.0->2.1;
commons-io:commons-io:2.1->2.3; commons-logging:commons-logging:1.1.1->1.1.3;
io.netty:netty:3.6.2.Final->3.7.0.Final; javax.activation:activation:1.1->1.1.1;
javax.mail:mail:1.4.1->1.4.3; log4j:log4j:1.2.16->1.2.17; org.apache.avro:avro:1.7.4->1.7.5;
org.tukaani:xz:1.2->1.4; org.xerial.snappy:snappy-java:1.0.4.1->1.0.5 (Steve Rowe)
======================= Lucene 4.8.1 =======================
Bug fixes
* LUCENE-5639: Fix PositionLengthAttribute implementation in Token class.
(Uwe Schindler, Robert Muir)
* LUCENE-5635: IndexWriter didn't properly handle IOException on TokenStream.reset(),
which could leave the analyzer in an inconsistent state. (Robert Muir)
* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
InputStream. (Christoph Kaser via Shai Erera)
* LUCENE-5600: HttpClientBase did not properly consume a connection if a server
error occurred. (Christoph Kaser via Shai Erera)
* LUCENE-5628: Change getFiniteStrings to iterative not recursive
implementation, so that building suggesters on a long suggestion
doesn't risk overflowing the stack; previously it consumed one Java
stack frame per character in the expanded suggestion. If you are building
a suggester this is a nasty trap. (Robert Muir, Simon Willnauer,
Mike McCandless).
* LUCENE-5559: Add additional argument validation for CapitalizationFilter
and CodepointCountFilter. (Ahmet Arslan via Robert Muir)
* LUCENE-5641: SimpleRateLimiter would silently rate limit at 8 MB/sec
even if you asked for higher rates. (Mike McCandless)
* LUCENE-5644: IndexWriter clears which threads use which internal
thread states on flush, so that if an application reduces how many
threads it uses for indexing, that results in a reduction of how
many segments are flushed on a full-flush (e.g. to obtain a
near-real-time reader). (Simon Willnauer, Mike McCandless)
* LUCENE-5653: JoinUtil with ScoreMode.Avg on a multi-valued field
with more than 256 values would throw exception.
(Mikhail Khludnev via Robert Muir)
* LUCENE-5654: Fix various close() methods that could suppress
throwables such as OutOfMemoryError, instead returning scary messages
that look like index corruption. (Mike McCandless, Robert Muir)
* LUCENE-5656: Fix rare fd leak in SegmentReader when multiple docvalues
fields have been updated with IndexWriter.updateXXXDocValue and one
hits exception. (Shai Erera, Robert Muir)
* LUCENE-5660: AnalyzingSuggester.build will now throw IllegalArgumentException if
you give it a longer suggestion than it can handle (Robert Muir, Mike McCandless)
* LUCENE-5662: Add missing checks to Field to prevent IndexWriter.abort
if a stored value is null. (Robert Muir)
* LUCENE-5668: Fix off-by-one in TieredMergePolicy (Mike McCandless)
* LUCENE-5671: Upgrade ICU version to fix an ICU concurrency problem that
could cause exceptions when indexing. (feedly team, Robert Muir)
======================= Lucene 4.8.0 =======================
System Requirements
* LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
(Robert Muir, Uwe Schindler)
Changes in Runtime Behavior
* LUCENE-5472: IndexWriter.addDocument will now throw an IllegalArgumentException
if a Term to be indexed exceeds IndexWriter.MAX_TERM_LENGTH. To recreate previous
behavior of silently ignoring these terms, use LengthFilter in your Analyzer.
(hossman, Mike McCandless, Varun Thacker)
New Features
* LUCENE-5356: Morfologik filter can accept custom dictionary resources.
(Michal Hlavac, Dawid Weiss)
* LUCENE-5454: Add SortedSetSortField to lucene/sandbox, to allow sorting
on multi-valued field. (Robert Muir)
* LUCENE-5478: CommonTermsQuery now allows to create custom term queries
similar to the query parser by overriding a newTermQuery method.
(Simon Willnauer)
* LUCENE-5477: AnalyzingInfixSuggester now supports near-real-time
additions and updates (to change weight or payload of an existing
suggestion). (Mike McCandless)
* LUCENE-5482: Improve default TurkishAnalyzer by adding apostrophe
handling suitable for Turkish. (Ahmet Arslan via Robert Muir)
* LUCENE-5479: FacetsConfig subclass can now customize the default
per-dim facets configuration. (Rob Audenaerde via Mike McCandless)
* LUCENE-5485: Add circumfix support to HunspellStemFilter. (Robert Muir)
* LUCENE-5224: Add iconv, oconv, and ignore support to HunspellStemFilter.
(Robert Muir)
* LUCENE-5493: SortingMergePolicy, and EarlyTerminatingSortingCollector
support arbitrary Sort specifications.
(Robert Muir, Mike McCandless, Adrien Grand)
* LUCENE-3758: Allow the ComplexPhraseQueryParser to search order or
un-order proximity queries. (Ahmet Arslan via Erick Erickson)
* LUCENE-5530: ComplexPhraseQueryParser throws ParseException for fielded queries.
(Erick Erickson via Tomas Fernandez Lobbe and Ahmet Arslan)
* LUCENE-5513: Add IndexWriter.updateBinaryDocValue which lets
you update the value of a BinaryDocValuesField without reindexing the
document(s). (Shai Erera)
* LUCENE-4072: Add ICUNormalizer2CharFilter, which lets you do unicode normalization
with offset correction before the tokenizer. (David Goldfarb, Ippei UKAI via Robert Muir)
* LUCENE-5476: Add RandomSamplingFacetsCollector for computing facets on a sampled
set of matching hits, in cases where there are millions of hits.
(Rob Audenaerde, Gilad Barkai, Shai Erera)
* LUCENE-4984: Add SegmentingTokenizerBase, abstract class for tokenizers
that want to do two-pass tokenization such as by sentence and then by word.
(Robert Muir)
* LUCENE-5489: Add Rescorer/QueryRescorer, to resort the hits from a
first pass search using scores from a more costly second pass
search. (Simon Willnauer, Robert Muir, Mike McCandless)
* LUCENE-5528: Add context to suggesters (InputIterator and Lookup
classes), and fix AnalyzingInfixSuggester to handle contexts.
Suggester contexts allow you to filter suggestions. (Areek Zillur,
Mike McCandless)
* LUCENE-5545: Add SortRescorer and Expression.getRescorer, to
resort the hits from a first pass search using a Sort or an
Expression. (Simon Willnauer, Robert Muir, Mike McCandless)
* LUCENE-5558: Add TruncateTokenFilter which truncates terms to
the specified length. (Ahmet Arslan via Robert Muir)
* LUCENE-2446: Added checksums to lucene index files. As of 4.8, the last 8
bytes of each file contain a zlib-crc32 checksum. Small metadata files are
verified on load. Larger files can be checked on demand via
AtomicReader.checkIntegrity. You can configure this to happen automatically
before merges by enabling IndexWriterConfig.setCheckIntegrityAtMerge.
(Robert Muir)
* LUCENE-5580: Checksums are automatically verified on the default stored
fields format when performing a bulk merge. (Adrien Grand)
* LUCENE-5602: Checksums are automatically verified on the default term
vectors format when performing a bulk merge. (Adrien Grand, Robert Muir)
* LUCENE-5583: Added DataInput.skipBytes. ChecksumIndexInput can now seek, but
only forward. (Adrien Grand, Mike McCandless, Simon Willnauer, Uwe Schindler)
* LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring
that all file metadata is persisted on disk in case of power failure.
This does not work on all file systems and operating systems, but Linux
and MacOSX are known to work. On Windows, fsyncing a directory is not
possible with Java APIs. (Mike McCandless, Uwe Schindler)
* LUCENE-6299: IndexWriter was failing to enforce the 2.1 billion doc
limit. (Robert Muir, Mike McCandless)
API Changes
* LUCENE-5454: Add RandomAccessOrds, an optional extension of SortedSetDocValues
that supports random access to the ordinals in a document. (Robert Muir)
* LUCENE-5468: Move offline Sort (from suggest module) to OfflineSort. (Robert Muir)
* LUCENE-5493: SortingMergePolicy and EarlyTerminatingSortingCollector take
Sort instead of Sorter. BlockJoinSorter is removed, replaced with
BlockJoinComparatorSource, which can take a Sort for ordering of parents
and a separate Sort for ordering of children within a block.
(Robert Muir, Mike McCandless, Adrien Grand)
* LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as
a boolean that indicates if a new merge was found in the caller thread before
the scheduler was called. (Simon Willnauer)
* LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from
normal scoring (Weight.scorer) for those queries that can do bulk
scoring more efficiently, e.g. BooleanQuery in some cases. This
also simplified the Weight.scorer API by removing the two confusing
booleans. (Robert Muir, Uwe Schindler, Mike McCandless)
* LUCENE-5519: TopNSearcher now allows to retrieve incomplete results if the max
size of the candidate queue is unknown. The queue can still be bound in order
to apply pruning while retrieving the top N but will not throw an exception if
too many results are rejected to guarantee an absolutely correct top N result.
The TopNSearcher now returns a struct like class that indicates if the result
is complete in the sense of the top N or not. Consumers of this API should assert
on the completeness if the bounded queue size is know ahead of time. (Simon Willnauer)
* LUCENE-4984: Deprecate ThaiWordFilter and smartcn SentenceTokenizer and WordTokenFilter.
These filters would not work correctly with CharFilters and could not be safely placed
at an arbitrary position in the analysis chain. Use ThaiTokenizer and HMMChineseTokenizer
instead. (Robert Muir)
* LUCENE-5543: Remove/deprecate Directory.fileExists (Mike McCandless)
* LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues.
(Dawid Weiss, Robert Muir)
* LUCENE-5604: Switched BytesRef.hashCode to MurmurHash3 (32 bit).
TermToBytesRefAttribute.fillBytesRef no longer returns the hash
code. BytesRefHash now uses MurmurHash3 for its hashing. (Robert
Muir, Mike McCandless)
Optimizations
* LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads
all known openoffice dictionaries without error, and supports an additional
longestOnly option for a less aggressive approach. (Robert Muir)
* LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
for NIOFSDirectory and MMapDirectory. This allows to delete open files
on Windows if NIOFSDirectory is used, mmapped files are still locked.
(Michael Poindexter, Robert Muir, Uwe Schindler)
* LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc
array with length of at most equal to the specified size instead of length
equal to at most from + size as was before. (Martijn van Groningen)
* LUCENE-5529: Spatial search of non-point indexed shapes should be a little
faster due to skipping intersection tests on redundant cells. (David Smiley)
Bug fixes
* LUCENE-5483: Fix inaccuracies in HunspellStemFilter. Multi-stage affix-stripping,
prefix-suffix dependencies, and COMPLEXPREFIXES now work correctly according
to the hunspell algorithm. Removed recursionCap parameter, as its no longer needed, rules for
recursive affix application are driven correctly by continuation classes in the affix file.
(Robert Muir)
* LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions.
(Robert Muir)
* LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying
types of whitespace in SET/FLAG commands. (Robert Muir)
* LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases
etc before the encoding declaration. (Robert Muir)
* LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order. (Robert Muir)
* LUCENE-5555: Fix SortedInputIterator to correctly encode/decode contexts in presence of payload (Areek Zillur)
* LUCENE-5559: Add missing argument checks to tokenfilters taking
numeric arguments. (Ahmet Arslan via Robert Muir)
* LUCENE-5568: Benchmark module's "default.codec" option didn't work. (David Smiley)
* SOLR-5983: HTMLStripCharFilter is treating CDATA sections incorrectly.
(Dan Funk, Steve Rowe)
* LUCENE-5615: Validate per-segment delete counts at write time, to
help catch bugs that might otherwise cause corruption (Mike McCandless)
* LUCENE-5612: NativeFSLockFactory no longer deletes its lock file. This cannot be done
safely without the risk of deleting someone else's lock file. If you use NativeFSLockFactory,
you may see write.lock hanging around from time to time: its harmless.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-5624: Ensure NativeFSLockFactory does not leak file handles if it is unable
to obtain the lock. (Uwe Schindler, Robert Muir)
* LUCENE-5626: Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed
IOException (ERROR_ACESS_DENIED) on Windows if the lock file was created
concurrently. This error is now handled the same way like in NativeFSLockFactory
by returning false. (Uwe Schindler, Robert Muir, Dawid Weiss)
* LUCENE-5630: Add missing META-INF entry for UpperCaseFilterFactory.
(Robert Muir)
Tests
* LUCENE-5630: Fix TestAllAnalyzersHaveFactories to correctly check for existence
of class and corresponding Map<String,String> ctor. (Uwe Schindler, Robert Muir)
Test Framework
* LUCENE-5592: Incorrectly reported uncloseable files. (Dawid Weiss)
* LUCENE-5577: Temporary folder and file management (and cleanup facilities)
(Mark Miller, Uwe Schindler, Dawid Weiss)
* LUCENE-5567: When a suite fails with zombie threads failure marker and count
is not propagated properly. (Dawid Weiss)
* LUCENE-5449: Rename _TestUtil and _TestHelper to remove the leading _.
* LUCENE-5501: Added random out-of-order collection testing (when the collector
supports it) to AssertingIndexSearcher. (Adrien Grand)
Build
* LUCENE-5463: RamUsageEstimator.(human)sizeOf(Object) is now a forbidden API.
(Adrien Grand, Robert Muir)
* LUCENE-5512: Remove redundant typing (use diamond operator) throughout
the codebase. (Furkan KAMACI via Robert Muir)
* LUCENE-5614: Enable building on Java 8 using Apache Ant 1.8.3 or 1.8.4
by adding a workaround for the Ant bug. (Uwe Schindler)
* LUCENE-5612: Add a new Ant target in lucene/core to test LockFactory
implementations: "ant test-lock-factory". (Uwe Schindler, Mike McCandless,
Robert Muir)
Documentation
* LUCENE-5534: Add javadocs to GreekStemmer methods.
(Stamatis Pitsios via Robert Muir)
======================= Lucene 4.7.2 =======================
Bug Fixes
* LUCENE-5574: Closing a near-real-time reader no longer attempts to
delete unreferenced files if the original writer has been closed;
this could cause index corruption in certain cases where index files
were directly changed (deleted, overwritten, etc.) in the index
directory outside of Lucene. (Simon Willnauer, Shai Erera, Robert
Muir, Mike McCandless)
* LUCENE-5570: Don't let FSDirectory.sync() create new zero-byte files, instead throw
exception if a file is missing. (Uwe Schindler, Mike McCandless, Robert Muir)
======================= Lucene 4.7.1 =======================
Changes in Runtime Behavior
* LUCENE-5532: AutomatonQuery.equals is no longer implemented as "accepts same language".
This was inconsistent with hashCode, and unnecessary for any subclasses in Lucene.
If you desire this in a custom subclass, minimize the automaton. (Robert Muir)
Bug Fixes
* LUCENE-5450: Fix getField() NPE issues with SpanOr/SpanNear when they have an
empty list of clauses. This can happen for example, when a wildcard matches