From 9ad1c0e8a8180683efbcc208858a35ad26e86288 Mon Sep 17 00:00:00 2001 From: Katharina Schmid Date: Tue, 18 Jun 2024 13:24:28 +0200 Subject: [PATCH 1/2] Fix read offsets and length --- .../java/com/github/dbmdz/solrocr/reader/BaseSourceReader.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/main/java/com/github/dbmdz/solrocr/reader/BaseSourceReader.java b/src/main/java/com/github/dbmdz/solrocr/reader/BaseSourceReader.java index 58d0de83..e56a6b50 100644 --- a/src/main/java/com/github/dbmdz/solrocr/reader/BaseSourceReader.java +++ b/src/main/java/com/github/dbmdz/solrocr/reader/BaseSourceReader.java @@ -218,7 +218,7 @@ public Section getAsciiSection(int offset) throws IOException { int readLen = Math.min(sectionSize, this.length() - startOffset); int numRead = 0; while(numRead < readLen) { - numRead += this.readBytes(copyBuf, 0, startOffset, readLen); + numRead += this.readBytes(copyBuf, numRead, startOffset + numRead, readLen - numRead); } // Construct a String without going through a decoder to save on CPU. // Given that the method has been deprecated since Java 1.1 and was never removed, I don't think From b560ee36dcbc1f7e2be5dc4e447e7d45ac962dc1 Mon Sep 17 00:00:00 2001 From: Katharina Schmid Date: Tue, 18 Jun 2024 13:29:56 +0200 Subject: [PATCH 2/2] Fix parameter name in documentation --- docs/performance.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/performance.md b/docs/performance.md index 2f0e465c..f8de68d0 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -43,11 +43,11 @@ the cache size, the more data is read from the disk, i.e. the chances of cache h comes at the cost of more memory usage and more allocations in the JVM, which can have a performance impact. By default, the plugin uses a section size of 8KiB with a maximum number of cached sections of 10, which is a good trade-off for most setups and performed well in our benchmarks. If you want to tweak these -settings, use the `sectionReadSizeKib` and `maxSectionCacheSizeKib` parameters on the `OcrHighlightComponent` +settings, use the `sectionReadSizeKiB` and `maxSectionCacheSizeKiB` parameters on the `OcrHighlightComponent` in your `solrconfig.xml`: -- `sectionReadSizeKib`: The size of the sections that are read from the OCR files. The default is 8KiB. -- `maxSectionCacheSizeKib`: The maximum memory that is used for caching sections. The default is 10 * `sectionReadSizeKib`. +- `sectionReadSizeKiB`: The size of the sections that are read from the OCR files. The default is 8KiB. +- `maxSectionCacheSizeKiB`: The maximum memory that is used for caching sections. The default is 10 * `sectionReadSizeKiB`. ## Concurrency The plugin can read multiple files in parallel and also process them concurrently. By default, it will