Skip to content

Commit

Permalink
Release 0.8.1
Browse files Browse the repository at this point in the history
This is a bugfix release targeting mainly the MiniOCR and ALTO
implementations.

**Bufgfixes:**

- ALTO: Fix handling of empty words. Previously any words after a word element
  with no text **would be skipped entirely during indexing** 😱😱.
- MiniOCR: Fix handling of empty words, Previously a word element with no text
  would make the parser crash.
- MiniOCR: Make the `wh` attribute on `<p>` page elements actually optional.
  The documentation said it was optional, but the parser would crash when
  attempting to handle elements without the attribute

**Other Changes:**

- A warning will now be logged if none of the fields requested with `hl.ocr.fl`
  exist or are defined as stored fields. Previously highlighting would just
  not work, with no indications to users as to why this was the case.
  • Loading branch information
jbaiter committed Jun 10, 2022
1 parent 9060551 commit a505a25
Show file tree
Hide file tree
Showing 6 changed files with 30 additions and 8 deletions.
25 changes: 25 additions & 0 deletions docs/changes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,29 @@
## 0.8.1 (2022-06-10)
[GitHub Release](https://github.com/dbmdz/solr-ocrhighlighting/releases/tag/0.8.1)

This is a bugfix release targeting mainly the MiniOCR and ALTO
implementations.

**Bufgfixes:**

- ALTO: Fix handling of empty words. Previously any words after a word element
with no text **would be skipped entirely during indexing** 😱😱.
- MiniOCR: Fix handling of empty words, Previously a word element with no text
would make the parser crash.
- MiniOCR: Make the `wh` attribute on `<p>` page elements actually optional.
The documentation said it was optional, but the parser would crash when
attempting to handle elements without the attribute

**Other Changes:**

- A warning will now be logged if none of the fields requested with `hl.ocr.fl`
exist or are defined as stored fields. Previously highlighting would just
not work, with no indications to users as to why this was the case.


## 0.8.0 (2022-06-01)
[GitHub Release](https://github.com/dbmdz/solr-ocrhighlighting/releases/tag/0.8.0)

The major improvement in this version is compatibility with Solr 9.

Due to a number of API changes in Solr and Lucene, we now have to ship two separate releases,
Expand Down
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ your Solrcloud cluster. All paths are relative to the Solr installation director
`$ ./bin/solr package add-repo dbmdz.github.io https://dbmdz.github.io/solr`
- **Install package** in the latest version:<br>
`$ ./bin/solr package install ocrhighlighting` if you're on Solr 9, otherwise:
`$ ./bin/solr package install ocrhighlighting:0.8.0-solr78`
`$ ./bin/solr package install ocrhighlighting:0.8.1-solr78`

!!! caution "Be sure to use the `ocrhighlighting:` prefix when specifying classes in your configuration."
When using the Package Manager, classes from plugins have to be prefixed (separated by a colon) by
Expand Down
2 changes: 1 addition & 1 deletion integration-tests/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ for version in $SOLR9_VERSIONS; do
-v "$plugin_dir:/build" \
-p "31337:8983" \
solr:$version \
solr-precreate ocr /opt/core-config & > /dev/null 2>&1 & \
solr-precreate ocr /opt/core-config > /dev/null 2>&1 & \
wait_for_solr "$container_name"
if ! python3 test.py; then
printf " !!!FAIL!!!\n"
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>de.digitalcollections</groupId>
<artifactId>solr-ocrhighlighting</artifactId>
<version>0.8.1-SNAPSHOT</version>
<version>0.8.1</version>

<name>Solr OCR Highlighting Plugin</name>
<description>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,7 @@ public OcrPage parsePageFragment(String pageFragment) {
}
Dimension dims = null;
if (m.group("width") != null && m.group("height") != null) {
dims = new Dimension(
Integer.parseInt(m.group("width")),
Integer.parseInt(m.group("height")));
dims = new Dimension(Integer.parseInt(m.group("width")), Integer.parseInt(m.group("height")));
}
String pageId = m.group("pageId");
return new OcrPage(pageId, dims);
Expand Down
3 changes: 1 addition & 2 deletions src/test/java/com/github/dbmdz/solrocr/solr/MiniOcrTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -336,8 +336,7 @@ public void testPagesWithoutDimensions() {
assertQ(
req,
"count(//lst[@name='57371']//arr[@name='snippets']/lst)='10'",
"(//lst[@name='57371']//arr[@name='snippets']/lst)[1]/arr[@name='pages']/lst/str[@name='id']/text()='716'"
);
"(//lst[@name='57371']//arr[@name='snippets']/lst)[1]/arr[@name='pages']/lst/str[@name='id']/text()='716'");
assertU(delI("57371"));
assertU(commit());
}
Expand Down

0 comments on commit a505a25

Please sign in to comment.