-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EAD collection import #6297
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes from reading the changes.
@@ -108,6 +122,10 @@ public CreateProcessForm() { | |||
this.priorityList = priorityList; | |||
} | |||
|
|||
public void calculateNumberOfEadElements() throws XMLStreamException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing JavaDoc
Kitodo/src/main/java/org/kitodo/production/forms/createprocess/CatalogImportDialog.java
Outdated
Show resolved
Hide resolved
private SearchResult searchResult = null; | ||
|
||
/** | ||
* Empty default constructor. Sets default catalog and search field, if configured. | ||
*/ | ||
public LazyHitModel() { | ||
public LazyHitModel(CatalogImportDialog dialog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc should be adjusted.
if (Objects.isNull(parentID)) { | ||
this.parentTempProcess = null; | ||
return; | ||
public void checkForParent(String parentID, Ruleset ruleset, int projectID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc should be adjusted.
Kitodo/src/main/java/org/kitodo/production/thread/ImportEadProcessesThread.java
Outdated
Show resolved
Hide resolved
/** | ||
* Set searchTerm. | ||
* | ||
* @param searchTerm as java.lang.String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java.lang.String
could be simplified to string
or the whole wording should be adjusted.
StringConstants.LEVEL, level); | ||
} | ||
|
||
public boolean isMaxNumberOfRecordsExceeded(String xmlString, String tagName) throws XMLStreamException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing JavaDoc.
@@ -409,8 +412,7 @@ public String getParentID(Document document, String higherLevelIdentifier, Strin | |||
*/ | |||
public TempProcess createTempProcessFromDocument(ImportConfiguration importConfiguration, Document document, | |||
int templateID, int projectID) | |||
throws ProcessGenerationException, IOException, TransformerException, InvalidMetadataValueException, | |||
NoSuchMetadataFieldException { | |||
throws ProcessGenerationException, IOException, TransformerException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc should be adjusted.
While testing this i first made a dumb configuration error. I associated the wrong XSL-file with the EAD-Mapping-Definition (i associated the pica2ktodo.xsl). This error is produced by the fact that the kitodo-production/Kitodo/src/main/java/org/kitodo/production/services/data/ImportService.java Line 894 in c947be2
It is a dumb mistake on my side, but maybe we can assist by also check if we also check if the importDocument is empty (file has no real content) - as we do later with the resultDocument - and the importer has to check the mapping file configuration. |
I tested with the given test ead file. A problem appears, when i try to import the file two times. I then get a Null pointer for the kitodo-production/Kitodo/src/main/java/org/kitodo/production/services/data/ImportService.java Line 1227 in c947be2
the process titles are considered invalid because they do already exist. kitodo-production/Kitodo/src/main/java/org/kitodo/production/services/data/ImportService.java Lines 1342 to 1344 in c947be2
This should throw a Then the Nullpointer is thrown in the FileService, because the file location is based on the missing ID. kitodo-production/Kitodo/src/main/java/org/kitodo/production/services/file/FileService.java Lines 242 to 246 in c947be2
When processing the list in the background (because the number of imported processes is greater than the configured limit) the process stops with a ProcessGenerationException. For the background processing the behaviour should probably be more like in the mass import, where items with duplicate titles are skipped. In the form Kitodo should catch invalid titles and force the user to correct them. |
* @return number of EAD elements with given level | ||
* @throws XMLStreamException when parsing XML string fails | ||
*/ | ||
public static int getNumberOfElements(String xmlString, String eadLevel) throws XMLStreamException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static int getNumberOfElements(String xmlString, String eadLevel) throws XMLStreamException { | |
public static int getNumberOfEADElements(String xmlString, String eadLevel) throws XMLStreamException { |
This can only be used with EAD elements. Maybe a generic function for getting the number of elements is useful as well, but right now it does not work like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BartChris yes, you are right. This was intended to be used in more general imports, but at the moment only works for the EAD import. I have renamed the method as you suggested.
|
||
if (parentElements.size() != 1) { | ||
throw new ProcessGenerationException( | ||
String.format("EAD XML does not contain exactly one element of parent level '%s'!", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should maybe be translated as it is displayed in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
While inspecting the Pull request i noticed a general problem: |
@BartChris thanks for your analysis. Actually I implemented it this way deliberately. The goal here was to restore the original state - from before the import started - when an error occurs so that the user can first check the uploaded file for problems, fix those problems and then re-try the upload with the updated file. Therefore, all processes that were already created up to the point where the exception is thrown are "cleaned up", e.g. removed. I do understand, though, that a differnt behavior could be preferable, where importing collections does not fail with individual problematic EAD data records, but instead skips those processes and proceeds with the next "items" from the EAD file. If there is no consense on which option is better, we could ideally introduce a parameter in |
Maybe nothing for now: but could here help a dry run on importing without a real save on database, index and file system? Deleting processes can even cause additional work in the background. |
I wouldn't do that in this particular case. Uploaded EAD XML files are parsed using a SAX parser in order to be able to handle large XML files without increasing memory requirements. To keep this advantage no lists of complex objects should be retained during the import of an EAD collection (which would grow with XML files containing large collections). Instead each process is completely handled (including saving it to filesystem, index and database) when reaching its corresponding "end event" in the main while loop of I added a parameter that controls how the system should handle exceptions during EAD import. The default behavior is now as @BartChris described, so that erroneous records are skipped and the import is continued with the following EAD element. In this case, no intermediate processes are removed, so no additional deletion of processes is performed anyway. |
I still have the problem, that when using the standard import way (without background task) i get a NullPointer exception, when i try to import the same EAD collection multiple times. |
This seems to be a problem in the hierarchical import in general that also occurs in the master branch and is not related to the changes in this pull request. There (in the master branch), I just tried to import a process hierarchy from Kalliope (ID @BartChris If this is easy to resolve I can try to include a fix in this pull request, otherwise I would suggest to handle that issue separately. |
@BartChris I think this will take a little more time to properly fix. I would therefor suggest to deal with this problem separately. I will create a ticket for this problem. |
@solth I agree and approve the Pull request. I think the real usage in the archives has to show, wether we need additional adjustments, but it works good for me. Some additional ideas for the future:
|
Yes, these are definitely the next two incremental steps that we should try to achieve and that I already had in mind as well. Thanks a lot for the review! |
This pull request adds the option to import EAD collections from uploaded XML files. The import creates one (parent) process for the collection and one (child) for each "file" contained in the XML file, thus allowing complete collections ("Bestände) into individual processes from a single file upload.
To be able to use this EAD import, first a ruleset and an Import configuration of type "File Upload" has to be configured with metadata format
EAD
:Accordingly, the mapping file used in the import configuration has to map EAD to the Kitodo internal metadata format:
A separate mapping file can be configured to map the collection level of the imported EAD XML file, in the field "Mapping file for parent record":
This pull request adds example mapping files for the parent and child processes (
eadParent2kitodo.xsl
andead2kitodo.xsl
) as well as a corresponding ruleset (ruleset_ead.xml
) that can be used for this purpose and that of cause can also extended when required.The user can then use this import configuration to upload an EAD XML file containing the EAD collection. The upload dialog also allows to select the EAD level of the element that is to be transformed into the parent process from
COLLECTION
,CLASS
andSERIES
and the EAD level of items that are transformed into child processes fromFILE
andITEM
:Additionally, a limit for the maximum number or child processes that can be processed interactively in the import mask can now be configured in
kitodo_config.properties
using the parametermaxNumberOfProcessesForImportMask
. (this value defaults to 5). If the number of child elements in the uploaded file exceeds this limit, the user is offered with a choice to relocate the import of all processes to a background task:If the user has the corresponding permission, the progress of this background import can then be monitored on the "Task manager" page of Kitodo:
Fixes #5984