Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update init data.xml file with new tenant data (presentation v5.0) #284

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

csidirop
Copy link
Contributor

See #283

Structures and Metadata are imported correctly. Entries now have the correct Data Format (see kitodo/kitodo-presentation#1147) and only theirs.

But I just can't get the default namespaces to be imported as well:
grafik

I manually added the old subtree, but it still don't works.

@beatrycze-volk beatrycze-volk added the 🛠️ maintenance A task to keep the code up-to-date and manageable. label Mar 21, 2024
@beatrycze-volk beatrycze-volk added this to the DFG-Viewer 6.1.0 milestone Mar 21, 2024
@beatrycze-volk beatrycze-volk linked an issue Mar 22, 2024 that may be closed by this pull request
@beatrycze-volk
Copy link
Contributor

beatrycze-volk commented Mar 28, 2024

I have few questions, I hope they can help you to fix the problem with data insertion.

  1. Manually added tx_dlf_format table has mixed order of uids
                        <table index="tx_dlf_formats" type="array">
				<rec index="3" type="array">
					<uid>3</uid>
					<pid>0</pid>
					<title>ALTO</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
				<rec index="4" type="array">
					<uid>4</uid>
					<pid>0</pid>
					<title>IIIF1</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
				<rec index="5" type="array">
					<uid>5</uid>
					<pid>0</pid>
					<title>IIIF2</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
				<rec index="6" type="array">
					<uid>6</uid>
					<pid>0</pid>
					<title>IIIF3</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
				<rec index="2" type="array">
					<uid>2</uid>
					<pid>0</pid>
					<title>MODS</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
				<rec index="1" type="array">
					<uid>1</uid>
					<pid>0</pid>
					<title>TEIHDR</title>
					<relations index="rels" type="array"></relations>
					<softrefs type="array"></softrefs>
				</rec>
			</table>

Do you think that it could help if they would be sorted 1, 2, 3, 4, 5, 6?

  1. tx_dlf_formats is only inserted in section records type="array"
    <records type="array">
    but missing in pid_lookup type="array" section
    <pid_lookup type="array">
    and is also missing in actual inserting of records to database
    <records type="array">

Could you also add it there?

  1. Is it possible that now with usage of EXTBASE the data is validated during import?

In this file are first inserted table rows of tx_dlf_metadata table and their are containing ids to tx_dlf_metadataformat rows which are not yet inserted. Additionally tx_dlf_metadataformat rows references rows from tx_dlf_formats which are not existing at all.

Ein frohes Osterfest! :)

@csidirop csidirop marked this pull request as draft April 9, 2024 09:59
@csidirop
Copy link
Contributor Author

Hi, sorry I had no time to answer earlier.

It took me a bit to realize what the issue was, but you comments showed me what was missing and the right direction. Now should everything work again:

grafik

grafik

What I do not know is, why PHPStan is throwing an error now.

@csidirop csidirop marked this pull request as ready for review April 11, 2024 09:02
@beatrycze-volk
Copy link
Contributor

beatrycze-volk commented Apr 11, 2024

It took me a bit to realize what the issue was, but you comments showed me what was missing and the right direction. Now should everything work again:

Great! :)

What I do not know is, why PHPStan is throwing an error now.

I think, you need to rebase your branch on top of the current master branch. It should fix the problem.

Edit: Or actually I need to fix it, as it is definition of the error which doesn't exist anymore in Kitodo.Production.

@csidirop
Copy link
Contributor Author

What I do not know is, why PHPStan is throwing an error now.

I think, you need to rebase your branch on top of the current master branch. It should fix the problem.

Edit: Or actually I need to fix it, as it is definition of the error which doesn't exist anymore in Kitodo.Production.

Yes, I had the same error when requiring this branch by composer. Composer loads an older version of presentation without tha changes. Trying to open a document results an exactly that error Call to undefined method Kitodo\Dlf\Domain\Model\Document::getCurrentDocument(). But with the current master everything works as aspected

@beatrycze-volk
Copy link
Contributor

I have removed this error message from configuration file. Now you can rebase your branch :)

@csidirop csidirop force-pushed the fix/init_data branch 2 times, most recently from 7605fcd to 1f4a669 Compare April 11, 2024 09:51
@csidirop
Copy link
Contributor Author

There is still an error unfortunately. Some documents throwing following exception while indexing:

Uncaught TYPO3 Exception Argument 1 passed to Kitodo\Dlf\Domain\Model\Document::setStructure() must be an instance of Kitodo\Dlf\Domain\Model\Structure, null given, called in /var/www/typo3/public/typo3conf/ext/dlf/Classes/Command/BaseCommand.php on line 227 thrown in file /var/www/typo3/public/typo3conf/ext/dlf/Classes/Domain/Model/Document.php in line 474

Need a bit more work here ...

@csidirop
Copy link
Contributor Author

I am unsure why this issue occurs. When I initialize a new instance using a clean export (without the new tenant data), such as
the file here cleanstate.xml.txt, and then add the new tenant data, everything works as expected.

However, if I use this export to initialize a new instance, everything appears to be working correctly at first, but then throws an error on some documents.

@beatrycze-volk
Copy link
Contributor

I am unsure why this issue occurs. When I initialize a new instance using a clean export (without the new tenant data), such as the file here cleanstate.xml.txt, and then add the new tenant data, everything works as expected.

However, if I use this export to initialize a new instance, everything appears to be working correctly at first, but then throws an error on some documents.

You mean this error with empty structure? If yes, then probably some of the documents are missing the structures. Adding one to each of the document should solve the problem.

@csidirop
Copy link
Contributor Author

csidirop commented May 15, 2024

First sorry for the late response! And second sorry, sometimes I express myself really unclearly. What I ment:

  • If I set up the Tenant data manually in presentation, I can index the documents without any problems.
  • When I initialize presentation & DFG Viewer with the new data (this PR), I can only index some of the documents.

But I looked now deeper into it and found following:
Some structure types have a tailing zero as index_name (eg. monograph0). Presumably to have unique index name. If the indexname with tailing zero is representing the German translation, the indexing aborts. Otherwise there is no issue.
Manually renaming those fixes the issue.

Here a screenshot:
grafik
In this example the type letter would throw an exception but document wouldn't.

Setting up the New Tenant manually does not produce entries with tailing zeros:
grafik

I remember that we added tailing zeros an some point in presentation. Why again? Uniqueness? (Edit: we did not. I made a proposal for the metadata) Are those missing in the first place and TYPO3 is adding them when importing the xml file? (I don't remember how the init xml is handelt)

@beatrycze-volk
Copy link
Contributor

But I looked now deeper into it and found following: Some structure types have a tailing zero as index_name (eg. monograph0). Presumably to have unique index name. If the indexname with tailing zero is representing the German translation, the indexing aborts. Otherwise there is no issue. Manually renaming those fixes the issue.

In this example the type letter would throw an exception but document wouldn't.

I have checked out the data.xml file to see how looks those two structures there.

Letter

Parent structure

  • uid 150
  • label Brief
  • index_name letter

Translation structure

  • uid 80
  • label Letter
  • index_name letter0
<tablerow index="tx_dlf_structures:150" type="array">
	<fieldlist index="data" type="array">
		<field index="uid" type="integer">150</field>
		...
		<field index="sys_language_uid" type="integer">0</field>
		<field index="l18n_parent" type="integer">0</field>
		<field index="l18n_diffsource">a:7:{s:8:&quot;toplevel&quot;;N;s:5:&quot;label&quot;;N;s:10:&quot;index_name&quot;;N;s:8:&quot;oai_name&quot;;N;s:16:&quot;sys_language_uid&quot;;N;s:6:&quot;hidden&quot;;N;s:6:&quot;status&quot;;N;}</field>
		...
		<field index="label">Brief</field>
		<field index="index_name">letter</field>
		...
	</fieldlist>
	<related index="rels" type="array"></related>
</tablerow>
<tablerow index="tx_dlf_structures:80" type="array">
	<fieldlist index="data" type="array">
		<field index="uid" type="integer">80</field>
		...
		<field index="sys_language_uid" type="integer">1</field>
		<field index="l18n_parent" type="integer">150</field>
		<field index="l18n_diffsource">a:2:{s:16:&quot;sys_language_uid&quot;;N;s:11:&quot;l18n_parent&quot;;N;}</field>
		...
		<field index="label">Letter</field>
		<field index="index_name">letter0</field>
		...
	</fieldlist>
	<related index="rels" type="array">
		...
	</related>
</tablerow>

Document

Parent structure

  • uid 129
  • label Dokument
  • index_name document

Translation structure

  • uid 131
  • label Document
  • index_name document0
<tablerow index="tx_dlf_structures:129" type="array">
	<fieldlist index="data" type="array">
		<field index="uid" type="integer">129</field>
		...
		<field index="sys_language_uid" type="integer">0</field>
		<field index="l18n_parent" type="integer">0</field>
		<field index="l18n_diffsource">a:8:{s:8:&quot;toplevel&quot;;N;s:5:&quot;label&quot;;N;s:10:&quot;index_name&quot;;N;s:8:&quot;oai_name&quot;;N;s:9:&quot;thumbnail&quot;;N;s:16:&quot;sys_language_uid&quot;;N;s:6:&quot;hidden&quot;;N;s:6:&quot;status&quot;;N;}</field>
		...
		<field index="label">Dokument</field>
		<field index="index_name">document</field>
		...
	</fieldlist>
	<related index="rels" type="array"></related>
</tablerow>
<tablerow index="tx_dlf_structures:131" type="array">
	<fieldlist index="data" type="array">
		<field index="uid" type="integer">131</field>
		...
		<field index="sys_language_uid" type="integer">1</field>
		<field index="l18n_parent" type="integer">129</field>
		<field index="l18n_diffsource">a:2:{s:16:&quot;sys_language_uid&quot;;N;s:11:&quot;l18n_parent&quot;;N;}</field>
		...
		<field index="label">Document</field>
		<field index="index_name">document0</field>
		...
	</fieldlist>
	<related index="rels" type="array">
		...
	</related>
</tablerow>

At first it looks quiet ok for both structures, but then I have noticed one difference between them. For document uid of the parent is 129 and then translation 131. It makes totally sens, the parent is saved and after that the translation. But for letter is the other way round. The parent has uid 150 and the translation 80.

I have observed similar dependency for other structures.

  • cover (correct) - the parent uid 107 and the translation uid 139
  • collation (incorrect) - the parent uid 152 and the translation uid 143

I'm not sure if it really matters but maybe the records are saved one after another and it causes the trouble here. What do you think?

@csidirop csidirop force-pushed the fix/init_data branch 2 times, most recently from cbfc9ec to 1f4a669 Compare May 23, 2024 11:40
@csidirop csidirop marked this pull request as draft May 23, 2024 11:41
@csidirop csidirop marked this pull request as ready for review May 23, 2024 13:35
@csidirop
Copy link
Contributor Author

csidirop commented May 23, 2024

Thank you for pointing me to the right direction!

The issue was again, that the index_name had to be unique but wasn't! I fixed this for metadata (kitodo/kitodo-presentation#1186) but not for structures (which I have done now: kitodo/kitodo-presentation#1211).

With the new export the metadata and structures are getting set up correct:
grafik

Indexing works again for all my test documents worked again. I hope that its done now :D

@beatrycze-volk
Copy link
Contributor

Thanks for your work :)

Do you think that it will still work properly with changes from kitodo/kitodo-presentation#1212 ?

@csidirop
Copy link
Contributor Author

So at first everything looked good, but after exporting the data TYPO3 again added tailing zeros to the metadata and structure index_names.
Setting up DFG-Viewer resulted in the same mess:
grafik

We are thinking that TYPO3 has some issue with the index_name. According to the database column (ext_tables.sql#L69 or ext_tables.sql#L97), we know that the value is not UNIQUE but a KEY. Maybe the latter is messing with TYPO3.

Because in addition this, if we move an entry in the backend TYPO3 will again add a tailing zero:
grafik
Like it says The value ... has been changed ... as it is required to be unique.

So we could ignore this issue, hope no one moves the entries and just delete the zeros from the init file, or look deeper into it. What was the problem with my solution again, that we append the language_id to the index_name?

@beatrycze-volk
Copy link
Contributor

We are thinking that TYPO3 has some issue with the index_name. According to the database column (ext_tables.sql#L69 or ext_tables.sql#L97), we know that the value is not UNIQUE but a KEY. Maybe the latter is messing with TYPO3.

Yes, it looks like some problem inside the TYPO3... I would say, we leave it as it is.

So we could ignore this issue, hope no one moves the entries and just delete the zeros from the init file, or look deeper into it. What was the problem with my solution again, that we append the language_id to the index_name?

It was not a problem with your solution. It was problem with the way how Kitodo inserts data in backend controller (using Extbase instead of DataHandler). It was causing a lot of mess and confusion.

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

@csidirop
Copy link
Contributor Author

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

I had the same thought :D lets try it

@beatrycze-volk
Copy link
Contributor

Eventually we could try still one approach. If it is possible, we could remove index_name field for the translated records and see if then insertion works correctly. If not, then we just leave it in the current state and I merge it. What do you think?

I had the same thought :D lets try it

How looks situation here?

@beatrycze-volk beatrycze-volk removed this from the DFG-Viewer 6.1.0 milestone Jun 27, 2024
@beatrycze-volk beatrycze-volk added this to the DFG-Viewer 6.2.0 milestone Jun 27, 2024
@csidirop
Copy link
Contributor Author

csidirop commented Jul 1, 2024

Good question. I cant remember where I stopped before the BiblioCon. I'll look at it again during the day.

@beatrycze-volk
Copy link
Contributor

Good question. I cant remember where I stopped before the BiblioCon. I'll look at it again during the day.

How looks situation with this PR? :)

 - all entries created by new tenant removed
 - all sites left as they where
 - updated to slightly new structure in TYPO3 v11

Signed-off-by: Christos Sidiropoulos <[email protected]>
 - including `sys_file` like `FormatDefaults.json`, `MetadataDefaults.json` and `StructureDefaults.json`

Signed-off-by: Christos Sidiropoulos <[email protected]>
@csidirop
Copy link
Contributor Author

csidirop commented Nov 5, 2024

Hi @beatrycze-volk ,
i finally found some time again. I brought the PR up to date with master and retried my attempt to update the data. But the result was the same as in April (with the error while indexing).

So I wanted to finally implement your suggestion but I just cant remember nor figure out at this moment where I need to remove the index_name fields. Can you point me to the right location?

@beatrycze-volk
Copy link
Contributor

Sorry, I'm a bit confused now.

Does it work or throws some errors? IF there are errors could you paste them here?

@csidirop
Copy link
Contributor Author

csidirop commented Nov 8, 2024

Oh sorry, Ill try to explain again:

tldr: This is the error (same as in April) Uncaught TYPO3 Exception Argument 1 passed to Kitodo\Dlf\Domain\Model\Document::setStructure() must be an instance of Kitodo\Dlf\Domain\Model\Structure, null given, called in /var/www/typo3/public/typo3conf/ext/dlf/Classes/Command/BaseCommand.php on line 227 thrown in file /var/www/typo3/public/typo3conf/ext/dlf/Classes/Domain/Model/Document.php in line 474

In more detail:

  1. I removed all data set up by the new tenant module
  2. Setup a new TYPO3 instance with the modified DFG-Viewer code
  3. Generate all metadata with the new tenant
    -> Everything looks good in the backend List view
  4. Export
  5. (Test if indexing works: yes for all my 20 test documents)
  6. Setup a new TYPO3 instance with the updated DFG-Viewer code
    -> Again everything looks good in the backend List view
  7. Test if indexing works: some fail with that error

@beatrycze-volk
Copy link
Contributor

So, I would suspect that some structures are not inserted to database. You would need to check which document objects throw this error and which structure object they actually need. Maybe the uids have changed and now document objects have some references to structure objects which are not there (or are there but with the different uid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🛠️ maintenance A task to keep the code up-to-date and manageable.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Initialisation/data.xml
2 participants