Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load OWL Database using tdbloader #58

Open
4 tasks
dwagmuse opened this issue Aug 28, 2024 · 2 comments
Open
4 tasks

Load OWL Database using tdbloader #58

dwagmuse opened this issue Aug 28, 2024 · 2 comments
Assignees

Comments

@dwagmuse
Copy link
Contributor

User Story

As noted in [this ticket]#57) loading performance of the owl-load task is rather poor when the database is large. Fuseki offers the ability to load the data at startup and experiments have shown that this is several orders of magnitude faster. So what we need is a new gradle plugin that can load the database from a set of OWL files.

Detailed Description

The intent is that this step would replace owl-load in a workflow. The omlToOwl step already puts all of the OWL files in the build/owl folder. The jena tdbloader can create a tdb database (before fuseki starts) in the filesystem given a list of OWL files.
So what we need this plugin adapter to do is to take the build/owl folder as one argument and perhaps the .fuseki folder as the other, it needs to enumerate all of the owl files in the given folder and then build the tdb database from those OWL files.

Our configuration uses a union graph and Maged says we need to also load data into the union graph. If the loader can't get that detail from the fuseki config file (fuseki.ttl) then it should simply be an option for the plugin (I don't want to have to call the loader twice -- one invocation should do all loading).

(I expect that this would obsolete owl-load as I can't think of any use cases where we would want to use the slow method if the fast method works and I can't think of any use cases where we might want to build the database and then load more owl files)

Acceptance Criteria

  • All queries produce the same results in a workflow where we only replace owl-load with the new owl-build-database step

Sub-task List

  • Task 1
  • Task 2
  • Task 3
@dwagmuse
Copy link
Contributor Author

This works once we publish owl-tools 2.11 which should run under jre11 or jre17

@dwagmuse dwagmuse self-assigned this Sep 24, 2024
@dwagmuse
Copy link
Contributor Author

Verified in the clipper workflow that this works.

Note that creating the tdb database in the build folder adds several Gigs of binary file data to the build folder. We don't need/want to save this data to normalized or auxiliary branch so simplest thing to do is just delete it at end of the build.
(maybe also mark this in .gitignore).

Also need to make sure that the container volumes are big enough to hold this data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant