Skip to content

kit-data-manager/pit-service

Repository files navigation

Typed PID Maker

License Java CI with Gradle

The Typed PID Maker enables the creation, maintenance, and validation of PIDs. It ensures the PID contains typed, machine-actionable information using validation. This is especially helpful in the context of FAIR Digital Objects (FAIR DOs / FDOs). To make this work, our validation strategy requires a reference to a registered Kernel Information Profile within the PID record, as defined by the recommendations of the Research Data Alliance (RDA). In the RDA context, this kind of service is called a "PIT service". We use Handle PIDs, which can be created using a Handle Prefix (not included). For testing or other local purposes, we support sandboxed PIDs, which require no external service.

Go to: Documentation | Configuration details | Features | Build | Run | License

Features

  • ✅ Create PIDs containing typed key-value-pairs for easy, fast, and automated decision-making.
  • ✅ Maintain the information within these PIDs.
  • ✅ Validate PIDs.
  • ✅ Resolve PIDs.
  • ✅ Store the created PIDs in your database and query them.
    • ✅ Pagination support
    • ✅ Tabulator.js support
  • ✅ Build & use your own search index
    • ✅ Search for information stored within PIDs. This includes PIDs you created, updated or resolved at some point.
    • ✅ Supports the full elastic DSL (and requires an Elasticsearch 8 instance).
  • ✅ Authentication via JWT or KeyCloak
  • ✅ Bootstrap with existing PIDs in your PID Prefix (see command line options).
  • ✅ Extract all your PIDs to CSV files (see command line options).
  • ✅ Make your PIDs distinguishable with a customizable branding-prefix and other customization options.
  • ✅ Use the Handle systems redirection feature to redirect browsers directly to the data.

Some of the features are described in more detail in the following sections.

Search example

The search can be executed via the provided swagger interface (default location: http://localhost:8090/swagger-ui.html). For example, with the following request body you will get all record information:

{
  "query": {
    "regexp": {
      "pid": {
        "value": ".*",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

You can also use other http clients, like CURL. A CURL (which may be provided by swagger) request may look like this:

curl -X 'POST' \
  'http://localhost:8090/api/v1/search?page=0&size=20' \
  -H 'accept: application/hal+json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": {
    "regexp": {
      "pid": {
        "value": ".*",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}'

The look of a PID (customization)

The detailed configuration documentation has a list of available properties which influence the way PIDs look. There is no functional benefit in PID customization. Some examples for possible PIDs with the Typed PID Maker:

  • ProjectX--1d6c-152c-c9e0-c136-1509
    • branding-prefix = ProjectX--
    • mode = HexChunk
    • num-chunks = 4
    • casing = lower
  • d08a-3c11-8e8a-55f7-76a6
    • without branding-prefix
  • AADF-46A0-661F-9CAF-43A2
    • casing = upper
  • 8d819cba-ba84-4080-b86c-8d2d318c240f
    • (default configuration)
    • mode = UUID4
    • casing = lower

If you have an interest in more customization, feel free to contact us.

In general, a PID is built like the following scheme:

PID = prefix + suffix
  1. The prefix is a string which is prepended to all your PIDs. It can be considered a namespace and is given to you by your PID system (here: the handle system). It usually ends with a slash (/) as a separator.
  2. The suffix is a random string, generated by the Typed PID Maker. This generation can be customized.

As the suffix is flexible, we can prepend a branding prefix to it, to show some relation to a project or institution. Please note that the branding is then part of the suffix, and therefore part of the whole PID. It can not be changed if the PID has already been registered. Of course, it can be changed for new PIDs. If a branding is applied, the scheme of a PID can be represented like the following:

PID = prefix + (branding + uniquely-generated-string)
               ^------------- <suffix> -------------^

All other configuration properties affect only the uniquely-generated-string. For example, you may choose a different generation method (UUID (default) or Hex Chunks) enforce casing (lower-case, upper-case).

How to build

Note: Alternatively, you can use the docker image.

Required: Java SE Development Kit 21 (or openjdk 21) or higher

  • Building (with tests): ./gradlew clean build
  • Building (with verbose test output) ./gradlew -Dprofile=verbose clean build
  • Building (without tests): ./gradlew clean build -x test
  • Run docker integration tests:
    • ./gradlew clean build (by default, this will reuse the local build)
    • time bash ./docker/test_docker.sh (runs test script)
  • Doing a release: ./gradlew clean build release
    • Will prompt you about version number to use and next version number
    • Will make a git tag which can later be used in a GitHub release
  • Build documentation: ./gradlew javadoc

On Windows, replace ./gradlew with gradlew.bat.

After a successful build, a jar file containing the entire service is created at build/libs/TypedPIDMaker-$(version).jar.

How to run

Currently, you can either run it via docker or via the compiled JAR file.

Running via docker

Required: Up-to-date Docker (or Docker Desktop) installation

We provide docker images hosted on GitHub.

  • Available versions and other details are listed in the package section.
  • Configuration / Mount points:
    • Containers are being considered "throwaway objects". To update the application, you simply stop the container and create a new one from the updated image. Therefore, you need to persist configuration and database information!
    • The configuration file is located within the container at /app/conf/application-default.properties
    • For configuration, either use environment variables (e.g. with docker compose) or mount a custom application.properties into /app/conf/.
    • For production, you'll want to configure your own Handle prefix. The required private key is recommended to also be mounted into /app/conf/, so you'll likely use mount point anyway.
    • For persisting a database file, consider mounting /data/. This also means to adjust the configuration accordingly!
  • Exposed ports (inner ports of the container)
    • These are the exposed ports. To not attempt to change it in the configuration, as the container does not export other ports.
    • 8090: Provides the API, as well as the Swagger documentation.

Running the compiled JAR file

For development purposes, the easiest way to run the service with your configuration file is:

./gradlew run --args="--spring.config.location=config/application-default.properties"

This command will use the default settings, set and documented in the file config/application.properties (see command line options). Changes in this file will require a restart of the Typed PID Maker, in case it is already running. If you change the location of the file or want to use another configuration, you may adjust the path in the command above or use one of the default locations for spring boot configurations.

For production use, the service can also be started directly like this:

./build/libs/TypedPIDMaker-$(version).jar

The start will take a moment, and indicate its readiness with Spring is started! on stdout. As soon as the microservice is started, you can browse to

OpenAPI / Swagger documentation:
http://localhost:8090/swagger-ui.html

in order to see available RESTful endpoints and their documentation. You may have to adapt the port according to your local settings. Furthermore, you can use this web interface to test single API calls in order to get familiar with the service.

Details on the version number and other build information can be found on http://localhost:8090/actuator/info.

Command line options

  • --spring.config.location=config/application.properties set the configuration files location to be used. Not required if the file is in the same directory as the jar file or another default location for spring boot configurations.
  • --spring.profiles.active=$PROFILE to make spring using your adjusted application-$PROFILE.properties instead of (or in addition to) application-default.properties. May also take multiple profiles as a comma separated list.
  • bootstrap all-pids-from-prefix starts the service and bootstraps all PIDs. This means:
    • store the PIDs as "known PIDs" in the local database (as configured)
    • send one message per PID to the message broker (if configured)
    • store the PID records in the search index (if configured)
    • after the bootstrap, the application will continue to run
  • bootstrap known-pids same as above, but:
    • not using all PIDs from prefix, but only the ones stored in the local database ("known PIDs")
    • useful to, for example, re-send PIDs via messaging to notify new services
  • write-file all-pids-from-prefix writes all PIDs of the configured PID prefix to a CSV file (one PID per line).
  • write-file known-pids same as above but:
    • only with the PIDs stored in the local database ("known PIDs").

License

The license for the KIT Data Manager source code is available within the LICENSE file.