GitHub - ohsu-comp-bio/dos_connect: Data Object Registry Connect

DOS connect

concept

Exercise the GA4GH data-object-schemas

epics

As a researcher, in order to maximize the amount of data I can process from disparate repositories, I can use DOS to harmonize those repositories
As a researcher, in order to minimize cloud costs and processing time, I can use DOS' to harmonized data to make decisions about what platform/region I should use to download from or where my code should execute.
As a informatician, in order to injest from disparate repositories, I need to injest an existing repository into DOS
As a informatician, in order to keep DOS up to date, I need to observe changes to the repository and automatically update DOS
As a developer, in order to enable DOS, I need to integrate DOS into my backend stack

capabilities

This project provides two high level capabilities:

observation: long lived services to observe the object store and populate a webserver with data-object-schema records. These observations catch add, moves and deletes to the object store.
inventory: on demand commands to capture a snapshot of the object store using data-object-schema records.

customizations

The data-object-schema is 'unopinionated' in several areas:

authentication and authorization is unspecified.
no specific backend is specified.
'system-of-record' for id, if unspecified, is driven by the client.

dos_connect addresses these on the server and client by utilizing plugin duck-typing

Server plugins:

BACKEND: for storage. Implementations: in-memory and elasticsearch. e.g. BACKEND=dos_connect.server.elastic_backend
AUTHORIZER: for AA. noop, keystone, and basic. e.g. AUTHORIZER=dos_connect.server.keystone_api_key_authorizer
REPLICATOR: for downstream consumers. noop, keystone e.g. REPLICATOR=dos_connect.server.kafka_replicator

Client plugins:

All observers and inventory tasks leverage a middleware plugin capability.

user_metadata(): customize the collection of metadata
before_store(): modify the data_object before persisting
md5sum(): calculate the md5 of the file
id(): customize id e.g. CUSTOMIZER=dos_connect.apps.aws_customizer

To specify your own customizer, set the CUSTOMIZER environmental variable.

For example: AWS S3 returns a special hash for multipart files. The aws_customizer uses a lambda to calculate the true md5 hash of multipart files. Other client customizers include noop, url_as_id, and smmart (obfuscates paths and associates user metadata)

setup

see here

server

Setup: .env file

# ******* webserver
# http port
DOS_CONNECT_WEBSERVER_PORT=<port-number>
# configure backend
BACKEND=dos_connect.server.elasticsearch_backend
ELASTIC_URL=<url>
# configure authorizer
AUTHORIZER=dos_connect.server.keystone_api_key_authorizer
# (/v3)
DOS_SERVER_OS_AUTH_URL=<url>
AUTHORIZER_PROJECTS=<project_name>
# replicator
REPLICATOR=dos_connect.server.kafka_replicator
KAFKA_BOOTSTRAP_SERVERS=<url>
KAFKA_DOS_TOPIC=<topic-name>

Server Startup:

$ alias web='docker-compose -f docker-compose-webserver.yml'
$ web build ; web up -d

Client Startup: note: execute source <openstack-openrc.sh> first

# webserver endpoint
export DOS_SERVER=<url>
# sleep in between inventory runs
export SLEEP=<seconds-to-sleep>
# bucket to monitor
export BUCKET_NAME=<existing-bucket-name>
$ alias client='docker-compose -f docker-compose-swift.yml'
$ client build; client up -d

ohsu implementation:

see swagger
note: you will need to belong to openstack and provide a token from openstack token issue

see kafak topic 'dos-events' for stream
the kafka queue is populated with
```
{'method': method, 'doc': doc}
```
where doc is a data_object and method is one of ['CREATE', 'UPDATE', 'DELETE']

next steps

testing
evangelization
swagger improvements (403, 401 status codes)

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
bin		bin
dos_connect		dos_connect
services		services
test		test
util		util
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
docker-compose-swift.yml		docker-compose-swift.yml
docker-compose-webserver.yml		docker-compose-webserver.yml
docker-compose.yml.example		docker-compose.yml.example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOS connect

concept

epics

capabilities

customizations

setup

server

ohsu implementation:

next steps

About

Releases

Packages

Contributors 3

Languages

ohsu-comp-bio/dos_connect

Folders and files

Latest commit

History

Repository files navigation

DOS connect

concept

epics

capabilities

customizations

setup

server

ohsu implementation:

next steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages