Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make it work for serp #73

Open
wants to merge 152 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
8bde32c
working version - Fast DuckDB
tekkisse Apr 10, 2024
2b4e4c8
bit of tidying up
tekkisse Apr 10, 2024
3fa3709
auto run DAG & UTF-8 testing
tekkisse Apr 10, 2024
07c1263
check in
tekkisse Apr 10, 2024
5d0d76f
FIX: install minio to base image
tekkisse Apr 10, 2024
934d2ca
multi part download
tekkisse Apr 10, 2024
1e816b1
working version
tekkisse Apr 11, 2024
d65fe71
use updated docker container
tekkisse Apr 11, 2024
24201f2
tracking
tekkisse Apr 11, 2024
1eb3973
parquete file size tracking
tekkisse Apr 12, 2024
5fdca92
MVP
tekkisse Apr 13, 2024
18eb439
FEAT: add jinja to docker image
tekkisse Apr 16, 2024
b17010e
compute ledger working
tekkisse Apr 16, 2024
42896ad
tweaking
tekkisse Apr 17, 2024
35ae96b
turn debug off
tekkisse Apr 17, 2024
5079d8d
track end iceberg table
tekkisse Apr 17, 2024
19fda0d
add notifitcation
tekkisse Apr 21, 2024
08d8393
make build
tekkisse Apr 23, 2024
d78c907
.
tekkisse Apr 23, 2024
d0ae2a4
made cooler
tekkisse Apr 23, 2024
e2e7d96
sort out docker compose start order
tekkisse Apr 25, 2024
8b5fb30
regex single for list
tekkisse Apr 28, 2024
c225b44
master switch
tekkisse Apr 28, 2024
08f68a2
working version hitting assetv3
tekkisse Apr 29, 2024
db63ec8
few error fixed
tekkisse Apr 30, 2024
e5e3ec0
fix
tekkisse Apr 30, 2024
4f3ac35
added action
tekkisse Apr 30, 2024
f36a61a
create, append, replace options
tekkisse Apr 30, 2024
8f466bc
MVP1
tekkisse Apr 30, 2024
39a6b02
use proxy
tekkisse Apr 30, 2024
9c49204
bug fixes
tekkisse Apr 30, 2024
fe30b55
bug fixing
tekkisse May 1, 2024
b71fbde
tidy up
tekkisse May 1, 2024
4f6af6a
turn proxy server on
tekkisse May 2, 2024
5001c7c
change default
tekkisse May 2, 2024
2cc7c8d
change logging
tekkisse May 2, 2024
246aedb
.
tekkisse May 2, 2024
e516ad0
refactor functions
tekkisse May 7, 2024
c023310
refactored into modules
tekkisse May 8, 2024
e2a6de0
full refactor
tekkisse May 8, 2024
b01c54d
separate DAG's
tekkisse May 8, 2024
e314b8a
extra logging
tekkisse May 9, 2024
44e9d4c
ISSUE WITH SERIALISING DICT OBJECTS
tekkisse May 9, 2024
6185b0c
tweaking
tekkisse May 9, 2024
8845877
add manual SED
tekkisse May 9, 2024
5e4514c
turn proxy on
tekkisse May 10, 2024
4c7ff41
deployable version
tekkisse May 10, 2024
ce997a4
a
tekkisse May 10, 2024
5e29347
cope with daft hard coded values
tekkisse May 10, 2024
9561e33
dam need to get more setting into config files
tekkisse May 10, 2024
d371af7
bucket default configurations if not dataset configuration
tekkisse May 12, 2024
cb689a4
allow for manual overide
tekkisse May 13, 2024
ba70254
fix(container): use base container with our root ca baked in
alee-x May 28, 2024
3306be8
update gitignore
alee-x May 28, 2024
16f8b44
fix(container): add chi root ca to trust store
alee-x May 28, 2024
c6fc7cb
small changes
tekkisse May 28, 2024
c8bac8c
Merge branch 'feat/make-it-work-for-serp' of https://github.com/Swans…
tekkisse May 28, 2024
812a204
switch to usding redis settings from Airflow connection
tekkisse May 29, 2024
d48b618
remove db number (redis) from connection
tekkisse May 29, 2024
29f55d8
fix: when getting the redis connection, if auth is enabled then use t…
alee-x Jun 3, 2024
23a98ea
change tracking info
tekkisse Jun 9, 2024
77efc18
correct logic
tekkisse Jun 9, 2024
2265d77
correct regex
tekkisse Jun 9, 2024
7fc8829
increase power of SED
tekkisse Jun 10, 2024
f8abb94
SED validation
tekkisse Jun 10, 2024
f15e6c9
add code to pull schema and max out
tekkisse Jun 10, 2024
9a58c83
fix stupid error
tekkisse Jun 10, 2024
6014996
change way table refernced
tekkisse Jun 10, 2024
666c6be
.
tekkisse Jun 10, 2024
f94b955
remove schema name from table name
tekkisse Jun 10, 2024
03523bc
trying to make this work
tekkisse Jun 10, 2024
baa9327
try again
tekkisse Jun 10, 2024
96365e3
correction
tekkisse Jun 10, 2024
d882d6a
try
tekkisse Jun 10, 2024
b66fd5e
.
tekkisse Jun 10, 2024
4eccdeb
deal with varchar
tekkisse Jun 10, 2024
be3ec1e
correct if statement
tekkisse Jun 10, 2024
353a83f
better solution
tekkisse Jun 10, 2024
ae253ea
python error
tekkisse Jun 10, 2024
6021796
whoops - missed that
tekkisse Jun 10, 2024
771d101
move marker
tekkisse Jun 10, 2024
967a8c3
fix/turn debug off
tekkisse Jun 10, 2024
f8110d1
turn debug back on
tekkisse Jun 11, 2024
c4bbf66
correct error
tekkisse Jun 11, 2024
43cc96b
add time datatype into mapping
tekkisse Jun 11, 2024
6de1e4e
try time stamp
tekkisse Jun 11, 2024
947e828
leave file for debigging
tekkisse Jun 11, 2024
3638818
rename DAGS
tekkisse Jun 11, 2024
8d7c27c
switch duckdb to disk instead of memory
tekkisse Jun 11, 2024
34c5703
tweaking
tekkisse Jun 11, 2024
0c88e90
lower case for file extension
tekkisse Jun 11, 2024
6b0c604
feat(dags-container): upgrade duckdb to v1.0.0, upgrade base containe…
alee-x Jun 12, 2024
f640682
Merge branch 'main' into feat/make-it-work-for-serp
alee-x Jun 12, 2024
d9a18a7
fix: use airflow connection schema keys for the aws connection where …
alee-x Jun 14, 2024
7586eeb
fix: don't hardcode minio secure/ssl to false because then it doesn't…
alee-x Jun 14, 2024
c108861
fix: correctly detect the connection protocol for minio and trino bas…
alee-x Jun 14, 2024
4c4873f
fix: don't hardcode the trino connection as though there no bloody au…
alee-x Jun 14, 2024
8f8a8bb
fix: we can't have a module with the same name as a package we need t…
alee-x Jun 14, 2024
beb411f
cope with different etag
tekkisse Jun 15, 2024
33c54d4
Merge branch 'feat/make-it-work-for-serp' of https://github.com/Swans…
tekkisse Jun 15, 2024
f072adc
Update docker-compose.yml
tekkisse Jun 26, 2024
0040bc8
increase max connections
tekkisse Jun 26, 2024
2baf179
remove auth as does not work with http
tekkisse Jun 26, 2024
0de5efd
options auth for trino
tekkisse Jun 26, 2024
745a1a8
correct code
tekkisse Jun 26, 2024
4646f7f
up max connections
tekkisse Jun 26, 2024
8fd0a8f
change port for mariadb
tekkisse Jun 26, 2024
6761328
switch ports
tekkisse Jun 26, 2024
f91a4e0
.
tekkisse Jun 26, 2024
a62c6cc
.
tekkisse Jun 26, 2024
51f8053
switch hive server
tekkisse Jun 26, 2024
8b2ce46
add second postgres
tekkisse Jun 26, 2024
aacedc2
.
tekkisse Jun 26, 2024
29e8979
change portal tables
tekkisse Jul 5, 2024
c26e6cc
corrections
tekkisse Jul 5, 2024
a0af32e
.
tekkisse Jul 5, 2024
f504676
new table for portal
tekkisse Jul 5, 2024
4abcc89
send schema request to rabbitmq
tekkisse Aug 2, 2024
0bee53a
use external DAGS
tekkisse Aug 23, 2024
e2d6d8f
correction
tekkisse Aug 23, 2024
551a65e
bump airflow build up
tekkisse Aug 23, 2024
a0c7f29
real correction :-)
tekkisse Aug 23, 2024
b416746
extra trino worker
tekkisse Aug 27, 2024
8df9ad2
upgrade trino
tekkisse Aug 27, 2024
8a12026
change airflow image
tekkisse Aug 27, 2024
676b535
extra max threads
tekkisse Aug 27, 2024
c8dbe31
up container version
tekkisse Aug 28, 2024
51e4b0a
changing rabbit
tekkisse Sep 2, 2024
cafde4b
use latest PR
tekkisse Sep 2, 2024
bec25db
shell scripts
tekkisse Sep 2, 2024
7be9e0a
correct spelling
tekkisse Sep 2, 2024
0ba1348
add portal details
tekkisse Sep 2, 2024
130944c
add restart script
tekkisse Sep 2, 2024
3ad1bc0
whoops - fix
tekkisse Sep 2, 2024
2222d2b
instructions
tekkisse Sep 2, 2024
faca597
add start firehose
tekkisse Sep 2, 2024
ac9503c
wire up firehose
tekkisse Sep 2, 2024
c7fa94b
correct rabbitmq def
tekkisse Sep 3, 2024
5b447cc
use correct DLM PR
tekkisse Sep 3, 2024
d0a1e30
use local DAG's
tekkisse Sep 3, 2024
c9e5a4b
a poke
John-Vaughan Sep 5, 2024
b7d623c
added request_schema_save
John-Vaughan Sep 25, 2024
156c6cb
update
John-Vaughan Oct 1, 2024
c873662
add postgres for nrdav2
tekkisse Oct 8, 2024
1a4f488
Merge branch 'feat/make-it-work-for-serp' of https://github.com/Swans…
tekkisse Oct 8, 2024
db38ef7
fix
John-Vaughan Oct 10, 2024
a3b01f0
added ledger
John-Vaughan Oct 16, 2024
50c59eb
PG_MAX_CONNECTIONS?
John-Vaughan Oct 21, 2024
8171b49
joss stuff
John-Vaughan Oct 21, 2024
046592c
update
John-Vaughan Oct 23, 2024
f733ef1
add
John-Vaughan Nov 11, 2024
fe921c7
update
John-Vaughan Nov 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.idea
.coverage
docker-compose/airflow/logs/**
.DS_Store
**/.DS_Store

**/__pycache__
20 changes: 16 additions & 4 deletions containers/dags/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,30 @@
FROM apache/airflow:2.8.4-python3.10 as build
FROM apache/airflow:2.9.0-python3.12 as build

# Ensure that release notes get picked up from our repo and not from the apache airflow base repo
# If the apache airflow notes are used we will incorrectly pickup the release notes from our version
# string searched against their release history....
LABEL org.opencontainers.image.source=https://github.com/SwanseaUniversityMedical/DARE-Airflow

# Add trusted certificates
USER root
ADD containers/dags/tls-certs/chi-root-ca.pem /usr/local/share/ca-certificates/chi-root-ca.crt
ADD containers/dags/tls-certs/chi-domain-ca.pem /usr/local/share/ca-certificates/chi-domain-ca.crt
ADD containers/dags/tls-certs/bundle.pem /usr/local/share/ca-certificates/bundle.pem
ENV SSL_CERT_FILE="/usr/local/share/ca-certificates/bundle.pem"

RUN chmod 644 /usr/local/share/ca-certificates/bundle.pem && \
chmod 644 /usr/local/share/ca-certificates/chi-root-ca.crt && \
chmod 644 /usr/local/share/ca-certificates/chi-domain-ca.crt && \
update-ca-certificates

USER airflow
ENV SSL_CERT_FILE="/usr/local/share/ca-certificates/bundle.pem"
ENV REQUESTS_CA_BUNDLE="/etc/ssl/certs/ca-certificates.crt"
# Add directory to the python path so we can import modules nested within that directory
# DAGs injected via volume mount live in /opt/airflow/dags
# DAGs injected via git-sync live in /opt/airflow/dags/repo/dags
ENV PYTHONPATH="/opt/airflow/dags/modules:/opt/airflow/dags/repo/dags/modules:${PYTHONPATH}"

USER airflow

# Install additional python dependencies
COPY containers/dags/requirements.txt .
RUN pip install --no-cache-dir pyclean && \
Expand All @@ -23,6 +36,5 @@ RUN pip install --no-cache-dir pyclean && \
RUN duckcli -D ":memory:" -e "INSTALL httpfs; LOAD httpfs" && \
python3 -c "import duckdb; duckdb.connect(':memory:').sql('LOAD httpfs')"


# Copy dags code
COPY dags /opt/airflow/dags
26 changes: 15 additions & 11 deletions containers/dags/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
pika==1.3.2
aio-pika==9.3.0
duckdb==0.10.1
pika
aio-pika
duckdb==1.0.0
duckcli==0.2.1
apache-airflow-providers-amazon==8.10.0
airflow-provider-rabbitmq==0.6.1
ydata-profiling==4.6.1
pyarrow==14.0.1
s3fs==2023.10.0
pandas==2.0.3
polars==0.19.12
apache-airflow-providers-trino==5.4.0
apache-airflow-providers-amazon
airflow-provider-rabbitmq
ydata-profiling==4.8.3
pyarrow
s3fs
pandas
polars
apache-airflow-providers-trino
apache-airflow-providers-hashicorp
beartype
minio
jinja2
65 changes: 65 additions & 0 deletions containers/dags/tls-certs/bundle.pem
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
-----BEGIN CERTIFICATE-----
MIIGCzCCA/OgAwIBAgIRAKGIqJbj5x6d4e4OiI2WQN0wDQYJKoZIhvcNAQELBQAw
LjEsMCoGA1UECgwjQ0hJIENlcnRpZmljYXRlIEF1dGhvcml0eSAtIEFuc2libGUw
HhcNMjIxMDI4MTUzMjEyWhcNMzIxMDI1MTUzMjEyWjBCMQwwCgYDVQQKDANDaGkx
HzAdBgNVBAsMFks4cyBQcm9kIENISSBEb21haW4gQ0ExETAPBgNVBAMMCEs4cyBQ
cm9kMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEArqoz/Tp6IpdyDl3f
OYm6JiHXBCKPNVO0bExorfMTVhc5An+2fT6eLbh/u3xavFAE3VHTuDjkP6oC0739
te8HpzO1P0uFmlA02YEZQdgzXbgPn6fMlue+HjYQVX09fy/5/3K7QY+Bp7OTNDQh
QWZlnaELJAin9Gkv/tAEb6PKC7NIhMeCMZihjl6OExTKhExyp3wfuV0gbBbZF5RX
wa0nnHPaW2qPMPL1L70GYm7B+NE26ihSZ0Q7WZiUJ8fhnaQKoQapcA0hxiCyre67
ehzJn53zoDZi0XaBAkFbcSyPKryCkVwPJ2PfGvSubGqxO//PDgX4zciYnnviqduX
CUDBqpqiDakzYUecUuH10Y6Yb4S9wyIYBV2A8pzf5b/6C5Gs/z1B42qpru0TA+ov
L2FqEaqDPkv3KuQOohgvflZicfAtBo5qW3LXfBRBr1vsSOv1jIuiQMuA8vtIlYSr
rnuNbtjFmaIM/GBLTeqSmVGwb4+MGozTQJp/SAPwejen2TFh8zsIIlvYy9Gy+GM9
9oO/IuVvaFSMrySq8DVLrifcb2bPeT3+T03nsHoveQbzpzEdxl9NwUEvvMhVebaY
w0saplk/7HegUbGhd618K9ltl2ntalutlHWe/HiIscJ+Cve23sGS+KexMUhj28dr
SeOEwKXG/9JfpoQO8/GQoeS80IECAwEAAaOCAQ4wggEKMG8GCCsGAQUFBwEBBGMw
YTAuBggrBgEFBQcwAoYiaHR0cDovL3Jvb3QtY2EuY2hpLnN3YW4uYWMudWsvY3J0
LzAvBggrBgEFBQcwAYYjaHR0cDovL3Jvb3QtY2EuY2hpLnN3YW4uYWMudWsvb2Nz
cC8wHwYDVR0jBBgwFoAUvO8HQ3rGyrb6KU5fyKDMAcwxpOQwEgYDVR0TAQH/BAgw
BgEB/wIBADAzBgNVHR8ELDAqMCigJqAkhiJodHRwOi8vcm9vdC1jYS5jaGkuc3dh
bi5hYy51ay9jcmwvMA4GA1UdDwEB/wQEAwIBBjAdBgNVHQ4EFgQU3jyk9fu4Tn0b
UBO3fTYkjRKbgTMwDQYJKoZIhvcNAQELBQADggIBAK9LKO+AeoOZ/lfdRD1w0gxq
Za8ILmdB7XZVMLFvIWUAWW4yDhatUSfzr0B+VHnzgIUpdPQOjme0SjtmsKyFPV8v
fEh4Lp03B81UyqzZo9GVTfgobyOSK2QaCT76qH84vUmmaoRILK0f6aBOXqmVcybB
KGw+pMkTxKdxRgPR+R7ATYXwcqMyi0axMPhylepjcbT/ccK23pOPuTcJsP8kBztI
l6qLFUXuymD1sqP6wR8ZFpAEQyZdUF7I9p/MrFLPPTYzGCYyJqAjuhZvrgrP18/E
6pwaagtaeckGNVH1Aqh+Ac1dBhnZNh2Uw+i7Nr96Tp+DpXfsmlnwX5z14mj9AIgy
V9aGxxAOA+FUVBFEN/2zyDfNarhNlh9lxTYk6y2+BUelntw6oU+ouIseXjsJjKOh
Yczt1iB0AvsyTBVAD/YrAGuJf8kse1yMf/oMmySQ+GXaWuXo04eXgFfu43hwZajw
6ku3WFfCOrHc6OCAexBxyADeAfAOgUO+eTw5QuzhHMNkQNgHuGQPm1WMbg5EHJ8E
sjxSjpCNTH/KsVw9qpamtcjZXoF03IOpR+fZ77grltWUYsVZu8xDIi+VYwkT7/Yv
zKA3btTPvR01eDEgLrXOvlwu31ebpeVTwQr4eWYdmvsbn+CuXbDnQpYmII2TeUPq
4TnY4JX7ejhj3XPielyR
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFKTCCAxGgAwIBAgIRAKGIqJbj5x6d4e4OiI2WQNgwDQYJKoZIhvcNAQELBQAw
LjEsMCoGA1UECgwjQ0hJIENlcnRpZmljYXRlIEF1dGhvcml0eSAtIEFuc2libGUw
HhcNMTYxMTE3MTYyODQzWhcNMjgxMTE0MTYyODQzWjAuMSwwKgYDVQQKDCNDSEkg
Q2VydGlmaWNhdGUgQXV0aG9yaXR5IC0gQW5zaWJsZTCCAiIwDQYJKoZIhvcNAQEB
BQADggIPADCCAgoCggIBANrN76lFRk3Lq1HYzt9rBBgJJqKfWihFX4kaQZZO22ad
H7rvmuxpNgBMzm3BuQnIFd0JYN26hx9wLOUPAvVyDHumoFa5mI3FI+FjuyMX+hQy
GmgEz92Qn+e5T12pidMK0qVQXdxVnTRwMCjieaDLx5s3BYlEJCGvkl+v+uK1UeGd
tRE8kle6wpLwZa1sJnSbQcoBYQCYTUUlqvCBxEnBMhACJR97ga0OBNUsk5+S/Pye
Riga7/nzmtCvMg9PoM4UCzpeN1trNoJftGDSwPuhf7PvZ4EEYzRMKheIv47Yp2py
6J8++3THOvE8qk2evkeNgYsb5qS9iUBNOw41TByK5v+lYpis9T0YsqDKUKkkq4OD
/S5gh1GNbKfnkteLUHvGtPjOuYYxSM1P37URKiqH5rNQwH30ICW6V5wVyq2DM48W
c9/QFOYRY27bf+cZOezFjadIYQnu8m8G118S6DIVelJ4qlLVdOfZ/a2lUmoAl6HP
XL2ipI758EsFdJcTahkAy9do3hUJHVd5Ja9rV4shfyl+lVqW4gb0r2F59/TsIWmb
tI4cGdwG6+QmdYa1DqIwaGDm81NdXy70qK7qgOvu2sew6r+u0jU1aJYQB/TwNvsa
VoOLslSi1u179sPX92bO83e0hpaU70Y5iMGUXjBC5eHug2gTlcJSwS9aY/w5435l
AgMBAAGjQjBAMA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMB0GA1Ud
DgQWBBS87wdDesbKtvopTl/IoMwBzDGk5DANBgkqhkiG9w0BAQsFAAOCAgEAt6dI
jSOB0Sl0+2FNoULN36H5pV5LpVK8JK2bCY//ntvXqOcDbCom86CQzdUcbTGhdxvK
bpuhG4Rt5acQQaTvUyQL6f9oIQB6TBPocV9pJbgoE+2OKa+jNE5AS9fdNGc2AAhG
OWzJ/620nAbVZBLdDyETtX6cix94+KLDDbuVmKtN+Nao9pUdvHJ4IEhhaldmGPD2
hrWsA9g3CtP8NFhKfYo7qpIZNbWWRx1svz7655yP9ukvOj3Siyj/nbGGt3vqoSvM
F2fpg5SZRCIejxc+DPUwdpMoaetRVM0okP7vZfT8ryqM1IA43US7p/ye2gOEvpAk
lpXnpY9rXu8hdp6y/LHCjlXk5VjsX/BiRxua39idqbhMivJRAmAonr4q4PKgI1O3
IBF9ZNo3AxinzEttDX9/CJLiQ9ejHqlECKW+nl6lMsWhV+1RskMvNnCsPUyuT4Hs
cSxyGhcuB+1ZSpEfWinJeZ6NJAn4k4AGbQhPH3xs3mjrer3r8kg5d+1y60D5MDq6
bzoeUuVmSTbgUBOGKQkuOz9J99gfG0PTNriy3rcs3/Kul8/flIUHc0Vcxn5LpR8O
igzn89sJC+VxVbjleUychdZTzuRs7JhhMPKUuhBXc3D836E69jJHtioJo6oXnX4B
OjKpgF8vJG4E0zd6sDJdvY/yPpmVjoJ7ha2cwSA=
-----END CERTIFICATE-----
35 changes: 35 additions & 0 deletions containers/dags/tls-certs/chi-domain-ca.pem
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
-----BEGIN CERTIFICATE-----
MIIGCzCCA/OgAwIBAgIRAKGIqJbj5x6d4e4OiI2WQN0wDQYJKoZIhvcNAQELBQAw
LjEsMCoGA1UECgwjQ0hJIENlcnRpZmljYXRlIEF1dGhvcml0eSAtIEFuc2libGUw
HhcNMjIxMDI4MTUzMjEyWhcNMzIxMDI1MTUzMjEyWjBCMQwwCgYDVQQKDANDaGkx
HzAdBgNVBAsMFks4cyBQcm9kIENISSBEb21haW4gQ0ExETAPBgNVBAMMCEs4cyBQ
cm9kMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEArqoz/Tp6IpdyDl3f
OYm6JiHXBCKPNVO0bExorfMTVhc5An+2fT6eLbh/u3xavFAE3VHTuDjkP6oC0739
te8HpzO1P0uFmlA02YEZQdgzXbgPn6fMlue+HjYQVX09fy/5/3K7QY+Bp7OTNDQh
QWZlnaELJAin9Gkv/tAEb6PKC7NIhMeCMZihjl6OExTKhExyp3wfuV0gbBbZF5RX
wa0nnHPaW2qPMPL1L70GYm7B+NE26ihSZ0Q7WZiUJ8fhnaQKoQapcA0hxiCyre67
ehzJn53zoDZi0XaBAkFbcSyPKryCkVwPJ2PfGvSubGqxO//PDgX4zciYnnviqduX
CUDBqpqiDakzYUecUuH10Y6Yb4S9wyIYBV2A8pzf5b/6C5Gs/z1B42qpru0TA+ov
L2FqEaqDPkv3KuQOohgvflZicfAtBo5qW3LXfBRBr1vsSOv1jIuiQMuA8vtIlYSr
rnuNbtjFmaIM/GBLTeqSmVGwb4+MGozTQJp/SAPwejen2TFh8zsIIlvYy9Gy+GM9
9oO/IuVvaFSMrySq8DVLrifcb2bPeT3+T03nsHoveQbzpzEdxl9NwUEvvMhVebaY
w0saplk/7HegUbGhd618K9ltl2ntalutlHWe/HiIscJ+Cve23sGS+KexMUhj28dr
SeOEwKXG/9JfpoQO8/GQoeS80IECAwEAAaOCAQ4wggEKMG8GCCsGAQUFBwEBBGMw
YTAuBggrBgEFBQcwAoYiaHR0cDovL3Jvb3QtY2EuY2hpLnN3YW4uYWMudWsvY3J0
LzAvBggrBgEFBQcwAYYjaHR0cDovL3Jvb3QtY2EuY2hpLnN3YW4uYWMudWsvb2Nz
cC8wHwYDVR0jBBgwFoAUvO8HQ3rGyrb6KU5fyKDMAcwxpOQwEgYDVR0TAQH/BAgw
BgEB/wIBADAzBgNVHR8ELDAqMCigJqAkhiJodHRwOi8vcm9vdC1jYS5jaGkuc3dh
bi5hYy51ay9jcmwvMA4GA1UdDwEB/wQEAwIBBjAdBgNVHQ4EFgQU3jyk9fu4Tn0b
UBO3fTYkjRKbgTMwDQYJKoZIhvcNAQELBQADggIBAK9LKO+AeoOZ/lfdRD1w0gxq
Za8ILmdB7XZVMLFvIWUAWW4yDhatUSfzr0B+VHnzgIUpdPQOjme0SjtmsKyFPV8v
fEh4Lp03B81UyqzZo9GVTfgobyOSK2QaCT76qH84vUmmaoRILK0f6aBOXqmVcybB
KGw+pMkTxKdxRgPR+R7ATYXwcqMyi0axMPhylepjcbT/ccK23pOPuTcJsP8kBztI
l6qLFUXuymD1sqP6wR8ZFpAEQyZdUF7I9p/MrFLPPTYzGCYyJqAjuhZvrgrP18/E
6pwaagtaeckGNVH1Aqh+Ac1dBhnZNh2Uw+i7Nr96Tp+DpXfsmlnwX5z14mj9AIgy
V9aGxxAOA+FUVBFEN/2zyDfNarhNlh9lxTYk6y2+BUelntw6oU+ouIseXjsJjKOh
Yczt1iB0AvsyTBVAD/YrAGuJf8kse1yMf/oMmySQ+GXaWuXo04eXgFfu43hwZajw
6ku3WFfCOrHc6OCAexBxyADeAfAOgUO+eTw5QuzhHMNkQNgHuGQPm1WMbg5EHJ8E
sjxSjpCNTH/KsVw9qpamtcjZXoF03IOpR+fZ77grltWUYsVZu8xDIi+VYwkT7/Yv
zKA3btTPvR01eDEgLrXOvlwu31ebpeVTwQr4eWYdmvsbn+CuXbDnQpYmII2TeUPq
4TnY4JX7ejhj3XPielyR
-----END CERTIFICATE-----
30 changes: 30 additions & 0 deletions containers/dags/tls-certs/chi-root-ca.pem
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
-----BEGIN CERTIFICATE-----
MIIFKTCCAxGgAwIBAgIRAKGIqJbj5x6d4e4OiI2WQNgwDQYJKoZIhvcNAQELBQAw
LjEsMCoGA1UECgwjQ0hJIENlcnRpZmljYXRlIEF1dGhvcml0eSAtIEFuc2libGUw
HhcNMTYxMTE3MTYyODQzWhcNMjgxMTE0MTYyODQzWjAuMSwwKgYDVQQKDCNDSEkg
Q2VydGlmaWNhdGUgQXV0aG9yaXR5IC0gQW5zaWJsZTCCAiIwDQYJKoZIhvcNAQEB
BQADggIPADCCAgoCggIBANrN76lFRk3Lq1HYzt9rBBgJJqKfWihFX4kaQZZO22ad
H7rvmuxpNgBMzm3BuQnIFd0JYN26hx9wLOUPAvVyDHumoFa5mI3FI+FjuyMX+hQy
GmgEz92Qn+e5T12pidMK0qVQXdxVnTRwMCjieaDLx5s3BYlEJCGvkl+v+uK1UeGd
tRE8kle6wpLwZa1sJnSbQcoBYQCYTUUlqvCBxEnBMhACJR97ga0OBNUsk5+S/Pye
Riga7/nzmtCvMg9PoM4UCzpeN1trNoJftGDSwPuhf7PvZ4EEYzRMKheIv47Yp2py
6J8++3THOvE8qk2evkeNgYsb5qS9iUBNOw41TByK5v+lYpis9T0YsqDKUKkkq4OD
/S5gh1GNbKfnkteLUHvGtPjOuYYxSM1P37URKiqH5rNQwH30ICW6V5wVyq2DM48W
c9/QFOYRY27bf+cZOezFjadIYQnu8m8G118S6DIVelJ4qlLVdOfZ/a2lUmoAl6HP
XL2ipI758EsFdJcTahkAy9do3hUJHVd5Ja9rV4shfyl+lVqW4gb0r2F59/TsIWmb
tI4cGdwG6+QmdYa1DqIwaGDm81NdXy70qK7qgOvu2sew6r+u0jU1aJYQB/TwNvsa
VoOLslSi1u179sPX92bO83e0hpaU70Y5iMGUXjBC5eHug2gTlcJSwS9aY/w5435l
AgMBAAGjQjBAMA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMB0GA1Ud
DgQWBBS87wdDesbKtvopTl/IoMwBzDGk5DANBgkqhkiG9w0BAQsFAAOCAgEAt6dI
jSOB0Sl0+2FNoULN36H5pV5LpVK8JK2bCY//ntvXqOcDbCom86CQzdUcbTGhdxvK
bpuhG4Rt5acQQaTvUyQL6f9oIQB6TBPocV9pJbgoE+2OKa+jNE5AS9fdNGc2AAhG
OWzJ/620nAbVZBLdDyETtX6cix94+KLDDbuVmKtN+Nao9pUdvHJ4IEhhaldmGPD2
hrWsA9g3CtP8NFhKfYo7qpIZNbWWRx1svz7655yP9ukvOj3Siyj/nbGGt3vqoSvM
F2fpg5SZRCIejxc+DPUwdpMoaetRVM0okP7vZfT8ryqM1IA43US7p/ye2gOEvpAk
lpXnpY9rXu8hdp6y/LHCjlXk5VjsX/BiRxua39idqbhMivJRAmAonr4q4PKgI1O3
IBF9ZNo3AxinzEttDX9/CJLiQ9ejHqlECKW+nl6lMsWhV+1RskMvNnCsPUyuT4Hs
cSxyGhcuB+1ZSpEfWinJeZ6NJAn4k4AGbQhPH3xs3mjrer3r8kg5d+1y60D5MDq6
bzoeUuVmSTbgUBOGKQkuOz9J99gfG0PTNriy3rcs3/Kul8/flIUHc0Vcxn5LpR8O
igzn89sJC+VxVbjleUychdZTzuRs7JhhMPKUuhBXc3D836E69jJHtioJo6oXnX4B
OjKpgF8vJG4E0zd6sDJdvY/yPpmVjoJ7ha2cwSA=
-----END CERTIFICATE-----
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
87 changes: 87 additions & 0 deletions dags/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
rabbitmq_queue_minio_event='afload'

Check failure on line 1 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_queue_object_event='afobjectload'

Check failure on line 2 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_queue_minio_register='afregister'

Check failure on line 3 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator

rabbitmq_exchange_load='load'

Check failure on line 5 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_exchange_load_key_s3file='s3'

Check failure on line 6 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator

rabbitmq_exchange_notify='notify'

Check failure on line 8 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_exchange_notify_key_s3file='s3'

Check failure on line 9 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_exchange_notify_key_trino='hive'

Check failure on line 10 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_exchange_notify_key_trino_iceberg='iceberg'

Check failure on line 11 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator

rabbitmq_exchange_schema='schemarequest'

Check failure on line 13 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

E225 missing whitespace around operator
rabbitmq_queue_schema_request_key='request'

redis_expiry = 30

assets3_url = 'https://cat-hdp.demo.ukserp.ac.uk/doc/GetFilteredData2?profile=dlm&Filter=%22Dataset='


process_s3_option_default = "default"
process_s3_option_load = "load"
process_s3_option_manual = "manual"
process_s3_option_whatif = "whatif"

process_s3_formoption_yesauto = "yesAlways"
process_s3_formoption_yesmanual = "yesManual"
process_s3_formoption_no = "no"

sql_trackingtable='''

Check warning on line 30 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
CREATE TABLE IF NOT EXISTS trackingtable (
id VARCHAR(150),

Check warning on line 32 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
dataset VARCHAR(150),

Check warning on line 33 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
version VARCHAR(150),

Check warning on line 34 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
label VARCHAR(150),

Check warning on line 35 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
dated timestamp,
bucket VARCHAR(150),
key VARCHAR(350),

Check warning on line 38 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
tablename VARCHAR(150),

Check warning on line 39 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
physical VARCHAR(200)
);
'''

sql_tracking='''

Check warning on line 44 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
CREATE TABLE IF NOT EXISTS tracking (
id VARCHAR(350),

Check warning on line 46 in dags/constants.py

View workflow job for this annotation

GitHub Actions / dags-flake8

W291 trailing whitespace
bucket VARCHAR(100),
path VARCHAR(500),
s_marker timestamp,
e_marker timestamp,
d_marker INT,
dataset VARCHAR(100),
version VARCHAR(100),
label VARCHAR(200),
schema_hive VARCHAR(150),
tablename_hive VARCHAR(200),
location_hive VARCHAR(300),
schema_ice VARCHAR(150),
tablename_ice VARCHAR(200),
location_ice VARCHAR(300),
filesize BIGINT,
filesize_par BIGINT,
columns INT,
s_download timestamp,
e_download timestamp,
d_download INT,
s_convert timestamp,
e_convert timestamp,
d_convert INT,
s_par timestamp,
e_par timestamp,
d_par INT,
s_upload timestamp,
e_upload timestamp,
d_upload INT,
s_schema timestamp,
e_schema timestamp,
d_schema INT,
s_hive timestamp,
e_hive timestamp,
d_hive INT,
s_iceberg timestamp,
e_iceberg timestamp,
d_iceberg INT,
params VARCHAR(1000)
);
'''
Loading
Loading