Skip to content

Commit

Permalink
[SCHEMATIC-192] V24.12.1 into Main (#1562)
Browse files Browse the repository at this point in the history
* add new tests

* add unit tests

* ran black

* Update schematic/models/validate_attribute.py

Co-authored-by: BryanFauble <[email protected]>

* added tests

* Update README.md

* Update README.md

* add unit tests

* run black

* Update README.md

* temp commit

* remove old tests

* [FDS-2386] Synapse entity tracking and code concurrency updates (#1505)

* [FDS-2386] Synapse entity tracking and code concurrency updates

* ran black

* Update CODEOWNERS

* updated data model type rules to include error param

* fix validate type attribute to use msg level param

* added error handling

* run black

* create Node class

* sat up Node class so that nodes with no displayName fields cause an error on creation

* ran black

* ran mypy

* added new configs for CLI tests

* added new manifests for testing CLI commands

* automate manual CLI tests

* ran black

* Update CODEOWNERS

* Update scan_repo.yml

* Update .github/CODEOWNERS

* Update .github/workflows/scan_repo.yml

* Attach additional telemetry data to OTEL traces (#1519)

* Attach additional telemetry data to OTEL traces

* feat: added tracing for cross manifest validation and file name validation  (#1509)

* add tracing for GX validation

* temp commit

* Updating contribution doc to expect squash and merge (#1534)

* [FDS-2491] Integration tests for Schematic API Test plan (#1512)

Integration tests for Schematic API Test plan

* [FDS-2500] Add Integration Tests for: Manifest Validation (#1516)

* Add Integration Tests for: Manifest Validation

* [FDS-2449] Lock `sphinx` version and update `poetry.lock` (#1530)

Also install `typing-extensions` in the build

* manual test files now being saved in manifests folder

* manual test files now being saved in manifests folder

* remove lines to delete json files that were under git control

* ran black

* add try finally blocks to remove created files

* ran black

* add lines to remove created json files

* Update file annotation store process to require filename be present in order to annotate file

* add lines to remove created json files

* Revert "Update file annotation store process to require filename be present in order to annotate file"

This reverts commit f57c718.

* Don't attempt to annotate the table

* add code in finally blocks to reset config to default values, when tests change them

* complete submit manifest command test

* ran black

* add test for bug case

* update test for table tidyness

* remove unused import

* remove etag column if already present when building temp file view

* catch all exceptions to switch to sequential mode

* update test for updated data

* Revert "update test for updated data"

This reverts commit 255e3c0.

* Revert "catch all exceptions to switch to sequential mode"

This reverts commit 68b0b24.

* catch ValueErrors as well

* Updates for integration test failures (#1537)

* Updates for integration test failures, Config file reset and scope changes

* add todos for removing config resets

* [FDS-2525] Authenticated export of telemetry data (#1527)

* Authenticated export of telemetry data, updating to HTTP otel library

* temp reduce tests

* restore tests

* uncomment tests

* redid how files are deleted, manual tests values are set

* ran black

* [SCHEMATIC-157] Make some dependencies required to avoid `schematic CLI` commands from potentially erroring when doing a pip install (#1540)

* Make otel flash non-optional

* Add dependencies as non-optional

* Include schematic_api for now (#1547)

* update toml version to 24.11.1 (#1548)

* [SCHEMATIC-193] Support exporting telemetry data from GH integration test runs (#1550)

* Support exporting telemetry data from GH run via access token retrieved via oauth2

* [SCHEMATIC-30, SCHEMATIC-200] Add version to click cli / use pathlib.Path module for checking cache size (#1542)

* Add version to click cli

* Add version

* Run black

* Reformat

* Fix

* Update schematic/schemas/data_model_parser.py

* Add test for check_synapse_cache_size

* Reformat

* Fix tests

* Remove unused parameter

* Install all-extras for now

* Make otel flash non-optional

* Update dockerfile

* Add dependencies as non-optional

* Update pyproject toml

* Fix trivy issue

* Add service version

* Run black

* Move all utils.general tests into separate folder

* Use pre-commit

* Add updates to contribution doc

* Fix

* Add service version to log provider

---------

Co-authored-by: BryanFauble <[email protected]>

* [SCHEMATIC-212] Prevent traces from being combined (#1552)

* Set instance id in github CI run, uninstrument flask auto during integration test run

* [SCHEMATIC-163] Catch error when manifest is generated and existing one doesn't have `entityId` (#1551)

* adds error handling

* adds unit tests for _get_file_entityIds

* updates error message

* adds entityid check to parent func

* updates docstring

* [SCHEMATIC-183] Use paths from file view for manifest generation (#1529)

source manifest file paths from synapse fileviews at generation

* [SCHEMATIC-214] Wrap pandas functions to support not including `None` with the NA values argument (#1553)

* Wrap pandas functions to support not including `None` with the NA values argument

* Ignore types

* pylint issues

* ordering of ignore

* Add to integration test to cover none in a manifest

* Add additional test for manifest

* [SCHEMATIC-210] Add attribute to nones data model (#1555)

Update example_test_nones.model.csv component and add new invalid manifest with nones

* first commit

* ran black

* add test for validateModelManifest

* [SCHEMATIC-214] change data model and component (#1556)

* add valid values to Patient attributes

* update data model

* add test manifests

* update test for new model

* update test for new valid value

* change test to use new manifests

* remove uneeded test file

* revert file

* revert file

* change tests to use new manifests

* remove uneeded manifests

* ran black

* add tests back in

* ran black

* revert manifest

* Split up valid and errored test as separate testing functions

* Remove unused import

---------

Co-authored-by: Gianna Jordan <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Thomas Yu <[email protected]>

* incremented packge version number

* Update publish.yml

* Update test.yml

* Update api_test.yml

* Update pdoc.yml

* Update version.py

* updates publish.yml (#1558) (#1561)

Co-authored-by: Brad Macdonald <[email protected]>

---------

Co-authored-by: BryanFauble <[email protected]>
Co-authored-by: Jenny V Medina <[email protected]>
Co-authored-by: Thomas Yu <[email protected]>
Co-authored-by: Lingling <[email protected]>
Co-authored-by: GiaJordan <[email protected]>
Co-authored-by: Brad Macdonald <[email protected]>
Co-authored-by: Gianna Jordan <[email protected]>
  • Loading branch information
8 people authored Dec 16, 2024
1 parent 816534a commit 917236e
Show file tree
Hide file tree
Showing 40 changed files with 1,255 additions and 460 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/api_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ on:

jobs:
test:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
POETRY_VERSION: 1.3.0
strategy:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pdoc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ concurrency:
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
POETRY_VERSION: 1.3.0
PYTHON_VERSION: "3.10"
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ on:

jobs:
pypi_release:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
POETRY_VERSION: 1.3.0
PYTHON_VERSION: "3.10"
if: github.event_name == 'push' && contains(github.ref, 'refs/tags')
steps:
#----------------------------------------------
Expand All @@ -18,10 +19,10 @@ jobs:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
python-version: ${{ env.PYTHON_VERSION }}

#----------------------------------------------
# install & configure poetry
Expand All @@ -48,7 +49,7 @@ jobs:
- name: Get current pushed tag
run: |
echo "RELEASE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
echo ${{ env.RELEASE_VERSION }}
echo "$RELEASE_VERSION"
#----------------------------------------------
# override version tag
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/scan_repo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ jobs:
trivy:
name: Trivy
runs-on: ubuntu-latest
env:
TRIVY_DB_REPOSITORY: public.ecr.aws/aquasecurity/trivy-db:2
TRIVY_JAVA_DB_REPOSITORY: public.ecr.aws/aquasecurity/trivy-java-db:1
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand Down
29 changes: 28 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ concurrency:
cancel-in-progress: true
jobs:
test:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
POETRY_VERSION: 1.3.0
strategy:
Expand Down Expand Up @@ -127,11 +127,31 @@ jobs:
#----------------------------------------------
# run integration test suite
#----------------------------------------------

- name: Retrieve telemetry access token from IDP
if: ${{ contains(fromJSON('["3.10"]'), matrix.python-version) }}
id: retrieve-telemetry-access-token
run: |
response=$(curl --request POST \
--url ${{ vars.TELEMETRY_AUTH_CLIENT_URL }} \
--header 'content-type: application/json' \
--data '{"client_id":"${{ vars.TELEMETRY_AUTH_CLIENT_ID }}","client_secret":"${{ secrets.TELEMETRY_AUTH_CLIENT_SECRET }}","audience":"${{ vars.TELEMETRY_AUTH_AUDIENCE }}","grant_type":"client_credentials"}')
access_token=$(echo $response | jq -r .access_token)
echo "::add-mask::$access_token"
echo "TELEMETRY_ACCESS_TOKEN=$access_token" >> "$GITHUB_OUTPUT"
- name: Run integration tests
if: ${{ contains(fromJSON('["3.10"]'), matrix.python-version) }}
env:
SYNAPSE_ACCESS_TOKEN: ${{ secrets.SYNAPSE_ACCESS_TOKEN }}
SERVICE_ACCOUNT_CREDS: ${{ secrets.SERVICE_ACCOUNT_CREDS }}
OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer ${{ steps.retrieve-telemetry-access-token.outputs.TELEMETRY_ACCESS_TOKEN }}"
DEPLOYMENT_ENVIRONMENT: ${{ vars.DEPLOYMENT_ENVIRONMENT }}
OTEL_EXPORTER_OTLP_ENDPOINT: ${{ vars.OTEL_EXPORTER_OTLP_ENDPOINT }}
TRACING_EXPORT_FORMAT: ${{ vars.TRACING_EXPORT_FORMAT }}
LOGGING_EXPORT_FORMAT: ${{ vars.LOGGING_EXPORT_FORMAT }}
TRACING_SERVICE_NAME: ${{ vars.TRACING_SERVICE_NAME }}
LOGGING_SERVICE_NAME: ${{ vars.LOGGING_SERVICE_NAME }}
SERVICE_INSTANCE_ID: ${{ github.head_ref || github.ref_name }}
run: >
poetry run pytest --durations=0 --cov-append --cov-report=term --cov-report=html:htmlcov --cov-report=xml:coverage.xml --cov=schematic/
-m "not (rule_benchmark or single_process_execution)" --reruns 4 -n 8 --ignore=tests/unit
Expand All @@ -141,6 +161,13 @@ jobs:
env:
SYNAPSE_ACCESS_TOKEN: ${{ secrets.SYNAPSE_ACCESS_TOKEN }}
SERVICE_ACCOUNT_CREDS: ${{ secrets.SERVICE_ACCOUNT_CREDS }}
OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer ${{ steps.retrieve-telemetry-access-token.outputs.TELEMETRY_ACCESS_TOKEN }}"
DEPLOYMENT_ENVIRONMENT: ${{ vars.DEPLOYMENT_ENVIRONMENT }}
OTEL_EXPORTER_OTLP_ENDPOINT: ${{ vars.OTEL_EXPORTER_OTLP_ENDPOINT }}
TRACING_EXPORT_FORMAT: ${{ vars.TRACING_EXPORT_FORMAT }}
LOGGING_EXPORT_FORMAT: ${{ vars.LOGGING_EXPORT_FORMAT }}
TRACING_SERVICE_NAME: ${{ vars.TRACING_SERVICE_NAME }}
LOGGING_SERVICE_NAME: ${{ vars.LOGGING_SERVICE_NAME }}
run: >
poetry run pytest --durations=0 --cov-append --cov-report=term --cov-report=html:htmlcov --cov-report=xml:coverage.xml --cov=schematic/
-m "single_process_execution" --reruns 4 --ignore=tests/unit
Expand Down
14 changes: 7 additions & 7 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in

## How to report bugs or feature requests

You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount:
You can **create bug and feature requests** through [Sage Bionetwork's DPE schematic support](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/7/create/225). Providing enough details to the developers to verify and troubleshoot your issue is paramount:
- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem.
- **Describe the exact steps which reproduce the problem** in as many details as possible.
- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior.
Expand All @@ -25,7 +25,7 @@ For new features, bugs, enhancements:

#### 1. Branch Setup
* Pull the latest code from the develop branch in the upstream repository.
* Checkout a new branch formatted like so: `develop-<feature/fix-name>` from the develop branch
* Checkout a new branch formatted like so: `<JIRA-ID>-<feature/fix-name>` from the develop branch

#### 2. Development Workflow
* Develop on your new branch.
Expand All @@ -35,22 +35,22 @@ For new features, bugs, enhancements:
* You can choose to create a draft PR if you prefer to develop this way

#### 3. Branch Management
* Push code to `develop-<feature/fix-name>` in upstream repo:
* Push code to `<JIRA-ID>-<feature/fix-name>` in upstream repo:
```
git push <upstream> develop-<feature/fix-name>
git push <upstream> <JIRA-ID>-<feature/fix-name>
```
* Branch off `develop-<feature/fix-name>` if you need to work on multiple features associated with the same code base
* Branch off `<JIRA-ID>-<feature/fix-name>` if you need to work on multiple features associated with the same code base
* After feature work is complete and before creating a PR to the develop branch in upstream
a. ensure that code runs locally
b. test for logical correctness locally
c. run `pre-commit` to style code if the hook is not installed
c. wait for git workflow to complete (e.g. tests are run) on github

#### 4. Pull Request and Review
* Create a PR from `develop-<feature/fix-name>` into the develop branch of the upstream repo
* Create a PR from `<JIRA-ID>-<feature/fix-name>` into the develop branch of the upstream repo
* Request a code review on the PR
* Once code is approved merge in the develop branch. The **"Squash and merge"** strategy should be used for a cleaner commit history on the `develop` branch. The description of the squash commit should include enough information to understand the context of the changes that were made.
* Once the actions pass on the main branch, delete the `develop-<feature/fix-name>` branch
* Once the actions pass on the main branch, delete the `<JIRA-ID>-<feature/fix-name>` branch

### Updating readthedocs documentation
1. Navigate to the docs directory.
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ RUN poetry install --no-interaction --no-ansi --no-root

COPY . ./

RUN poetry install --only-root
RUN poetry install --only-root
2 changes: 2 additions & 0 deletions env.example
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ SERVICE_ACCOUNT_CREDS='Provide service account creds'
# LOGGING_EXPORT_FORMAT=otlp
# TRACING_SERVICE_NAME=schematic-api
# LOGGING_SERVICE_NAME=schematic-api
## Instance ID is used during integration tests export to identify the git branch
# SERVICE_INSTANCE_ID=schematic-1234
## Other examples: dev, staging, prod
# DEPLOYMENT_ENVIRONMENT=local
# OTEL_EXPORTER_OTLP_ENDPOINT=https://..../telemetry
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "schematicpy"
version = "24.11.2"
version = "24.12.1"
description = "Package for biomedical data model and metadata ingress management"
authors = [
"Milen Nikolov <[email protected]>",
Expand Down
20 changes: 14 additions & 6 deletions schematic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import DEPLOYMENT_ENVIRONMENT, SERVICE_NAME, Resource
from opentelemetry.sdk.resources import (
DEPLOYMENT_ENVIRONMENT,
SERVICE_INSTANCE_ID,
SERVICE_NAME,
SERVICE_VERSION,
Resource,
)
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, Span
from opentelemetry.sdk.trace.sampling import ALWAYS_OFF
Expand All @@ -20,6 +26,7 @@

from schematic.configuration.configuration import CONFIG
from schematic.loader import LOADER
from schematic.version import __version__
from schematic_api.api.security_controller import info_from_bearer_auth

Synapse.allow_client_caching(False)
Expand Down Expand Up @@ -91,16 +98,14 @@ def set_up_tracing(session: requests.Session) -> None:
Synapse.enable_open_telemetry(True)
tracing_service_name = os.environ.get("TRACING_SERVICE_NAME", "schematic-api")
deployment_environment = os.environ.get("DEPLOYMENT_ENVIRONMENT", "")
service_instance_id = os.environ.get("SERVICE_INSTANCE_ID", "")
trace.set_tracer_provider(
TracerProvider(
resource=Resource(
attributes={
SERVICE_INSTANCE_ID: service_instance_id,
SERVICE_NAME: tracing_service_name,
# TODO: Revisit this portion later on. As of 11/12/2024 when
# deploying this to ECS or running within a docker container,
# the package version errors out with the following error:
# importlib.metadata.PackageNotFoundError: No package metadata was found for schematicpy
# SERVICE_VERSION: package_version,
SERVICE_VERSION: __version__,
DEPLOYMENT_ENVIRONMENT: deployment_environment,
}
)
Expand All @@ -122,11 +127,14 @@ def set_up_logging(session: requests.Session) -> None:
logging_export = os.environ.get("LOGGING_EXPORT_FORMAT", None)
logging_service_name = os.environ.get("LOGGING_SERVICE_NAME", "schematic-api")
deployment_environment = os.environ.get("DEPLOYMENT_ENVIRONMENT", "")
service_instance_id = os.environ.get("SERVICE_INSTANCE_ID", "")
if logging_export == "otlp":
resource = Resource.create(
{
SERVICE_INSTANCE_ID: service_instance_id,
SERVICE_NAME: logging_service_name,
DEPLOYMENT_ENVIRONMENT: deployment_environment,
SERVICE_VERSION: __version__,
}
)

Expand Down
2 changes: 2 additions & 0 deletions schematic/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from schematic.visualization.commands import (
viz as viz_cli,
) # viz generation commands
from schematic import __version__

logger = logging.getLogger()
click_log.basic_config(logger)
Expand All @@ -24,6 +25,7 @@
# invoke_without_command=True -> forces the application not to show aids before losing them with a --h
@click.group(context_settings=CONTEXT_SETTINGS, invoke_without_command=True)
@click_log.simple_verbosity_option(logger)
@click.version_option(version=__version__, prog_name="schematic")
def main():
"""
Command line interface to the `schematic` backend services.
Expand Down
2 changes: 2 additions & 0 deletions schematic/manifest/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -1904,6 +1904,8 @@ def get_manifest(
# TODO: avoid explicitly exposing Synapse store functionality
# just instantiate a Store class and let it decide at runtime/config
# the store type
# TODO: determine which parts of fileview are necessary for `get` operations
# and pass query parameters at object instantiation to avoid having to re-query
if access_token:
# for getting an existing manifest on AWS
store = SynapseStorage(access_token=access_token)
Expand Down
7 changes: 5 additions & 2 deletions schematic/models/validate_attribute.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from schematic.schemas.data_model_graph import DataModelGraphExplorer
from schematic.store.synapse import SynapseStorage
from schematic.utils.df_utils import read_csv
from schematic.utils.validate_rules_utils import validation_rule_info
from schematic.utils.validate_utils import (
comma_separated_list_regex,
Expand Down Expand Up @@ -868,7 +869,7 @@ def _get_target_manifest_dataframes(
entity: File = self.synStore.getDatasetManifest(
datasetId=dataset_id, downloadFile=True
)
manifests.append(pd.read_csv(entity.path))
manifests.append(read_csv(entity.path))
return dict(zip(manifest_ids, manifests))

def get_target_manifests(
Expand Down Expand Up @@ -2119,7 +2120,9 @@ def filename_validation(

where_clauses = []

dataset_clause = f"parentId='{dataset_scope}'"
dataset_clause = SynapseStorage.build_clause_from_dataset_id(
dataset_id=dataset_scope
)
where_clauses.append(dataset_clause)

self._login(
Expand Down
3 changes: 2 additions & 1 deletion schematic/store/database/synapse_database_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from opentelemetry import trace

from schematic.store.synapse_tracker import SynapseEntityTracker
from schematic.utils.df_utils import read_csv


class SynapseTableNameError(Exception):
Expand Down Expand Up @@ -108,7 +109,7 @@ def execute_sql_query(
pandas.DataFrame: The queried table
"""
result = self.execute_sql_statement(query, include_row_data)
table = pandas.read_csv(result.filepath)
table = read_csv(result.filepath)
return table

def execute_sql_statement(
Expand Down
Loading

0 comments on commit 917236e

Please sign in to comment.