Releases: Sage-Bionetworks/schematic
v23.6.2
Created a minor release
- Feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment by @linglp in #1241
- Added hide-blanks parameter when submitting manifest through API by @linglp in #1242
- Revert "Feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment" by @linglp in #1243
- feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment by @linglp in #1244
- Minor release v23.6.2 by @linglp in #1245
Full Changelog: v23.6.1...v23.6.2
Release v23.6.1
Release notes
New Features and Enhancements
-
Update and insert (upsert) rows in Synapse tables. This feature allows piece-wise updates to a table in Synapse: a user only needs a csv manifest containing new or changed data/metadata. Given a manifest csv file and a dataset folder on Synapse, schematic will find the associated metadata table for this dataset folder. For each row in the manifest file schematic would check whether the row is already present in the Synapse metadata table. If the row is present, schematic would update it with values from the corresponding manifest row. If the row is not present, schematic will insert it as a new row in the Synapse metadata table. Instructions for using the upsert features via the schematic CLI are here. Note: this feature works differently than the existing table replace option (the default table manipulation option) in schematic. A table replace will substitute the full content of an existing table with the content of a manifest csv file. The latter allows removing rows from the existing table. The upsert feature does not remove existing rows in the table. This feature does not impact users that only work with csv manifest files and do not store metadata in Synapse tables.
-
Adding parameter controlling whether to execute validation rules part of the Great Expectations (GX) suite. GX is great but some rules take a while to load and execute. This is undesirable in certain situation (e.g. large number of data records that need to be validated in real time). A user can now turn off GX validation rules.
-
Standardizing validation error format: previously different types of data validation errors may have had different 'look and feel' to them (in addition to different structure). Validation error format and structure are now standardized which allows users & client apps to reliably process them.
-
New REST API endpoints:
- Retrieving validation rules associated with an attribute in a data model schema: if a schema attribute has a validation rule specifying its type (e.g. int, string, etc.), this endpoint allows retrieving the validation rule and determining the type of the attribute via the schematic REST API. The endpoint retrieves any other validation rules associated with an attribute as well.
- Retrieving the display name associated with an attribute in a data model: aside from machine-friendly labels, attributes in data-model schemas have human-friendly names (aka display names); this endpoint allows retrieving the display name of an attribute given its label.
- Checking if an entity is w/in an asset view (aka fileview in Synapse): this is useful when a user is uncertain whether a dataset has been deleted; users can provide the dataset ID and schematic would check if a dataset with this ID is present.
These endpoints can be accessed by running the schematic REST API locally or deployments on the cloud using schematic version (v23.6.1) or greater.
- Updated REST API web server: previously schematic used the default Flask web server. That was suitable for development, but unreliable for production deployments. The new schematic REST API server (uWSGI) remedies security and performance issues.
Performance improvements:
- Loading a manifest (or other) csv files now takes advantage of multiple processors speeding up loading of large files if the user's machine has multiple cores (the more cores, the larger the speed up).
- REST API calls are profiled and benchmarked against a standard set of inputs (e.g. data models, csv manifests, etc.).
- Validation rules are benchmarked against a standard set of inputs (e.g. data models, csv manifests, etc.).
These benchmarks allow us to detect when feature performance is degraded (or improved) due to an update; they'd also allow us to maintain guarantees on performance in the future.
Bug fixes:
- Data template formatting: catching edge cases and ensuring column headers are aligned with column values; ensuring conditional formatting works as expected in both Excel and Google Sheets templates.
- Ensuring properties of attributes in the data-model schema are properly loaded in schematic: the same property can be reused in multiple attributes (e.g. if the property represents the same concept: name, diagnosis); previously, a property would only be added to one schema attribute. This allows setting up data models for Relation Databases (RDB) where different tables may have columns with the same name (e.g. both Patient and Biospecimen table can have column 'name').
Security fixes:
Updated dependencies, hardened handling of access tokens, among other security and reliability issues allowing schematic to be deployed in secure production environments handling PHI data.
Technical debt:
Code doesn't escape the 2nd law of thermodynamics. We put energy into refactoring handling of validation rules and interactions with Synapse (so that adding features and avoiding bugs is easier later); catching errors and exceptions more robustly and specifically (so that users and clients know what's causing a problem and can handle, report, or fix it more effectively); improving coverage of automated testing (so that we reduce the likelihood of letting bugs in released versions of schematic).
For more details on specific changes, please refer to the changelog below.
What's Changed
- Skip api tests when rule combination tests are run by @GiaJordan in #1068
- Added workflow to deploy schematic docker container in Github container registry by @linglp in #1062
- Remove schematic support for Python
v3.7
andv3.8
by @GiaJordan in #1090 - Refactor table operations structure in asset store by @GiaJordan in #1069
- Added input_token as a parameter for /manifest/get endpoint to fix credential issues when getting an existing manifest on AWS by @linglp in #1080
- Fixed
getProjectManifests
function in synapse storage by @linglp in #1084 - Develop api node display names by @mialy-defelice in #1094
- Create API endpoint for get_node_validation_rules by @mialy-defelice in #1095
- Update schematic dependencies by @GiaJordan in #1092
- Raise
errors
forwrong schema
errors by @GiaJordan in #1073 - Set default of "table_manipulation" as "replace" in API endpoint when users enter None and updated tests by @linglp in #1115
- Update
synapseClient
dependency and api for manifest table uploads by @GiaJordan in #1101 - set pyopenssl = "^23.0.0" by @andrewelamb in #1125
- added date GE rule by @andrewelamb in #1103
- Implement table upsert feature by using
schematic-db
by @GiaJordan in #1081 - Add
use_schema_label
parameter to manifest submission endpoint, separate manifest submission and table upsert tests by @GiaJordan in #1129 - Delete GE checkpoint after completion of GE validation by @GiaJordan in #1136
- Remove
try: catch:
block from manifest submission command function by @GiaJordan in #1130 - Save all properties that are Included in the domain of a Class by @mialy-defelice in #1134
- Display exceptions raised during validation with Great expectations, allow exclusion of upper bound OR lower bound for
inRange
rule by @GiaJordan in #1131 - Update Documentation - python/package versions and POCs by @GiaJordan in #1139
- Increase buffer size to a higher limit to deal with long token by @linglp in #1144
- lock
schematic-db
to version0.0.6
by @GiaJordan in #1145 - use try: finally: to delete checkpoint even if running the checkpoint fails or errors out by @GiaJordan in #1155
- Allowed CORS on given routes instead of all routes by @linglp in #1168
- Added restrict rules param to
manifest/validate
by @linglp in #1178 - Bug Fix: remedy negation of table manipulation specification by @GiaJordan in #1186
- Added an endpoint to check entity type on Synapse and an endpoint to check if an entity is in the asset view by @linglp in #1078
- add restrict rules control to manifest validate by @linglp in #1189
- Added a parameter to control if GE gets used when using
manifest/validate
endpoint by @linglp in #1177 - Propagate logger level entered in from command line to other schematic submodules by @GiaJordan in https://github.com/Sage-Bion...
Release v23.1.1
What's Changed
- Turned
service_account_creds.json
as an environment variable and default way of authentication by @linglp in #1015 - Changed base docker image and removed package for security reasons by @linglp in #1037
- Add support for treating date entries as type
datetime
by @GiaJordan in #1041 - Automatically updated CLI documentation on Readthedocs by @linglp in #1047
- Deprecated token pickle and credentials.json by @linglp in #1040
- Introduce tests to cover current table operations: creation and replacement by @GiaJordan in #1046
- Schematic Release v23.1.1 by @linglp in #1057
Full Changelog: v22.11.3...v23.1.1
Release v22.11.3
What's Changed
- Removed validity requirement for unrequired attribute entries, allow users to specify validation rule conformity required for entries, fix DCA disconnect caused by JSON Schema validation errors by @GiaJordan in #1000
- Strip hyphens from Synapse annotation keys by @milen-sage in #1020
- Fixed a bug where in some instances manifests would be uploaded with different display and
downloadAs
names by @GiaJordan in #1017 - Added an API endpoint to check if a given node is required or not by @linglp in #1024
- added more instructions of running CLI in jupyter notebook by @linglp in #1031
- Added a slack bot to notify successful new releases by @linglp in #1025
- Release 22.11.3 by @GiaJordan in #1032
Full Changelog: v22.11.2...v22.11.3
v22.11.2
What's Changed
- Fixed typo in conditional requirement for schema viz attribute table by adding quotes by @linglp in #1003
- Added tests for all API endpoints by @linglp in #995
- Fixed an issue where the DCA would disconnect on manifest submission by casting UUIDs to type string by @GiaJordan in #1006
- Reminded users to run
synapse login --rememberMe
after schematic init step in ReadMe by @linglp in #1004 - Added the option to populate manifest as an excel spreadsheet to avoid sending metadata to Google APIs by @linglp in #994
- For schematic APIs, return excel file instead of excel file path by @linglp in #988
- Emphasized that users should download the
config.yml
from develop branch in ReadMe by @linglp in #1007 - Release 22.11.2 by @GiaJordan in #1014
Full Changelog: v22.11.1...v22.11.2
v22.11.1
What's Changed
- HOTFIX: Extend table replacement sleep duration to ensure synapse deletion operation completes by @GiaJordan in #999
- Pass environment variable to docker container by @linglp in #975
- update error messages in synapseStore when
manifest_basename
key is missing in config.yml by @linglp in #997 - Added instructions for installing and using schematic and the schematic API docker containers to README by @GiaJordan in #953
- Develop pdoc by @linglp in #957
- correct typo: master_basename to manifest_basename in error message by @linglp in #1005
- Release 22.11.1 by @GiaJordan in #1001
Full Changelog: v22.10.3...v22.11.1
Schematic Release (v22.10.3)
What's Changed
- Skip Trashed Entities and Entities that "do not exist" by @GiaJordan in #985
- Reword conditional requirements for the attribute table by @linglp in #987
- fix api endpoint for docker container by @linglp in #974
- Remove Manifest Table Submission Dependency on Project Scope when not Validating by @GiaJordan in #989
- Schematic Release v22.10.3 by @linglp in #990
Full Changelog: v22.10.2...v22.10.3
Schematic Release (v22.10.2)
What's Changed
- Develop schema viz tool tests by @mialy-defelice in #659
- Develop schema viz tool cors by @linglp in #942
- Move Schema Visualization to Schematic by @mialy-defelice in #645
- add default max size for columns missing this key by @GiaJordan in #952
- Schematic Release V22.10.2 by @linglp in #962
Full Changelog: v22.10.1...v22.10.2
v22.10.1
What's Changed
- allow users to upload csv or json file when submitting manifest using API endpoints by @linglp in #908
- allow users to upload csv or json file when validating manifest using API endpoints by @linglp in #910
- Develop table schema fix replacements by @GiaJordan in #947
- Fixing parsing of table schema parameters on table replace by @milen-sage in #945
- Release 22.10.1 by @GiaJordan in #950
Full Changelog: v22.9.1...v22.10.1
v22.09.1
What's Changed
- Develop optional rule args by @GiaJordan in #843
- Containerizing schematic (and update dependencies as needed) by @BrunoGrandePhD in #882
- add endpoint find_class_specific_properties by @linglp in #890
- add route get_subgraph_by_edge_type by @linglp in #891
- create endpoint to return schema as a pickle file by @linglp in #892
- poetry documentation by @linglp in #895
- Add support for Schematic API to Dockerfile by @BrunoGrandePhD in #887
- Create project table manifests, move entities to new project by @mialy-defelice in #829
- add concurrency to github action by @linglp in #896
- HOTFIX: Mixed type column manifest table upload by @GiaJordan in #902
- update asset view table endpoint to allow returning a json by @linglp in #912
- added get_node_range endpoint by @andrewelamb in #904
- added get_property_label endpoint by @andrewelamb in #905
- added node_dependencies endpoint by @andrewelamb in #906
- Allow schema.org schema to be loaded and used by @GiaJordan in #903
- add example workbook to demo cli functions by @linglp in #916
- change manifest name for censored manifest when upload by @linglp in #931
- Add Restriction Flag to Make Synapse Table Method by @GiaJordan in #935
- Develop Table Uploads: Replace by @GiaJordan in #917
- Release 22.9.1 by @linglp in #934
Full Changelog: v22.8.1...v22.9.1