-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-24283: Add updateSchema tool #32
Conversation
@bsmartradio would you be able to also look at #31 for me? (DM-39756). As far as I can tell none of the tests work so I can't tell if my fixes break anything. |
I realized I should probably use fastavro instead of avro because it isn't in the stack. I'm swapping it over now. |
bee6cb1
to
a9349b9
Compare
820ee8f
to
90804cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest changing updateSchema.py to make it more useful in the future--not just for docstring synchronization but for generating Avro schemas that match an apdb.yaml source of truth.
__all__ = ['update_schema'] | ||
|
||
|
||
def update_docs(schema, apdb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even thought this is a small script, let's include docstrings.
90804cb
to
b602c68
Compare
30cb226
to
ec8b6c6
Compare
e0f8d41
to
c23e696
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bsmartradio, a number of comments throughout for consistency.
|
||
""" | ||
|
||
registry = SchemaRegistry.from_filesystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip instantiating a schema registry here and just have the function take the user-supplied version number directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new input, schema_path
, which requires the user to include a path to where the schemas will be added.
return schema | ||
|
||
|
||
def update_schema(apdb_filepath, update_version=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not updating existing schemas but generating new ones based on an apdb I'd choose a different name for this function and update the docstrings accordingly throughout the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm swapping to 'generate_schema' since we are generating a schema from the apdb.
version_name = version.split(".")[0] + "_" + version.split(".")[1] | ||
|
||
# The first 4 columns in the apdb are the ones we use for alerts | ||
for x in range(0, 4): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be less fragile if you iterate through all of the entries in apdb['tables']
and compare the name
to a list of the tables you want.
else: | ||
version_name = version.split(".")[0] + "_" + version.split(".")[1] | ||
|
||
# The first 4 columns in the apdb are the ones we use for alerts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not columns, but tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your right, I believe avro treats them as columns of the parent schema, but its really nested tables. Changed to tables.
return schema | ||
|
||
|
||
def populate_fields(apdb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this takes a single table schema I wouldn't call this variable apdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped to apdb_table
Parameters | ||
---------- | ||
apdb: `dict` | ||
The name of the schema as a string. E.G. diaSource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this docstring description isn't right--you're passing in a dictionary from the sdm_schemas yaml, not a string name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docstring. Thanks for catching this. A lot of the docstrings were leftover from when I was just updating things and not generating new schemas.
field_dictionary_array = [] | ||
for column in apdb['columns']: | ||
# We are still finalizing the time series feature names. | ||
if (column['name'] != 'validityStart') and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be nicer looking if you made a list of strings to exclude and then used a loop to check rather than retyping column['name']
a bunch of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to iterating over a list of excluded fields.
|
||
if __name__ == '__main__': | ||
|
||
parser = argparse.ArgumentParser(description='Update the schema docstrings so that they' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I'd update the docstring here: rather than updating docstrings we're generating a new schema set from apdb.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the docstring to reflect that we are generating new schemas.
README.rst
Outdated
|
||
* Rename all of the ``lsst.vX*`` files to the new version. | ||
* Update the ``"namespace": "lsst.v5_0",`` line at the top of each ``*.avsc`` file to the new version. | ||
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml "6.0"`` All Generated files do not need to be altered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml "6.0"`` All Generated files do not need to be altered. | |
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml /path/to/alert_packet/schema/ "6.0"`` All Generated files do not need to be altered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor items of docs cleanup but I think it's ready otherwise!
87dbc80
to
dfa5860
Compare
28e3842
to
946a730
Compare
Tool that takes a path to the apdb.yaml and a version string and generates new schemas. This tool excludes a number of fields we do not currently want in the schemas. sample_data, diaNondetectionLimit.avsc, and alert.avsc still need to be manually updated as they are not read from the apdb.