Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24283: Add updateSchema tool #32

Merged
merged 3 commits into from
Jan 3, 2024
Merged

DM-24283: Add updateSchema tool #32

merged 3 commits into from
Jan 3, 2024

Conversation

bsmartradio
Copy link
Contributor

@bsmartradio bsmartradio commented Jun 30, 2023

Tool that takes a path to the apdb.yaml and a version string and generates new schemas. This tool excludes a number of fields we do not currently want in the schemas. sample_data, diaNondetectionLimit.avsc, and alert.avsc still need to be manually updated as they are not read from the apdb.

@bsmartradio bsmartradio requested a review from ebellm June 30, 2023 22:06
@timj
Copy link
Member

timj commented Jun 30, 2023

@bsmartradio would you be able to also look at #31 for me? (DM-39756). As far as I can tell none of the tests work so I can't tell if my fixes break anything.

@bsmartradio
Copy link
Contributor Author

I realized I should probably use fastavro instead of avro because it isn't in the stack. I'm swapping it over now.

@bsmartradio bsmartradio force-pushed the tickets/DM-24283 branch 3 times, most recently from bee6cb1 to a9349b9 Compare July 8, 2023 03:16
@timj timj changed the title Add updateSchema tool DM-24283: Add updateSchema tool Jul 8, 2023
@bsmartradio bsmartradio force-pushed the tickets/DM-24283 branch 2 times, most recently from 820ee8f to 90804cb Compare July 20, 2023 18:34
@bsmartradio bsmartradio requested a review from erinleighh August 1, 2023 23:44
Copy link
Contributor

@ebellm ebellm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest changing updateSchema.py to make it more useful in the future--not just for docstring synchronization but for generating Avro schemas that match an apdb.yaml source of truth.

python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
__all__ = ['update_schema']


def update_docs(schema, apdb):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even thought this is a small script, let's include docstrings.

python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ebellm ebellm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bsmartradio, a number of comments throughout for consistency.

README.rst Show resolved Hide resolved

"""

registry = SchemaRegistry.from_filesystem()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip instantiating a schema registry here and just have the function take the user-supplied version number directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new input, schema_path, which requires the user to include a path to where the schemas will be added.

return schema


def update_schema(apdb_filepath, update_version=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not updating existing schemas but generating new ones based on an apdb I'd choose a different name for this function and update the docstrings accordingly throughout the function.

Copy link
Contributor Author

@bsmartradio bsmartradio Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm swapping to 'generate_schema' since we are generating a schema from the apdb.

version_name = version.split(".")[0] + "_" + version.split(".")[1]

# The first 4 columns in the apdb are the ones we use for alerts
for x in range(0, 4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be less fragile if you iterate through all of the entries in apdb['tables'] and compare the name to a list of the tables you want.

else:
version_name = version.split(".")[0] + "_" + version.split(".")[1]

# The first 4 columns in the apdb are the ones we use for alerts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not columns, but tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right, I believe avro treats them as columns of the parent schema, but its really nested tables. Changed to tables.

return schema


def populate_fields(apdb):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this takes a single table schema I wouldn't call this variable apdb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped to apdb_table

Parameters
----------
apdb: `dict`
The name of the schema as a string. E.G. diaSource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring description isn't right--you're passing in a dictionary from the sdm_schemas yaml, not a string name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring. Thanks for catching this. A lot of the docstrings were leftover from when I was just updating things and not generating new schemas.

field_dictionary_array = []
for column in apdb['columns']:
# We are still finalizing the time series feature names.
if (column['name'] != 'validityStart') and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be nicer looking if you made a list of strings to exclude and then used a loop to check rather than retyping column['name'] a bunch of times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to iterating over a list of excluded fields.


if __name__ == '__main__':

parser = argparse.ArgumentParser(description='Update the schema docstrings so that they'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I'd update the docstring here: rather than updating docstrings we're generating a new schema set from apdb.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the docstring to reflect that we are generating new schemas.

README.rst Outdated

* Rename all of the ``lsst.vX*`` files to the new version.
* Update the ``"namespace": "lsst.v5_0",`` line at the top of each ``*.avsc`` file to the new version.
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml "6.0"`` All Generated files do not need to be altered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml "6.0"`` All Generated files do not need to be altered.
* run ``python updateSchema.py /path/to/LSST/code/sdm_schemas/yml/apdb.yaml Path/To/Yaml/sdm_schemas/yml/apdb.yaml /path/to/alert_packet/schema/ "6.0"`` All Generated files do not need to be altered.

Copy link
Contributor

@ebellm ebellm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor items of docs cleanup but I think it's ready otherwise!

README.rst Outdated Show resolved Hide resolved
python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
python/lsst/alert/packet/updateSchema.py Outdated Show resolved Hide resolved
README.rst Show resolved Hide resolved
@bsmartradio bsmartradio merged commit 2b9c5b9 into main Jan 3, 2024
5 checks passed
@bsmartradio bsmartradio deleted the tickets/DM-24283 branch January 3, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants