Capture upsert strategy information in media properties doc #4521
Labels
📄 aspect: text
Concerns the textual material in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
help wanted
Open to participation from the community
🟩 priority: low
Low priority and doesn't need to be rushed
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🧱 stack: documentation
Related to Sphinx documentation
🐍 tech: python
Involves Python
Description
When updating certain JSON fields in the catalog, we use a "merge" upsert strategy:
openverse/catalog/dags/common/storage/columns.py
Lines 70 to 78 in ca82f84
This came up in a recent discussion about tags: #4475 (comment)
The catalog media properties expansion (#4366) added
DB Column Type
to the description of each field for easier referencing. We also addupsert_strategy
to the table that gets generated at the top of the doc:openverse/catalog/utilities/media_props_gen/helpers/column_parser.py
Lines 52 to 58 in 5041280
We should add an
Upsert Strategy
auto-generated field for each column description (similar toDB Column Type
) which carries this information down to each field. It would also be helpful to have a section in the preamble/postamble which describes each update strategy and links to the code for each strategy in thecolumns.py
file. The auto-generated field documentation can also then link to this description.Alternatives
This was prompted by a recent mention about only new tags being added (and tags never being removed during ingestion). We could add a note explicitly to the
tags
column documentation instead, but it seems more prudent to do this programmatically not only because we can but because it would be more resilient going forward.The text was updated successfully, but these errors were encountered: