-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share FCI L1c metadata between segments #2828
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2828 +/- ##
=======================================
Coverage 95.94% 95.94%
=======================================
Files 366 366
Lines 53515 53580 +65
=======================================
+ Hits 51343 51409 +66
+ Misses 2172 2171 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 9515844065Details
💛 - Coveralls |
I'll see what I can do for testing. It's not easy, because the file handler is never used in the tests. |
I've now added two tests that hopefully show that the storing and reusing of common items between the FCI L1c segments work. |
Pull Request Test Coverage Report for Build 9581505593Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9592333785Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9592773461Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9593191408Details
💛 - Coveralls |
Interesting! |
Anything in the list shown here that don't end in the strings listed in |
In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work! I left some comments/questions where I think things might be implemented differently. In addition, could more implementation be in netcdf_utils
so it can be used by other readers where relevant?
NONSHAREABLE_VARIABLE_ENDINGS = [ | ||
"index", | ||
"time", | ||
"measured/effective_radiance", | ||
"measured/y", | ||
"position_row", | ||
"index_map", | ||
"pixel_quality"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #2686 I have added this information to the YAML file. Essentially, I have changed required_variable_names
to be a dict rather than a list. The keys of the dict remain the required variable names, and the values indicate how they can be shared between segments or between repeat cycles:
required_netcdf_variables: &required-variables
# key/value; keys are names, value is a list of string on how this may be
# cached between segments or between repeat cycles or neither
attr/platform:
- segment
- rc
data/{channel_name}/measured/start_position_row:
- rc
data/{channel_name}/measured/end_position_row:
- rc
data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_wavenumber:
- segment
- rc
data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_a:
- segment
- rc
data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_b:
- segment
- rc
data/{channel_name}/measured/radiance_to_bt_conversion_constant_c1:
- segment
- rc
data/{channel_name}/measured/radiance_to_bt_conversion_constant_c2:
- segment
- rc
data/{channel_name}/measured/radiance_unit_conversion_coefficient:
- segment
- rc
data/{channel_name}/measured/channel_effective_solar_irradiance:
- segment
- rc
data/{channel_name}/measured/effective_radiance: []
data/{channel_name}/measured/x:
- segment
- rc
data/{channel_name}/measured/y:
- rc
data/{channel_name}/measured/pixel_quality: []
data/{channel_name}/measured/index_map: []
We should agree on an approach to avoid a duplication of information. One difference is that I need information not only on what can be shared between segments, but also between repeat cycles. By adding information to the YAML file, I don't need to repeat variable names (or parts of variable names) in different places (YAML file + source code).
I think swath_number
is not shareable between segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I kinda made this on the basis "lets see if this makes things faster" so the approach was the simplest I saw. I think yours is more general and I'll update this PR to match yours when yours is ready.
Could be that swath_number
can't be shared, but apparently sharing it didn't break anything when creating imagery 😅
if any(key.endswith(k) for k in NONSHAREABLE_VARIABLE_ENDINGS): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default you are sharing (in case of unknown variable names). Would it be safer to default to not-sharing, i.e. use a whitelist rather than a blacklist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your dict of sharing stuff would make this obsolete in any case as it explicitly says what can be shared and how.
if any(key.endswith(k) for k in NONSHAREABLE_VARIABLE_ENDINGS): | ||
continue | ||
shared_info[key] = self.file_content[key] | ||
filetype_info["shared_info"] = shared_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is filetype_info
really an appropriate place to put this cache? I fear this might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this with @ameraner at PCW and this was already used in somewhere. LI L2, perhaps? It was also the path of least resistance as it was already in place and the same dict is passed between different filehandlers by the YAML reader.
If there are other backwards compatible ways to pass the info between the file handlers then let me know.
I think so. As I said in another comment above this snowballed from a quick test at PCW so I touched as little other parts of the code as possible. The FCI L1c data format is the most demanding in this regard, and most important to me, so started here. I'll see about generalizing things later and converting this as a draft for the summer period. |
Pull Request Test Coverage Report for Build 9596154108Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Some of the metadata are identical in every FCI L1c segment, so reading those only once is possible. This will save a lot of time in
Scene
creation when the data are in S3 storage:main
- 37.0 sThere will be conflicts with #2686, but maybe adding the pickle would benefit also this feature.
AUTHORS.md
if not there already