Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximize state size or guarantee replayabilitiy (or both)? #10

Open
wombaugh opened this issue Feb 5, 2021 · 1 comment
Open

Maximize state size or guarantee replayabilitiy (or both)? #10

wombaugh opened this issue Feb 5, 2021 · 1 comment

Comments

@wombaugh
Copy link
Contributor

wombaugh commented Feb 5, 2021

Consider a transient observed at increasing times A, B, C, D, E.
These photopoints are delivered in two alerts: 1 and 2.
1 contains ABC and 2 contains CDE.
That fact that 2 does not contain all photopoints could depend e.g. on the 30 day ZTF cap on alert history.

How should the ingester create states based on these alerts, and how is this affected by the order in which the alerts are received?

Current implementation (post afb8e32) only ingests datapoints contained in the alerts.
Two states are created, one with the ABC and one with the CDE datapoints.
This guarantees full replayability (alert-order does not matter).

The previous implementation complemented a state with any older datapoints in the DB.
https://github.com/AmpelProject/Ampel-ZTF/blob/7c15460de57c2026795cb9136af919f20c9655fd/ampel/ztf/ingest/ZiAlertContentIngester.py#L268-L272
This means that if the alerts are received in correct time-order (1 -> 2), we get the states ABC and ABCDE.
However, if the alerts are processed in a time-reversed fashion (2 -> 1) the states will be CDE and ABC.
The outcome thus varies with alert order.

From an information perspective, one could argue that one should always maximize the data available in the state (and thus complement all alerts with all DB data). This would in the correct time-order yield the states ABC and ABCDE and with reversed alert order CDE and ABCDE.

This leaves question:

  • Is this behaviour solely controlled by this clause in the implemented ingester, or thus the same question appear in different places?
  • Is there a "correct" or "desired" solution?
  • Do we actually want this to be something that is parameter driven? In principle a channel could give config parameters to the ingester which regulates whether to maximize state content or guarantee replayability?
@jvansanten
Copy link
Contributor

For ZTF there are two kinds of light curves: alert, which contain only alert photometry, and archive, which contain all photometry older than the most recent observation. The alert version is strongly reproducible, since it only depends on the contents of the alert. The archive version, on the other hand, depends on the order in which alerts are received, as well as the contents of the archive database. I'm not sure there is a single correct solution, but those two seem to cover most cases.

vbrinnel referenced this issue Feb 5, 2021
Synchronously bulk-update t0 collection on ingest, and check that no new data were added while the update was in flight. This removes a data race that occurred any time two AlertProcessors received superseding alerts between flushes of their update buffers. Fixes #8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants