Maximize state size or guarantee replayabilitiy (or both)? #10

wombaugh · 2021-02-05T10:00:37Z

Consider a transient observed at increasing times A, B, C, D, E.
These photopoints are delivered in two alerts: 1 and 2.
1 contains ABC and 2 contains CDE.
That fact that 2 does not contain all photopoints could depend e.g. on the 30 day ZTF cap on alert history.

How should the ingester create states based on these alerts, and how is this affected by the order in which the alerts are received?

Current implementation (post afb8e32) only ingests datapoints contained in the alerts.
Two states are created, one with the ABC and one with the CDE datapoints.
This guarantees full replayability (alert-order does not matter).

The previous implementation complemented a state with any older datapoints in the DB.
https://github.com/AmpelProject/Ampel-ZTF/blob/7c15460de57c2026795cb9136af919f20c9655fd/ampel/ztf/ingest/ZiAlertContentIngester.py#L268-L272
This means that if the alerts are received in correct time-order (1 -> 2), we get the states ABC and ABCDE.
However, if the alerts are processed in a time-reversed fashion (2 -> 1) the states will be CDE and ABC.
The outcome thus varies with alert order.

From an information perspective, one could argue that one should always maximize the data available in the state (and thus complement all alerts with all DB data). This would in the correct time-order yield the states ABC and ABCDE and with reversed alert order CDE and ABCDE.

This leaves question:

Is this behaviour solely controlled by this clause in the implemented ingester, or thus the same question appear in different places?
Is there a "correct" or "desired" solution?
Do we actually want this to be something that is parameter driven? In principle a channel could give config parameters to the ingester which regulates whether to maximize state content or guarantee replayability?

The text was updated successfully, but these errors were encountered:

jvansanten · 2021-02-05T10:24:22Z

For ZTF there are two kinds of light curves: alert, which contain only alert photometry, and archive, which contain all photometry older than the most recent observation. The alert version is strongly reproducible, since it only depends on the contents of the alert. The archive version, on the other hand, depends on the order in which alerts are received, as well as the contents of the archive database. I'm not sure there is a single correct solution, but those two seem to cover most cases.

Synchronously bulk-update t0 collection on ingest, and check that no new data were added while the update was in flight. This removes a data race that occurred any time two AlertProcessors received superseding alerts between flushes of their update buffers. Fixes #8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximize state size or guarantee replayabilitiy (or both)? #10

Maximize state size or guarantee replayabilitiy (or both)? #10

wombaugh commented Feb 5, 2021 •

edited by vbrinnel

Loading

jvansanten commented Feb 5, 2021

Maximize state size or guarantee replayabilitiy (or both)? #10

Maximize state size or guarantee replayabilitiy (or both)? #10

Comments

wombaugh commented Feb 5, 2021 • edited by vbrinnel Loading

jvansanten commented Feb 5, 2021

wombaugh commented Feb 5, 2021 •

edited by vbrinnel

Loading