Skip to content

Commit

Permalink
add provenance section
Browse files Browse the repository at this point in the history
  • Loading branch information
mservillat committed Aug 28, 2024
1 parent 18452ff commit 56c0b74
Showing 1 changed file with 18 additions and 15 deletions.
33 changes: 18 additions & 15 deletions VOHE-Note.tex
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,15 @@ \section{Introduction}
Such high level, HE observations have been included in the VO, via data access endpoints provided by observatories or by agencies and indexed in the VO Registry.
%Some high-energy (HE) data is already available via the VO. Images, time series, and spectra may be described with Obscore and access.

However, after browsing this data, users may want to download lower level data and reapply data reduction steps relevant to their Science objectives. A common scenario is to download HE event lists, i.e., lists of detected events on a HE detector, that are expected to be detection of particles (e.g. a HE photon), and the corresponding calibration files, including Instrument Response Functions (IRFs). The findability and accessibility of these data via the VO is the focus of this note.
However, after browsing this data, users may want to download lower level data and reapply data reduction steps relevant to their Science objectives. A common scenario is to download HE "event" lists, i.e. lists of detected events on a HE detector, that are expected to be detection of particles (e.g. a HE photon), and the corresponding calibration files, including Instrument Response Functions (IRFs). The findability and accessibility of these data via the VO is the focus of this note.

We first report typical use cases for data access and analysis of data from current HE observatories. From those use cases, we note that some existing IVOA Recommendations are of interest to the domain. They should be further explored by HE observatories. We then discuss how standards could evolve to better integrate specific aspects of HE data, and if new standards should be developed.

\subsection{Objectives of the document}

The main objective of the document is to analyse how HE data can be better integrated to the VO.

We first identify and expose the specificities of HE data from several HE observatories. Then we intend to illustrate how HE data is or can be handled using current IVOA standards. Finally, we explore several topics that could lead to HE specific recommendations.
We first identify and expose the specificities of HE data from several HE observatories. Then we intend to illustrate how HE data is or can be handled using current IVOA standards. Finally, we explore several topics that could lead to HE specific recommandations.

A related objective is to provide a context and a list of topics to be further discussed within the IVOA by a dedicated HE Interest Group.

Expand All @@ -78,7 +78,7 @@ \subsection{Scope of the document}

This document mainly focuses on HE data discovery through the VO, with the identification of common use cases in the HE Astrophysics domain, which provides an insight of the specific metadata to be expose through the VO for HE data.

To some extent, all current existing IVOA recommendation could be discussed in this document in the HE context.
To some extend, all current existing IVOA recommandation could be discussed in this document in the HE context.



Expand Down Expand Up @@ -113,7 +113,6 @@ \section{High Energy observatories and experiments}

\subsection{Gamma-ray programs}


\subsubsection{H.E.S.S}
\label{sec:hess}

Expand Down Expand Up @@ -144,7 +143,6 @@ \subsubsection{H.E.S.S}
tentative ObsCore description of each dataset. We hope that, in the future, the H.E.S.S. legacy archive will be published
in a similar way and accessible through the VO.


\subsubsection{CTAO}
\label{sec:ctao}

Expand All @@ -167,7 +165,7 @@ \subsubsection{CTAO}

A focus of CTAO is to distribute in this context their Data Level 3 (DL3) datasets, that correspond to lists of Cherenkov
events detected by the telescopes along with the proper IRFs. CTAO is planning an internal and a public Science Data
Challenges, which represent opportunities to build VO inside solutions.
Challenges, which represent opportunities to build "VO inside" solutions.
%% Need to describe the IRFs like for Chandra?

The CTAO observatory is complementary to other gamma-ray instruments observing the sky up to ultra high energies (ie PeV).
Expand Down Expand Up @@ -217,7 +215,6 @@ \subsubsection{Chandra}\label{sec:chandra}
observations. The Sherpa modeling and fitting package supports N-dimensional model fitting and optimization in Python,
and supports advanced Bayesian Markov chain Monte Carlo analyses.


\subsubsection{XMM-Newton}

The European Space Agency's (ESA) X-ray Multi-Mirror Mission (XMM-Newton) was launched in 1999. XMM-Newton is ESA's
Expand Down Expand Up @@ -261,6 +258,7 @@ \subsection{KM3Net and neutrino detection}
% mireille : what is specific for the community in terms of data interpretation and computation steps

\section{Common practices in the High Energy community}
\label{sec:vhespec}

\subsection{Data flow specificities}

Expand Down Expand Up @@ -297,7 +295,7 @@ \subsubsection{Background signal}

Observations in HE may contain a high background component, that may be due to instrument noises, or to unresolved astrophysical sources, emission from extended regions or other terrestrial sources producing particles similar to the signal. The characterization and estimation of this background may be particularly important to then apply corrections during the analysis of a source signal.

In the VHE domain with the IACT, WCD and neutrino techniques, the background is created by cosmic-ray induced events. The case of unresolved astrophysical sources, emission from extended regions are treated as a model of a gamma-ray or neutrino emission. In the X-ray domain, contributions to background can include an instrumental component, the local radiation environment (i.e., space weather) which can change dynamically, and may include the cosmological background due to unresolved astrophysical sources, depending on the spatial resolution of the instrument.
In the VHE domain with the IACT, WCD and neutrino techniques, the background is created by cosmic-ray induced events. The case of unresolved astrophysical sources, emission from extended regions are treated as a model of a gamma-ray or neutrino emission. In the X-ray domain, contributions to background can include an instrumental component, the local radiation environment (i.e. space weather) which can change dynamically, and may include the cosmological background due to unresolved astrophysical sources, depending on the spatial resolution of the instrument.


\subsubsection{Time intervals}
Expand Down Expand Up @@ -327,7 +325,7 @@ \subsubsection{Granularity of data products}
Where feasible, the efficient granularity for distributing HE data products seems to be the full combination of data and associated IRFs. Depending on the instrument, some IRFs may need to be (re-)computed by a service or tool after parameter selection by the user, so inclusion of additional files that are required for these steps should be included in the package where appropriate.

% mir already mentionned above why we should consider IRF
%The coverage information, i.e., how the data product spans on the sky coordinates, and along time and energy axis, is an important criterium for data selection. In the case of HE observations, these parameters vary depending on the selected good time intervals.
%The coverage information, i.e. how the data product spans on the sky coordinates, and along time and energy axis, is an important criterium for data selection. In the case of HE observations, these parameters vary depending on the selected good time intervals.
% to be developed

The event-list dataset is generally stored as a table, with one row per candidate detection (event) and several columns for the observed and/or estimated physical parameters (e.g. arrival time, position (on detector or in the sky), energy or pulse height, and additional properties such as errors or flags that are project-dependent) that can vary with data level.
Expand Down Expand Up @@ -438,7 +436,7 @@ \subsection{IVOA Recommendations}

\subsubsection{ObsCore and TAP}

Event-list datasets can be described in ObsCore using a dataproduct\_type set to event. However, this is not widely used in current services, and we observe only a few services with event-list datasets declared in the VO Registry, and mainly the H.E.S.S. public data release (see \ref{sec:hess}).
Event-list datasets can be described in ObsCore using a dataproduct\_type set to "event". However, this is not widely used in current services, and we observe only a few services with event-list datasets declared in the VO Registry, and mainly the H.E.S.S. public data release (see \ref{sec:hess}).

As services based on the Table Access Protocol \citep{2019ivoa.spec.0927D} and ObsCore are well developed within the VO, it would be a straightforward option to discover HE event-list datasets, as well as multi-wavelength and multi-messenger associated data.

Expand All @@ -454,7 +452,7 @@ \subsubsection{DataLink}
table. In the case of an ObsCore response each dataset can be linked this way (via the via the access\_url
FIELD content) to previews, documentation pages, calibration data as well as to the dataset itself.
Some dynamical links to web services may also be provided. In that case the service input parameters are
described with the help of a service descriptor feature as described in the same DataLink specification.
described with the help of a "service descriptor" feature as described in the same DataLink specification.

\subsubsection{HiPS}

Expand All @@ -479,6 +477,11 @@ \subsubsection{MIVOT}

\subsubsection{Provenance}

Provenance information of VHE data product is a crucial information to provide, especially given the complexity of the data preparation and analysis workflow in the VHE domain. Such complexity comes from the specificities of the VHE data as exposed in sections \ref{sec:vhespec}.

The develoment of the IVOA Provenance Data Model \citep{2020ivoa.spec.0411S} has been conducted with those use cases in mind. The Provenance Data Model proposes to structure this information as activities and entities (as in the W3C PROV recommendation), and adds the concepts of descriptions and configuration of each step, so that the complexity of provenance of VHE data can be exposed.


\todo[inline]{To be completed (e.g. Mathieu)}


Expand All @@ -499,7 +502,7 @@ \subsubsection{Current definition in the VO}
event is a dataproduct\_type with the following definition:
\begin{quote}
\textbf{event}: an event-counting (e.g. X-ray or other high energy) dataset of some sort. Typically this is
instrumental data, i.e., event data. An event dataset is often a complex object containing multiple files or
instrumental data, i.e., "event data". An event dataset is often a complex object containing multiple files or
other substructures. An event dataset may contain data with spatial, spectral, and time information for each
measured event, although the spectral resolution (energy) is sometimes limited. Event data may be used to produce
higher level data products such as images or spectra.
Expand Down Expand Up @@ -583,7 +586,7 @@ \subsubsection{Metadata re-interpretation for the VOHE context}
The initial role of this metadata was to hold the access\_url allowing data access.
Depending on the packaging of the event bundle in one compact format (OGIP, GADF, tar ball, ...)
or as different files available independently in various urls, a datalink pointer can be used for accessing the various parts of IRFs, background maps, etc.
Then in such a case the value for access\_format should be application/x-votable+xml;content=datalink. The format itself of the data file is then given by the datalink parameter content-type.
Then in such a case the value for access\_format should be "application/x-votable+xml;content=datalink". The format itself of the data file is then given by the datalink parameter "content-type".
See next section \ref{sec:datalink}.

\paragraph{o\_ucd}
Expand Down Expand Up @@ -672,9 +675,9 @@ \subsection{Use of Datalink for HE products}
\label{sec:datalink}
There are two options to provide an access to a full event-bundle package.

In the first option, the event-bundle dataset (\ref{sec:event-bundlle-or-list}) exposed in the discovery service contains all the relevant information, e.g. several frames in the FITS file, one corresponding to the event-list itself, and the others providing good/stable time intervals, or any IRF file. This is what was done in the current GADF data format (see \ref{sec:GADF}). In this option, the content of the event-list package should be properly defined in its description: what information is included and where is it in the dataset structure? Obviously the Event-list Context Data Model (see \ref{sec:EventListContext}) would be useful to provide that.
In the first option, the "event-bundle" dataset (\ref{sec:event-bundlle-or-list}) exposed in the discovery service contains all the relevant information, e.g. several frames in the FITS file, one corresponding to the event-list itself, and the others providing good/stable time intervals, or any IRF file. This is what was done in the current GADF data format (see \ref{sec:GADF}). In this option, the content of the event-list package should be properly defined in its description: what information is included and where is it in the dataset structure? Obviously the Event-list Context Data Model (see \ref{sec:EventListContext}) would be useful to provide that.

In the second option we provide links to the relevant information from the base event-list (\ref{sec:event-bundlle-or-list}) exposed in the discovery service. This could be done using Datalink and a list of links to each contextual information such as the IRFs. The Event-list Context Data Model (see \ref{sec:EventListContext}) would provide the concepts and vocabulary to characterise the IRFs and other information relevant to the analysis of an event-list. These specific concepts and terms describing the various flavors of IRFs and GTI will be given in the semantics and content\_qualifier FIELDS of the DataLink response to qualify the links. The different links can point to different
In the second option we provide links to the relevant information from the base "event-list" (\ref{sec:event-bundlle-or-list}) exposed in the discovery service. This could be done using Datalink and a list of links to each contextual information such as the IRFs. The Event-list Context Data Model (see \ref{sec:EventListContext}) would provide the concepts and vocabulary to characterise the IRFs and other information relevant to the analysis of an event-list. These specific concepts and terms describing the various flavors of IRFs and GTI will be given in the semantics and content\_qualifier FIELDS of the DataLink response to qualify the links. The different links can point to different
dereferencable URLs or alternbatively to different fragments of the same drefereencable URL as stated by the DataLink specification.


Expand Down

0 comments on commit 56c0b74

Please sign in to comment.