Skip to content

Commit

Permalink
Merge pull request #326 from lsst-sqre/tickets/DM-47262a
Browse files Browse the repository at this point in the history
DM-47262: Fix IVOA DALI timestamp parsing, add data types
  • Loading branch information
rra authored Nov 14, 2024
2 parents cda04c2 + a4c7ed5 commit 0cc2c9a
Show file tree
Hide file tree
Showing 12 changed files with 212 additions and 82 deletions.
8 changes: 8 additions & 0 deletions changelog.d/20241113_152723_rra_DM_47262a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Backwards-incompatible changes

- `parse_isodatetime` and `normalize_isodatetime` now accept exactly the date formats accepted by the IVOA DALI standard. This means seconds are now required, the trailing `Z` is now optional (times are always interpreted as UTC regardless), and the time is optional and interpreted as 00:00:00 if missing.

### New features

- Add new `safir.pydantic.UtcDatetime` type that is equivalent to `datetime` but coerces all incoming times to timezone-aware UTC. This type should be used instead of using `normalize_datetime` as a validator.
- Add new `safir.pydantic.IvoaIsoDatetime` type that accepts any ISO 8601 date and time that matches the IVOA DALI standard for timestamps. This follows the same rules as `parse_isodatetime` now follows. This type should be used instead of using `normalize_isodatetime` as a validator.
2 changes: 2 additions & 0 deletions docs/documenteer.toml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ nitpick_ignore = [
["py:obj", "safir.pydantic.EnvAsyncPostgresDsn"],
["py:obj", "safir.pydantic.EnvRedisDsn"],
["py:obj", "safir.pydantic.HumanTimedelta"],
["py:obj", "safir.pydantic.IvoaIsoDatetime"],
["py:obj", "safir.pydantic.SecondsTimedelta"],
["py:obj", "safir.pydantic.UtcDatetime"],
# TypeVar references used as parameters to Generic seem to create target
# not found errors even if they are exported when they are used in private
# submodules.
Expand Down
21 changes: 3 additions & 18 deletions docs/user-guide/datetime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,28 +41,13 @@ ISO 8601 is a large and complex standard that supports numerous partial date or
However, its most basic date and time format, ``YYYY-MM-DDTHH:MM:SSZ`` (where the ``T`` and ``Z`` are fixed letters and the other letters represent their normal date and time components), provides a good balance of unambiguous parsing and human readability.
The trailing ``Z`` indicates UTC.

This subset of ISO 8601 is used by both Kubernetes and the IVOA UWS standard.
This subset of ISO 8601 is used by both Kubernetes and the IVOA DALI standard, but the IVOA DALI standard allows omitting ``Z`` (the time is still interpreted as UTC) and omitting the time portion entirely.

Safir provides two utility functions for this date and time serialization format.
Safir provides two utility functions for this subset of ISO 8601.
`safir.datetime.isodatetime` converts a `~datetime.datetime` to this format.
`safir.datetime.parse_isodatetime` goes the opposite direction, converting this format to a time zone aware `~datetime.datetime` in UTC.

To use this format as the serialized representation of any `~datetime.datetime` objects in a Pydantic model, use the following Pydantic configuration:

.. code-block:: python
from datetime import datetime
from pydantic import BaseModel, field_serializer
from safir.datetime import isodatetime
class Example(BaseModel):
some_time: datetime
_serialize_some_time = field_serializer("some_time")(isodatetime)
Also see the Pydantic validation function `safir.pydantic.normalize_isodatetime`, discussed further at :ref:`pydantic-datetime`.
If, as is more often the case, you are accepting or generating `~datetime.datetime` fields as part of a Pydantic model, see :ref:`pydantic-datetime`.

Formatting datetimes for logging
================================
Expand Down
45 changes: 24 additions & 21 deletions docs/user-guide/pydantic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,40 +62,43 @@ Normalizing datetime fields
Pydantic supports several input formats for `~datetime.datetime` fields, but the resulting `~datetime.datetime` object may be timezone-naive.
Best practice for Python code is to only use timezone-aware `~datetime.datetime` objects in the UTC time zone.

Safir provides a utility function, `~safir.pydantic.normalize_datetime`, that can be used as a field validator for a `~datetime.datetime` model field.
It ensures that any input is converted to UTC and is always timezone-aware.
Safir provides a data type, `~safir.pydantic.UtcDatetime`, that can be used in models.
It is equivalent to `~datetime.datetime` except that it coerces any input to UTC and ensures that it is always timezone-aware.

Here's an example of how to use it:

.. code-block:: python
from typing import Annotated
from pydantic import BaseModel, field_validator
from safir.pydantic import normalize_datetime
from safir.pydantic import UtcDatetime
class Info(BaseModel):
last_used: Optional[datetime] = Field(
None,
title="Last used",
description="When last used in seconds since epoch",
examples=[1614986130],
)
last_used: Annotated[
UtcDatetime | None,
Field(
title="Last used",
description="When last used",
examples=[1614986130, "2021-03-05T15:15:30+00:00"],
),
]
_normalize_last_used = field_validator("last_used", mode="before")(
normalize_datetime
)
This data type accepts all of the input formats that Pydantic accepts.

IVOA DALI timestamps
--------------------

In some cases, such as services that implement IVOA standards, it may be desirable to require input timestamps compatible with the `IVOA DALI`_ standard.

Multiple attributes can be listed as the initial arguments of `~pydantic.field_validator` if there are multiple fields that need to be checked.
.. _IVOA DALI: https://www.ivoa.net/documents/DALI/20170517/REC-DALI-1.1.html

This field validator accepts all of the input formats that Pydantic accepts.
This includes some ambiguous formats, such as an ISO 8601 date without time zone information.
All such dates are given a consistent interpretation as UTC, but the results may be surprising if the caller expected local time.
In some cases, it may be desirable to restrict input to one unambiguous format.
This can be done using `~safir.pydantic.IvoaIsoDatetime` as the data type instead of `~safir.pydantic.UtcDatetime`.
This data type produces the same timezone-aware UTC `~datetime.datetime` objects, but it only accepts ``YYYY-MM-DD[THH:MM:SS[.mmm]][Z]`` as the input format.

This can be done by using `~safir.pydantic.normalize_isodatetime` as the field validator instead.
This function only accepts ``YYYY-MM-DDTHH:MM[:SS]Z`` as the input format.
The ``Z`` time zone prefix indicating UTC is mandatory.
It is called the same way as `~safir.pydantic.normalize_datetime`.
Following the IVOA DALI standard, the trailing ``Z`` is optional, but the timestamp is always interpreted as UTC.
Explicit timezone information is not allowed.

.. _pydantic-timedelta:

Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ extend = "ruff-shared.toml"
"*/src/safir/**" = [
"N818", # Exception is correct in some cases, others are part of API
]
"safir/src/safir/pydantic/_validators.py" = [
"TRY004", # pydantic requires ValueError, not TypeError
]
"safir/src/safir/testing/**" = [
"S101", # test support functions are allowed to use assert
]
Expand Down
33 changes: 21 additions & 12 deletions safir/src/safir/datetime/_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
import re
from datetime import datetime, timedelta

_IVOA_TIMESTAMP_PATTERN = re.compile(
r"(?P<timestamp>\d{4}-\d\d-\d\d(?P<time>T\d\d:\d\d:\d\d(\.\d\d\d)?)?)Z?$"
)
"""Regular expression matching an IVOA DALI timestamp."""

_TIMEDELTA_PATTERN = re.compile(
r"((?P<weeks>\d+?)\s*(weeks|week|w))?\s*"
r"((?P<days>\d+?)\s*(days|day|d))?\s*"
Expand All @@ -26,30 +31,34 @@ def parse_isodatetime(time_string: str) -> datetime:
Parameters
----------
time_string
Date and time formatted as an ISO 8601 date and time using ``Z`` as
the time zone. This is the same format produced by `isodatetime` and
is compatible with Kubernetes and the IVOA UWS standard.
Date and time formatted as an ISO 8601 date and time, either using
``Z`` as the timezone or without timezone information. This is the
same format produced by `isodatetime` and is compatible with
Kubernetes and the IVOA DALI standard.
Returns
-------
datetime.datetime
The corresponding `datetime.datetime`.
The corresponding `~datetime.datetime`.
Raises
------
ValueError
The provided ``time_string`` is not in the correct format.
Raised if the provided time string is not in the correct format.
Notes
-----
When parsing input for a model, use `safir.pydantic.normalize_isodatetime`
instead of this function. Using a model will be the normal case; this
function is primarily useful in tests or for the special parsing cases
required by the IVOA UWS standard.
When parsing input for a model, use the `~safir.pydantic.IvoaIsoDatetime`
type instead of this function. Using a model will be the normal case; this
function is primarily useful in tests.
"""
if not time_string.endswith("Z"):
raise ValueError(f"{time_string} does not end with Z")
return datetime.fromisoformat(time_string[:-1] + "+00:00")
if m := re.match(_IVOA_TIMESTAMP_PATTERN, time_string):
timestamp = m.group("timestamp")
if not m.group("time"):
timestamp += "T00:00:00"
return datetime.fromisoformat(timestamp + "+00:00")
else:
raise ValueError(f"{time_string} does not match IVOA format")


def parse_timedelta(text: str) -> timedelta:
Expand Down
4 changes: 4 additions & 0 deletions safir/src/safir/pydantic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
EnvAsyncPostgresDsn,
EnvRedisDsn,
HumanTimedelta,
IvoaIsoDatetime,
SecondsTimedelta,
UtcDatetime,
)
from ._validators import (
normalize_datetime,
Expand All @@ -18,7 +20,9 @@
"EnvAsyncPostgresDsn",
"EnvRedisDsn",
"HumanTimedelta",
"IvoaIsoDatetime",
"SecondsTimedelta",
"UtcDatetime",
"normalize_datetime",
"normalize_isodatetime",
"to_camel_case",
Expand Down
35 changes: 33 additions & 2 deletions safir/src/safir/pydantic/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from __future__ import annotations

import os
from datetime import timedelta
from datetime import datetime, timedelta
from typing import Annotated, TypeAlias

from pydantic import (
Expand All @@ -14,13 +14,17 @@
)
from pydantic_core import Url

from safir.datetime import parse_timedelta
from safir.datetime import isodatetime, parse_timedelta

from ._validators import normalize_datetime, normalize_isodatetime

__all__ = [
"EnvAsyncPostgresDsn",
"EnvRedisDsn",
"HumanTimedelta",
"IvoaIsoDatetime",
"SecondsTimedelta",
"UtcDatetime",
]


Expand Down Expand Up @@ -154,3 +158,30 @@ def _validate_human_timedelta(v: str | float | timedelta) -> float | timedelta:
built-in Pydantic handling of `~datetime.timedelta`, an integer number of
seconds as a string is accepted, and ISO 8601 durations are not supported.
"""

UtcDatetime: TypeAlias = Annotated[
datetime, AfterValidator(normalize_datetime)
]
"""Coerce a `~datetime.datetime` to UTC.
Accepts as input all of the normal Pydantic representations of a
`~datetime.datetime`, but then forces the result to be timezone-aware and in
UTC.
"""

IvoaIsoDatetime: TypeAlias = Annotated[
datetime,
BeforeValidator(normalize_isodatetime),
PlainSerializer(isodatetime, return_type=str, when_used="json"),
]
"""Accept the ISO datetime format required by IVOA standards.
The IVOA DALI standard requires timestamps to be formatted as in ISO 8601 but
without full timezone information. Either the timezone should be omitted or
the time portion should end in ``Z``, and in either case the timestamp should
be interpreted as UTC.
This type accepts the input formats that DALI accepts, provides a
timezone-aware `~datetime.datetime` in UTC, and always serializes to the ISO
8601 format ending in ``Z``.
"""
51 changes: 37 additions & 14 deletions safir/src/safir/pydantic/_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

from pydantic import BaseModel

from safir.datetime import parse_isodatetime

T = TypeVar("T")

__all__ = [
Expand Down Expand Up @@ -36,6 +38,16 @@ def normalize_datetime(v: Any) -> datetime | None:
The timezone-aware `~datetime.datetime` or `None` if the input was
`None`.
Raises
------
ValueError
Raised if the input could not be parsed as a `~datetime.datetime`.
Notes
-----
Prefer to use the `~safir.pydantic.UtcDatetime` type instead of using this
function as a validator.
Examples
--------
Here is a partial model that uses this function as a field validator.
Expand Down Expand Up @@ -66,27 +78,40 @@ class Info(BaseModel):
return v.replace(tzinfo=UTC)


def normalize_isodatetime(v: str | None) -> datetime | None:
def normalize_isodatetime(v: Any) -> datetime | None:
"""Pydantic field validator for datetime fields in ISO format.
This field validator requires the ISO 8601 date and time format with ``Z``
as the time zone (``YYYY-MM-DDTHH:MM:SSZ``). This format is compatible
with Kubernetes and the ISO UWS standard and is the same format produced
by `safir.datetime.isodatetime`. It should be used when the ambiguous
formats supported by Pydantic by default (such as dates and times without
time zone information) shouldn't be allowed.
This field validator requires a subset of the ISO 8601 date and time
format, ``YYYY-MM-DD[THH:MM:SS[.mmm]][Z]``. Regardless of whether the
trailing ``Z`` is included, the date and time are interpreted as being in
UTC, not local time. This format is compatible with Kubernetes, the IVOA
DALI standard, and the format produced by `safir.datetime.isodatetime`.
It should be used when the other formats supported by Pydantic by default
(such as dates and times in other timezones) shouldn't be allowed, such as
when strict conformance with the IVOA standard is desired.
Parameters
----------
v
The field representing a `~datetime.datetime`.
Field representing a `~datetime.datetime`.
Returns
-------
datetime.datetime or None
The timezone-aware `~datetime.datetime` or `None` if the input was
`None`.
Raises
------
ValueError
Raised if the provided time string is not in the correct format.
Notes
-----
Prefer to use the `~safir.pydantic.IvoaIsoDatetime` type instead of using
this function as a validator.
Examples
--------
Here is a partial model that uses this function as a field validator.
Expand All @@ -107,12 +132,10 @@ class Info(BaseModel):
"""
if v is None:
return None
if not isinstance(v, str) or not v.endswith("Z"):
raise ValueError("Must be a string in YYYY-MM-DDTHH:MM[:SS]Z format")
try:
return datetime.fromisoformat(v[:-1] + "+00:00")
except Exception as e:
raise ValueError(f"Invalid date {v}: {e!s}") from e
if not isinstance(v, str):
msg = "Must be a string in YYYY-MM-DD[THH:MM:SS[:mmm]][Z] format"
raise ValueError(msg)
return parse_isodatetime(v)


def validate_exactly_one_of(
Expand Down
6 changes: 5 additions & 1 deletion safir/tests/datetime_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,12 @@ def test_parse_isodatetime() -> None:
assert time == datetime(2022, 9, 16, 12, 3, 45, tzinfo=UTC)
now = current_datetime()
assert parse_isodatetime(isodatetime(now)) == now
time = parse_isodatetime("2022-09-16T12:03:45")
assert time == datetime(2022, 9, 16, 12, 3, 45, tzinfo=UTC)
time = parse_isodatetime("2022-09-16")
assert time == datetime(2022, 9, 16, 0, 0, 0, tzinfo=UTC)

with pytest.raises(ValueError, match=r".* does not end with Z"):
with pytest.raises(ValueError, match="does not match IVOA format"):
parse_isodatetime("2022-09-16T12:03:45+00:00")


Expand Down
Loading

0 comments on commit 0cc2c9a

Please sign in to comment.