Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3884: Add local-timestamp-nanos and timestamp-nanos #2554

Merged
merged 6 commits into from
Dec 7, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/content/en/docs/++version++/Specification/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,11 @@ The `timestamp-micros` logical type represents an instant on the global timeline

A `timestamp-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.

### Timestamp (nanosecond precision)
The `timestamp-nanos` logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one nanosecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's accurate to say that "time zone information gets lost in this process" because the type is independent of a zone. I also would not refer to "the instant". Assuming that this logical type corresponds to TIMESTAMP(9) WITHOUT TIME ZONE, I would say that any statement should be the displayed value must never be modified with respect to the system time zone because it has no time zone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and I think it can be confusing to refer to a timezone at all. I copied this both from millis and micros, do we want to deviate from that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I do not agree with the statement that "time zone information gets lost in this process": the paragraph below explicitly states the time zone in use.

This is different from the logical type local-timestamp-nanos below, that does not have time zone information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp logical types should have examples to clarify the semantics. No need to repeat those examples for each of -millis, -micros, and -nanos though.

Given an event at noon local time on January 1, 2000, in Helsinki where the local time was two hours east of UTC:

  • For timestamp-millis, the timestamp is converted to UTC 2000-01-01T10:00:00 and that is then converted to Avro long (fill in the number).
  • For local-timestamp-millis, the timestamp is kept in local time 2000-01-01T12:00:00 and that is then converted to Avro long (fill in the number).

In either case, the schema author may add a separate field for the time zone offset (+02:00) or a time zone identifier (Europe/Helsinki), or the recipient of the data may know these via some offband agreement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone for the great input here. I think we all agree that this needs some reworking.

Actually, I do not agree with the statement that "time zone information gets lost in this process": the paragraph below explicitly states the time zone in use.

The timezone is always UTC. But the local timezone that the writer lives in, is lost. I would suggest removing this sentence since it is confusing. Any objections?

The timestamp logical types should have examples to clarify the semantics. No need to repeat those examples for each of -millis, -micros, and -nanos though.

I agree there, and I also like the examples. I've restructured the documentation to remove the duplicate sections.


A `timestamp-nanos` logical type annotates an Avro `long`, where the long stores the number of nanoseconds from the unix epoch, 1 January 1970 00:00:00.000000000 UTC.

### Local timestamp (millisecond precision) {#local_timestamp_ms}
The `local-timestamp-millis` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond.

Expand All @@ -872,6 +877,11 @@ The `local-timestamp-micros` logical type represents a timestamp in a local time

A `local-timestamp-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds, from 1 January 1970 00:00:00.000000.

### Local timestamp (nanosecond precision)
The `local-timestamp-nanos` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one nanosecond.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the Iceberg work? I don't think that we would want to use this type for timestamptz_ns because we don't consider that a "local" timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is unrelated to Iceberg, but because we have this there is also a local-timestamp-{millis,micros} then I think people also expect the local equivalent.


A `local-timestamp-nanos` logical type annotates an Avro `long`, where the long stores the number of nanoseconds, from 1 January 1970 00:00:00.000000.
Fokko marked this conversation as resolved.
Show resolved Hide resolved

### Duration
The `duration` logical type represents an amount of time defined by a number of months, days and milliseconds. This is not equivalent to a number of milliseconds, because, depending on the moment in time from which the duration is measured, the number of days in the month and number of milliseconds in a day may differ. Other standard periods such as years, quarters, hours and minutes can be expressed through these basic periods.

Expand Down