Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace ndjson with jsonl specification #9

Merged
merged 5 commits into from
Mar 17, 2024

Conversation

byCedric
Copy link
Member

@byCedric byCedric commented Mar 17, 2024

Linked issue

There are two (slightly) different standards for Newline-Delimited JSON; ndjson and jsonl.

The main difference between these two are around the allowed content characters within the JSON files:

  • ndjson - 3.1. Serialization

    Each JSON text MUST conform to the [RFC8259] standard and MUST be written to the stream followed by the newline character \n (0x0A). The newline character MAY be preceded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.

    All serialized data MUST use the UTF8 encoding.

  • jsonl - 1. UTF-8 Encoding

    JSON allows encoding Unicode strings with only ASCII escape sequences, however those escapes will be hard to read when viewed in a text editor. The author of the JSON Lines file may choose to escape characters to work with plain ASCII files.

    Encodings other than UTF-8 are very unlikely to be valid when decoded as UTF-8 so the chance of accidentally misinterpreting characters in JSON Lines files is low.

Since we embed code in these stats files, which may or may not contain (un)escaped UTF-8/UTF-16 characters and likely contains JSON-escaped newlines (which makes it hard to parse JSON by partial line chunk), we don't fully match the ndjson spec.

On top of that, .jsonl seems to be a recognized format in vscode. So fully swapping over to jsonl and using .jsonl extension seems to be a low hanging fruit.

image

@byCedric byCedric force-pushed the refactor/rename-ndjson-to-jsonl branch from 91faefd to 9dc1185 Compare March 17, 2024 14:40
@byCedric byCedric changed the title refactor: rename ndjson to jsonl in utils refactor: replace ndjson with jsonl specification Mar 17, 2024
@byCedric byCedric merged commit b3bb892 into main Mar 17, 2024
2 checks passed
@byCedric byCedric deleted the refactor/rename-ndjson-to-jsonl branch March 17, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant