Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Lexing/Parsing of embedded docs to not eagerly validate #507

Merged
merged 3 commits into from
Oct 19, 2024

Conversation

jpschorr
Copy link
Contributor

@jpschorr jpschorr commented Oct 11, 2024

This changes the lexer and parser to pass through strings enclosed in backticks un-parsed. (At current, these documents are parsed during lowering).

Since embedded documents may themselves contain backticks, beginning and ending delimiters consist of an arbitrary odd numbers of backticks (e.g., `, ```, ````` etc.) that must be paired (e.g., `$ion_data_here::[]`, ```$ion_data_here::[ $string_with_embedded_backtick:"`" ]```, etc.).

As opening and closing delimiters are required to be odd in count of backticks, a contiguous string of backticks that is even is interpreted as an empty document.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 93.91304% with 7 lines in your changes missing coverage. Please review.

Project coverage is 80.86%. Comparing base (3f9d17f) to head (52b360c).

Files with missing lines Patch % Lines
partiql-parser/src/lexer/mod.rs 89.47% 3 Missing and 1 partial ⚠️
partiql-parser/src/lexer/partiql.rs 90.00% 2 Missing ⚠️
...sion/partiql-extension-visualize/src/ast_to_dot.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           dev-ion-doc     #507      +/-   ##
===============================================
+ Coverage        80.84%   80.86%   +0.01%     
===============================================
  Files               80       80              
  Lines            19398    19401       +3     
  Branches         19398    19401       +3     
===============================================
+ Hits             15683    15689       +6     
- Misses            3291     3293       +2     
+ Partials           424      419       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Oct 11, 2024

Conformance comparison report

Base (3f9d17f) b1c975d +/-
% Passing 90.37% 90.37% 0.00%
✅ Passing 5732 5732 0
❌ Failing 611 611 0
🔶 Ignored 0 0 0
Total Tests 6343 6343 0

Number passing in both: 5732

Number failing in both: 611

Number passing in Base (3f9d17f) but now fail: 0

Number failing in Base (3f9d17f) but now pass: 0

@jpschorr jpschorr marked this pull request as ready for review October 17, 2024 18:06
@jpschorr jpschorr requested review from alancai98 and am357 October 17, 2024 18:06
@@ -394,7 +394,7 @@ impl PrettyDoc for Lit {
Lit::FloatLit(inner) => arena.text(inner.to_string()),
Lit::DoubleLit(inner) => arena.text(inner.to_string()),
Lit::BoolLit(inner) => arena.text(inner.to_string()),
Lit::IonStringLit(inner) => inner.pretty_doc(arena),
Lit::EmbeddedDocLit(inner) => inner.pretty_doc(arena), // TODO better pretty for embedded doc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work is needed to do type-specific pretty printing for embedded documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #508

@@ -1978,6 +1978,7 @@ fn lit_to_value(lit: &Lit) -> Result<Value, AstTransformError> {
Ok(val)
}

// TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be moved/removed in a future PR as part of this dev-ion-doc feature branch.

partiql-ast/src/pretty.rs Outdated Show resolved Hide resolved
@@ -46,7 +46,7 @@ time = { version = "0.3", features = ["macros"] }
criterion = "0.5"
rand = "0.8"

assert_matches = "1.5"
assert_matches = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check: Do we want to move from 1.5 to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what all other crates use:

13:50❯ rg assert_matches -g "*.toml"
partiql-eval/Cargo.toml
35:assert_matches = "1"

partiql/Cargo.toml
49:assert_matches = "1"

partiql-logical-planner/Cargo.toml
42:assert_matches = "1"

partiql-parser/Cargo.toml
50:assert_matches = "1"

partiql-parser/src/lexer/mod.rs Outdated Show resolved Hide resolved
LineAndColumn::from(offset_tracker.at(query, BytePosition::from(1)).unwrap()),
LineAndColumn::new(1, 2).unwrap()
);
// ion is not eagerly parsed, so unterminated ion does not cause a lex/parse error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will an error be thrown for the unterminated Ion string? Probably when the contents of the document is later used?

partiql-parser/src/lexer/mod.rs Show resolved Hide resolved
@jpschorr jpschorr merged commit 3c82a85 into dev-ion-doc Oct 19, 2024
17 checks passed
@jpschorr jpschorr deleted the feat-embedded-doc branch October 19, 2024 19:20
jpschorr added a commit that referenced this pull request Oct 22, 2024
This changes the lexer and parser to pass through strings enclosed in backticks un-parsed. (At current, these documents are parsed during lowering).

Since embedded documents may themselves contain backticks, beginning and ending delimiters consist of an arbitrary odd numbers of backticks (e.g., `` ` ``, `` ``` ``, `` ````` `` etc.) that must be paired (e.g., `` `$ion_data_here::[]` ``, `` ```$ion_data_here::[ $string_with_embedded_backtick:"`" ]``` ``, etc.).

As opening and closing delimiters are required to be odd in count of backticks, a contiguous string of backticks that is even is interpreted as an empty document.
jpschorr added a commit that referenced this pull request Dec 5, 2024
This changes the lexer and parser to pass through strings enclosed in backticks un-parsed. (At current, these documents are parsed during lowering).

Since embedded documents may themselves contain backticks, beginning and ending delimiters consist of an arbitrary odd numbers of backticks (e.g., `` ` ``, `` ``` ``, `` ````` `` etc.) that must be paired (e.g., `` `$ion_data_here::[]` ``, `` ```$ion_data_here::[ $string_with_embedded_backtick:"`" ]``` ``, etc.).

As opening and closing delimiters are required to be odd in count of backticks, a contiguous string of backticks that is even is interpreted as an empty document.
jpschorr added a commit that referenced this pull request Dec 5, 2024
* Change Lexing/Parsing of embedded docs to not eagerly validate (#507)

This changes the lexer and parser to pass through strings enclosed in backticks un-parsed. (At current, these documents are parsed during lowering).

Since embedded documents may themselves contain backticks, beginning and ending delimiters consist of an arbitrary odd numbers of backticks (e.g., `` ` ``, `` ``` ``, `` ````` `` etc.) that must be paired (e.g., `` `$ion_data_here::[]` ``, `` ```$ion_data_here::[ $string_with_embedded_backtick:"`" ]``` ``, etc.).

As opening and closing delimiters are required to be odd in count of backticks, a contiguous string of backticks that is even is interpreted as an empty document.

* Behavior-preserving refactor of `Value` into a module. (#509)

* Behavior-preserving refactor of Value into a module. (#510)

* Change modeling of Literals in the AST remove ambiguity (#517)

Change parsing and AST-modeling of literals to not share AST structures with non-scalar expressions.

* Change modeling of boxed ion literals to be lazy until evaluator. (#519)

Changes the logical plan to have a distinct `Lit` type to hold literals instead of embedded `Value`

* Refactor lifetimes for new rust warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants