-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds APIs for accessing encoding of raw stream items #760
Conversation
* Uses the bump allocator to handle text escape processing, allowing `RawSymbolTokenRef` to hold a reference to a `&'bump str` instead of potentially owning a `String`. This change allows the `RawSymbolTokenRef` type to implement `Copy`, which in turn allows all of the `LazyExpandedValue`- and `LazyValue`-related types to also implement `Copy`. * Removes the `RawSymbolToken` type, which is now redundant to the `RawSymbolTokenRef` type. * Adds a `Span` type that provides access to the input bytes that comprised various raw stream items. * Adds a `LazyRawVersionMarker` trait and per-encoding impls that can provide a `Span` upon request. * Adds a `LazyRawField` trait and per-encoding impls that can provide a `Span` upon request. * Adds an `UnexpandedField` type that can represent both raw struct fields and struct fields from a template body. This simplified the code for expanding structs. * Adds methods to convert container types back to the general value type. * Adds `EncodedBinaryValueData_1_0` and `EncodedBinaryAnnotations_1_0` types that can be used to access spans and ranges for the various components of a binary 1.0 value. This patch exposes many functions and types which we likely wish to feature gate, but that change is being left for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
.write(15, RawSymbolToken::SymbolId(19))? // format | ||
.write(13, RawSymbolTokenRef::SymbolId(17))? // logger name | ||
.write(14, RawSymbolTokenRef::SymbolId(18))? // log level | ||
.write(15, RawSymbolTokenRef::SymbolId(19))? // format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Now that RawSymbolTokenRef
can hold a reference to a decoded string in the bump allocator, there's no longer a need for a separate RawSymbolToken
type that can own a string.
Down the line, I'd like to rename RawSymbolTokenRef
to simply RawSymbol
.
pub enum LazyRawAnyVersionMarkerKind<'top> { | ||
Text_1_0(LazyRawTextVersionMarker_1_0<'top>), | ||
Binary_1_0(LazyRawBinaryVersionMarker_1_0<'top>), | ||
Text_1_1(LazyRawTextVersionMarker_1_1<'top>), | ||
Binary_1_1(LazyRawBinaryVersionMarker_1_1<'top>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ There's now a per-encoding version marker type that can provide a Span
of its input bytes. Previously we only surfaced the (major: u8, minor: u8)
pair, but didn't have information about its encoding (particularly its location).
match self.encoding { | ||
Text_1_0(marker) => marker.span(), | ||
Binary_1_0(marker) => marker.span(), | ||
Text_1_1(marker) => marker.span(), | ||
Binary_1_1(marker) => marker.span(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This file (LazyRawAny_____
types) is entirely enum dispatch. A while ago I looked to see if I could use the enum_dispatch
crate to eliminate all the repetition, but the use of lifetimes was an obstacle. I have not checked recently to see if that limitation has been overcome.
pub fn kind(&self) -> LazyRawValueKind<'top> { | ||
self.encoding | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Allows an application to access the encoding-specific data type that underlies this value.
@@ -1211,7 +1412,7 @@ mod tests { | |||
|
|||
assert!(matches!( | |||
reader.next(&allocator)?, | |||
LazyRawStreamItem::<AnyEncoding>::EndOfStream | |||
LazyRawStreamItem::<AnyEncoding>::EndOfStream(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The EndOfStream
item now returns an EndPosition
that can provide its range and span. This is not especially useful in and of itself, but it allows the RawStreamItem
type that contains it to also provide a range and span for every item.
fn token_is_identifier(token: &str) -> bool { | ||
if token.is_empty() { | ||
return false; | ||
} | ||
let mut chars = token.chars(); | ||
let first = chars.next().unwrap(); | ||
(first == '$' || first == '_' || first.is_ascii_alphabetic()) | ||
&& chars.all(|c| c == '$' || c == '_' || c.is_ascii_alphanumeric()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The methods in this section were originally copied from the IonValueFormatter
. This PR introduces the IoFmtShim
type that allows them to be re-used instead of copied.
let mut iter_adapter = raw_struct_iter.map( | ||
|field: IonResult<LazyRawFieldExpr<'top, D>>| match field? { | ||
RawFieldExpr::NameValuePair(name, RawValueExpr::MacroInvocation(m)) => { | ||
let resolved_invocation = m.resolve(context)?; | ||
Ok(RawFieldExpr::NameValuePair( | ||
name, | ||
RawValueExpr::MacroInvocation(resolved_invocation.into()), | ||
)) | ||
} | ||
RawFieldExpr::NameValuePair(name, RawValueExpr::ValueLiteral(value)) => Ok( | ||
RawFieldExpr::NameValuePair(name, RawValueExpr::ValueLiteral(value)), | ||
), | ||
RawFieldExpr::MacroInvocation(invocation) => { | ||
let resolved_invocation = invocation.resolve(context)?; | ||
Ok(RawFieldExpr::MacroInvocation(resolved_invocation.into())) | ||
} | ||
}, | ||
); | ||
Self::next_field_from(context, state, e_exp_evaluator, &mut iter_adapter) | ||
Self::next_field_from(context, state, e_exp_evaluator, raw_struct_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The introduction of the UnexpandedField
enum made it possible to eliminate a lot of the complexity and generics involved in struct expansion.
/// This helper function takes a parser and returns a closure that performs the same parsing | ||
/// but prints the Result before returning the output. This is handy for debugging. | ||
// A better implementation would use a macro to auto-generate the label from the file name and | ||
// line number. | ||
fn dbg_parse<I: Debug, O: Debug, E: Debug, P: Parser<I, O, E>>( | ||
label: &'static str, | ||
mut parser: P, | ||
) -> impl Parser<I, O, E> { | ||
move |input: I| { | ||
let result = parser.parse(input); | ||
#[cfg(debug_assertions)] | ||
println!("{}: {:?}", label, result); | ||
result | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This method would cause a clippy warning in release builds because label
wouldn't be used. The text reader is spec compliant, so I feel we don't really need this debug helper anymore.
// TODO: We're discarding the name encoding information here. When we revise our field name | ||
// storage strategy[1], we should make sure to capture this for tooling's sake. | ||
// [1]: https://github.com/amazon-ion/ion-rust/issues/631 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ 🎉
pub fn read<'data>(&self, matched_input: TextBufferView<'data>) -> IonResult<StrRef<'data>> { | ||
pub fn read<'data>( | ||
&self, | ||
allocator: &'data BumpAllocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Text escapes and base64 decoding now happen in bump-allocated scratch space instead of heap-allocated String
s and Vec
s.
/// Returns the stream byte offset range that this buffer contains. | ||
pub fn range(&self) -> Range<usize> { | ||
self.offset..self.offset + self.len() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should TextBufferView
just implement HasSpan
/HasRange
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, sure. This method predates those traits.
TextBufferView
is not a public-facing type though, so there's not really a need for it IMO.
In aggregate, the changes in this PR make it straightforward to access a "lower level" view of a given item in an Ion stream, determining whether it was an input literal, part of a template, or a constructed value. This enables the development of a variety of tools.
Changes:
RawSymbolTokenRef
to hold a reference to a&'bump str
instead of potentially owning aString
. This change allows theRawSymbolTokenRef
type to implementCopy
, which in turn allows all of theLazyExpandedValue
- andLazyValue
-related types to also implementCopy
.RawSymbolToken
type, which is now redundant to theRawSymbolTokenRef
type.Span
type that provides access to the input bytes that comprised various raw stream items.LazyRawVersionMarker
trait and per-encoding impls that can provide aSpan
upon request.LazyDecoder::Value
types into implementations of a newLazyRawFieldName
trait. This fixes issue Explore moving struct field info out ofLazy*Value
and intoLazy*Field
#631.LazyRawField
trait and per-encoding impls that can provide aSpan
upon request.UnexpandedField
type that can represent both raw struct fields and struct fields from a template body. This simplified the code for expanding structs.EncodedBinaryValueData_1_0
andEncodedBinaryAnnotations_1_0
types that can be used to access spans and ranges for the various components of a binary 1.0 value.This patch exposes many functions and types which we likely wish to feature gate, but that change is being left for a future PR.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.