Skip to content

Commit

Permalink
Merge #52: Revamp BufEncoder
Browse files Browse the repository at this point in the history
e1d380f Revamp `BufEncoder` (Martin Habovstiak)

Pull request description:

  Due to MSRV limitation we had to previously supply the buffer externally. This can be now avoided since const generics are available. This removes the possibility of having the buffer allocated separtely from encoder but this is practically never needed. All uses construct both of them on stack, use them to encode something and then drop them.

  To further improve the code, this uses `ArrayString` from the `arrayvec` crate (privately). This simplifies the implementation and provides better performance since that crate uses `unsafe` to handle uninitialized bytes and doesn't need double UTF-8 check (the one we have should get optimized-out).

  Unfortunately, it's still not possible to implement the `AsHex` trait for all arrays but this at least simplifies the `DisplayArray` type. To avoid accidental panics its `new` method was made private and it's now constructed only with `as_hex` method. This also improves forward compatibility since we could one day change `new` to accept array instead (when const generics can accept expressions) without breaking (most of) the code.

Top commit has no ACKs.

Tree-SHA512: 94ed62f2dc30f0219b95ee8c9eaf52a4823d2d43aa7b00760ce7d2309b14cd4e1088a7c36f73b30ffb3e8d4d536c5c6a86b20ad716c0987cb4ca2d3056e11d05
  • Loading branch information
tcharding committed Dec 11, 2023
2 parents 79111e6 + e1d380f commit 406fede
Show file tree
Hide file tree
Showing 5 changed files with 106 additions and 251 deletions.
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ std = ["alloc"]
alloc = []

[dependencies]
arrayvec = { version = "0.7", default-features = false }
core2 = { version = "0.3.2", default-features = false, optional = true }
serde = { version = "1.0", default-features = false, optional = true }

Expand Down
4 changes: 2 additions & 2 deletions examples/wrap_array_display_hex_trait.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ impl FromHex for Wrap {
// fn hex_reserve_suggestion(self) -> usize { self.0.as_ref().hex_reserve_suggestion() }
// }
impl<'a> DisplayHex for &'a Wrap {
type Display = DisplayArray<core::slice::Iter<'a, u8>, [u8; 64]>;
fn as_hex(self) -> Self::Display { DisplayArray::new(self.0.iter()) }
type Display = DisplayArray<'a, 64>;
fn as_hex(self) -> Self::Display { self.0.as_hex() }
fn hex_reserve_suggestion(self) -> usize { 64 }
}
220 changes: 25 additions & 195 deletions src/buf_encoder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,188 +10,25 @@
use core::borrow::Borrow;

use super::Case;

#[rustfmt::skip] // Keep public re-exports separate.
#[doc(inline)]
pub use self::out_bytes::OutBytes;

/// Trait for types that can be soundly converted to `OutBytes`.
///
/// To protect the API from future breakage this sealed trait guards which types can be used with
/// the `Encoder`. Currently it is implemented for byte arrays of various interesting lengths.
///
/// ## Safety
///
/// This is not `unsafe` yet but the `as_out_bytes` should always return the same reference if the
/// same reference is supplied. IOW the returned memory address and length should be the same if
/// the input memory address and length are the same.
///
/// If the trait ever becomes `unsafe` this will be required for soundness.
pub trait AsOutBytes: out_bytes::Sealed {
/// Performs the conversion.
fn as_out_bytes(&self) -> &OutBytes;

/// Performs the conversion.
fn as_mut_out_bytes(&mut self) -> &mut OutBytes;
}

/// A buffer with compile-time-known length.
///
/// This is essentially `Default + AsOutBytes` but supports lengths 1.41 doesn't.
pub trait FixedLenBuf: Sized + AsOutBytes {
/// Creates an uninitialized buffer.
///
/// The current implementtions initialize the buffer with zeroes but it should be treated a
/// uninitialized anyway.
fn uninit() -> Self;
}

/// Implements `OutBytes`
///
/// This prevents the rest of the crate from accessing the field of `OutBytes`.
mod out_bytes {
use super::AsOutBytes;

/// A byte buffer that can only be written-into.
///
/// You shouldn't concern yourself with this, just call `BufEncoder::new` with your array.
///
/// This prepares the API for potential future support of `[MaybeUninit<u8>]`. We don't want to use
/// `unsafe` until it's proven to be needed but if it does we have an easy, compatible upgrade
/// option.
///
/// Warning: `repr(transparent)` is an internal implementation detail and **must not** be
/// relied on!
#[repr(transparent)]
pub struct OutBytes([u8]);

impl OutBytes {
/// Returns the first `len` bytes as initialized.
///
/// Not `unsafe` because we don't use `unsafe` (yet).
///
/// ## Panics
///
/// The method panics if `len` is out of bounds.
#[track_caller]
pub(crate) fn assume_init(&self, len: usize) -> &[u8] { &self.0[..len] }

/// Writes given bytes into the buffer.
///
/// ## Panics
///
/// The method panics if pos is out of bounds or `bytes` don't fit into the buffer.
#[track_caller]
pub(crate) fn write(&mut self, pos: usize, bytes: &[u8]) {
self.0[pos..(pos + bytes.len())].copy_from_slice(bytes);
}

/// Returns the length of the buffer.
pub(crate) fn len(&self) -> usize { self.0.len() }

fn from_bytes(slice: &[u8]) -> &Self {
// SAFETY: copied from std
// conversion of reference to pointer of the same referred type is always sound,
// including in unsized types.
// Thanks to repr(transparent) the types have the same layout making the other
// conversion sound.
// The pointer was just created from a reference that's still alive so dereferencing is
// sound.
unsafe { &*(slice as *const [u8] as *const Self) }
}

fn from_mut_bytes(slice: &mut [u8]) -> &mut Self {
// SAFETY: copied from std
// conversion of reference to pointer of the same referred type is always sound,
// including in unsized types.
// Thanks to repr(transparent) the types have the same layout making the other
// conversion sound.
// The pointer was just created from a reference that's still alive so dereferencing is
// sound.
unsafe { &mut *(slice as *mut [u8] as *mut Self) }
}
}
use arrayvec::ArrayString;

macro_rules! impl_encode {
($($len:expr),* $(,)?) => {
$(
impl super::FixedLenBuf for [u8; $len] {
fn uninit() -> Self {
[0u8; $len]
}
}

impl AsOutBytes for [u8; $len] {
fn as_out_bytes(&self) -> &OutBytes {
OutBytes::from_bytes(self)
}

fn as_mut_out_bytes(&mut self) -> &mut OutBytes {
OutBytes::from_mut_bytes(self)
}
}

impl Sealed for [u8; $len] {}

impl<'a> super::super::display::DisplayHex for &'a [u8; $len / 2] {
type Display = super::super::display::DisplayArray<core::slice::Iter<'a, u8>, [u8; $len]>;
fn as_hex(self) -> Self::Display {
super::super::display::DisplayArray::new(self.iter())
}

fn hex_reserve_suggestion(self) -> usize {
$len
}
}
)*
}
}

impl<T: AsOutBytes + ?Sized> AsOutBytes for &'_ mut T {
fn as_out_bytes(&self) -> &OutBytes { (**self).as_out_bytes() }

fn as_mut_out_bytes(&mut self) -> &mut OutBytes { (**self).as_mut_out_bytes() }
}

impl<T: AsOutBytes + ?Sized> Sealed for &'_ mut T {}

impl AsOutBytes for OutBytes {
fn as_out_bytes(&self) -> &OutBytes { self }

fn as_mut_out_bytes(&mut self) -> &mut OutBytes { self }
}

impl Sealed for OutBytes {}

// As a sanity check we only provide conversions for even, non-empty arrays.
// Weird lengths 66 and 130 are provided for serialized public keys.
impl_encode!(
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 40, 64, 66, 128, 130, 256, 512,
1024, 2048, 4096, 8192
);

/// Prevents outside crates from implementing the trait
pub trait Sealed {}
}
use super::Case;

/// Hex-encodes bytes into the provided buffer.
///
/// This is an important building block for fast hex-encoding. Because string writing tools
/// provided by `core::fmt` involve dynamic dispatch and don't allow reserving capacity in strings
/// buffering the hex and then formatting it is significantly faster.
pub struct BufEncoder<T: AsOutBytes> {
buf: T,
pos: usize,
pub struct BufEncoder<const CAP: usize> {
buf: ArrayString<CAP>,
}

impl<T: AsOutBytes> BufEncoder<T> {
impl<const CAP: usize> BufEncoder<CAP> {
const _CHECK_EVEN_CAPACITY: () = [(); 1][CAP % 2];

/// Creates an empty `BufEncoder`.
///
/// This is usually used with uninitialized (zeroed) byte array allocated on stack.
/// This can only be constructed with an even-length, non-empty array.
#[inline]
pub fn new(buf: T) -> Self { BufEncoder { buf, pos: 0 } }
pub fn new() -> Self { BufEncoder { buf: ArrayString::new() } }

/// Encodes `byte` as hex in given `case` and appends it to the buffer.
///
Expand All @@ -201,8 +38,7 @@ impl<T: AsOutBytes> BufEncoder<T> {
#[inline]
#[track_caller]
pub fn put_byte(&mut self, byte: u8, case: Case) {
self.buf.as_mut_out_bytes().write(self.pos, &super::byte_to_hex(byte, case.table()));
self.pos += 2;
self.buf.push_str(&case.table().byte_to_hex(byte));
}

/// Encodes `bytes` as hex in given `case` and appends them to the buffer.
Expand Down Expand Up @@ -251,56 +87,54 @@ impl<T: AsOutBytes> BufEncoder<T> {

/// Returns true if no more bytes can be written into the buffer.
#[inline]
pub fn is_full(&self) -> bool { self.pos == self.buf.as_out_bytes().len() }
pub fn is_full(&self) -> bool { self.space_remaining() == 0 }

/// Returns the written bytes as a hex `str`.
#[inline]
pub fn as_str(&self) -> &str {
core::str::from_utf8(self.buf.as_out_bytes().assume_init(self.pos))
.expect("we only write ASCII")
}
pub fn as_str(&self) -> &str { &self.buf }

/// Resets the buffer to become empty.
#[inline]
pub fn clear(&mut self) { self.pos = 0; }
pub fn clear(&mut self) { self.buf.clear(); }

/// How many bytes can be written to this buffer.
///
/// Note that this returns the number of bytes before encoding, not number of hex digits.
#[inline]
pub fn space_remaining(&self) -> usize { (self.buf.as_out_bytes().len() - self.pos) / 2 }
pub fn space_remaining(&self) -> usize { self.buf.remaining_capacity() / 2 }

pub(crate) fn put_filler(&mut self, filler: char, max_count: usize) -> usize {
let mut buf = [0; 4];
let filler = filler.encode_utf8(&mut buf);
let max_capacity = self.space_remaining() / filler.len();
let max_capacity = self.buf.remaining_capacity() / filler.len();
let to_write = max_capacity.min(max_count);

for _ in 0..to_write {
self.buf.as_mut_out_bytes().write(self.pos, filler.as_bytes());
self.pos += filler.len();
self.buf.push_str(filler);
}

to_write
}
}

impl<const CAP: usize> Default for BufEncoder<CAP> {
fn default() -> Self { Self::new() }
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn empty() {
let mut buf = [0u8; 2];
let encoder = BufEncoder::new(&mut buf);
let encoder = BufEncoder::<2>::new();
assert_eq!(encoder.as_str(), "");
assert!(!encoder.is_full());
}

#[test]
fn single_byte_exact_buf() {
let mut buf = [0u8; 2];
let mut encoder = BufEncoder::new(&mut buf);
let mut encoder = BufEncoder::<2>::new();
assert_eq!(encoder.space_remaining(), 1);
encoder.put_byte(42, Case::Lower);
assert_eq!(encoder.as_str(), "2a");
Expand All @@ -317,8 +151,7 @@ mod tests {

#[test]
fn single_byte_oversized_buf() {
let mut buf = [0u8; 4];
let mut encoder = BufEncoder::new(&mut buf);
let mut encoder = BufEncoder::<4>::new();
assert_eq!(encoder.space_remaining(), 2);
encoder.put_byte(42, Case::Lower);
assert_eq!(encoder.space_remaining(), 1);
Expand All @@ -334,8 +167,7 @@ mod tests {

#[test]
fn two_bytes() {
let mut buf = [0u8; 4];
let mut encoder = BufEncoder::new(&mut buf);
let mut encoder = BufEncoder::<4>::new();
encoder.put_byte(42, Case::Lower);
assert_eq!(encoder.space_remaining(), 1);
encoder.put_byte(255, Case::Lower);
Expand All @@ -352,8 +184,7 @@ mod tests {

#[test]
fn put_bytes_min() {
let mut buf = [0u8; 2];
let mut encoder = BufEncoder::new(&mut buf);
let mut encoder = BufEncoder::<2>::new();
let remainder = encoder.put_bytes_min(b"", Case::Lower);
assert_eq!(remainder, b"");
assert_eq!(encoder.as_str(), "");
Expand Down Expand Up @@ -393,8 +224,7 @@ mod tests {
}

let mut writer = Writer { buf: [0u8; 2], pos: 0 };
let mut buf = [0u8; 2];
let mut encoder = BufEncoder::new(&mut buf);
let mut encoder = BufEncoder::<2>::new();

for i in 0..=255 {
write!(writer, "{:02x}", i).unwrap();
Expand Down
Loading

0 comments on commit 406fede

Please sign in to comment.