Skip to content
This repository has been archived by the owner on May 16, 2022. It is now read-only.

Use JsonUtf8Encoding : Encoding #17

Open
neuecc opened this issue Oct 9, 2017 · 3 comments
Open

Use JsonUtf8Encoding : Encoding #17

neuecc opened this issue Oct 9, 2017 · 3 comments

Comments

@neuecc
Copy link
Owner

neuecc commented Oct 9, 2017

Escaping string character is hurt of performance of JSON serialization.
It is possible to reduce escape cost by creating custom UTF8 Encoding that includes JSON encoding/decoding.
for invoke internal FastAllocateString, it is necessary to inherit Encoding.

public class JsonUtf8Encoding : Encoding
{
    #region decode(for reader)

    // (Encoding.GetString) -> GetCharCount -> (FastAllocateString) -> GetChars

    public override int GetCharCount(byte[] bytes, int index, int count)
    {
        // return CharCount is \" (.+) \", (.+) group unescaped.
        if (bytes[index] != '\"') throw new InvalidOperationException();

        throw new NotImplementedException();
    }

    public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
    {
        throw new NotImplementedException();
    }

    #endregion

    #region encode(for writer)

    // should use GetByteCount? too large?

    public override int GetMaxByteCount(int charCount)
    {
        return Encoding.UTF8.GetMaxByteCount(charCount) * 2; // worst case, escaped.
    }

    public override unsafe int GetBytes(string s, int charIndex, int charCount, byte[] bytes, int byteIndex)
    {
        int byteCount = bytes.Length - byteIndex;

        fixed (char* pChars = s)
        fixed (byte* pBytes = bytes)
        {
            return GetBytes(pChars + charIndex, charCount, pBytes + byteIndex, byteCount);
        }
    }

    public override unsafe int GetBytes(char* chars, int charCount, byte* bytes, int byteCount)
    {
        throw new NotImplementedException();
    }

    #endregion

    public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
    {
        throw new NotSupportedException();
    }

    public override int GetByteCount(char[] chars, int index, int count)
    {
        throw new NotSupportedException();
    }

    public override int GetMaxCharCount(int byteCount)
    {
        throw new NotSupportedException();
    }
}

Also, it is necessary to implement efficient UTF 8 encoding/decoding.
I found this article.
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
If there are any other good examples, please let me know.

@neuecc
Copy link
Owner Author

neuecc commented Oct 9, 2017

@itn3000 is trying fast utf8 <-> utf16 utilities.
https://github.com/itn3000/unicode-convert-utilities

@ufcpp is building custom UTF8 decoder.
https://github.com/ufcpp/Utf8Utils

NStack is golang like new encoding system.
https://github.com/migueldeicaza/NStack

System.Text.Utf8String is span based new primitive.
https://github.com/dotnet/corefxlab/tree/master/src/System.Text.Utf8String/System/Text

@Tornhoof
Copy link

Regarding utf-8:
http://nullprogram.com/blog/2017/10/06/
https://news.ycombinator.com/item?id=15423674
and related from dotnet/corefxlab#1831

@penguinawesome
Copy link

hi @neuecc we badly need your help, do you have an idea or workaround for our issue? #224

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants