Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.splat universal format discussion #47

Open
dylanebert opened this issue Nov 7, 2023 · 51 comments
Open

.splat universal format discussion #47

dylanebert opened this issue Nov 7, 2023 · 51 comments

Comments

@dylanebert
Copy link

dylanebert commented Nov 7, 2023

Hello! I'm the author of gsplat.js, in which I'm using the splat format as provided in antimatter15/splat

I have opened an issue on splat compression, and I think it would be great if we can have a universal representation, with a consistent header to support different compression methods.

I can replicate your compressed format, but maybe we can open a common format repo with test files, so we can stay on the same page?

What do you think?

@mkkellogg
Copy link
Owner

mkkellogg commented Nov 8, 2023

I think that is a great idea; given all the different splat viewers that have been developed, I think this was pretty much inevitable.

I think one of the first questions we need to answer is: have we identified all the potential stakeholders in such a project? Ideally the decisions we make here would produce a universal compact format that would be beneficial to anyone that has implemented a viewer (commercial, open-source, and so on). Or maybe my thinking is a little too grandiose at this point and we should just move forward and hope others will join. I'm sure you've read over Aras Pranckevičius's blog about compressing splat files; it would also probably be good to reach out to Kevin Kwok (if you haven't already).

As far as the actual approaches we use to organize the data and/or compress it, I'm totally open to suggestions. I admit my current implementation is fairly quick & dirty; I just wanted to get something in place that would cut down the size of the .splat files (and they're still pretty big).

@dylanebert
Copy link
Author

Sounds great! I think a universal compact format would be very nice, or at least a plan how the open source community can stay in sync as it keeps improving.

I have reached out to Aras Pranckevičius and Kevin Kwok and pointed them here.

@aras-p
Copy link

aras-p commented Nov 8, 2023

Wall of text! ⚠️

Unity Gaussian Splatting format

Thought process behind it is in my blog posts (one, two), but it's kinda like this:

enum VectorFormat {
    Float32 = 0, // 12 bytes: 3x float32
    Norm16 = 1,  // 6 bytes: 3x unorm16
    Norm11 = 2,  // 4 bytes: 11+10+11 bit unorm
    Norm6 = 3    // 2 bytes: 5+6+5 bit unorm
}
enum SHFormat {
    // same as above
    Float32,
    Norm16,
    Norm11,
    Norm6,
    // "palette", each SH data is an index into a separate table.
    // table itself uses Float16 ("half") data
    Cluster64k,
    Cluster32k,
    Cluster16k,
    Cluster8k,
    Cluster4k,
}
enum ColorFormat {
	Float32x4,	// 16 bytes: 4x float32
	Float16x4,	// 8 bytes: 4x float16	
	Unorm8x4,	// 4 bytes: 4x unorm8 (colors are *not* sRGB)
	BC7, 		// 1 byte: BC7 GPU format, not sRGB (PC/consoles only, not mobile!)
	// I don't have these yet, but potential candidates:
	ASTC4x4,	// 1 byte: ASTC 4x4 GPU format (mobile & Mac)
	ASTC5x5, 	// 0.64 byte: ASTC 5x5 GPU format
	ASTC6x6, 	// 0.44 byte: ASTC 6x6 GPU format
	UASTC,		// 1 byte: Basis Universal UASTC format, can be transcoded to either BC7 or ASTC4x4 at load
}
// Each variable is min,max value of that thing per chunk.
// shR/shG/shB are not used if one of Cluster SH formats is in use.
struct ChunkInfo {
	half2 colR, colG, colB, colA;
	float2 posX, posY, posZ;
	half2 scaleX, scaleY, scaleZ;
	half2 shR, shG, shB;
}

Data "header" is like:

struct Header {
	uint splatCount;
	VectorFormat posFormat;
	VectorFormat scaleFormat;
	ColorFormat colorFormat;
	SHFormat shFormat;
}

Now the data itself is separate "conceptual files" of:

  • Chunk information (size: (splatCount+255)/256), array of ChunkInfo.
  • Splat positions: array of posFormat-dependent items.
  • Splat colors: 2D array (texture) of colorFormat-dependent items. Notes on layout below.
  • Splat rotation+scale+SHindex data: array of scaleFormat+shFormat-dependent items.
  • SH data: per-splat SH entries (dependent on shFormat), or palette of N SH items for Cluster SH formats.

For color data (which is RGB color plus opacity), I store that in a 2D texture, to enable GPU compression. The texture width is always 2048 (allows for up to 32M splats total, given that max GPU texture height is 16k); height is dependent on splat count but always multiple of 16. And the order that the splat data is laid out inside the texture is not simple row-major, but rather each "chunk" (256 splats) is put into 16x16 blocks, and within a block pixels are arranged in Morton order.

uint2 DecodeMorton2D_16x16(uint t)
{
    t = (t & 0xFF) | ((t & 0xFE) << 7); // -EAFBGCHEAFBGCHD
    t &= 0x5555;                        // -E-F-G-H-A-B-C-D
    t = (t ^ (t >> 1)) & 0x3333;        // --EF--GH--AB--CD
    t = (t ^ (t >> 2)) & 0x0f0f;        // ----EFGH----ABCD
    return uint2(t & 0xF, t >> 8);  // --------EFGHABCD
}
int SplatIndexToPixelIndex(uint idx)
{
    uint2 xy = DecodeMorton2D_16x16(idx);
    uint width = kTextureWidth / 16;
    idx >>= 8;
    uint x = (idx % width) * 16 + xy.x;
    uint y = (idx / width) * 16 + xy.y;
    return (int)(y * kTextureWidth + x);
}

For rotation+scale+SH data, it's like this:

  • Rotation quaternion is always 4 bytes, stored in "smallest three" encoding, using 10+10+10+2 bits. Ten bits for each of three smallest components, and then 2 bits for the index of which component was the largest one. The smallest components can only ever be in -0.707..+0.707 range so the bits are spread over that range.
  • Scale is dependent on scaleFormat.
  • SH index is only present for Cluster SH formats, and if present, it's two bytes (16 bit index).

Additional data transformations that are done:

  • Scale is first converted into linear scaling factor (i.e. not log scale as in PLY data), and then raised to 1/8th power. The reason is that most scales I've found tend to be really on the "small side" of the per-chunk scale range, and this distributes scale values over available few bits better.
  • Color (SH 0 aka DC component) is transformed to regular color (dc0*0.2820948+0.5).
  • Opacity is first transformed into regular opacity (apply sigmoid function to PLY value), and then apply a sort of "square it, centered around 0.5" transformation. The reason is that most opacities tend to be towards "almost fully opaque" or "almost fully transparent" ends, and this again distributes values towards available bits better.
    float SquareCentered01(float x)
    {
      x -= 0.5f;
      x *= x * sign(x);
      return x * 2.0f + 0.5f;
    }
    

In my implementation the "Very Low" preset is the one that's not absolutely smallest allowed by these formats,
but the one where I deemed that "it still looks acceptable", and that is:

  • Positions: Unorm11 (4 bytes/splat)
  • Rotations: 10.10.10.2 quaternion (4 bytes/splat)
  • Scale: Unorm6 (2 bytes/splat)
  • Color+opacity: BC7 (1 bytes/splat)
  • SH: Cluster4k entries, plus 2 bytes/splat SH index.
  • Chunk data is 64 bytes/chunk, or 0.25 bytes/splat.

So overall this is like 13.25 bytes/splat, plus 360 KB of SH palette data for the whole splat cloud. For a million splats, this
would be 13.6 MB.

antimatter15/splat .splat format

From what I can tell, the format there is 32 bytes/splat (for a million splats: 32 MB):

struct Splat {
	float3 position;
	float3 scale;
	unorm8x4 color; // color + opacity
	unorm8x4 rotation; // quaternion
}

The format drops all SH data, so some of the realistm of "shininess" of surfaces is lost when looking at them from different angles.

Some "probably easy" ways of making this data smaller:

  • Store position in unorm16x3, uniformly quantized over the bounding box (store bounding box separately somewhere). Or just store as half3. 12 bytes -> 6 bytes.
  • Store scale as half3. 12 bytes -> 6 bytes.
  • Store rotation as 10.10.10.2 quaternion. Does not improve storage, but way better accuracy.

The above would get it down to 20 bytes/splat, but the format would still not have spherical harmonics.

gsplat.tech format

I don't recall that format details right now, but IIRC it was something like:

struct Splat {
	half3 position; // 6 bytes
	half6 covariance3d; // 12 bytes
	unorm8x4 color; // 4 bytes
	ushort16 shIndex; // 2 bytes
}

Difference here is that instead of storing rotation+scale, they store 6 numbers of the "3D covariance". This saves a tiny bit of calculations in the shader, but you can't easily factor this out into rotation/scale again if you want to visualize splats as something else than splats (but e.g. oriented boxes). Their genius bit is the clustered/paletted SH table idea. They also do something with opacities, as if they are not stored directly in 8 bits available, but also each of 256 opacity values indexes into a premade table. I guess this achieves similar effect as my "non-linear opacity transformation" above.

Their splat data is thus 24 bytes/splat, and IIRC they always store 64k possible SH entries, at half precision float format, so that table is always like 6 MB. For a million splats: 30 MB of data.

Wot I think a format could be

So the first question is, do you want a "simple" format like .splat or gsplat.tech, where there's no "chunks" but rather data for each splat is just stored somehow quantized acceptably. This is simple, but probably hard to get below ~20 bytes/splat.

With "chunking" like what Unity project does, it get a bit more complicated, but since each chunk stores min/max value range, it is possible to quantize the actual splat values in smaller amount of bits, while still retaining good quality. This is important for positions, scales and color data.

Another question is, do you want to have spherical harmonics data or not. It's probably out of the question that each splat would store any form of SH data per-splat, since it's way too large. Even if you cut it down massively (e.g. BC1 GPU compression like in https://aras-p.info/blog/2023/09/13/Making-Gaussian-Splats-smaller/), that is still 7.5 bytes/splat just for SH. For the web, I think the only practical choices are:

  • Either discard SH data completely (like what .splat does),
  • Or cluster all possible SH values into a "palette" (like what gsplat.tech and unity project does), and store index into palette per splat,
  • Or maybe keep only the 1st order SH data, like the first 3 RGB entries. But that's still quite large, unless you go for like BC1 GPU format.

I quite like the chunking approach TBH, and it's not terribly complexicated. Keeping WebGL2 in mind that can't read from arbitrary data buffers, there's a certain elegance of putting all the possible data formats into GPU textures, and letting the GPU hardware do all the sampling and decoding. I initially had that in the unity project, but then backed out of that partially because things like "float3" just don't exist as a GPU format (WebGL has it, but internally for GPUs that gets turned into a float4, thus wasting some VRAM). However, that is not a big deal, and specifically for the web, I doubt anyone would use float3 format options. So it might make sense to put everything textures, laid out in the same order as color data in unity project case, i.e.:

All the splats are put into "chunks" of 256 splats size. These are preferably put in some sort of "chunk is small / close in space" fashion, e.g. by rearranging splats in 3D morton order by position or some other way. Each chunk stores min/max values of: position (float3 x2), scale (half3 x2), color and opacity (half4 x2). This is 52 bytes/chunk (or 0.2 bytes/splat). For WebGL2 usage, this could be put into a R32UI texture, with rows of 13 pixels containing raw chunk data bits, and within the shader you convert from raw bits into floats and halfs (using unpackHalf2x16 etc.).

Now, you also have more textures, with per-splat data:

  • Positions (various formats, R10G10B10A2 good default),
  • Rotations (always R10G10B10A2),
  • Scales (various formats, R5G6B5 good default),
  • Color+opacity (various formats, R8G8B8A8 is easy, UASTC viable),
  • SH index (ushort16)

So defaults listed above would be 16 bytes/splat (or 13 bytes/splat when using UASTC).

And the SH palette data would be stored in half3 format, each of 15 RGB SH values arranged in 4x4 pixel block (with one pixel unused), similar to how gsplat.tech does it. For 4k SH palette that would be 386 KB.

But if really really needed, you could get crazily lower, with zero added complexity (since all the data is "just textures" and shader code does not really care how GPU decodes them): positions, rotations, scales and color all using UASTC, and drop SH index. Now it's just 4 bytes/splat; it would look a bit like https://aras-p.info/img/blog/2023/gaussian-splat/GsTruck_4VeryLow.jpg but if you really need to go super small then heck why not.

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel? One file (like .splat or .ply) is very convenient to use. gsplat.tech IIRC loads from something like 4 files at once. I don't know which approach is better.

Conceptually, if it's one file, I'd put data in this order:

  • First header (splat count, data formats),
  • Chunk data,
  • Positions,
  • Colors,
  • Rotations,
  • Scales,
  • SH indices,
  • SH palette.

This way you can display "something" while it's loading, kinda similar to Luma's "magic reveal" but not quite:

  • Even when you only have chunks (which is very small amount of data), I think you could display "something": you know for each group of 256 splats, their center (middle of min/max) position, center color and center scale. Could display some "blob" in there.
  • Once positions start coming in, you can start displaying them as points (similar to Luma). The actual chunks could be ordered so they start out in the middle of the scene, and since that's where most of interest is in typical splat files.
  • Later on their colors start coming in, so start displaying those too.
  • And then once rotations & scales are there, do those.
  • Finally SHs once they arrive.

"Technology" needed to build all of the above (all/most of that exists in Unity project, but it's all written in C#):

  • Reorder input splats in spatially aware order (Morton), to cut into chunks.
  • Cluster SH data using k-means or similar clustering mechanism.
  • Compress data into GPU compression formats (typically you'd only do this for color/opacity, but can do on others if need to go really low). For web use case, I'd start by looking into Basis Universal UASTC format, which can work on both desktop and mobile, is 1 byte/pixel (same as BC7 or ASTC4x4), and supposedly has fast web-friendly transcoders. I have not personally used this myself though.

All of the above maybe could be done as some ad-hoc format, or maybe as some way of using glTF2. @hybridherbst might know more there.

@Snosixtyboo
Copy link

Hi,

we would be very much on board with finding a good universal representation for the ref implementation too!

Speaking only for myself, my current thought is that it makes sense to leave this mainly to the community, but there's one concrete concern: In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

We would love to be kept in the loop! If a reasonable consensus is found, we would do our best to quickly support it in the ref!

Best,
Bernhard

@aras-p
Copy link

aras-p commented Nov 8, 2023

In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

Yeah that makes a lot of sense. This is what I do in the unity project too, e.g. some of the (WIP) splat editing tools only are enabled and only work if literally everything is full floats. Then I actually don't do any "chunking" at all (since it's both cumbersome when editing splats, plus might lead to a precision loss).

@chris-aeviator
Copy link

needs:

  • dynamic fields (e.g. all 64 SH channels vs. just some)
  • to consider additional data (e.g. movement fields from 4DGS) now/ in the future
  • compression/ quantization
  • streaming!
  • global coordinate spaces (think stitching multiple splats)

@slimbuck
Copy link

slimbuck commented Nov 8, 2023

In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

Isn't this just original PLY files?

@dylanebert
Copy link
Author

Thanks @aras-p for the amazing wall of text! I'm on board with the layout you described.

My general sense is that .splat tends to leans toward compression, and .ply tends to lean toward losslessness/flexibility
(similar to .jpg vs .png). I think that we can support both and cover most needs / use-cases well. What do you think?

@aras-p
Copy link

aras-p commented Nov 8, 2023

Not sure if .png vs .jpg analogy holds up all that well. PLY is completely uncompressed, full float32 precision data (whereas .png is "compressed losslessly"). FWIW I tried doing some lossless compression of 3DGS data, but it does not compress well, mostly because things like rotations and scales are very random.

Anyway, probably first question is what's the scope that we're targeting. Everything I wrote above is more towards "this represents a single gaussian splat cloud, nothing else". @chris-aeviator comment above indicates needs for it to be more extensible and/or ability to augment it with some additional metadata.

I think (ab)using glTF2 might be a very viable way to look into that. glTF2 itself would provide ability to put "more than just the splat" into a file in case someone needs it (e.g. positions of the cameras, transform of the splat itself, etc.). A good question is though, how exactly to represent the data of the splats. If we go towards the "everything is actually put into textures" idea as in my previous comment, then maybe splat data could be put into glTF2 roughly like so:

  • The glTF2 file defines "something dummy" for a mesh (if that's needed at all), like one quad or something.
  • Then it defines / references all the data textures needed for the splats.
  • And then it defines a custom "splat material" with some "extra" properties that have nothing to do with standard PBR materials, but instead reference the needed data textures, as well as any other data as needed.

Advantage of glTF2 is that it's very much "native" for the web stack, i.e. almost any 3D engine on the web supports it, including things like UASTC texture data transcoding handling.

And it would be somewhat "extensible" for future (animated splats, etc.), because the "file format" is just glTF2.

But whether anything above makes sense at all, would have to be evaluated by someone who actually knows anything about glTF2. Maybe I'll ask some people around :)

@antimatter15
Copy link

antimatter15 commented Nov 8, 2023

I haven't fully digested all the great points above (and I don't have any experience with texture compression), so some of this might be wrong! But so far here's how I've been thinking:

I would like the format to support streaming— where something can be shown as soon as possible. And in particular, I would like to support a kind of "early termination" where on devices that are either compute and/or bandwidth constrained, some subset of the splats are loaded. i.e. I would like it if mobile devices could just fetch the first n-MB of a file and then abort the transfer and still be able to deliver an acceptable interactive user experience.

I would like the format to be deliverable as a single file, rather than a number of files or a folder.

I would like the file format to support sharing additional information— for instance the contents of cameras.json. Ideally information about the authorship and what tool generated it. Perhaps information about real-world units when available.

I think it would be nice to support palletized spherical harmonics, but loading them after all the uniformly colored splats.

Additionally, I would like the format to be fairly simple to parse and to generate. I think .ply is a fine format for representing "raw" uncompressed data, and I think that having a single format for both compressed and uncompressed information might lead to user confusion.

Another thing I have played around with a little bit is to take the "far away" splats and condense them into a panoramic skybox. At the very least I think that the format should be able to represent whether the background is assumed to be black, white or transparent. But having an arbitrary skybox cubemap texture might also be useful for compression.

I think that the space is probably evolving too fast for this format to be the "last word" on splat shipping. I haven't really thought of what the right format would be for dynamic/animated splats, for scenes consisting of multiple splats (either somewhat naively composed, or arranged into some regular grid). Perhaps in the future there might be a way to do coarse-to-fine/LOD splats.

My thinking was a new .splat file (i like the filename) which is actually a .tar file containing a number of different internal files.

test.splat is a tar file containing (in order):

- metadata.json - authorship, generator tool, etc.
- cameras.json - camera views
- main.splatdata: Splat[]
- main.shdata: SHPaletteEntry[]

struct Splat {
	half3 position;
	unorm8x4 color; // color + opacity
	half6 cov3d;  // potentially split this out into separate scale/rotation
}

struct SHPaletteEntry {
	int splat_count;
        int[] splat_indices;
        half48 sh_coeffs;
}

@chris-aeviator
Copy link

chris-aeviator commented Nov 8, 2023 via email

@zalo
Copy link

zalo commented Nov 9, 2023

I'd be thrilled if GLTF2 could be made to store splats; on the surface, it seems to support both the streaming and extensibility features one would want for an evolving, bleeding-edge rendering primitive.

This article provides context for how BabylonJS handles the API for streaming GLTF2 loading via Microsoft's LOD Extension: https://doc.babylonjs.com/features/featuresDeepDive/importers/glTF/progressiveglTFLoad

Gaussian Splats have the added benefit that LODs are additive; presumably the low LODs will consist of the largest % of splats, with smaller, more transparent splats loading as part of the high LODs.

@mmalex
Copy link

mmalex commented Nov 9, 2023

I love aras's chunked approach, yes yes and more yes. for LOD, I wonder if you could do something within the 256 splat chunks that is like, a mini treelet. so you have a single mega splat representing the whole chunk, then do a radix-4 or radix-16 tree from there, storing deltas to your parent params? I guess you'd have to see if it actually helped but im kinda thinking along the lines of - more likely to make the splat values close to 0, so that a byte oriented compression of the output (brotli or lz4 or something) would get to squish it harder, without overly complicating the format. you could even 'delta' and 'undelta' at serialisation time, in-place, so that the in-memory / in-texture format is exactly as aras describes (plus the tree structure for LOD, I guess) but the on disk version has had the parent values subtracted out to make things more compressible.
ANYWAY. thinking out loud. I do love the chunked style.

@aras-p
Copy link

aras-p commented Nov 9, 2023

I do love the chunked style

@mmalex you know where I got the idea for the chunked style, right? From your own Dreams presentation :)

@mkkellogg
Copy link
Owner

I second what several others have already mentioned about supporting "future" data. This space is indeed evolving fast and I think whatever strategy we ultimately land on should to be adaptable/flexible enough to evolve with it (I know that's easy to say and possibly not so easy to do :) ).

I also believe supporting some sort of LOD mechanism is very important and will ultimately be required if we want these viewers to be capable of rendering large scenes. Gaussian splat LODs are out of my wheelhouse so I am unsure whether or not the lower-fidelity data should be produced at the same time as the original .ply and ultimately included in the .splat file, if it is something we can generate during the .ply to .splat conversion, or if it is something that could be generated on-the-fly when a scene is loaded, thereby eliminating the need for the .splat file to contain any LOD data.

@zalo and @mmalex -- I would definitely like to learn more about the LOD strategies you are proposing. @zalo -- If Gaussian splat LODs are additive, would supporting LODs simply be a matter of properly ordering the base data (highest fidelity) within the .splat file? It seems like we'd just get LODs for free, and it would be great for streaming/progressive loading. @mmalex -- For the approach you described, it sounds like we'd also only need to store the highest precision data in the .splat file, and LODs would be computed at runtime. However it seems like a lower-LOD-first streaming strategy wouldn't be feasible, since lower-LOD splats would need to be constructed from higher-LOD splats. Sorry if my questions are confusing and please correct me if I'm wrong in any of my thinking here :)

As far as compression goes, I'm a big fan of the chunked approach as well -- thank you @aras-p for sharing your very detailed and insightful thoughts on this matter.

@donmccurdy
Copy link

donmccurdy commented Nov 12, 2023

glTF 2.0 has a concept of "extensions", and that's usually the path by which new features are added and adopted. Here, I'd imagine defining EXT_splat, extending a glTF node:

"scenes": [
  "children": [ 0 ],
],
"nodes": [
  {
    "name": "MySplat",
    "extensions": {
      "EXT_splat": {
        "count": 1024,
        "chunkData": 25, // accessor index to f32[] data?
        "positionTexture": 0, // texture index
        "rotationTexture": 1,
        "scaleTexture": 2,
        "colorTexture": 3,
        "shPaletteTexture": 4,
      }
    }
  }
],

The texture indices resolve to a texture associated with the file, which could be PNG or UASTC or something else. Future extensions could add new texture formats to glTF 2.0, and that wouldn't affect the EXT_splat definition above.

The chunkData index points to an accessor (N scalar or vector elements stored in a binary buffer), which glTF clients will know how to resolve.

If anyone would like to help with creating input data (.png or .exr uncompressed textures?) and defining the metadata, I'd be happy to help with converting textures to KTX2/UASTC and constructing glTF files using the hypothetical extension above. Also see https://github.com/donmccurdy/KTX2-Samples/blob/main/encode.sh for examples of KTX2 encoding steps (requires the latest KTX Software CLI alpha release).

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel?

Neither is strictly better. Web clients can use range requests to grab chunks of a file as if they were multiple files. But not all applications or servers implement range requests, and there is a bit of overhead on each request, so choices vary.

glTF has some flexibility here — .glb uses embedded resources, .gltf uses external resources, and conversion between the two (including any glTF extensions) is trivial.

@fasteinke
Copy link

We're obviously heading at breakneck speed towards the Splataverse ... someone had better reserve the domain names, etc, for this creature ...

@oreasono
Copy link

We're obviously heading at breakneck speed towards the Splataverse ... someone had better reserve the domain names, etc, for this creature ...

I took the domain splats.ai last month LOL.

glTF 2.0 has a concept of "extensions", and that's usually the path by which new features are added and adopted. Here, I'd imagine defining EXT_splat, extending a glTF node:

"scenes": [
  "children": [ 0 ],
],
"nodes": [
  {
    "name": "MySplat",
    "extensions": {
      "EXT_splat": {
        "count": 1024,
        "chunkData": 25, // accessor index to f32[] data?
        "positionTexture": 0, // texture index
        "rotationTexture": 1,
        "scaleTexture": 2,
        "colorTexture": 3,
        "shPaletteTexture": 4,
      }
    }
  }
],

The texture indices resolve to a texture associated with the file, which could be PNG or UASTC or something else. Future extensions could add new texture formats to glTF 2.0, and that wouldn't affect the EXT_splat definition above.

The chunkData index points to an accessor (N scalar or vector elements stored in a binary buffer), which glTF clients will know how to resolve.

If anyone would like to help with creating input data (.png or .exr uncompressed textures?) and defining the metadata, I'd be happy to help with converting textures to KTX2/UASTC and constructing glTF files using the hypothetical extension above. Also see https://github.com/donmccurdy/KTX2-Samples/blob/main/encode.sh for examples of KTX2 encoding steps (requires the latest KTX Software CLI alpha release).

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel?

Neither is strictly better. Web clients can use range requests to grab chunks of a file as if they were multiple files. But not all applications or servers implement range requests, and there is a bit of overhead on each request, so choices vary.

glTF has some flexibility here — .glb uses embedded resources, .gltf uses external resources, and conversion between the two (including any glTF extensions) is trivial.

I came from 3D graphics background and played with glTF a lot, I will see what I can do to help 'glTF2 extension that loads splat'. Even if the community agreed on creating a new file format, the glTF extension approach would still be valueable because glTF easily reaches the broader 3D audience. Do you think this glTF extension idea should be discussed seperately?

@slimbuck
Copy link

We're busy implementing a slightly compressed GS PLY format as interim solution while an all bells and whistles format is being thrashed out.

It takes ideas directly from dreams(@mmalex) and @aras-p and packages the data into a standard PLY file (though using non-standard PLY properties).

The PLY file contains two elements:

  • chunks storing float32 position and scale bounding box of 256 splats
  • vertices storing 4 uint32s per splat for position, rotation, scale and color

This gives roughly 4x saving over uncompressed PLY data without much quality degradation (visual tests still to be done).

We have a PR implementing decompression here.

It's a simple format with narrow scope, but please do let us know if we've missed anything obvious!

@aras-p
Copy link

aras-p commented Nov 18, 2023

It's a simple format with narrow scope, but please do let us know if we've missed anything obvious!

@slimbuck nice! I very much like the simplicity. I would think that within the same size, you could improve quality slightly (just a guess, I haven't actually tested it), at expense of a small amount of complexity:

  • Store quaternion in "smallest 3" format - 10 bits for three smallest components (each within -0.7..+0.7 range), two bits for which component index was the largest. Still 4 bytes, but quite a bit more accurate than just storing xyz in -1..+1 range.
  • For opacity, do transform similar to what I mentioned above, since most opacities are either almost fully opaque, or almost fully transparent.
  • For scale, raise it to power since most scales are towards the small end.
  • Super minor thing, but I would think that storing "min, max" for chunk bounds might be better than storing "min, range". With min/max and essentially a "lerp" to reconstruct the value you can be guaranteed that both min and max can be exactly represented under floating point, whereas with "min, range" the max value might not be. Probably does not matter all that much overall though.

@slimbuck
Copy link

Thanks @aras-p! I wasn't sure the extra complexity was worth it in such a simple format, but perhaps it is. I'll make these changes and compare the results to see if they make any difference. Thanks again!

@slimbuck
Copy link

I did some testing today with the garden scene, train scene and guitar scene and found the 2/10/10/10 quaternion format lowered reconstruction error deviation by a good 12-33%. I've updated the format to adopt this. I also changed chunks to store min/max instead of min/size, which just makes a lot of sense.

I tried the squared opacity mapping change too, but actually found that it resulted in worse error variance with my test scenes. Not too sure why this might be. So we're just storing plain old opacity for now.

We added this format to our editor tool's import & export if anyone is interested to give it a try.

Next up we will likely investigate some sort of splat LOD and support for skinning. It would be great to hear if anyone has started investigating either of these!

@drcmda
Copy link

drcmda commented Nov 30, 2023

if the splat format is being renewed, would it be possible to include bounds in the header somewhere? especially for streaming it would be very useful to be able to dynamically center/position a splat before it has fully loaded. i am using the antimatter format currently with streaming, so the model appears immediately, but it has no interop with center and fitting components.

@koktavy
Copy link

koktavy commented Jan 19, 2024

Would the LightGaussian paper be of help here?
https://github.com/VITA-Group/LightGaussian

@raymondyfei
Copy link

raymondyfei commented Feb 12, 2024

It would be cool if there were a way to compose with draco compression

@pwais Yes I fully agree on this! There's already some discussion on supporting Draco for point cloud in GLTF. KhronosGroup/glTF#1809
I think Gsplat is a good reason to gather more consensus there and to push this initiative further.

In theory, couldn't gsplat GLTF use no extensions, just use a bunch of standard buffers, then leave it up to the viewer gl code to interpret the data?

That could also be doable...but the main goal of using GLTF is that we need some hierarchy to do some sort of composition (otherwise using just PLY is fine), and hence we need some tag to identify that a buffer is related to a Gsplat and another buffer is not.

@donmccurdy
Copy link

donmccurdy commented Feb 12, 2024

If attributes were stored in accessors rather than buffers (see #47 (comment)), we would:

  1. avoid +33% filesize overhead of base64 and extra parsing cost
  2. enable Meshopt compression out of the box. no extension needed, except perhaps to add custom filters

By using the existing glTF accessor/bufferview/buffer constructs, you have the choice of storing data as base64 data URIs, or as binary. The three.js community uses glTF extensively, and I always recommend avoiding these data URIs because the loading cost is significant, unless the data involved is trivially small.

@pwais
Copy link

pwais commented Feb 13, 2024

That could also be doable...but the main goal of using GLTF is that we need some hierarchy to do some sort of composition (otherwise using just PLY is fine)

The hierarchy and organization is def helpful, but pretty sure that the performance benefit of GLTF is that the buffers largely skip any JS intervention / SERDES, unlike PLY (at least canonical THREE PLY loader appears to be non-native code). That might make a marginally small benefit if you just load one single object / scene to viz, even if it's a few megabytes...

But if you want to viz a stream of splats / point cloud data, then GLTF beats the pants off anything that requires javascript SERDES. E.g. try loading 1,000 frames of ~1MB PLYs (or protobufs or your fav buffer format) versus 1,000 frames of GLTFs. And scrub over those frames. Browsers / phones from even 2-4 years ago, GLTFs are night-and-day better, especially above 1M points. (Like imagine if you're working in robotics, and you want to introduce a replacement for rosbag... you surely would try out GLTF wouldn't you? no?!)

By using the existing glTF accessor/bufferview/buffer constructs, you have the choice of storing data as base64 data URIs

@donmccurdy 💯 💯 base64 data uris are extremely handy hacks but +1 to optional but not required

@raymondyfei
Copy link

If attributes were stored in accessors rather than buffers

Agreed. Storing the attributes into individual accessors would definitely be better, just as how mesh vertices are stored.

@raymondyfei
Copy link

But if you want to viz a stream of splats / point cloud data, then GLTF beats the pants off anything that requires javascript SERDES.

Yes! That's another reason we'd make a schema to store Gsplat in GLTFs.

@arcman7
Copy link

arcman7 commented Mar 5, 2024

A bit late the party here, but is there magic number or specific file header that we all agree on that I can reliably use to detect whether or not a given file is a .splat file?

@kitaedesigns
Copy link

Stumbled upon this thread, but as I've been working more and more with .splats in the architecture space/use cases, i think .gltf would be a great container for it, especially with meshopt compression etc.

@arcman7
Copy link

arcman7 commented Aug 13, 2024

Stumbled upon this thread, but as I've been working more and more with .splats in the architecture space/use cases, i think .gltf would be a great container for it, especially with meshopt compression etc.

So you would split up the different attributes of a splat vertex into what parts of the gltf file exactly? I'm not as familiar, so I don't immediately know how I would go about that.

@kitaedesigns
Copy link

@arcman7 if you scroll up to some[ previous comments you can see some suggestions from folks but .gltf can already show off point clouds using point primitives, so that handles the XYZ locations of all the points in a splat,

#47 (comment)
#47 (comment)
#47 (comment)

@IHKYoung
Copy link

Has anyone tried to use Python to implement ply conversion to ksplat?

@SharkWipf
Copy link

After posting on an issue on the KhronosGroup glTF repo, it was suggested I post this here as well, since it's relevant here too, and would get more eyes on it:

One concern raised in the MrNerf Discord server, is that standardizing on a format this early in the game might either stifle innovation, or result in an explosion of competing standards, especially in a field as actively researched as this.
New methods that may need to store additional information would be forced to forego early standardizations like this.
On top of that, most whitepaper implementations likely simply won't care to support anything beyond what they need.

While the creation of standards is inevitable, whether they will be able to be successful at this point remains to be seen.

After some further discussion, we instead worked out a super early(!) draft for a high-level container format.
To be clear: This would have no direct implications on your work as it operates at a different level entirely, but it felt worth mentioning it here since it's closely related and adds a different perspective, and now is technically ongoing work in parallel.
An initial early version of this draft is posted here, for reference: https://gist.github.com/SharkWipf/a02a2616424d0a2ab69af2d3ad8c1829

@jo-chemla
Copy link

jo-chemla commented Nov 12, 2024

Just cross-linking a discussion nianticlabs/spz#7 given the recent announcement by Niantic of their open-sourcing of the spz format which might be interesting to some people in this discussion. nianticlabs/spz#7
Given the related blog post, the choice has been made to carefully select the adequate amount of data per splat-primitive (eg 3x8bits for rotation rather than 4x32bits), right quantization, a column-based format, and standard gzipping.

@LaFeuilleMorte
Copy link

LaFeuilleMorte commented Nov 21, 2024

Hi, there's a recent work that achieves very high compression ratio while maintaining SH cofficients. See this :https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/
And the compression and decompression code was very easy to implement:

https://github.com/nerfstudio-project/gsplat/blob/dd66cbd597f376f103e90de8b6265087b482eac1/gsplat/compression/png_compression.py#L75

  1. Compression has dependencies on Plas to sort gaussians and https://github.com/DeMoriarty/TorchPQ?tab=readme-ov-file#install
  2. Decompression is very easy and we don't need extra python libs to get this work. And it's very easy to implement this to any other languages (Typescript, C# for example).
  3. Compression ratio: (In my case. My raw ply file: about 1GB; Compressed files: 46 MB).
  4. File format:
    388366046-b7d5628d-9246-4053-9c03-6688bae15bf6
    A little bit tricky was the output formats were a bunch of PNG images with respect to gaussian splats properties. Maybe we can encode them to base64 and store them in one file to reduce complexity

@LaFeuilleMorte
Copy link

I made some clean on the decompression code. I think it's quite easy to implement. However the file formats should be further discussed.

import os
import json
from torch import Tensor
from typing import Any, Callable, Dict
import torch
import numpy as np

def inverse_log_transform(y):
    return torch.sign(y) * (torch.expm1(torch.abs(y)))

def _decompress_png_16bit(
    compress_dir: str, param_name: str, meta: Dict[str, Any]
) -> Tensor:
    """Decompress parameters from PNG files.

    Args:
        compress_dir (str): compression directory
        param_name (str): parameter field name
        meta (Dict[str, Any]): metadata

    Returns:
        Tensor: parameters
    """
    import imageio.v2 as imageio

    if not np.all(meta["shape"]):
        params = torch.zeros(meta["shape"], dtype=getattr(torch, meta["dtype"]))
        return meta

    img_l = imageio.imread(os.path.join(compress_dir, f"{param_name}_l.png"))
    img_u = imageio.imread(os.path.join(compress_dir, f"{param_name}_u.png"))
    img_u = img_u.astype(np.uint16)
    img = (img_u << 8) + img_l

    img_norm = img / (2**16 - 1)
    grid_norm = torch.tensor(img_norm)
    mins = torch.tensor(meta["mins"])
    maxs = torch.tensor(meta["maxs"])
    grid = grid_norm * (maxs - mins) + mins

    params = grid.reshape(meta["shape"])
    params = params.to(dtype=getattr(torch, meta["dtype"]))
    return params


def _decompress_npz(compress_dir: str, param_name: str, meta: Dict[str, Any]) -> Tensor:
    """Decompress parameters with numpy's NPZ compression."""
    arr = np.load(os.path.join(compress_dir, f"{param_name}.npz"))["arr"]
    params = torch.tensor(arr)
    params = params.reshape(meta["shape"])
    params = params.to(dtype=getattr(torch, meta["dtype"]))
    return params


def _decompress_png(compress_dir: str, param_name: str, meta: Dict[str, Any]) -> Tensor:
    """Decompress parameters from PNG file.

    Args:
        compress_dir (str): compression directory
        param_name (str): parameter field name
        meta (Dict[str, Any]): metadata

    Returns:
        Tensor: parameters
    """
    import imageio.v2 as imageio

    if not np.all(meta["shape"]):
        params = torch.zeros(meta["shape"], dtype=getattr(torch, meta["dtype"]))
        return meta

    img = imageio.imread(os.path.join(compress_dir, f"{param_name}.png"))
    img_norm = img / (2**8 - 1)

    grid_norm = torch.tensor(img_norm)
    mins = torch.tensor(meta["mins"])
    maxs = torch.tensor(meta["maxs"])
    grid = grid_norm * (maxs - mins) + mins

    params = grid.reshape(meta["shape"])
    params = params.to(dtype=getattr(torch, meta["dtype"]))
    return params



def _decompress_kmeans(
    compress_dir: str, param_name: str, meta: Dict[str, Any], **kwargs
) -> Tensor:
    """Decompress parameters from K-means compression.

    Args:
        compress_dir (str): compression directory
        param_name (str): parameter field name
        meta (Dict[str, Any]): metadata

    Returns:
        Tensor: parameters
    """
    if not np.all(meta["shape"]):
        params = torch.zeros(meta["shape"], dtype=getattr(torch, meta["dtype"]))
        return meta

    npz_dict = np.load(os.path.join(compress_dir, f"{param_name}.npz"))
    centroids_quant = npz_dict["centroids"]
    labels = npz_dict["labels"]

    centroids_norm = centroids_quant / (2 ** meta["quantization"] - 1)
    centroids_norm = torch.tensor(centroids_norm)
    mins = torch.tensor(meta["mins"])
    maxs = torch.tensor(meta["maxs"])
    centroids = centroids_norm * (maxs - mins) + mins

    params = centroids[labels]
    params = params.reshape(meta["shape"])
    params = params.to(dtype=getattr(torch, meta["dtype"]))
    return params

def _get_decompress_fn(param_name: str) -> Callable:
        decompress_fn_map = {
            "means": _decompress_png_16bit,
            "scales": _decompress_png,
            "quats": _decompress_png,
            "opacities": _decompress_png,
            "sh0": _decompress_png,
            "shN": _decompress_kmeans,
        }
        if param_name in decompress_fn_map:
            return decompress_fn_map[param_name]
        else:
            return _decompress_npz

def decompress(compress_dir: str) -> Dict[str, Tensor]:
    """Run decompression

    Args:
        compress_dir (str): directory that contains compressed files

    Returns:
        Dict[str, Tensor]: decompressed Gaussian splats
    """
    with open(os.path.join(compress_dir, "meta.json"), "r") as f:
        meta = json.load(f)

    splats = {}
    for param_name, param_meta in meta.items():
        decompress_fn = _get_decompress_fn(param_name)
        splats[param_name] = decompress_fn(compress_dir, param_name, param_meta)

    # Param-specific postprocessing
    splats["means"] = inverse_log_transform(splats["means"])
    return splats

if __name__ == "__main__":
    compress_dir = ""
    splats = decompress(compress_dir)

@ebeaufay
Copy link

Another use-case here is serving tiled and multileveled splats as OGC 3DTiles that use Gltf for tile content

https://www.jdultra.com/splats2/index.html

Using the ogc3dtiles spec could make sense, it already supports mesh and point-clouds, why not splats..

For info, as @donmccurdy suggested, I use extra accessors for covariance and harmonics allowing for easy compression.

An extension spec describing how these accessors are arranged would be nice for interoperability.

Another note is KHR_draco_mesh_compression that doesn't support point primitives. The issue's been discussed here

Draco gives twice as good compression compared with quantize or meshopt (in this case) and three.js, cesium and unity all decode draco compressed Gltf points no problem. Gltf encoders tend to follow the spec though.

Changing the KHR_draco spec would allow viewing draco compressed splats as points out of the box. I use triangle primitives to get draco compression, which doesn't make sense when rendering splats directly.

@jo-chemla
Copy link

multileveled splats as OGC 3DTiles

Interesting take on tiling/streaming large gaussian splats scenes via OGC 3D-Tiles, thanks @ebeaufay! The OGC + Cesium (now acqui-hired by bentley, sorry for the ping @pjcozzi @lilleyse) team could probably be very interested by the standard being extended to that new type of primitives. + Thanks for the comments on the underlying implementation like extra gltf accessors for compression, and draco not supporting point primitives.

Are you taking any assumption regarding locality - eg that splats are small enough so that you can only use the gaussians centers to allocate them to specific regions of the hierarchy the same way points from a ptc would be assigned a tile? Are do you account for their sizes/scales more akin to how faces/polygons are assigned a tile? Plus do you rely on ImplicitTiling at all like Octree, are just using standard pointcloud tilers? Very interesting demo, even if the viewer does not yet seem to unload tiles at lower-resolution or out-of-frustum while navigating!

@ebeaufay
Copy link

ebeaufay commented Dec 9, 2024

Hi @jo-chemla the demo loads/unloads tiles, try playing with the detail multiplier slider.

The thing is most "small" splats have detail near the center and less further out. Since the render camera is near the center, most of the tileset is immediately loaded. It's just that the tree is imbalanced. The use case would be much stronger with a large splats dataset and even level of detail.

I use an octree or quad tree (or kd) so implicit tiling is possible but I don't recommend it for a pure visual use-case. Splitting at the "median"rather than the center makes for even tiling, constant LOD "jumps" and smaller tilesets. It totally counterbalances the smaller json with implicit tiles and this is valid for mesh and PC too in my experience

I think I used something like 2.5 standard deviations around splats to compute the bounds of the leaves so leaves do overlap. Wether it's necessary or not depends on the simplification algorithm but it can be a big deal. Even in detailed areas, large transparent splats can appear.

I have a draft for a gltf extension I'll publish to work with this viewer until an official one comes out. An OGC3DTILES specific extension would be good too.

@lilleyse
Copy link

lilleyse commented Dec 10, 2024

@jo-chemla @ebeaufay - we're working on a draft glTF extension for splats.

The idea is to add _ROTATION and _SCALE attributes to a glTF POINTS primitive. The zeroth-order spherical harmonic is stored in COLOR_0. This is sufficient for a lot of geospatial use cases where we would have previously used unlit photogrammetry meshes.

We're interested in the higher-order spherical harmonics as well. I could see there being multiple glTF extensions that build on top of each other.

Here's what the base extension looks like so far:

Name Accessor Type Component Type Description
POSITION "VEC3" 5126 (FLOAT) Gaussian Splat position
COLOR_0 "VEC4" 5121 (UNSIGNED BYTE) normalized Color (Spherical Harmonic 0 (Diffuse) and opacity)
_ROTATION "VEC4" 5126 (FLOAT) Gaussian Splat rotation
_SCALE "VEC3" 5126 (FLOAT) Gaussian Splat scale

We've also seen good results with EXT_mesh_quantization + EXT_meshopt_compression

  • Full bitrate: 16 bits per component for position, scale, and rotation
  • Minimum bitrate: 10 bits per position component, 8 bits per scale and rotation components
  • Rotation uses quaternion filter, scale uses exponent filter
Source Splats PLY size glTF meshopt glTF(full bitrate) meshopt glTF(min bitrate)
Los Reyes (Polycam) 1214130 301.1MB 77.7MB 35.0MB 24.8MB

Here's what it looks like on a branch of CesiumJS, loading in as 3D Tiles:

image

@ebeaufay I'm curious how this compares to your approach.

@ebeaufay
Copy link

ebeaufay commented Dec 10, 2024

@lilleyse,
I encode harmonics in color_0, color_1to3, color_4to8 and color_9to15. I use an extension parameter to specify how many harmonic levels are used per primitive. I'd say harmonics are good to have, at least as an option, because the use-cases for splats I encountered are when photographic fidelity matters most.

I encode the covariance matrix instead of the scale + quaternion, that way the covariance doesn't need to be reconstructed after decoding/decompressing.

Quantization is less efficient and matrices might not stay positive-definite during compression so they need some sort of regularization. Still, faster decoding is important for the streaming use-cases where 50k splats need to be decoded in less than a couple milliseconds.

I need 24 bits for covariance and 20 for position quantization to have a near lossless compression which seems a lot but I think this depends on the dataset. Lumalabs has the huge SkyDome splats compared to the tiny splats near the cameras.

Also, why not allow encoding the splats as triangle primitives when the extension is specified as "required". If it's not required, the splats can be viewed as points but when it is required they can only be viewed as splats which would allow draco. Perhaps test it with your spec but draco gave me far better results than quantize and meshopt.

@pjcozzi
Copy link

pjcozzi commented Dec 14, 2024

@ebeaufay great discussion. As part of the Metaverse Standards Forum, we are planning to host a virtual industry-wide townhall on the potential for an open standard for gsplats on January 22, 2025. We would love for you to join to present/demo your ideas for ~7-10 minutes. @LossieFree is helping organizing and will email you at [email protected] (per your GitHub profile).

@jo-chemla
Copy link

Look like this Forum is going to be a very interesting place to discuss existing tooling and efforts, is this open @pjcozzi @LossieFree ?

@pjcozzi
Copy link

pjcozzi commented Dec 16, 2024

@jo-chemla for sure, the virtual town hall will be open to everyone. We can add a link to this thread once it is posted. @LossieFree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests