From e70027341ae34a1c29a62e2c3e8e8b1a95c47d02 Mon Sep 17 00:00:00 2001 From: Richard Vargas Date: Mon, 24 Jan 2022 12:47:16 -0800 Subject: [PATCH 1/3] Updated steps on creating md5 hash. --- docs/1.7/slpk_hashtable.cmn.md | 1 - docs/1.8/slpk_hashtable.cmn.md | 37 ++++++++ ...xed 3d Scene Layer Format Specification.md | 93 ++++++------------- 3 files changed, 64 insertions(+), 67 deletions(-) create mode 100644 docs/1.8/slpk_hashtable.cmn.md diff --git a/docs/1.7/slpk_hashtable.cmn.md b/docs/1.7/slpk_hashtable.cmn.md index 49ae890..a298861 100644 --- a/docs/1.7/slpk_hashtable.cmn.md +++ b/docs/1.7/slpk_hashtable.cmn.md @@ -7,7 +7,6 @@ A hash table is a data structure that implements an associative array abstract d ## To create SLPK hash table 1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. 2. Convert all file paths to their canonical path. Canonical paths must: - - Be lower case - Use a forward slash as the path separator `/` - Not contain a heading forward slash - Example: `/my/PATH.json` converts to `my/path.json` diff --git a/docs/1.8/slpk_hashtable.cmn.md b/docs/1.8/slpk_hashtable.cmn.md new file mode 100644 index 0000000..da32284 --- /dev/null +++ b/docs/1.8/slpk_hashtable.cmn.md @@ -0,0 +1,37 @@ +# SLPK Hash Table + +Scanning an SLPK (ZIP store) containing millions of documents is usually inefficient and slow. A hash table file may be added to the SLPK to improve first load and file scanning performances. + +A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found (Wikipedia). + +## To create SLPK hash table +1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. +2. Convert all file paths to their canonical path. Canonical paths must: + - Use a forward slash as the path separator `/` + - Not contain a heading forward slash + - Example: `/my/PATH.json` converts to `my/path.json` +3. Compute the MD5 128 bit-hash for each canonical path to create an array of key-value pairs [MD5-digest ->Offset 64bit]. + +4. Sort the key-value pairs by ascending keys using the following comparison based on little-endian architecture: +```cpp + //for performance the following C++ comparator is used: (**little-endian**) + typedef std::array< unsigned char, 16 > md5_hash; + bool less_than( const md5_hash& hash_a, md5_hash& hash_b ) + { + const uint64* a = reinterpret_cast(&hash_a[0]); + const uint64* b = reinterpret_cast(&hash_b[0]); + return a[0] == b[0] ? a[1] < b[1] : a[0] < b[0]; + } +``` +5. Write this sorted array as the last file of the SLPK archive (last entry in the ZIP central directory). The file must be named `@specialIndexFileHASH128@`. Each array element is 24-bytes long, and includes the following restrictions: + - 16 bytes for the MD5-digest and 8 bits for the offset. + - Must be in little-endian order. + - Must **not** contain padding. + - Must **not** contain a header. +## To read SLPK hash table +1. Convert the input path to the canonical path and compute its MD5 hash (i.e. key). + +2. Search for key in the hash table. This can be easily implemented as a binary (i.e. dichotomic) search since the keys are sorted. + +3. Retrieve the file from the ZIP archive using the offset associated with key. + diff --git a/format/Indexed 3d Scene Layer Format Specification.md b/format/Indexed 3d Scene Layer Format Specification.md index f71a763..22f3350 100644 --- a/format/Indexed 3d Scene Layer Format Specification.md +++ b/format/Indexed 3d Scene Layer Format Specification.md @@ -1,14 +1,14 @@ # Esri Indexed 3d Scene Layer (I3S) and Scene Layer Package (*.slpk) Format Specification -Version 1.8. May 21, 2021 +Version 1.8. June 30, 2019 -*Contributors:* Chris Andrews, Tamrat Belayneh, Jillian Foster, Javier Gutierrez, Markus Lipp, Sud Menon, Pascal Müller, Dragan Petrovic, Ronald Poirrier, Simon Reinhard, Juan Ruiz, Johannes Schmid, Ivonne Seler, Chengliang Shan,Thorsten Reitz, Ben Tan, Richard Vargas, Moxie Zhang +*Contributors:* Chris Andrews, Tamrat Belayneh, Jillian Foster, Javier Gutierrez, Markus Lipp, Sud Menon, Pascal Müller, Dragan Petrovic, Ronald Poirrier, Simon Reinhard, Juan Ruiz, Johannes Schmid, Ivonne Seler, Chengliang Shan,Thorsten Reitz, Ben Tan, Richard Vargas, Moxie Zhang *Acknowledgements:* Bart van Andel, Fabien Dachicourt, Carl Reed --- -The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format used to rapidly stream and distribute large volumes of 3D GIS data to mobile, web and desktop clients. I3S content can be shared across enterprise systems using both physical and cloud servers. ArcGIS Scene Layers and [Scene Services](http://server.arcgis.com/en/server/latest/publish-services/windows/scene-services.htm) use the I3S infrastructure. See the [version history on the main ReadMe](../README.md) for more details about previous versions of I3S as well as information on compatability with the OGC I3S Community Standard. +The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format used to rapidly stream and distribute large volumes of 3D GIS data to mobile, web and desktop clients. I3S content can be shared across enterprise systems using both physical and cloud servers. ArcGIS Scene Layers and [Scene Services](http://server.arcgis.com/en/server/latest/publish-services/windows/scene-services.htm) use the I3S infrastructure. See the [version history on the main ReadMe](../README.md) for more details about previous versions of I3S as well as information on compatability with the OGC I3S Community Standard. # Table of Contents @@ -31,7 +31,6 @@ The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format us  [I3SREST](#i3sREST)
 [Extracted Scene Layer Package](#ESLPK)
 [Scene Layer Packages](#SLPK)
-  [1.8 SLPK Structure](#1.8-SLPK-Structure)
  [1.7 SLPK Structure](#1.7-SLPK-Structure)
  [1.6 SLPK Structure](#1.6-SLPK-Structure)
  [Metadata](#metadata)
@@ -50,7 +49,7 @@ Scene Layers provide a structured way for clients to store and visualize large v I3S is designed to support large datasets of 3D content ranging from local to global extent containing detailed features. Clients can visualize scene layers by taking advantage of the multi-LoD (level of detail) representation and symbology to create the right user experience for visualizing the 3D content. The I3S format continues to evolve and new functionality continues to be added. Previous versions of I3S (SLPK) can be converted and validated using the [I3S Converter](../i3s_converter/i3s_converter_ReadMe.md). You can find an overview of the[Version History of I3S](../README.md). -I3S is organized as [nodes](#Nodes), which are structured into [node pages](#NodePages). The node page includes the bounding volume, child reference, count and the [level of detail selection](LevelofDetail.md). Nodes contain all the information to describe features including [geometry](#Geometry), [attributes](#Attributes) and [textures](#textures). Scene Layers can be created in cartesian 3D or in global 3D world [coordinate reference systems](#CRS). I3S Scene Layers can be delivered to web, mobile and desktop clients. Most users will interact with Scene Layers using applications with cloud or server-based information. In these cases, the scene layer content is on the server and is provided to clients through a RESTful interface. These web addressable resources provide access to the scene layer, nodes, and associated resources. Alternatively, a scene layer can be delivered as a Scene Layer Package. This is a single file that includes the complete node tree and all necessary resources in an archive. +I3S is organized as [nodes](#Nodes), which are structured into [node pages](#NodePages). The node page includes the bounding volume, child reference, count and the [level of detail selection](LevelofDetail.md). Nodes contain all the information to describe features including [geometry](#Geometry), [attributes](#Attributes) and [textures](#textures). Scene Layers can be created in cartesian 3D or in global 3D world [coordinate reference systems](#CRS). I3S Scene Layers can be delivered to web, mobile and desktop clients. Most users will interact with Scene Layers using applications with cloud or server-based information. In these cases, the scene layer content is on the server and is provided to clients through a RESTful interface. These web addressable resources provide access to the scene layer, nodes, and associated resources. Alternatively, a scene layer can be delivered as a Scene Layer Package. This is a single file that includes the complete node tree and all necessary resources in an archive. # Organization and Structure @@ -61,7 +60,7 @@ To ensure high performance when visualizing 3D content, data are spatially group The bounding volume is defined either as minimum bounding sphere (MBS) or oriented bounding box (OBB) representation. ![Minimum bounding sphere..](../docs/img/MBS_Example.png) - + *3D objects enclosed in minimum bounding spheres.* ![Oriented bounding box.](../docs/img/OBB_Example.png) @@ -82,7 +81,7 @@ In order to provide a scalable representation of the original data, parent nodes ## Nodes -In a Scene Layer, data are spatially grouped into nodes. The nodes contain node resources for the bounding volume. Each node has a unique identifier, which allows clients to efficiently locate and load the resources. +In a Scene Layer, data are spatially grouped into nodes. The nodes contain node resources for the bounding volume. Each node has a unique identifier, which allows clients to efficiently locate and load the resources. ### Feature @@ -108,9 +107,9 @@ In addition to a bounding volume, each node contains references to node resource *Node resource for backward compatibility with 1.6* | Node Resources |Integrated Mesh | 3D Object | Building Scene Layer | -| -------------- | ----------------|---------------------------- | ---------------------------- | +| -------------- | ----------------|---------------------------- | ---------------------------- | | sharedResources | ![yes](images/checkmark.png) |![yes](images/checkmark.png) |![yes](images/checkmark.png) | -| 3dNodeIndexDocument|![yes](images/checkmark.png) |![yes](images/checkmark.png) | ![yes](images/checkmark.png) | +| 3dNodeIndexDocument|![yes](images/checkmark.png) |![yes](images/checkmark.png) | ![yes](images/checkmark.png) | *Note: All binary data is stored in little endian.* @@ -119,7 +118,7 @@ In addition to a bounding volume, each node contains references to node resource Depending on the scene layer type and the version of I3S, different geometry representations are used. For example, an integrated mesh scene layer geometry data includes all vertex attributes, feature counts, and mesh segmentation. -In I3S version 1.7 3D Objects and Integrated Mesh scene layer are using [geometryBuffer](../docs/1.7/geometryBuffer.cmn.md) with draco compression to describe the geometry. Previous versions of 3D Object and Integrated Mesh scene layers (1.6 and earlier) define geometry in the [defaultGeometrySchema](../docs/1.6/defaultGeometrySchema.cmn.md). The expected triangle/face winding order in all geometry resources is counterclockwise. +In I3S version 1.7 3D Objects and Integrated Mesh scene layer are using [geometryBuffer](../docs/1.7/geometryBuffer.cmn.md) with draco compression to describe the geometry. Previous versions of 3D Object and Integrated Mesh scene layers (1.6 and earlier) define geometry in the [defaultGeometrySchema](../docs/1.6/defaultGeometrySchema.cmn.md). The expected triangle/face winding order in all geometry resources is counterclockwise. Point and Point cloud layers model geometries as points. Point scene layer define the geometry in [featureData](../docs/1.6/featureData.cmn.md). For a Point Cloud Scene Layer the binary [geometry](../docs/2.0/defaultGeometrySchema.pcsl.md") is lepcc-xyz compressed. @@ -134,7 +133,7 @@ Metadata on each attribute resource is made available to clients via the scene s #### Textures -The texture resource contains texture image files. Textures are stored as a binary resource. Individual textures should be aggregated into texture atlases (An image containing a collection of smaller images.). Client capabilities for handling complex UV cases may vary, so texture coordinates are used. Texture coordinates do not take atlas regions into account directly. The client is expected to use the sub-image region values and the texture coordinates to best handle repeating textures in atlases. As of I3S version 1.8, the recommended compressed texture format is [Basis Universal](https://github.com/BinomialLLC/basis_universal) in [Khronos KTX2™️](http://github.khronos.org/KTX-Specification/) container format. The benefits of this texture format can be seen [in this blog](https://www.esri.com/arcgis-blog/products/arcgis/3d-gis/esri-collaborates-with-binomial-to-improve-basis-universal-texture-compression-speeds/). +The texture resource contains texture image files. Textures are stored as a binary resource. Individual textures should be aggregated into texture atlases (An image containing a collection of smaller images.). Client capabilities for handling complex UV cases may vary, so texture coordinates are used. Texture coordinates do not take atlas regions into account directly. The client is expected to use the sub-image region values and the texture coordinates to best handle repeating textures in atlases. As of I3S version 1.8, the recommended compressed texture format is [Basis Universal](https://github.com/BinomialLLC/basis_universal) in [Khronos KTX2™️](http://github.khronos.org/KTX-Specification/) container format. The benefits of this texture format can be seen [in this blog](https://www.esri.com/arcgis-blog/products/arcgis/3d-gis/esri-collaborates-with-binomial-to-improve-basis-universal-texture-compression-speeds/). For more details, see the [texture](../docs/1.8/texture.cmn.md) and [textureSetDefinition](../docs/1.8/textureSetDefinition.cmn.md). @@ -143,10 +142,10 @@ For more details, see the [texture](../docs/1.8/texture.cmn.md) and [textureSetD To ensure backward compatibility with 1.6 clients, a 1.7 scene layer needs to also include the [3dNodeIndexDocument](../docs/1.7/3DNodeIndexDocument.cmn.md) resource as well as the [sharedResources](../docs/1.7/sharedResource.cmn.md) available for any node. SharedResource includes the material definition of the node. ## Node Page -In version 1.6 and earlier, each node is stored individually as a 3DNodeIndexDocument, causing the tree traversal performance to be negatively impacted due to the large number of small resource requests required. Version 1.7 packs many nodes into a single resource called a node page. +In version 1.6 and earlier, each node is stored individually as a 3DNodeIndexDocument, causing the tree traversal performance to be negatively impacted due to the large number of small resource requests required. Version 1.7 packs many nodes into a single resource called a node page. These node pages are created by representing the tree as a flat array of nodes where internal nodes reference their children by their array indices. -I3S creators are free to use any ordering (e.g. breadth first, depth first) of the nodes into a flat array of nodes. In version 1.7, the ID for a node is an integer that represents the index of the node within this flattened array. +I3S creators are free to use any ordering (e.g. breadth first, depth first) of the nodes into a flat array of nodes. In version 1.7, the ID for a node is an integer that represents the index of the node within this flattened array. ![bounding volume hierarchy tree](../docs/img/BoundingVolumeHierarchyTree.png) ![node page](../docs/img/NodePageArray.png) @@ -160,7 +159,7 @@ I3S creators are free to use any ordering (e.g. breadth first, depth first) of t The I3S specification supports specifying the Coordinate Reference System (CRS) as a Well Known Text, as defined in clause 6.4 in OGC Simple Features [99-036/ISO 19125](http://portal.opengeospatial.org/files/?artifact_id=13227) standard. I3S also supports specifying CRS in the WKT standard [CRS/ISO 19162:2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html), Geographic information – Well-known text representation of coordinate reference systems, which provided an update to the original WKT representation. The two standards are referred to as WKT1 and WKT2 respectively. -In I3S implementation the CRS may be represented using either WKT1 or WKT2. While WKT1 has been in use for many years, WKT1 has been superseded by WKT2. Although implementations of OGC standards using WKT2 are not yet widely available, the guidance from the OGC/ISO community is to implement WKT2. +In I3S implementation the CRS may be represented using either WKT1 or WKT2. While WKT1 has been in use for many years, WKT1 has been superseded by WKT2. Although implementations of OGC standards using WKT2 are not yet widely available, the guidance from the OGC/ISO community is to implement WKT2. WKT1 does not support explicit definition of axis order. Therefore, I3S implementers need to note for their implementations if they support WKT1 only or both (as WKT2 requires continued support of WKT1). In addition, please note that not all ArcGIS clients support WKT2 yet. @@ -176,7 +175,7 @@ All I3S profiles support writing 3D content in two modes: *global* and *local*. In both modes, node index and position vertex must have the same CRS. In addition, all vertex positions are specified as an *offset* from a node's Minimum Bounding Volume (MBV) center. The MBV could be specified as a Minimum Bounding Sphere (MBS) or as an Oriented Bounding Box (OBB). -All vertex positions SHALL be specified using a geodetic CRS (including Cartesian coordinate reference systems), where x,y,z axes are all in same unit, and with a per-node offset (from the center point of the node's minimum bounding sphere) for all vertex positions. +All vertex positions SHALL be specified using a geodetic CRS (including Cartesian coordinate reference systems), where x,y,z axes are all in same unit, and with a per-node offset (from the center point of the node's minimum bounding sphere) for all vertex positions. Axis Order: Axis order explicitly defined by the CRS SHALL be used when present. When the axis order is not defined by the CRS, Easting, Northing, Height axis order SHALL be used. The Height axis SHALL always point upwards towards the sky (away from the center of the earth). @@ -191,7 +190,7 @@ The location of all vertex positions and index-related data structures, such as For an I3S layer to be in a *local* mode the following requirements must be met: -All vertex positions are specified using geodetic CRS, identified by an EPSG code. Any CRS with an EPSG code *other* than 4326 or 4490 will be treated as in a local mode. +All vertex positions are specified using geodetic CRS, identified by an EPSG code. Any CRS with an EPSG code *other* than 4326 or 4490 will be treated as in a local mode. - All three components of a vertex position (XYZ) and the Minimum Bounding Volume (MBV) radius (for MBS) or halfSize (for OBB) need to be in the same unit. @@ -207,7 +206,7 @@ The heightModelInfo, included in the 3DSceneLayerInfo resource, is used by clien # I3S Services -A RESTful API allows access to I3S scene layers. Each scene layer profile has different components and features. For details on the API of a specific profile and version, refer to the individual README documents. +A RESTful API allows access to I3S scene layers. Each scene layer profile has different components and features. For details on the API of a specific profile and version, refer to the individual README documents. Version 1.7 support for [3D Objects](../docs/1.7/3Dobject_ReadMe.md), [Integrated Mesh](../docs/1.7/IntegratedMesh_ReadMe.md), [Building](../docs/1.7/BSL_ReadMe.md), and [Point](../docs/1.7/Point_ReadMe.md) @@ -238,7 +237,7 @@ An SLPK is a [zip](https://en.wikipedia.org/wiki/Zip_(file_format)) archive cont Both 64-bit and 32-bit zip archives are supported. 64-bit is required for datasets larger than 2GB. -Please note that this method is slightly different than a typical zip archive. In general, when a file is added to a zip archive, the new file is individually compressed, and the overall archive is compressed. **That is not the case for SLPK.** When adding files to an SLPK, the new file is compressed, but the overall archive remains uncompressed and is archived using compression level not compressed (`STORE`). +Please note that this method is slightly different than a typical zip archive. In general, when a file is added to a zip archive, the new file is individually compressed, and the overall archive is compressed. **That is not the case for SLPK.** When adding files to an SLPK, the new file is compressed, but the overall archive remains uncompressed and is archived using compression level not compressed (`STORE`). This is an example of a geometry resource opened in 7-zip. Notice that both the Size and the Packed Size are equal. The method is `STORE`. @@ -246,7 +245,7 @@ This is an example of a geometry resource opened in 7-zip. Notice that both the **File Extensions** -SLPK require file extensions to determine the file type. +SLPK require file extensions to determine the file type. Here are a few examples of SLPK file extensions: @@ -257,46 +256,7 @@ Here are a few examples of SLPK file extensions: **Hash** -In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1.7/slpk_hashtable.cmn.md) is used to improve loading time. The hash must be the last item at the end of the central directory and named `@specialIndexFileHASH128@`. - - #### Example 1.8 SLPK Structure Summary for 3D Objects - - ``` - .\example_17.slpk - +--nodePages - | +--0.json.gz - | +-- (...) - +--nodes - | +--root - | | +--3dNodeIndexDocument.json.gz - | +--0 - | | +--attributes - | | | +--f_0 - | | | | +--0.bin.gz - | | | +--(...) - | | +--features - | | | +-- 0.json.gz - | | | +--(...) - | | +--geometries - | | | +-- 0.bin.gz - | | | +--(...) - | | +--textures - | | | +--0.jpg - | | | +--0_0_1.bin.dds.gz - | | | +--1.Ktx2 - | | | +--(...) - | | +--shared - | | | +--sharedResource.json.gz - | | + 3dNodeIndexDocument.json.gz - | +--(...) - +--statistics - | +--f_1 - | | +--0.json.gz - | +--(...) - +--3dSceneLayer.json.gz - +--@specialIndexFileHASH128@ - ``` - +In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1.7/slpk_hashtable.cmn.md) was introduced to improve loading time. The hash must be the last item at the end of the central directory and named `@specialIndexFileHASH128@`. #### Example 1.7 SLPK Structure Summary for 3D Objects @@ -323,7 +283,7 @@ In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1 | | | +--0.jpg | | | +--0_0_1.bin.dds.gz | | | +--(...) - | | +--shared + | | +--shared | | | +--sharedResource.json.gz | | + 3dNodeIndexDocument.json.gz | +--(...) @@ -340,7 +300,7 @@ Paths are the same as in the API, but without the `layers/0` prefix. Exceptions |-----|---|------| |Scene layer document|3dSceneLayer.json.gz|layers/0| |Legacy node resource|/nodes/4/3dNodeIndexDocument.json.gz|layers/0/nodes/4| -|Legacy shared resource|/nodes/4/shared/sharedResource.json.gz|layers/0/nodes/4/shared| +|Legacy shared resource|/nodes/4/shared/sharedResource.json.gz|layers/0/nodes/4/shared|
#### Example 1.6 Structure Summary for 3D Objects @@ -365,14 +325,14 @@ Paths are the same as in the API, but without the `layers/0` prefix. Exceptions | | | +--0.jpg | | | +--0_0_1.bin.dds.gz | | | +--(...) - | | +--shared + | | +--shared | | | +--sharedResource.json.gz | | +--3dNodeIndexDocument.json.gz | +--0-0 | | +--(...) | +--0-0-0 | | +--(...) - | +--1 + | +--1 | | +--(...) | +--1-0 | | +--(...) @@ -396,19 +356,19 @@ Scene layer packages (SLPK) contain metadata information regarding its content i |folderPattern | One of {BASIC, EXTENDED},
Default is {EXTENDED} | |archiveCompressionType | One of {STORE, DEFLATE64, [DEFLATE]},
Default is {STORE} | |resourceCompressionType | One of {GZIP, NONE}, Default is {GZIP} | -|I3SVersion | One of {1.2, 1.3, 1.4, 1.6, 1.7, 1.8, 2.0},
Default is {1.8} (Point cloud is {2.0}) | +|I3SVersion | One of {1.2, 1.3, 1.4, 1.6, 1.7, 2.0},
Default is {1.7} (Point cloud is {2.0}) | |nodeCount | Total number of nodes in the SLPK |
-**Example of 1.8 Metadata json** +**Example of 1.7 Metadata json** ``` .\metadata.json { "folderPattern":"BASIC", "archiveCompressionType":"STORE", "resourceCompressionType":"GZIP", - "I3SVersion":"1.8", + "I3SVersion":"1.7", "nodeCount":62 } ``` @@ -424,3 +384,4 @@ Scene layer packages (SLPK) contain metadata information regarding its content i "nodeCount":1156 } ``` + From b49f4695a07e595843260db96f09e4345f0427d9 Mon Sep 17 00:00:00 2001 From: Richard Vargas Date: Mon, 24 Jan 2022 12:50:46 -0800 Subject: [PATCH 2/3] Revert "Updated steps on creating md5 hash." This reverts commit e70027341ae34a1c29a62e2c3e8e8b1a95c47d02. --- docs/1.7/slpk_hashtable.cmn.md | 1 + docs/1.8/slpk_hashtable.cmn.md | 37 -------- ...xed 3d Scene Layer Format Specification.md | 93 +++++++++++++------ 3 files changed, 67 insertions(+), 64 deletions(-) delete mode 100644 docs/1.8/slpk_hashtable.cmn.md diff --git a/docs/1.7/slpk_hashtable.cmn.md b/docs/1.7/slpk_hashtable.cmn.md index a298861..49ae890 100644 --- a/docs/1.7/slpk_hashtable.cmn.md +++ b/docs/1.7/slpk_hashtable.cmn.md @@ -7,6 +7,7 @@ A hash table is a data structure that implements an associative array abstract d ## To create SLPK hash table 1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. 2. Convert all file paths to their canonical path. Canonical paths must: + - Be lower case - Use a forward slash as the path separator `/` - Not contain a heading forward slash - Example: `/my/PATH.json` converts to `my/path.json` diff --git a/docs/1.8/slpk_hashtable.cmn.md b/docs/1.8/slpk_hashtable.cmn.md deleted file mode 100644 index da32284..0000000 --- a/docs/1.8/slpk_hashtable.cmn.md +++ /dev/null @@ -1,37 +0,0 @@ -# SLPK Hash Table - -Scanning an SLPK (ZIP store) containing millions of documents is usually inefficient and slow. A hash table file may be added to the SLPK to improve first load and file scanning performances. - -A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found (Wikipedia). - -## To create SLPK hash table -1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. -2. Convert all file paths to their canonical path. Canonical paths must: - - Use a forward slash as the path separator `/` - - Not contain a heading forward slash - - Example: `/my/PATH.json` converts to `my/path.json` -3. Compute the MD5 128 bit-hash for each canonical path to create an array of key-value pairs [MD5-digest ->Offset 64bit]. - -4. Sort the key-value pairs by ascending keys using the following comparison based on little-endian architecture: -```cpp - //for performance the following C++ comparator is used: (**little-endian**) - typedef std::array< unsigned char, 16 > md5_hash; - bool less_than( const md5_hash& hash_a, md5_hash& hash_b ) - { - const uint64* a = reinterpret_cast(&hash_a[0]); - const uint64* b = reinterpret_cast(&hash_b[0]); - return a[0] == b[0] ? a[1] < b[1] : a[0] < b[0]; - } -``` -5. Write this sorted array as the last file of the SLPK archive (last entry in the ZIP central directory). The file must be named `@specialIndexFileHASH128@`. Each array element is 24-bytes long, and includes the following restrictions: - - 16 bytes for the MD5-digest and 8 bits for the offset. - - Must be in little-endian order. - - Must **not** contain padding. - - Must **not** contain a header. -## To read SLPK hash table -1. Convert the input path to the canonical path and compute its MD5 hash (i.e. key). - -2. Search for key in the hash table. This can be easily implemented as a binary (i.e. dichotomic) search since the keys are sorted. - -3. Retrieve the file from the ZIP archive using the offset associated with key. - diff --git a/format/Indexed 3d Scene Layer Format Specification.md b/format/Indexed 3d Scene Layer Format Specification.md index 22f3350..f71a763 100644 --- a/format/Indexed 3d Scene Layer Format Specification.md +++ b/format/Indexed 3d Scene Layer Format Specification.md @@ -1,14 +1,14 @@ # Esri Indexed 3d Scene Layer (I3S) and Scene Layer Package (*.slpk) Format Specification -Version 1.8. June 30, 2019 +Version 1.8. May 21, 2021 -*Contributors:* Chris Andrews, Tamrat Belayneh, Jillian Foster, Javier Gutierrez, Markus Lipp, Sud Menon, Pascal Müller, Dragan Petrovic, Ronald Poirrier, Simon Reinhard, Juan Ruiz, Johannes Schmid, Ivonne Seler, Chengliang Shan,Thorsten Reitz, Ben Tan, Richard Vargas, Moxie Zhang +*Contributors:* Chris Andrews, Tamrat Belayneh, Jillian Foster, Javier Gutierrez, Markus Lipp, Sud Menon, Pascal Müller, Dragan Petrovic, Ronald Poirrier, Simon Reinhard, Juan Ruiz, Johannes Schmid, Ivonne Seler, Chengliang Shan,Thorsten Reitz, Ben Tan, Richard Vargas, Moxie Zhang *Acknowledgements:* Bart van Andel, Fabien Dachicourt, Carl Reed --- -The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format used to rapidly stream and distribute large volumes of 3D GIS data to mobile, web and desktop clients. I3S content can be shared across enterprise systems using both physical and cloud servers. ArcGIS Scene Layers and [Scene Services](http://server.arcgis.com/en/server/latest/publish-services/windows/scene-services.htm) use the I3S infrastructure. See the [version history on the main ReadMe](../README.md) for more details about previous versions of I3S as well as information on compatability with the OGC I3S Community Standard. +The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format used to rapidly stream and distribute large volumes of 3D GIS data to mobile, web and desktop clients. I3S content can be shared across enterprise systems using both physical and cloud servers. ArcGIS Scene Layers and [Scene Services](http://server.arcgis.com/en/server/latest/publish-services/windows/scene-services.htm) use the I3S infrastructure. See the [version history on the main ReadMe](../README.md) for more details about previous versions of I3S as well as information on compatability with the OGC I3S Community Standard. # Table of Contents @@ -31,6 +31,7 @@ The Indexed 3D Scene Layer (I3S) format is an open 3D content delivery format us  [I3SREST](#i3sREST)
 [Extracted Scene Layer Package](#ESLPK)
 [Scene Layer Packages](#SLPK)
+  [1.8 SLPK Structure](#1.8-SLPK-Structure)
  [1.7 SLPK Structure](#1.7-SLPK-Structure)
  [1.6 SLPK Structure](#1.6-SLPK-Structure)
  [Metadata](#metadata)
@@ -49,7 +50,7 @@ Scene Layers provide a structured way for clients to store and visualize large v I3S is designed to support large datasets of 3D content ranging from local to global extent containing detailed features. Clients can visualize scene layers by taking advantage of the multi-LoD (level of detail) representation and symbology to create the right user experience for visualizing the 3D content. The I3S format continues to evolve and new functionality continues to be added. Previous versions of I3S (SLPK) can be converted and validated using the [I3S Converter](../i3s_converter/i3s_converter_ReadMe.md). You can find an overview of the[Version History of I3S](../README.md). -I3S is organized as [nodes](#Nodes), which are structured into [node pages](#NodePages). The node page includes the bounding volume, child reference, count and the [level of detail selection](LevelofDetail.md). Nodes contain all the information to describe features including [geometry](#Geometry), [attributes](#Attributes) and [textures](#textures). Scene Layers can be created in cartesian 3D or in global 3D world [coordinate reference systems](#CRS). I3S Scene Layers can be delivered to web, mobile and desktop clients. Most users will interact with Scene Layers using applications with cloud or server-based information. In these cases, the scene layer content is on the server and is provided to clients through a RESTful interface. These web addressable resources provide access to the scene layer, nodes, and associated resources. Alternatively, a scene layer can be delivered as a Scene Layer Package. This is a single file that includes the complete node tree and all necessary resources in an archive. +I3S is organized as [nodes](#Nodes), which are structured into [node pages](#NodePages). The node page includes the bounding volume, child reference, count and the [level of detail selection](LevelofDetail.md). Nodes contain all the information to describe features including [geometry](#Geometry), [attributes](#Attributes) and [textures](#textures). Scene Layers can be created in cartesian 3D or in global 3D world [coordinate reference systems](#CRS). I3S Scene Layers can be delivered to web, mobile and desktop clients. Most users will interact with Scene Layers using applications with cloud or server-based information. In these cases, the scene layer content is on the server and is provided to clients through a RESTful interface. These web addressable resources provide access to the scene layer, nodes, and associated resources. Alternatively, a scene layer can be delivered as a Scene Layer Package. This is a single file that includes the complete node tree and all necessary resources in an archive. # Organization and Structure @@ -60,7 +61,7 @@ To ensure high performance when visualizing 3D content, data are spatially group The bounding volume is defined either as minimum bounding sphere (MBS) or oriented bounding box (OBB) representation. ![Minimum bounding sphere..](../docs/img/MBS_Example.png) - + *3D objects enclosed in minimum bounding spheres.* ![Oriented bounding box.](../docs/img/OBB_Example.png) @@ -81,7 +82,7 @@ In order to provide a scalable representation of the original data, parent nodes ## Nodes -In a Scene Layer, data are spatially grouped into nodes. The nodes contain node resources for the bounding volume. Each node has a unique identifier, which allows clients to efficiently locate and load the resources. +In a Scene Layer, data are spatially grouped into nodes. The nodes contain node resources for the bounding volume. Each node has a unique identifier, which allows clients to efficiently locate and load the resources. ### Feature @@ -107,9 +108,9 @@ In addition to a bounding volume, each node contains references to node resource *Node resource for backward compatibility with 1.6* | Node Resources |Integrated Mesh | 3D Object | Building Scene Layer | -| -------------- | ----------------|---------------------------- | ---------------------------- | +| -------------- | ----------------|---------------------------- | ---------------------------- | | sharedResources | ![yes](images/checkmark.png) |![yes](images/checkmark.png) |![yes](images/checkmark.png) | -| 3dNodeIndexDocument|![yes](images/checkmark.png) |![yes](images/checkmark.png) | ![yes](images/checkmark.png) | +| 3dNodeIndexDocument|![yes](images/checkmark.png) |![yes](images/checkmark.png) | ![yes](images/checkmark.png) | *Note: All binary data is stored in little endian.* @@ -118,7 +119,7 @@ In addition to a bounding volume, each node contains references to node resource Depending on the scene layer type and the version of I3S, different geometry representations are used. For example, an integrated mesh scene layer geometry data includes all vertex attributes, feature counts, and mesh segmentation. -In I3S version 1.7 3D Objects and Integrated Mesh scene layer are using [geometryBuffer](../docs/1.7/geometryBuffer.cmn.md) with draco compression to describe the geometry. Previous versions of 3D Object and Integrated Mesh scene layers (1.6 and earlier) define geometry in the [defaultGeometrySchema](../docs/1.6/defaultGeometrySchema.cmn.md). The expected triangle/face winding order in all geometry resources is counterclockwise. +In I3S version 1.7 3D Objects and Integrated Mesh scene layer are using [geometryBuffer](../docs/1.7/geometryBuffer.cmn.md) with draco compression to describe the geometry. Previous versions of 3D Object and Integrated Mesh scene layers (1.6 and earlier) define geometry in the [defaultGeometrySchema](../docs/1.6/defaultGeometrySchema.cmn.md). The expected triangle/face winding order in all geometry resources is counterclockwise. Point and Point cloud layers model geometries as points. Point scene layer define the geometry in [featureData](../docs/1.6/featureData.cmn.md). For a Point Cloud Scene Layer the binary [geometry](../docs/2.0/defaultGeometrySchema.pcsl.md") is lepcc-xyz compressed. @@ -133,7 +134,7 @@ Metadata on each attribute resource is made available to clients via the scene s #### Textures -The texture resource contains texture image files. Textures are stored as a binary resource. Individual textures should be aggregated into texture atlases (An image containing a collection of smaller images.). Client capabilities for handling complex UV cases may vary, so texture coordinates are used. Texture coordinates do not take atlas regions into account directly. The client is expected to use the sub-image region values and the texture coordinates to best handle repeating textures in atlases. As of I3S version 1.8, the recommended compressed texture format is [Basis Universal](https://github.com/BinomialLLC/basis_universal) in [Khronos KTX2™️](http://github.khronos.org/KTX-Specification/) container format. The benefits of this texture format can be seen [in this blog](https://www.esri.com/arcgis-blog/products/arcgis/3d-gis/esri-collaborates-with-binomial-to-improve-basis-universal-texture-compression-speeds/). +The texture resource contains texture image files. Textures are stored as a binary resource. Individual textures should be aggregated into texture atlases (An image containing a collection of smaller images.). Client capabilities for handling complex UV cases may vary, so texture coordinates are used. Texture coordinates do not take atlas regions into account directly. The client is expected to use the sub-image region values and the texture coordinates to best handle repeating textures in atlases. As of I3S version 1.8, the recommended compressed texture format is [Basis Universal](https://github.com/BinomialLLC/basis_universal) in [Khronos KTX2™️](http://github.khronos.org/KTX-Specification/) container format. The benefits of this texture format can be seen [in this blog](https://www.esri.com/arcgis-blog/products/arcgis/3d-gis/esri-collaborates-with-binomial-to-improve-basis-universal-texture-compression-speeds/). For more details, see the [texture](../docs/1.8/texture.cmn.md) and [textureSetDefinition](../docs/1.8/textureSetDefinition.cmn.md). @@ -142,10 +143,10 @@ For more details, see the [texture](../docs/1.8/texture.cmn.md) and [textureSetD To ensure backward compatibility with 1.6 clients, a 1.7 scene layer needs to also include the [3dNodeIndexDocument](../docs/1.7/3DNodeIndexDocument.cmn.md) resource as well as the [sharedResources](../docs/1.7/sharedResource.cmn.md) available for any node. SharedResource includes the material definition of the node. ## Node Page -In version 1.6 and earlier, each node is stored individually as a 3DNodeIndexDocument, causing the tree traversal performance to be negatively impacted due to the large number of small resource requests required. Version 1.7 packs many nodes into a single resource called a node page. +In version 1.6 and earlier, each node is stored individually as a 3DNodeIndexDocument, causing the tree traversal performance to be negatively impacted due to the large number of small resource requests required. Version 1.7 packs many nodes into a single resource called a node page. These node pages are created by representing the tree as a flat array of nodes where internal nodes reference their children by their array indices. -I3S creators are free to use any ordering (e.g. breadth first, depth first) of the nodes into a flat array of nodes. In version 1.7, the ID for a node is an integer that represents the index of the node within this flattened array. +I3S creators are free to use any ordering (e.g. breadth first, depth first) of the nodes into a flat array of nodes. In version 1.7, the ID for a node is an integer that represents the index of the node within this flattened array. ![bounding volume hierarchy tree](../docs/img/BoundingVolumeHierarchyTree.png) ![node page](../docs/img/NodePageArray.png) @@ -159,7 +160,7 @@ I3S creators are free to use any ordering (e.g. breadth first, depth first) of t The I3S specification supports specifying the Coordinate Reference System (CRS) as a Well Known Text, as defined in clause 6.4 in OGC Simple Features [99-036/ISO 19125](http://portal.opengeospatial.org/files/?artifact_id=13227) standard. I3S also supports specifying CRS in the WKT standard [CRS/ISO 19162:2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html), Geographic information – Well-known text representation of coordinate reference systems, which provided an update to the original WKT representation. The two standards are referred to as WKT1 and WKT2 respectively. -In I3S implementation the CRS may be represented using either WKT1 or WKT2. While WKT1 has been in use for many years, WKT1 has been superseded by WKT2. Although implementations of OGC standards using WKT2 are not yet widely available, the guidance from the OGC/ISO community is to implement WKT2. +In I3S implementation the CRS may be represented using either WKT1 or WKT2. While WKT1 has been in use for many years, WKT1 has been superseded by WKT2. Although implementations of OGC standards using WKT2 are not yet widely available, the guidance from the OGC/ISO community is to implement WKT2. WKT1 does not support explicit definition of axis order. Therefore, I3S implementers need to note for their implementations if they support WKT1 only or both (as WKT2 requires continued support of WKT1). In addition, please note that not all ArcGIS clients support WKT2 yet. @@ -175,7 +176,7 @@ All I3S profiles support writing 3D content in two modes: *global* and *local*. In both modes, node index and position vertex must have the same CRS. In addition, all vertex positions are specified as an *offset* from a node's Minimum Bounding Volume (MBV) center. The MBV could be specified as a Minimum Bounding Sphere (MBS) or as an Oriented Bounding Box (OBB). -All vertex positions SHALL be specified using a geodetic CRS (including Cartesian coordinate reference systems), where x,y,z axes are all in same unit, and with a per-node offset (from the center point of the node's minimum bounding sphere) for all vertex positions. +All vertex positions SHALL be specified using a geodetic CRS (including Cartesian coordinate reference systems), where x,y,z axes are all in same unit, and with a per-node offset (from the center point of the node's minimum bounding sphere) for all vertex positions. Axis Order: Axis order explicitly defined by the CRS SHALL be used when present. When the axis order is not defined by the CRS, Easting, Northing, Height axis order SHALL be used. The Height axis SHALL always point upwards towards the sky (away from the center of the earth). @@ -190,7 +191,7 @@ The location of all vertex positions and index-related data structures, such as For an I3S layer to be in a *local* mode the following requirements must be met: -All vertex positions are specified using geodetic CRS, identified by an EPSG code. Any CRS with an EPSG code *other* than 4326 or 4490 will be treated as in a local mode. +All vertex positions are specified using geodetic CRS, identified by an EPSG code. Any CRS with an EPSG code *other* than 4326 or 4490 will be treated as in a local mode. - All three components of a vertex position (XYZ) and the Minimum Bounding Volume (MBV) radius (for MBS) or halfSize (for OBB) need to be in the same unit. @@ -206,7 +207,7 @@ The heightModelInfo, included in the 3DSceneLayerInfo resource, is used by clien # I3S Services -A RESTful API allows access to I3S scene layers. Each scene layer profile has different components and features. For details on the API of a specific profile and version, refer to the individual README documents. +A RESTful API allows access to I3S scene layers. Each scene layer profile has different components and features. For details on the API of a specific profile and version, refer to the individual README documents. Version 1.7 support for [3D Objects](../docs/1.7/3Dobject_ReadMe.md), [Integrated Mesh](../docs/1.7/IntegratedMesh_ReadMe.md), [Building](../docs/1.7/BSL_ReadMe.md), and [Point](../docs/1.7/Point_ReadMe.md) @@ -237,7 +238,7 @@ An SLPK is a [zip](https://en.wikipedia.org/wiki/Zip_(file_format)) archive cont Both 64-bit and 32-bit zip archives are supported. 64-bit is required for datasets larger than 2GB. -Please note that this method is slightly different than a typical zip archive. In general, when a file is added to a zip archive, the new file is individually compressed, and the overall archive is compressed. **That is not the case for SLPK.** When adding files to an SLPK, the new file is compressed, but the overall archive remains uncompressed and is archived using compression level not compressed (`STORE`). +Please note that this method is slightly different than a typical zip archive. In general, when a file is added to a zip archive, the new file is individually compressed, and the overall archive is compressed. **That is not the case for SLPK.** When adding files to an SLPK, the new file is compressed, but the overall archive remains uncompressed and is archived using compression level not compressed (`STORE`). This is an example of a geometry resource opened in 7-zip. Notice that both the Size and the Packed Size are equal. The method is `STORE`. @@ -245,7 +246,7 @@ This is an example of a geometry resource opened in 7-zip. Notice that both the **File Extensions** -SLPK require file extensions to determine the file type. +SLPK require file extensions to determine the file type. Here are a few examples of SLPK file extensions: @@ -256,7 +257,46 @@ Here are a few examples of SLPK file extensions: **Hash** -In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1.7/slpk_hashtable.cmn.md) was introduced to improve loading time. The hash must be the last item at the end of the central directory and named `@specialIndexFileHASH128@`. +In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1.7/slpk_hashtable.cmn.md) is used to improve loading time. The hash must be the last item at the end of the central directory and named `@specialIndexFileHASH128@`. + + #### Example 1.8 SLPK Structure Summary for 3D Objects + + ``` + .\example_17.slpk + +--nodePages + | +--0.json.gz + | +-- (...) + +--nodes + | +--root + | | +--3dNodeIndexDocument.json.gz + | +--0 + | | +--attributes + | | | +--f_0 + | | | | +--0.bin.gz + | | | +--(...) + | | +--features + | | | +-- 0.json.gz + | | | +--(...) + | | +--geometries + | | | +-- 0.bin.gz + | | | +--(...) + | | +--textures + | | | +--0.jpg + | | | +--0_0_1.bin.dds.gz + | | | +--1.Ktx2 + | | | +--(...) + | | +--shared + | | | +--sharedResource.json.gz + | | + 3dNodeIndexDocument.json.gz + | +--(...) + +--statistics + | +--f_1 + | | +--0.json.gz + | +--(...) + +--3dSceneLayer.json.gz + +--@specialIndexFileHASH128@ + ``` + #### Example 1.7 SLPK Structure Summary for 3D Objects @@ -283,7 +323,7 @@ In I3S verison 1.7, an [MD5](https://en.wikipedia.org/wiki/MD5) [hash](../docs/1 | | | +--0.jpg | | | +--0_0_1.bin.dds.gz | | | +--(...) - | | +--shared + | | +--shared | | | +--sharedResource.json.gz | | + 3dNodeIndexDocument.json.gz | +--(...) @@ -300,7 +340,7 @@ Paths are the same as in the API, but without the `layers/0` prefix. Exceptions |-----|---|------| |Scene layer document|3dSceneLayer.json.gz|layers/0| |Legacy node resource|/nodes/4/3dNodeIndexDocument.json.gz|layers/0/nodes/4| -|Legacy shared resource|/nodes/4/shared/sharedResource.json.gz|layers/0/nodes/4/shared| +|Legacy shared resource|/nodes/4/shared/sharedResource.json.gz|layers/0/nodes/4/shared|
#### Example 1.6 Structure Summary for 3D Objects @@ -325,14 +365,14 @@ Paths are the same as in the API, but without the `layers/0` prefix. Exceptions | | | +--0.jpg | | | +--0_0_1.bin.dds.gz | | | +--(...) - | | +--shared + | | +--shared | | | +--sharedResource.json.gz | | +--3dNodeIndexDocument.json.gz | +--0-0 | | +--(...) | +--0-0-0 | | +--(...) - | +--1 + | +--1 | | +--(...) | +--1-0 | | +--(...) @@ -356,19 +396,19 @@ Scene layer packages (SLPK) contain metadata information regarding its content i |folderPattern | One of {BASIC, EXTENDED},
Default is {EXTENDED} | |archiveCompressionType | One of {STORE, DEFLATE64, [DEFLATE]},
Default is {STORE} | |resourceCompressionType | One of {GZIP, NONE}, Default is {GZIP} | -|I3SVersion | One of {1.2, 1.3, 1.4, 1.6, 1.7, 2.0},
Default is {1.7} (Point cloud is {2.0}) | +|I3SVersion | One of {1.2, 1.3, 1.4, 1.6, 1.7, 1.8, 2.0},
Default is {1.8} (Point cloud is {2.0}) | |nodeCount | Total number of nodes in the SLPK |
-**Example of 1.7 Metadata json** +**Example of 1.8 Metadata json** ``` .\metadata.json { "folderPattern":"BASIC", "archiveCompressionType":"STORE", "resourceCompressionType":"GZIP", - "I3SVersion":"1.7", + "I3SVersion":"1.8", "nodeCount":62 } ``` @@ -384,4 +424,3 @@ Scene layer packages (SLPK) contain metadata information regarding its content i "nodeCount":1156 } ``` - From ab6414035a55e800bcae6043e3f23be412a108da Mon Sep 17 00:00:00 2001 From: Richard Vargas Date: Mon, 24 Jan 2022 12:51:57 -0800 Subject: [PATCH 3/3] Updated only docs for hashtable --- docs/1.7/slpk_hashtable.cmn.md | 1 - docs/1.8/slpk_hashtable.cmn.md | 37 ++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 1 deletion(-) create mode 100644 docs/1.8/slpk_hashtable.cmn.md diff --git a/docs/1.7/slpk_hashtable.cmn.md b/docs/1.7/slpk_hashtable.cmn.md index 49ae890..a298861 100644 --- a/docs/1.7/slpk_hashtable.cmn.md +++ b/docs/1.7/slpk_hashtable.cmn.md @@ -7,7 +7,6 @@ A hash table is a data structure that implements an associative array abstract d ## To create SLPK hash table 1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. 2. Convert all file paths to their canonical path. Canonical paths must: - - Be lower case - Use a forward slash as the path separator `/` - Not contain a heading forward slash - Example: `/my/PATH.json` converts to `my/path.json` diff --git a/docs/1.8/slpk_hashtable.cmn.md b/docs/1.8/slpk_hashtable.cmn.md new file mode 100644 index 0000000..da32284 --- /dev/null +++ b/docs/1.8/slpk_hashtable.cmn.md @@ -0,0 +1,37 @@ +# SLPK Hash Table + +Scanning an SLPK (ZIP store) containing millions of documents is usually inefficient and slow. A hash table file may be added to the SLPK to improve first load and file scanning performances. + +A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found (Wikipedia). + +## To create SLPK hash table +1. The offset of each SLPK file is known. For example, the byte offset from the beginning of the SLPK file to the first byte of its ZIP local file header. See ZIP specification for reference. +2. Convert all file paths to their canonical path. Canonical paths must: + - Use a forward slash as the path separator `/` + - Not contain a heading forward slash + - Example: `/my/PATH.json` converts to `my/path.json` +3. Compute the MD5 128 bit-hash for each canonical path to create an array of key-value pairs [MD5-digest ->Offset 64bit]. + +4. Sort the key-value pairs by ascending keys using the following comparison based on little-endian architecture: +```cpp + //for performance the following C++ comparator is used: (**little-endian**) + typedef std::array< unsigned char, 16 > md5_hash; + bool less_than( const md5_hash& hash_a, md5_hash& hash_b ) + { + const uint64* a = reinterpret_cast(&hash_a[0]); + const uint64* b = reinterpret_cast(&hash_b[0]); + return a[0] == b[0] ? a[1] < b[1] : a[0] < b[0]; + } +``` +5. Write this sorted array as the last file of the SLPK archive (last entry in the ZIP central directory). The file must be named `@specialIndexFileHASH128@`. Each array element is 24-bytes long, and includes the following restrictions: + - 16 bytes for the MD5-digest and 8 bits for the offset. + - Must be in little-endian order. + - Must **not** contain padding. + - Must **not** contain a header. +## To read SLPK hash table +1. Convert the input path to the canonical path and compute its MD5 hash (i.e. key). + +2. Search for key in the hash table. This can be easily implemented as a binary (i.e. dichotomic) search since the keys are sorted. + +3. Retrieve the file from the ZIP archive using the offset associated with key. +