Skip to content

Multistream rendering and instancing

Chuck Walbourn edited this page Aug 25, 2021 · 22 revisions

In this lesson we learn how to use multistream rendering to implement GPU instancing.

Setup

First create a new project using the instructions from the earlier lessons: Using DeviceResources and Adding the DirectX Tool Kit which we will use for this lesson.

Input assembler

For these tutorial lessons, we've been providing a single stream of vertex data to the input assembler. Generally the most efficient rendering is single vertex buffer with a stride of 16, 32, or 64 bytes, but there are times when arranging the render data in such a layout is expensive. The Direct3D Input Assembler can therefore pull vertex information from up to 32 vertex buffers. This provides a lot of freedom in managing your vertex buffers.

For example, if we return to a case from Simple rendering, here is the 'stock' vertex input layout for VertexPositionNormalTexture:

const D3D12_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,    0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
};

This describes a single vertex stream with three elements. We could arrange this into three VBs as follows:

// Position in VB#0, NORMAL in VB#1, TEXCOORD in VB#2
const D3D12_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT, 1, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,    2, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
};

To render, we'd need to create an Pipeline State Object (PSO) for this input layout, and then bind the vertex buffers to each slot:

D3D12_VERTEX_BUFFER_VIEW vbViews[3] = {};
vbViews[0].BufferLocation = ...;
vbViews[0].StrideInBytes = sizeof(float) * 3;
vbViews[0].SizeInBytes = ...;

vbViews[1].BufferLocation = ...;
vbViews[1].StrideInBytes = sizeof(float) * 3;
vbViews[1].SizeInBytes = ...;

vbViews[2].BufferLocation = ...;
vbViews[2].StrideInBytes = sizeof(float) * 2;
vbViews[2].SizeInBytes = ...;

commandList->IASetVertexBuffers(0, 3, &vbViews);

Note if we are using DrawIndexed, then the same index value is used to retrieve the 'ith' element from each vertex buffer (i.e. there is only one index per vertex, and all VBs must be at least as long as the highest index value).

Instancing

In addition to pulling vertex data from multiple streams, the input assembler can also 'loop' over some streams to implement a feature called "instancing". Here the same vertex data is drawing multiple times with some per-vertex data changing "once per instance" as it loops over the other data. This allows you to efficiently render a large number of the same object in many locations, such as grass or boulders.

The NormalMapEffect supports GPU instancing using a per-vertex XMFLOAT3X4 matrix which can include translations, rotations, scales, etc. For example if we were using VertexPositionNormalTexture model data with instancing, we'd create an input layout as follows:

// VertexPositionNormalTexture in VB#0, XMFLOAT3X4 in VB#1
const D3D12_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA,   0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA,   0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,       0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA,   0 },
    { "InstMatrix",  0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_INSTANCE_DATA, 1 },
};

Here the first vertex buffer has enough data for one instance, and the second vertex buffer has as many entries as instances.

UNDER CONSTRUCTION

More to explore

Next lessons: Using HDR rendering

Further reading

For Use

  • Universal Windows Platform apps
  • Windows desktop apps
  • Windows 11
  • Windows 10
  • Xbox One
  • Xbox Series X|S

Architecture

  • x86
  • x64
  • ARM64

For Development

  • Visual Studio 2022
  • Visual Studio 2019 (16.11)
  • clang/LLVM v12 - v18
  • MinGW 12.2, 13.2
  • CMake 3.20

Related Projects

DirectX Tool Kit for DirectX 11

DirectXMesh

DirectXTex

DirectXMath

Tools

Test Suite

Model Viewer

Content Exporter

DxCapsViewer

See also

DirectX Landing Page

Clone this wiki locally