Compressing LUTs with KTX 2.0

August 31, 2022

“Lookup tables (LUTs) are a technique for optimizing the evaluation of functions that are expensive to compute and inexpensive to cache, ... By precomputing the evaluation of a function over a domain of common inputs, expensive runtime operations can be replaced with inexpensive table lookups. ... LUTs are also useful when wanting to separate the calculation of a transform from its application. For example, in color pipelines it is often useful to “bake” a series of color transforms into a single lookup table, which is then suitable for distribution and re-use.”

~ Nick Shaw, Cinematic Color 2: LUTs and Transforms

2D and 3D renderers use LUTs to transform pixel values very flexibly and very quickly. LUTs achieve technical or stylistic goals like custom tone mapping, color gradients for scientific visualization, day/night transitions, baking results of a simulation, and so on. LUTs can enhance both photorealistic rendering and stylized graphics.

The three minute video above, by EightyEightGames, gives a short and clear introduction to LUTs in games. A quick search for “gamedev LUT” on Twitter should offer some other interesting ideas.

What follows is a guide — for web developers, or anyone else — to compressing LUTs for faster loading in realtime applications. With KTX 2.0, we'll reduce both file size and parsing time by ~99% compared to traditional formats like .cube.

What's in a LUT

Simpler operations like contrast or saturation can be applied with small 1D LUTs. For more sophisticated operations, larger 3D LUTs are required. Complete 3D LUTs, containing possible mappings on an 8-bit sRGB [0,1] input domain, would store 256x256x256 = 16,777,216 values. Fortunately, that's rarely necessary — we interpolate between values when sampling from the table, and a smaller 3D LUT containing only 65⨉65⨉65 = 275K values is often enough.

Common LUT file formats, like .cube or .spi3d, store these values in plaintext rows:

LUT_3D_SIZE 65
0.000176 0.000176 0.000176
0.015625 0.000176 0.000176
0.031250 0.000176 0.000176
... <274,622 more rows>

This is not an efficient way to store data, and 275K values in plaintext add up fast: each 3D LUT might easily be 5-10 MB. For desktop applications, this might be fine. But on the web, most use cases would benefit from much smaller files.

Lossless compression (gzip, brotli, or zstd) reduces the download size of these files pretty well. But we don't just need to download the files — we also need to parse them and upload them to the GPU as textures, for fast parallel lookups in a fragment shader. After decompressing a compressed text file, we're still stuck with 5-10 MB of text to process and convert to a texture, before we can draw any pixels. On a 2021 M1 Pro chip, JavaScript might spend 100–200ms just parsing a 5–10MB LUT in these plaintext formats, but we can do much better.

GPU-optimized compression with KTX 2.0

With KTX 2.0, a flexible texture container format defined by the Khronos Group in 2021, we can store the GPU texture directly instead of using plain text. We'll avoid nearly all of that parsing overhead: in binary formats like KTX 2.0, loaders only need to parse a small header to learn what's in the file. After that the loader can slice views out of the larger payload, without parsing any pixel data.

KTX 2.0 supports a lot of GPU texture formats, and a few other “universal” formats. glTF 2.0 models (.gltf, .glb) often use KTX 2.0 with Basis Universal texture compression. We aren't going to use Basis Universal in this case¹ — LUTs work best with other types of compression. We'll instead use uncompressed texture formats (u8, f16, and f32), with some lossless compression applied to the container (“supercompression”).

To convert a LUT to KTX2 in your own software, you'd do something like this:

Parse the original LUT, allocating an array of 8-bit, 16-bit, or 32-bit values
Write the array to a KTX2 container, perhaps using KTX-Software or ktx-parse
Apply supercompression, perhaps using KTX-Software's ktxsc command

Using JavaScript and three.js, we could also define a LUT programmatically:

const customLUT1D = new THREE.DataTexture(
    new Float32Array([0, 0.1, ..., 1.0]),
    NUM_VALUES,
    1,
    THREE.RedFormat,
    THREE.FloatType,
    THREE.UVMapping,
    THREE.ClampToEdgeWrapping,
    THREE.ClampToEdgeWrapping,
    THREE.LinearFilter,
    THREE.LinearFilter
);

Because three.js also includes utilities like LUTCubeLoader.js, LUT3dlLoader.js, KTX2Exporter.js, and KTX2Loader.js, we can use these to convert LUT-specific formats (.cube, .3dl) to KTX 2.0. To make this easier, I made a small ~~Glitch~~ application for loading .cube LUTs and converting them to KTX2. Try it out here:

🔗 https://lut-to-texture.donmccurdy.com/

This tool encodes the LUT to a more compact binary representation, but doesn't yet include supercompression. I haven't been able to find a Zstandard encoder that works in a web browser — if you know of one, please let me know. In the meantime, add supercompression in a separate step with KTX-Software's ktxsc utility:

ktxsc --zcmp 19 lut.ktx2

Results

Testing KTX 2.0 compression with a 3D LUT, the results are dramatic. The KTX 2.0 file is just 1% the size of the original, parses in a fraction of the time, and loses none of the original precision.

format	size (MB)	parse (ms)
.cube	7.414 MB	90 – 100 ms
.ktx2	94 KB	0.1 – 0.2 ms

Using the LUT

Using a .ktx2 LUT in three.js is no different than using .cube. Just replace THREE.LUTCubeLoader with THREE.KTX2Loader. While some KTX 2.0 files must be encoded for particular GPUs using WASM transcoders, these LUTs do not require transcoding. If the LUT is being applied to the full viewport, that's done in post-processing — either three/postprocessing or pmndrs/postprocessing will work.

// three.js post-processing
const lut = await loader.loadAsync( './lut.ktx2' );

// pmndrs post-processing
import { LookupTexture } from 'postprocessing';
const lut = LookupTexture.from( await loader.loadAsync( './lut.ktx2' ) );

In other engines, you may need to ask about support for 3D textures in KTX2 files. This is a newer and less widely-supported pattern for now. I'd love to hear if you found this article useful, and how you're using LUTs in your projects. Please reach out on Twitter!

More technical details

Input domain: Every LUT is defined only for a certain range of input values. Often that range is RGB ∈ [0,1], but another input domain might be defined at the top of the file. Inputs outside this range are clamped, so it's important to get the range right. Custom input domains are not embedded in the KTX 2.0 texture, and would need to be assigned manually before using the KTX 2.0 LUT in an application².

Color spaces: Like its input domain, every LUT makes assumptions about the color space of input values. Output values may be in the same color space, or the LUT may include conversion to another color space. Unfortunately, color space information is typically not defined in the LUT file. The output color space can be included in a KTX 2.0 file, if you know it.

KTX 2.0 vs. PNG: We've discussed .cube and other LUT-specific formats, but not traditional image formats like PNG. PNG can store 1D LUTs in some cases, and 3D LUTs that are flattened (24⨉24⨉24px → 24⨉576px). PNG compression works well, but it has two major limitations in this context. First, output values are constrained to the [0,1] range in PNG, while KTX 2.0 supports unbounded floating point values. Second, precision of output values is limited to 8 bits by PNG — that's just enough for a final pixel color, and is likely to cause banding if any additional processing (tone mapping, color space conversion, etc.) is applied after the LUT later in the post processing stack. While PNG has a 16-bit mode, no web browser currently supports it.

Other LUT formats: If you have LUTs in a format other than those discussed here, tools like ociobakelut may be helpful to convert or reformat the LUT. I'm not aware of any LUT-specific formats that are both (1) open, and (2) as efficient as the KTX 2.0 approach described here.

¹ Basis Universal, by Binomial LLC, is an outstanding technology for compressing textures used in 3D models and materials, reducing GPU memory usage, and generally improving performance in GPU-based applications. Learn more about Basis Universal, if you aren't using it for material textures yet.

² KTX 2.0 does support key/value data and an extension-like mechanism, so formal or informal ways of embedding the LUT domain could be added in the future.