21 Display Signal Processing

From Chapter 20, we know that ultimately it is the \(V_{Data}\) signals that the display has to set in order to get a desired response on the pixels. How do set \(V_{Data}\)? This question can only be answered by positioning display in an end-to-end workflow that involves imaging, image processing, and display. We will first give a big-picture view, showing that the central task of this signal processing chain is tone mapping, which is realized by a chain of signal processing steps (Section 21.1). We will then walk through this chain step by step (Section 21.2), and discuss practical issues in realizing tone mapping (Section 21.3). Finally, we will discuss color management, a framework that makes everything we discuss in this chapter much more consistent and reliable across software and hardware platforms (Section 21.4).

21.1 The Big Picture

Consider a typical workflow where you capture the scene as an image and then view it on a display. Figure 21.1 illustrates the chain of signal processing that takes place in this workflow. At the beginning of this chain is the luminance in the physical scene; at the end is the luminance emitted by the display. The transformation from the former to the latter can be abstracted as the Opto-Optical Transfer Function (OOTF), ① in Figure 21.1. In practice, the OOTF is indirectly realized through a long processing chain (① through ⑩ in Figure 21.1) that spans imaging hardware, image processing algorithms, and display hardware. Each step can be represented as a transfer function, and together these functions collectively constitute the OOTF.

Figure 21.1: In an end-to-end workflow, OETF is carried by the imaging system, image processing executes the EETF (which includes color space conversion (CSC), gamut mapping, tone mapping, and EOTF compensation), and the display performs the EOTF. Together, the mapping in the scene luminnace to the display luminance is the effective OOTF of the system.

Ideally, the OOTF should be an identity function, which, one could argue, is the Holy Grail of an imaging-display workflow: faithfully capturing and reproducing the actual luminance in the scene. The former is dealt with by HDR imaging (Section 16.2.4), and the latter is the job of display signal processing and display hardware design.

It would be amazing if a display could accurately reproduce the scene luminance (assuming it is accurately captured). It is hardly possible for a variety of reasons.

First, the peak display luminance of a display is usually lower than that of the real world.
Second, the real world has a much larger luminance dynamic range (DR) than that is afforded by the display. The luminance DR of the scene is the ratio between the maximum and minimum luminance in the scene, and the luminance DR of the display is the ratio between the maximum and minimum luminance producible by the display.
- The definitions are concerned with luminance (a photometric metric) rather than illuminance (a radiometric metric) because we care about the perceived power not the radiant power in the scene.
- We use “luminance DR” rather than simply DR to emphasize its difference from the DR of a sensor (Section 16.2.4), which is concerned with the ratio of peak measurable luminance in the scene to the noise floor. For simplicity we will use DR when it is clear what it refers to in a given context. Luminance DR is also often referred to as the contrast ratio while the sensor DR can be thought of as a form of signal-to-noise ratio (Mantiuk et al. 2015).
Third, the luminance levels in a real-world scene are continuous whereas the luminance levels in digital displays are quantized (e.g., 256 levels in an 8-bit encoding), so there are quantization errors.

Figure 21.2: Luminance dynamic range comparison between a real-world scene and various output devices. Adapted from Lang (2007, figs. 3,5).

The difference between the scene luminance range and that of various output devices is illustrated in Figure 21.2.

The DR of a real-world scene usually spans 4-5 log units.
The DR of a typical display is usually limited to about 3 log units, which is slightly higher than prints.
Modern high-dynamic-range (HDR) displays have luminance DRs that might match that of a scene, but the peak luminance still falls far short of that of what a real-world scene can produce¹.

To enhance display DR, we not only need to be able to produce a high peak luminance but also a very low, ideally 0, luminance when the pixel value is 0. There are many industry standards/certifications for HDR displays, almost all of which include metrics such as minimum peak luminance, maximum black-level luminance, contrast ratio, and bit depth (VESA 2024).

Given that it is unlikely that a display can fully reproduce the scene luminance, the next best question to ask is: how do we accurately reproduce the perceptual experience of the intended scene? To achieve this, the knob we have is the mapping of the intended luminance of each pixel in the image to a new luminance that is within the range afforded by the display (assuming of course the chromaticity is maintained throughout this mapping through appropriate color space transformations).

The mapping from scene luminance to displayed luminance is the OOTF in Figure 21.1, and is the central task of tone mapping. As noted earlier, the OOTF is not directly controlled; instead, it emerges from a cascade of transfer functions within the signal processing chain. Some of these transfer functions are determined by the design of the imaging and display hardware (② and ⑩), others by image encoding and decoding format (e.g., ③, ④ ⑥, ⑦), and the remainder by the image processing algorithm (⑤) abstracted as Electro-Electrical Transfer Function (EETF).

EETF is the component over which we have the greatest degree of control and, thus, where tone mapping operators (TMOs) are usually implemented. In principle, of course, we can influence tone mapping at any stage in the pipeline. For instance, the recent Rec. 2100 standard defines image encoding formats, both on the imaging and display side, that allows for better tone mapping in HDR workflows.

We will refer you to Reinhard (2010) and Mantiuk et al. (2015) for surveys of tone mapping techniques. The key thing is to preserve contrast. Recall that the human visual system has a contrast sensitivity function (Section 2.4.2), which tells us the minimal contrast necessary at each frequency for the pattern to be detectable. When we compress a large DR to a small DR, (local) contrasts would be lost due to quantization errors (insufficient bit depths) and, as a result, the displayed image looks “dull”.

21.2 The Chain of Processing

Let’s now walk through the chain of processing from luminance in the scene to the luminance emitted from a display.

21.2.1 Hardware-Intrinsic OETF

An imaging system fundamentally performs a signal transduction from the optical domain to the electrical domain. This transduction can be abstracted as Equation 16.7, where the scene power is converted to RAW pixels. Equation 16.7 can be thought of as the intrinsic Opto-Electrical Transfer Function (OETF) of the imaging system (② in Figure 21.1). Barring noise and ADC quantization errors, the RAW pixel values are roughly proportional to the scene luminance. A RAW pixel value can be expressed as:

\[ P_{cam} = \begin{bmatrix} hOETF_R(\Phi(\lambda)) \\ hOETF_G(\Phi(\lambda)) \\ hOETF_B(\Phi(\lambda)) \end{bmatrix}, \tag{21.1}\]

where \(P_{cam}\) is the pixel color in the camera RAW space, \(hOETF_R(\cdot)\), \(hOETF_G(\cdot)\), and \(hOETF_B(\cdot)\) represent the hardware-intrinsic OETF for the red, green, and blue channel, respectively, and \(\Phi(\lambda)\) represents the SPD of the incident light. We have three OETFs here because there are three different spectral sensitivity functions (Section 16.7.1). If using a 10-bit encoding, each channel in \(P_{RAW}\) is bounded between 0 and 1023. The \(hEOTF\)s are defined accordingly.

In a rendering system, the image pixel values are rendered/simulated rather than captured, but the same principle applies, where the rendered pixels should ideally be proportional to or, ideally, directly encode the absolute luminance information of the rendered scene. Two main differences exist between rendering and imaging. First, in rendering we generally we do not intentionally model sensor noise. Second, numerically solving the (volume) rendering equation leads to inaccuracies (Pharr, Jakob, and Humphreys 2018, chaps. 2, 13, 14), whereas the rendering equations are effectively “solved by nature” in imaging. Of course, we still pay the ADC quantization error in rendering. Therefore, the rendered pixels are usually not perfectly proportional to luminance.

21.2.2 Reference OETF

When savings RAW pixels as an image file (such as JPEG and PNG), we usually have a low bit budget. For instance, RAW pixels are typically encoded using 10 or 12 bits, but usual RGB images use 8 bits per color channel. Recall from Section 5.3.2 that when quantizing luminance-linear signals into digital values, we use a gamma-based encoding strategy, which attempts to encode the perceived brightness, rather than the physical luminance, uniformly. Gamma encoding makes better use of the limited bit budget by reducing the perceptual quantiztion errors for the low luminance range.

The encoding function \(f_{L \mapsto V}\) that maps a luminance-linear signal \(L\) to a digital value \(V\) that is actually saved in an image file is also called the OETF (③ in Figure 21.1). This OETF, however, purely represents an encoding strategy, and is clearly different from the hardware-intrinsic OETF of the imaging system.

The hardware-intrinsic OETF represents an actual signal transduction, but the encoding OETF here purely manipulates information, both \(L\) and \(V\), in the electrical domain, except \(L\) represents luminance information and \(V\) represents a digital value.
The two need not be, and mostly will not be, the same. When people say OETF without any qualifier, what they refer to is the encoding OETF \(f_{L \mapsto V}\), a convention we will follow. We will explicitly use hardware-intrinsic OETF when referring specifically to the transfer function intrinsic to the signal transduction process.

After \(f_{L \mapsto V}\), pixel values are roughly proportional to perceived brightness. A good OETF should be designed based on models of human brightness perception. Over the years, many reference OETFs have been defined in video/broadcast standards, such as Rec. 601, Rec. 709, and, more recently, Rec. 2020 and Rec. 2100—all published by ITU-R². In contrast, sRGB and Display P3 are color space standards (not defined by ITU-R). sRGB shares the same primaries and white point chromaticities as Rec. 709 but uses a different OETF. Display P3 offers a wider gamut than sRGB while using the same OETF.

All of the standards above, except Rec. 2100, use relative luminance as input, where L=1 is given by some form of maximum luminance measure manually determined for a particular setting and, therefore, usually does not correspond to a fixed, absolute luminance level. That maximum luminance could be, for instance, the absolute luminance that just saturates the sensor in the imaging system. In theory, though, the brightness-vs-luminance model should take absolute luminance into account. The sRGB standard does specify a recommended display luminance of 80 nits, but that is just a recommendation and nothing prevents you from displaying an sRGB on a dimmer or brighter display, in which case the brightness model underlying the OETF in sRGB technically would not apply.

In practice, the OETF is applied after a color space conversion (CSC) from the raw camera space to a standard color space such as sRGB or Display P3 (which we cover in Zhu (2022)), each of which specifies a reference OETF. The OETF is applied to each of the three color channels. Mathematically:

\[ \begin{aligned} P_{XYZ} &= T_{cam\_to\_XYZ} \times \text{diag}^{-1}(1024) \times P_{cam}, \\ P_{sRGB\_linear} &= T_{XYZ\_to\_sRGB} \times P_{XYZ}, \\ P_{sRGB} &= \text{diag}(255) \times \begin{bmatrix} OETF(P_{sRGB\_linear}(R))\\ OETF(P_{sRGB\_linear}(G))\\ OETF(P_{sRGB\_linear}(B)) \end{bmatrix}, \end{aligned} \tag{21.2}\]

where:

\(\text{diag}^{-1}(1024) \times P_{cam}\) (\(\mathbb{R}^3 \in [0, 1]^3\)) is a color in the camera RAW space normalized to the [0, 1] range (assuming 10-bit RAW encoding).
\(T_{cam\_to\_XYZ}\) is the transformation matrix from the normalized camera RAW space to the CIE 1931 XYZ space; it is usually illuminant dependent and normalized such that when the illuminant is normalized to have a Y value of 1, one of the RGB channels saturates (Rowlands 2020).
\(P_{XYZ}\) is the color in the XYZ space.
\(T_{XYZ\_to\_sRGB}\) is the transformation from the XYZ space to a color space, say sRGB, used to encode the image file; the matrix is usually normalized such that [1, 1, 1] in the linear sRGB space translates to Y=1.
\(P_{sRGB\_linear}\) (\(\mathbb{R}^3 \in [0, 1]^3\)) is the color in the linear sRGB space.
\(OETF(\cdot)\) is the OETF of the encoding space (the OETF for sRGB in this example is Equation 5.1).
\(P_{sRGB}\) (\(\mathbb{Z}^3 \in [0, 255]^3\)) is the color in the sRGB space.

21.2.3 EETF

When an OETF-encoded image is later processed, we can use OETF^-1 to recover the original luminance (④ in Figure 21.1). This is important because any further image processing should ideally be operating in the luminance-linear space, where operates correspond to physical units; forgetting this can lead to many subtle bugs in code (Chen, Chang, and Zhu 2024)!

The image processing pipeline can be abstracted as Electro-Electrical Transfer Function (EETF), as it processes digital pixels (⑤ in Figure 21.1). EETF is usually the part of the entire processing pipeline that we get to control, so it is where we can impact the overall tone mapping and OOTF. Mathematically: \[ P'_{sRGB\_linear} = f(OETF^{-1}(\text{diag}^{-1}(255) \times P_{sRGB})), \tag{21.3}\]

where \(f(\cdot)\) is the tone mapping operator operating on luminance-linear signals, and \(P'_{sRGB\_linear}\) is the tone-mapped pixel value in the luminance-linear space. If \(f\) depends only on the value of \(P_{sRGB}\), the TMO is a global operator. In contrast, local TMOs can apply different transformations to pixels that share the same color but appear at different spatial locations.

The EETF-based tone mapping is most commonly implemented at the end of an rendering pipeline or a camera signal processing pipeline (Chapter 18), which is where we have access to raw (relative) luminance information before we have to turn luminance to digital values in a, e.g., JPEG or PNG image. But it is also common to control tone mapping by processing an JPEG/PNG image, which is what is Equation 21.3 assumes and what is visualized in Figure 21.1.

One obvious thing we notice here is that even though ideally we would want to manipulate absolute luminance, as discussed in Section 21.1 the tone mapping operator here has to work to relative luminance. This is because usually the processing stages before EETF do not keep the absolute luminance information because of all the normalizations. Section 21.3 discusses challenges facing implementing a good EETF and typical solutions.

21.2.4 Reference EOTF

After EETF, each image pixel is mapped to an intended (relative) luminance. Now comes the time to display the image. We have to again turn luminance back to digital values³. Minimizing perceptual quantization error is still the key, since these digital values will eventually be decoded back to luminance. This requires, again, modeling human brightness perception, but this time the luminance range is limited by what the display can afford to produce so the model would be somewhat different than that used to define OETF on the imaging side.

We need a function \(f_{V \mapsto L}\) that maps a digital value \(V\) to a luminance \(L\). \(f_{V \mapsto L}\) is called the Electro-Optical Transfer Function (EOTF). This is potentially confusing: why do we not construct the function to map luminance to digital value, like how we have done on the imaging side, but the other way around? Mathematically, this is somewhat a moot point because the function is constructed to be monotonic and, thus, invertible. In practice, we use EOTF, rather than OETF, on the display side simply to signify the fact that a display converts electrical signals to optical signals.

Over the years, there have been a set of reference EOTFs defined in various standards. Rec. 1886 is meant to give a good approximation of the hardware-intrinsic EOTF of CRT displays, and Rec. 2100 is meant to be used for HDR workflows, where absolute luminance is tracked. Rec. 709, Rec. 2100, sRGB, and Display P3 define both an OETF and an EOTF, which are inversions of each other. Note that OETF and EOTF need not be an inversion of each other. Both are designed with a good model of human brightness perception in mind. The difference in the underlying model is that at the imaging side the luminance is dictated by the scene whereas as the display side the luminance is dictated by the display hardware; the two do not match, which, in turn, impacts the EOTF and OETF design. Therefore, while we use OETF to encode scene luminance to a file (and OETF^-1 to recover the scene luminance from the file), the display EOTF might not necessarily be OETF^-1.

Given an intended luminance \(L\) we want to display, we use EOTF^-1 to obtain the corresponding digital value \(V\) to be sent to the display (⑥ in Figure 21.1). This is usually carried out in a CSC. For example, if the input image is encoded in sRGB, the pixels remain in the sRGB space after applying the EETF. If the display operates in the Display P3 color space, a CSC must be performed from linear sRGB to linear Display P3, after which the P3 EOTF is applied to obtain the digital pixel values. These EOTF-encoded digital pixels will then be transmitted through the MIPI DSI interface to the driver IC, as discussed in Section 20.2. Mathematically, the sequence of processing is:

\[ \begin{aligned} P_{XYZ} &= T^{-1}_{XYZ\_to\_sRGB} \times P'_{sRGB\_linear}, \\ P_{P3\_linear} &= T_{XYZ\_to\_P3} \times P_{XYZ}, \\ P_{P3} &= \text{diag}(1024) \times \begin{bmatrix} EOTF^{-1}(P_{P3\_linear}(R)) \\ EOTF^{-1}(P_{P3\_linear}(G)) \\ EOTF^{-1}(P_{P3\_linear}(B)) \end{bmatrix}, \end{aligned} \tag{21.4}\]

where:

\(T_{XYZ\_to\_P3}\) is the transformation from the XYZ space to the linear Display P3 space; the matrix is usually normalized such that [1, 1, 1] in the linear P3 space translates to an XYZ value where Y=1.
\(P_{P3\_linear}\) is the color in the linear P3 space.
\(P_{P3}\) (\(\mathbb{Z}^3 \in [0, 1023]^3\)) is the color in the P3 space, assuming 10-bit encoding.

Even though the TMO in Equation 21.3 operates completely within the sRGB space, by cascading Equation 21.3 and Equation 21.4 we can see that an EETF-based TMO effectively maps pixels from the color space where the input image is encoded (\(P_{sRGB}\) here) to the color space where the tone-mapped image is to be displayed (\(P_{P3}\) here).

21.2.5 Hardware-Intrinsic EOTF

The driver IC will then turn the P3-encoded pixel values to the DAC inputs. Can we directly use the former for the latter? Most likely not.

To see why, let’s assume that we are dealing with an AMOLED display; using Equation 20.1 and Equation 19.1, we know that to achieve a particular optical power \(P\), the following must hold:

\[ e\frac{P/(h f)}{\eta} = k(V_{DD} - V_{Data} - V_{th})^2, \]

Therefore, the desired \(V_{Data}\) is given by⁴:

\[ V_{Data} = \sqrt{e\frac{P/(h f)}{\eta k}} + V_{th} - V_{DD}. \]

With a DAC, we can convert a digital value to an analog voltage. Using an ideal DAC transfer function, the digital value to be sent to the DAC is then:

\[ \begin{aligned} D = \frac{V_{Data} - V_{min}}{\Delta}, \\ \Delta = \frac{V_{max} - V_{min}}{2^N - 1}, \end{aligned} \]

where \([V_{min}, V_{max}]\) is the DAC output range and \(N\) is the resolution.

The relationship between the digital value \(D\) and the emitted optical power \(P\) is what we call the display-intrinsic EOTF. From the theoretical analysis we can see that the relationship is non-linear. Figure 21.3 shows examples for four inorganic LEDs. In practice, the hardware-intrinsic EOTF is affeced by many factors (such as variation in manufacturing, the particular driving circuit design, etc.), and is usually measured offline rather than modeled analytically.

Figure 21.3: Hardware-intrinsic EOTFs for four inorganic LEDs (R, G, B, and W). From Miller (2019, fig. 7.2).

We want to very explicitly differentiate between the display-intrinsic EOTF and the reference EOTF defined in a standard.

The former maps digital values sent to the DAC to the luminance emitted: it is an inherent property of the display hardware (both the driving circuits and the emissive devices) and represents an actual signal transduction. The latter is purely a theoretical construction that is meant for efficient and effective digital encoding (based on human brightness perception); it operates completely within the electrical domain, except the input \(V\) represents a digital value and the output \(L\) represents (relative) luminance.
The two EOTFs do not have to match and most definitely do not match⁵. When people say EOTF without any qualifier, they mean \(f_{V \mapsto L}\). We will specifically use display-intrinsic EOTF to refer to the actual EOTF that maps DAC values to emitted luminance.

Given the display-intrinsic EOTF, converting from P3-encoded pixels to DAC inputs requires two steps.

First, we use the reference EOTF (in this case part of the Display P3 standard) to decode the actual luminance intended to be displayed (⑦ in Figure 21.1).
Second, we perform a CSC from the Display P3 space to the display native space, after which we invert the display-intrinsic EOFT to obtain the digital value to send to the DACs (⑧ in Figure 21.1). This CSC is necessary because it is unlikely that the display primaries and white point exactly matches that of a color space standard (e.g., Display P3), but we can measure them offline and construct a transformation matrix from the P3 space to the display native.

Mathematically, this is:

\[ \begin{aligned} P_{P3\_linear} &= EOTF(\text{diag}^{-1}(1024) \times P_{P3}), \\ P_{disp} &= T_{P3\_to\_disp} \times P_{P3\_linear}, \\ P_{DAC} &= \begin{bmatrix} hEOTF^{-1}_R(P_{disp}(R)) \\ hEOTF^{-1}_G(P_{disp}(G)) \\ hEOTF^{-1}_B(P_{disp}(B)) \end{bmatrix} \end{aligned} \]

where:

\(T_{P3\_to\_disp}\) is the transformation matrix from the linear Display P3 space to the display native space.
\(P_{disp}\) is the pixel color in the display native space.
\(P_{DAC}\) is the DAC inputs, one for each channel since each sub-pixel might have a different hardware-intrinsic OETF.

In reality, we could calibrate, offline, three look-up tables (LUTs), each of which maps each digital level in a \(P_{P3}\) channel to a corresponding DAC input (⑨ in Figure 21.1). In this way, after going through the display-intrinsic EOTF (⑩ in Figure 21.1) the luminance emitted by the display matches the intended luminance. The LUTs can be constructed by, at each P3 digital level, repeatedly changing the DAC input and measure the actual emitted luminance from the display until it matches that of the indended luminance. If the LUT size of concern, we could measure just a few digital values and interpolate between them in hardware.

21.3 Practical Tone Mapping

Tone mapping controls the OOTF of the end-to-end system, so ideally TMOs should manipulate absolute luminance information. In reality, however, TMOs are implemented as the EETF, which has no access to the absolute luminance information.

For instance, if we are given an sRGB image to display, an image pixel [10, 20, 30] in the sRGB space tells us nothing about the absolute luminance of each channel.
Even if an image is captured through an HDR imaging workflow and encoded in an HDR format, e.g., OpenEXR (ILM 2025), which has a very high bit depth (even allows for floating point numbers!), the absolute luminance information is usually still not encoded.
Worse, we do not always know the target display’s luminancec range, which is ultimatelly what matters since that is where the image will be displayed! This often the case when tone mapping is done in an camera signal processing pipeline that is agnostic to the viewing display.
Perhaps the only exception is when images are generated using physically-based spectral rendering, where spectral radiance information is tracked throughout the rendering pipeline.

Absent absolute luminance information, the TMO has to operate in normalized, luminance-linear spaces. Some guesswork and heuristics are involved. For instance, if the input image is encoded using sRGB, one fair assumption to make is that the image is to be displayed on a display with a peak luminance of 80 nits, which is the recommended luminance in the sRGB standard (that is rarely followed!). If the image is to be displayed on a display that supports the Rec. 2100 standard, we can assume that the display will have a peak luminance of at least 1,000 nits and can go up to 10,000 nits.

Many software tools allow us to interactively adjust the EETF, such as the famous Curves tool in Photoshop and Lightroom. With these tools, even though we are not explicitly told of the display luminance range, the absolute output luminance information is directly seen on the display, so we can judge for ourselves whether we like it or not.

Figure 21.4 shows three such examples in Lightroom, each of which has a tonal adjustment curve that maps an input pixel value in the normalized, luminance-linear input range (x-axis) to an output pixel value in the same range and color space (y-axis). The curves here are effectively the TMO \(f\) in Equation 21.3.

Figure 21.4: Three tone mapping examples, each with a corresponding tonal adjustment curve, I set using Lightroom on an iPhone 12 Pro. The original image is a 10-bit (demosaicked and color corrected) image captured by a Google Pixel phone, obtained from the HDR+ Burst Photography Dataset (Hasinoff et al. 2016).

With a simple linear mapping in the first example, the image looks quite dark and dull. This is because most of the input pixel values are quite low (judging from the color histogram at the top), so essentially most of the pixels are mapped to low output digital values. We can raise the brightness by raising the tonal curve, as done in the second example. That curve essentially increases the contrast ratio of the low-to-mid luminance pixels and compress the contrast ratio of the mid-to-high luminance pixels.

In my last example, I have raised the tonal curve so much that many input digital levels are mapped to the same, maximum output digital levels, as if those pixels were “saturated” during imaging. What this does is to give the low-to-mid luminance pixels an even larger contrast ratio, so the details look more vivid. Perhaps surprisingly, this intentional saturation does not actually lead to visible “over-exposure” in the final image. Why? Look at the color histogram at the top of the third example: only a small fraction pixels are actually saturated, even though a relative wide range of pixel pixel values are mappped to saturation.

The tonal adjustment curve is also a place for creative expression even if we are not concerned with tone mapping per se. Readers familiar with the Curve tool in Photoshop must be familiar with the notion of an “S-curve” or an “inverse S-curve” (if not, see these articles). The former essentially increases the contrast ratio between the highlights and shadows in an image and the latter does the opposite. These adjustments are meant to enhance the visual experience (e.g., increasing the contrast improves the visibility of some otherwise less detectable details) at the cost of technically changing the relative luminance information in the physical scene.

Figure 21.5: Four tone mapping examples form (Chen and Hasinoff 2020). The last example uses a local TMO from the HDR+ pipeline in Google Pixel phones, whereas the first three use glocal TMOs.

As another example, Figure 21.5 shows four tone mapping examples and their associated tonal adjustment curves. The first three use glocal TMOs similar to the three examples in Figure 21.4. The last example uses a local TMO from the HDR+ pipeline in Google Pixel phones (Hasinoff et al. 2016). We can see it uses a local TMO because there is no single mapping function from an input pixel value to an output pixel value. Instead, the adjustment “curve” is actually a heatmap showing, for each input pixel, the output pixel distribution after tone mapping. The local TMO is realized by dividing an image into tiles and designing a curve for each tile.

21.4 Color Management

Everything we have discussed so far benefits immensely from a princpled color management, which is concerned with maintaining a consistent color appearance throughout the workflow that might involve wildly different capturing devices (e.g., cameras, scanners) and output devices (e.g., displays, printers).

Color management requires a collaboration between every single piece that touches color in the workflow: the image file must come with a profile that specifies what color space its pixel colors are encoded in and (an estimation of) the viewing condition under which the image was originally edited/viewed, the software that manipulates image content must correctly read and interpret the profile and perform the necessary transformation, potentially through APIs exposed by the Operating System (OS), and the display firmware and drive must communicate with the OS a similar profile of the display itself. Giorgianni and Madden (2009) and Sharma (2018) are two excellent references for color management.

First, the image file should ideally have metadata that tells us what color space its pixel colors are encoded in or, better, the transformation matrix from the image’s color space to a device-independent color space, say the CIE XYZ space. The way to describe such information has been standardized by ICC in what is called the ICC profile (International Color Consortium 2019; Sharma 2018, chap. 5). We can embed an ICC profile in common image file formats such as JPEG.

Second, the display itself also has to report its native color space. To do that, modern displays usually come with an ICC profile that describes how to transform from the CIE XYZ space to the display’s native space. Now when the Operating System gets the image file, it would first transform the, say, sRGB colors to the XYZ space using the ICC profile in the image and then transform the colors in the XYZ to the display’s native space using the display ICC profile. You can see that the XYZ space here serves to connect the input color space and the output color space. ICC calls such a space a Profile Connection Space (PCS).

21.4.1 Gamut Mapping

A display might support a color space whose gamut is smaller than that of the image’s encoding space. For instance, the display might support only sRGB while the image is encoded in DCI-P3, so some of the P3 colors might not be accurately reproduced. That is, \(P_{disp}\) in ?eq-eetf_3 might be outside the [0, 1] bound. The best thing we can do is to approximate an out-of-gamut color with an in-gamut color to minimize the color error. This is called gamut mapping. Morovič (2008) and Glassner (1995, chap. 3.6) describe the basic algorithms, with the former being more recent and comprehensive.

The simplest strategy would be to simply clamp out-of-range values, so a color of [12, 200, 300] would become [12, 200, 255]. Clearly, other than being extremely simple to implement, this strategy would introduce large color reproduction errors. International Color Consortium (ICC) has defined four rendering intents, each of which corresponds to a gamut mapping algorithm (vaguely worded, and the implementation detail might vary).

For instance, the Absolute rendering intent leaves all the in-gamut colors unchanged but maps the out-of-gamut colors to the boundary of the color gamut. The Perceptual rendering intent can be implemented by uniformly projecting all the colors to the white point so that all the colors are in-gamut. You can imagine that while this maintains the relative color appearance between colors (which the Absolute rendering intent fails at), but it would also change in-gamut colors that could have been accurately rendered!

21.4.2 Chromatic Adaptation

During the color space transformation, we usually perform an additional transformation so that sRGB white becomes the white point in the display space. This is called chromatic adaptation, which is discussed in Section 6.3. This is to accommodate the fact that the viewer might be under a different viewing condition than the condition under which the photo was originally edited. The viewing condition could affect the actual appearance of a color, so we must account for this shift in viewing condition through chromatic adaptation.

Chen and Hasinoff. 2020. “Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a.” https://research.google/blog/live-hdr-and-dual-exposure-controls-on-pixel-4-and-4a/.

Chen, Ethan, Jiwon Chang, and Yuhao Zhu. 2024. “Coolerspace: A Language for Physically Correct and Computationally Efficient Color Programming.” Proceedings of the ACM on Programming Languages 8 (OOPSLA2): 846–75.

Giorgianni, Edward J, and Thomas E Madden. 2009. Digital Color Management: Encoding Solutions. Vol. 13. John Wiley & Sons.

Glassner, Andrew S. 1995. Principles of Digital Image Synthesis. Elsevier.

Hasinoff, Samuel W, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. “Burst Photography for High Dynamic Range and Low-Light Imaging on Mobile Cameras.” ACM Transactions on Graphics (ToG) 35 (6): 1–12.

ILM. 2025. “OpenEXR Specification, v 3.4.0.” https://openexr.com/en/latest/.

International Color Consortium. 2019. “Specification ICC.2:2019 (Profile version 5.0.0 - iccMAX).” https://color.org/specification/ICC.2-2019.pdf.

Lang, Karl. 2007. “Rendering the Print: The Art of Photography.” Adobe System Technical Paper.

Mantiuk, Rafał, Grzegorz Krawczyk, Dorota Zdrojewska, Radosław Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. 2015. “High Dynamic Range Imaging.” In Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley.

Miller, Michael E. 2019. Color in Electronic Display Systems. Springer.

Morovič, Ján. 2008. Color Gamut Mapping. 2nd ed. John Wiley & Sons.

Pharr, Matt, Wenzel Jakob, and Greg Humphreys. 2018. Physically Based Rendering: From Theory to Implementation. 3rd ed. MIT Press.

Reinhard, Erik. 2010. High Dynamic Range Imaging Acquisition, Display, and Image-Based Lighting. 2nd ed. Morgan Kaufmann Publishers.

Rowlands, D Andrew. 2020. “Color Conversion Matrices in Digital Cameras: A Tutorial.” Optical Engineering 59 (11): 110801–1.

Sharma, Abhay. 2018. Understanding Color Management. John Wiley & Sons.

VESA. 2024. VESA High-Performance Monitor and Display Compliance Test Specification. Video Electronics Standards Association.

Zhu, Yuhao. 2022. “Exploring Camera Color Space and Color Correction.” https://horizon-lab.org/colorvis/camcolor.html.

Case in point: you have probably never really felt too uncomfortable staring at a display but starring at white paper under noon sunlight is excruciating.↩︎
ITU-R refers to the Radiocommunication Sector of the International Telecommunication Union; these standards are formally designated with names such as ITU-R BT.2100, commonly shortened to Rec. 2100.↩︎
We encode luminance into digital values because the raw luminance data are continuous and would require floating-point representation, which cannot be sent directly to the display. Encoding reduces bandwidth demands and ensures compatibility with nearly all existing interface protocols.↩︎
given that \(V_{gs} > V_{th}\) for the TFT to operate in the saturation region.↩︎
except maybe in the CRT case where its EOTF^-1 roughly matches human luminance-to-brightness perception.↩︎