5  Colorimetry

Colorimetry is concerned with quantitatively studying color, a subjective experience. Not until we can put our experience into numbers can we rigorously study colors. In Section 4.2, we have seen two ways to geometrically interpret a color as a point in a three-dimensional space: the cone space and the CIE 1931 RGB space.

The main goal of this chapter is to introduce other commonly used color space to quantitatively analyze colors. Some of these color spaces are device-independent, just like the LMS cone space, so they permit us to analyze all human visible colors. Other color spaces are device-dependent; they are concerned with colors that can physically be captured (by an imaging device) or produceed (by a display device). Studying device-dependent spaces allows us to appreciate many subtle but important issues in real-world color workflows.

Classic colorimetry is concerned only with color matching under the same viewing condition. It tells us if two objects or light sources have the same color when viewed under exactly the same conditions (e.g., ambient illumination). It does not tell us 1) how different two colors are and 2) the actual appearance of a color, which depends on the viewing condition. Color difference will be discussed in Section 5.7; Fairchild (2013) is a classic reference on color appearance modeling, which we will touch upon in Section 7.3.

5.1 CIE 1931 XYZ Space

There are two slight inconveniences with the CIE 1931 RGB color space. First, it depends on the exact primary colors (and reference white) you choose. Second, there are also inevitably going to be colors that can be “produced” only by using negative amounts of the primaries, no matter what primaries you choose. While mathematically and physically rigorous, it is not quite intuitive. So CIE in 1931 wanted to standardize a color space that 1) can be used as a “common language” (without having to laboriously specify what the primaries are used every time you say “the RGB color space”) and that 2) all the human-visible colors are produced by mixing non-negative amounts of the primaries. That color space is called the CIE 1931 XYZ color space, sometimes referred to simply as the XYZ color space.

You might be wondering: isn’t the LMS cone space already a color space that satisfies the two conditions above, and if so, why do we have to invent a new XYZ space? The cone space is tied intrinsically to the HVS, so it does not vary (significantly) in population. It is also a color space where all the colors are expressed using positive amounts of the primaries (cone responses). These are all true, but remember the cone fundamentals were not reliably available back in 1931 (Section 3.2).

Figure 5.1: The CIE 1931 XYZ color space (right) is constructed to be a linear transformation from the CIE 1931 RGB color space (left). Notice how a color, say, 600 \(\text{nm}\) spectral light is represented differently in the two color spaces. This figure visualizes how the spectral locus and the CMFs are transformed. The exact coefficients of the transformation matrix \(T_{rgb2xyz}\) are omitted here but are widely available online. The CIE 1931 RGB CMFs figure is adapted from Marco Polo (2007), and the XYZ CMFs figure is adapted from Acdx (2009).

Fairman, Brill, and Hemmendinger (1997), Brill (1998), and Service (2016, sec. 4) describe the process and the (sometimes rather arbitrary) design decisions that went into turning the CIE 1931 RGB space into the 1931 XYZ space. Zhu (2022c) is an interactive tutorial that walks through the math.

The bottom line is that the transformation from the CIE RGB to the XYZ space is constructed to be a linear transformation. Figure 5.1 shows how the spectral locus is transformed from the RGB to the XYZ space, governed by the matrix \(\mathbf{T}_{rgb2xyz}\). We can see that in the RGB space the spectral locus enters negative octants, but it stays entirely within the all-positive, first octant in the XYZ space. The transformation also gives a new set of CMFs in the XYZ space. The Y CMF is intentionally designed to match the CIE 1924 Luminous Efficiency Function (LEF), so that by looking at the Y value of a color, we can tell what its luminance is (refer to Section 4.3.2 for the definition of the LEF and its various caveats).

5.2 Chromaticity Diagram

How do a color that is mixed from 1:2:4 units of RGB primaries and a color that is mixed from 2:4:8 units of the primaries relate? The amount of a primary is directly proportional to the power of that primary, so the second color can be obtained by doubling the power of each primary in the first color. Similarly, halving the power of each primary in the second color gets us the first color. Intuitively, lights that have the same primary quantity ratio have the same “objective color quality” while differing in the intensity.

5.2.1 Chromaticity is the Result of a Perspective Projection

More formally, we can calculate the primary ratio \(r:g:b\) of a color and then normalize the ratio such that \(r + g + b = 1\) (100%). The so-calculated \(r\), \(g\), \(b\) values of a color are called the (RGB) chromaticity values of that color. Mathematically, the chromaticity of a color defined in an RGB space is calculated from its absolute quantity by:

\[ \begin{align} r = \frac{R}{R+G+B}\\ g = \frac{G}{R+G+B}\\ b = \frac{B}{R+G+B} \end{align} \]

Geometrically, going from the RGB values of a color to the rgb chromaticity is equivalent to a perspective projection, where we project an [R, G, B] point through the origin to the \(r+g+b=1\) plane. The left panel in Figure 5.2 visualizes this projection. Each line that goes through the origin is an “equi-chromaticity” line, in that all the colors on that line have the same chromaticity. The spectral locus is so projected to the \(r+g+b=1\) plane. Since there are only two degrees of freedom in chromaticity, we can visualize the chromaticity in a two-dimensional space, and usually the \(r\) and \(g\) coordinates are used. The right panel in Figure 5.2 shows the spectral locus in the rg-chromaticity diagram.

Figure 5.2: Visualization of the CIE 1931 RGB space and its rg-chromaticity diagram. Left: the transformation from an [R, G, B] color to its [r, g, b] chromaticity is a perspective projection to the \(r+g+b=1\) plane. Each line that goes through the origin is an “equi-chromaticity” line, in that all the colors on that line have the same chromaticity. We use the CIE 1931 RGB color space for illustration here, but the same idea applies to other color spaces as well, e.g., the CIE 1931 XYZ space. From the interactive tutorial in Zhu (2022b). Right: visualization of the spectral locus in CIE 1931 RGB space; from Fairman, Brill, and Hemmendinger (1997, fig. 2).

5.2.2 xy-Chromaticity Diagram and Its Interpretation

Of course we can do the same if a color is defined in the XYZ space or the LMS cone space, and we omit the trivial math here. The left panel in Figure 5.3 shows the xy-chromaticity diagram. It is obtained by first converting from the XYZ space to the xyz space and then plotting only the x and y axes (z is implicit in that \(x+y+z=1\)). The horseshoe curve is the spectral locus. For the reference, we also show the three primary lights and the white point of the CIE 1931 RGB color space as well as the Planckian locus, which shows the chromaticities of the black-body radiation at different temperatures (Figure 4.5).

Figure 5.3: Left: The gamut and spectral locus of the CIE 1931 RGB space visualized in the xy-chromaticity diagram; adapted from PAR (2012). The Planckian locus is shown for the reference too. A point outside the (convex) spectral locus is an imaginary color. Right: comparison of different color spaces in the xy-chromaticity diagram; from Myndex (2022). A color space’s chromaticity gamut is a triangle; a color outside the triangle cannot be physically produced in that color space.

We can make a few general observations. First, the triangle in the diagram represents the chromaticity values of all the colors that can be produced by mixing different amounts of the three colors whose chromaticities are the vertices of the triangle. That is, given three colors \([R_1, G_1, B_1]\), \([R_2, G_2, B_2]\), \([R_3, G_3, B_3]\) and their chromaticity coordinates \(\mathbf{c_1} = [\frac{R_1}{R_1+G_1+B_1}]\), \(\mathbf{c_2} = [\frac{R_2}{R_2+G_2+B_2}]\), and \(\mathbf{c_3} = [\frac{R_3}{R_3+G_3+B_3}]\), we can show if we mix these colors to form a color C, \([\alpha R_1 + \beta R_2 + \gamma R_3, \alpha G_1 + \beta G_2 + \gamma G_3, \alpha B_1 + \beta B_2 + \gamma B_3]\) (\(\alpha, \beta, \gamma\) are the contributions of the primary colors), C’s chromaticity is necessarily inside the triangle \(\bigtriangleup \mathbf{c_1}\mathbf{c_2}\mathbf{c_3}\). So the triangle \(\bigtriangleup \mathbf{R}\mathbf{G}\mathbf{B}\) represents the chromaticities that can be physically produced by the CIE 1931 RGB primary lights. We call that the chromaticity gamut of the color space, or sometimes simply the gamut of the color space, but we should keep in mind that the actual gamut of a color space is always a three-dimensional concept.

Second, we can extend from mixing three colors to mixing an arbitrary number of colors and show that the interior of the spectral locus represents the chromaticities of all the colors that humans can see, i.e., the gamut of the HVS. This is true because the shape of the spectral locus is convex, so connecting any two points (i.e., mixing two colors) on or inside the locus will never go beyond the locus. By extension, a positive linear combination of any points on or inside the locus will always stay inside the locus. A natural implication is that any point outside the spectral locus represents an imaginary color, since that point can never be constructed by a positive linear combination of points on or inside the spectral locus.

Third, the right panel in Figure 5.3 shows the gamut of a few common color spaces. The sRGB color space is the most commonly used color space; virtually every single display supports it, and images, by default, are encoded in the sRGB format. We will have more to say about displays and image encoding later. Observe how small the sRGB gamut is: it covers about 35% of the HVS gamut. P3 is a more wider gamut that is supported in many new displays. Rec.2020 is an even wider gamut that is yet to be widely supported; it is 72% larger than the sRGB gamut and 37% larger than the P3 gamut. ProPhotoRGB contains colors that are beyond the HVS gamut, so to produce all the real colors in the ProPhotoRGB space we will need more than three primary lights. It is mostly used in Adobe Lightroom and Adobe Camera RAW software. They both deal with RAW images before they are encoded in a format that is displayable. We will talk about RAW imaging and processing later in Chapter 14.

Finally, no display can produce all the colors that humans can see. No matter where you choose to place the primary colors in the chromaticity diagram and how many primaries you choose, the resulting gamut will never completely cover the entire HVS gamut as long as the primary colors are real colors (i.e., on or inside the spectral locus) and you have a finite number of them. This is again because the spectral locus is convex. For this reason, do not trust the colors in any xy-chromaticity diagram: the undisplayable colors are approximated by in-gamut, displayable colors. This is called gamut mapping, which we will discuss in Section 5.6.3.

5.2.3 HVS Gamut

We can systematically sample the chromaticities in the chromaticity diagram to visualize how the HVS gamut looks like. Figure 5.4 visualizes the HVS gamut in both the XYZ space and the xy-chromaticity diagram. Comparing the two, you can see how a selected set of colors in the highlighted XYZ space map to a curve in the xy-chromaticity diagram.

Figure 5.4: HVS gamut visualized in the XYZ space and in the xy-chromaticity diagram. We systematically sample the chromaticities in the chromaticity diagram using square pulses as the light SPDs (insets on the right). From the interactive tutorial in Zhu (2022d).

There are, of course, many ways you can sample the chromaticities to get good coverage of the HVS gamut, and Zhu (2022d) is an interactive tutorial that talks about this in detail (you can also see what the HVS gamut looks like in different color spaces). A common way seems to be to generate SPDs that are square pulses with equal peaks (see the insets on the right), which will guarantee that you do not repeatedly sample the same chromaticity point. This is what the popular Python package Colour (NumFOCUS n.d.) does, but nothing prevents you from using a different method, as explored in Zhu (2022d). Of course, the actual HVS gamut has no boundary: we can indefinitely grow the gamut by simply scaling up the light power.

5.3 Color Cube

The various color spaces we have been discussing are great, but they do not seem to be the sort of color spaces we use in everyday software when specifying colors. By far the most common way in practical applications to specify colors is by using a color cube, where you can specify the primary values (usually R, G, and B) of a color, each an integer between 0 and 255. What exactly are the colors that can be represented by such a color cube? How is it related to the color gamut we have discussed, and how do we construct a color cube? These are questions explored in the interactive tutorial (Zhu 2022d), which you are invited to go through. Figure 5.5 illustrates the idea, and we will give a brief summary of the main steps.

Figure 5.5: Pick the primary colors (which usually are termed R, G, and B, because they usually are red-ish, green-ish, and blue-ish) and the white point in the xy-chromaticity space (left panel) and then construct a color cube from them (right panel). Note how the spectral locus is now positioned in the constructed RGB space. From the interactive tutorial in Zhu (2022a), which we invite you to study, you can see that as you change the primary colors and/or the white point, the resulting color gamut and the color cube will change accordingly.

5.3.1 Step 1: A Linear Transformation From the XYZ Space

  • We know that a color space is defined by its three primary colors and the white point, which you get to choose when building your own color cube. The left panel shows one such choice, which happens to be what is used by the sRGB color space.

  • Knowing these four points uniquely defines the shape of a parallelepiped in the XYZ space (middle panel). The space inside the parallelepiped corresponds to actual colors that can be produced by using the primary colors.

    Note that at this point we know only the relative shape, but not the absolute scale, of the parallelepiped: we can uniformly scale the power of the primary colors and white point, which will not change their chromaticity values but will expand or shrink the parallelepiped. The convention is to set the Y value of white to be 1 and normalize everything else accordingly, but of course the actual luminance of white (and any other color) depends on the actual device used.

  • Now we turn the parallelepiped to a cube that is positioned between [0, 1] in all three directions (right panel). The white point in the XYZ space will be [1, 1, 1] in the color cube, signifying that white is produced from equal units of the three primary colors. This amounts to a linear transformation from the XYZ space.

    Note also how the spectral locus is now positioned in the RGB space: part of the locus (and by extension the HVS gamut) is now outside the RGB cube, showing that there exist real colors (i.e., inside the HVS gamut) that cannot be produced by the choice of the primary colors. This is consistent with our gamut interpretation in the chromaticity diagram (Figure 5.3).

What we have done so far is to construct a linear transformation matrix, \(T_{xyz2rgb}\), which transforms the parallelepiped (middle panel in Figure 5.5) to a cube (right panel in Figure 5.5). This transformation matrix will change if we change any primary color or the white point of our color space (the interactive tutorial in Zhu (2022a) will allow you to do exactly that). Either way, the color cube we have built so far is luminance-linear: if we double the power of a light whose color is [R, G, B], we will get a color [2R, 2G, 2B]. This is because the XYZ space is luminance-linear, and the RGB cube we have so far is a linear transformation from the XYZ space.

5.3.2 Step 2: Color Quantization and Gamma

We get a cube now, but we are not done yet. The cube is a continuous solid between [0, 0, 0] and [1, 1, 1], but the digital representation of a color is discrete and finite, so we have to quantize the solid. Assuming we have, say, 8 bits (i.e., 256 discrete levels) to represent the contribution of each primary color, the question is how to allocate the 256 levels to the [0, 1] range.

So far in our discussion, the contribution of a primary color is linearly correlated with the power of the primary: doubling the contribution of a primary requires doubling the power of the corresponding light. Therefore, a uniform allocation of the bits would mean uniformly quantizing the power range, which, however, is not ideal. As we have seen in Section 3.6.1, the electrical response of a photoreceptor is not linearly proportional to the light power (even though the amount of photon absorption and pigment excitation are!); the response incrementally saturates as the light power increases. As a result, the perceptual brightness level also gradually saturates with the light power. Therefore, uniformly quantizing the power range would lead to a non-uniform quantization of the brightness range, leading to large quantization error perceptually.

To best use the limited bit budget, therefore, we would ideally want to uniformly quantize the brightness range, not the power range. A common method is to first model the brightness level (\(B\)) as a power-law function of the raw channel value (\(v \in [0, 1]\)) by \(B=v^{1/2.2}\) and then quantize \(B\) uniformly. The constant factor \(2.2\) is called the gamma of the system. For instance, a red-channel value of 0.5 would translate to \(\lfloor 0.5^{1/2.2} \times 255 \rfloor = 186\) in an 8-bit encoding. The relationship between \(B\) and \(v\) is called the Opto-Electronic Transfer Function (OETF). OETF is usually performed by an imaging system such as a camera, which turns optical signals (luminance) into electrical signals (bits in a color space).

The sRGB standard (Anderson et al. 1996) essentially uses this approach with one slight tweak to avoid numerical issues when \(v\) is small. In particular, the sRGB standard uses linear scaling when \(v\) is very small (below 0.0031308). This makes sense given our understanding in Section 3.6.3 that the receptor’s electrical response is approximately linear against the light luminance when the luminance is very low. The sRGB standard also adjusts the gamma to be 2.4 so that the overall quantization function approximates a uniform power-law function with a gamma of 2.2. As a result, the OETF used in sRGB is:

\[ B = \begin{cases} 12.92 v, & v \leq 0.0031308\\ 1.055 v^{1/2.4} - 0.055, & v > 0.0031308 \end{cases} \]

where \(v \in [0, 1]\) is one of the three RGB channels, and an 8-bit quantization is applied to \(B \in [0, 1]\) to bring each channel to an integer between 0 and 255.

Note that the gamma-based OETF does not model the actual relationship between perceived brightness and light luminance, but it is a close engineering hack. The behavioral brightness perception is largely accounted for by the photoreceptor/RGC response to light intensity. As we discussed in Section 3.6.3, the relationship between the electrical response of a photoreceptor and the light intensity is usually modeled by a (generalized) Michaelis equation, which incrementally saturates and exhibits a diminishing return, the same characteristic that the power-law function also possess.

There are two caveats here. First, \(v\) is proportional to luminance \(L\), but is not exactly \(L\), so the same \(v\) will result in different \(L\)s on different displays that differ in their peak luminance. So encoding \(B\) as a power-law function of \(v\) does not mean the OETF actually models the correct relationship between \(B\) and \(L\). That is why the sRGB standard specifies the peak luminance of the display (white point) as 80 cd/m\(^2\). Presumably this means that at this particular luminance range (0 to 80 cd/m\(^2\)), the relationship between \(B\) and \(L\) roughly follows the power law. Second, light adaptation (Section 7.1) will also play a role, since the HVS responds to contrasts over the mean illuminance, rather than absolute illuminance, and the mean illuminance varies largely across viewing environments. The sRGB standard also specifies that the mean illuminance level of the viewing environment to be 64 lux. When actually viewing an sRGB image, both conditions are rarely met, so take all these with a huge grain of salt.

5.4 HSB/HSL/HSV Space

A color cube is one way to represent an RGB color space. Another common way to represent an RGB color space is to use a cylindrical-coordinate representation. There are two such representations, HSL (Hue, Saturation, and Lightness) and HSV (Hue, Saturation, and Value), which is also called HSB (B for Brightness). These are not new color spaces; their gamut is identical to that of the corresponding RGB color space. They are just different ways to organize colors in a color space; instead of using three-dimensional coordinates to represent a color as in a color cube, they use cylindrical coordinates.

Figure 5.6: We can represent an RGB color cube (left) using cylindrical coordinates. One such representation is the HSL color space (right), where hue, saturation, and lightness have intuitive interpretations. Hue and saturation also have intuitive interpretations in the CIE 1931 xy-chromaticity diagram, which normalizes luminance so lightness information is absent. Left: from SharkD (2010b). Middle: adapted from SharkD (2010a). Right: from BenRG (2009).

Figure 5.6 compares a typical color cube (left) and its HSL representation (right). We omit the transformation math here, but one can imagine how we turn the white point in a color cube to the top plane, the black point to the bottom plane, and expand everything else so that a cube surface morphs into a cylindrical surface. The three dimensions in an HLS space are hue, saturation, and lightness. Very informally, hue represents subjectively different colors (red, orange, yellow, etc.), saturation represents how much white a color has (a color with a higher saturation means it is more “pure”), and lightness represents the brightness. In this sense, hue and saturation also find their interpretations in the CIE 1931 xy-chromaticity diagram (right), where a color closer to the spectral locus has a higher saturation (and colors closer to white-ish colors are desaturated), and the spectral locus cycles through different hues. Lightness is not concerned with in the chromaticity diagram, which normalizes the color intensity.

You can imagine what the benefit of using an HSL/HSB color space is. It is more intuitive to pick colors in these color representations since the three dimensions have intuitive interpretations that better align with how we describe colors in our everyday language. So we can more easily reason about how a color will change if we vary a dimension. In contrast, it is sometimes hard to predict how a color will change when we, say, increase the red channel by 10. I almost exclusively use the HSL/HSB space when picking colors in graphing software.

5.5 Display Native Gamut

A display has a native color space. Each display pixel is implemented by (usually) three sub-pixels, each of which has an implementation-specific SPD and acts as a primary light. The retina then spatially integrates the lights from the three sub-pixels, i.e., mixing the three primary colors. We can individually control the luminance of each sub-pixel and, by extension, the actual color of the mixed pixel. The luminance can be controlled by 1) the duty cycle of a pixel through Pulse Width Modulation (PWM), 2) the current supply to each sub-pixel, or 3) the voltage supply to each sub-pixel. The luminance is strictly linear with respect to the drive signal in the first case, approximately linear in the second case, and non-linear in the third case (Miller 2019, p. 112). The mapping from the electrical drive signal strength to the luminance level is usually called the Electro-Optical Transfer Function (EOTF).

The display’s native color space is most likely not exactly sRGB or any standard color space. The primary colors (and the white point) depend on the emission spectrum of each sub-pixel, which in turn depends on the material used. For instance, inorganic LEDs have a narrower emission spectra than the organic LEDs (Huang et al. 2020), so they tend to be able to generate more saturated colors and, thus, the resulting display gamut is wider. One has to balance multiple trade-offs in a display design, such as invariance of chromaticity vs. luminance, lifetime, power consumption, and cost, so it is difficult to tune the pixel spectra just so that the colors precisely match that of a standard.

Field Sequential Displays (FSD) rely on the temporal integration of our visual system to create different colors. The most common example of an FSD is modern Digital Light Processing (DLP) projectors. We will not discuss specific display implementations; instead, we will focus on the color space of a display regardless of how the colors are produced.

Figure 5.7: Microscope-magnified subpixel images of P3 green and sRGB green primary (both are [0, 255, 0] in their respective color spaces) on a 4th-generation iPad Pro taken from an iPhone 12 Pro (whose image signal processing chain introduces color inaccuracies; the red sub-pixel contributions to the sRGB green are not as strong when seen by naked eye). As a side note, you can also see that when the image is focused on the green sub-pixels, the red (and blue) sub-pixels are out of focus, a result of chromatic aberration.

As an example, Figure 5.7 shows the the sub-pixels images of the green primary colors in the P3 and sRGB color space as displayed on a 4th-generation iPad Pro. We can make a few observations. First, the emission patterns of P3 green and sRGB green are different. The P3 green is more “pure”, where the red and blue sub-pixels are contributing very little, whereas the sRGB green requires noticeable contribution from the red sub-pixels. This is not surprising because the P3 green is much more saturated (closer to spectral colors) than the sRGB green, as shown in the right figure in Figure 5.3. The actual contributions of red sub-pixels in sRGB green as seen by my eye are not as strong as seen in this iPhone-taken image; the image signal processing pipeline in the iPhone definitely has introduced its artifacts.

Second, even for the P3 green, there are still some contributions from the red sub-pixels. This suggests that the native display gamut is different from (and larger than) P3. This makes sense: for a display to support a particular color space, say, the P3 space, the display’s native color space must be no smaller than the P3 space.

5.6 Color Management

An end-to-end workflow might involve multiple output media (e.g., displays, prints), and it is important to correctly translate colors between them to accurately reproduce the color appearance. There are a few issues that need our attention.

First, you might edit a photo encoded in the P3 color space, save the photo in a file, and share it with your friend, who will view the image on a display that supports only the sRGB color space. Multiple color spaces are involved here. The image is first encoded in the P3 space and then will have to be reinterpreted in the sRGB space. A color, say, [10, 20, 30] encoded in the P3 color space is not the same color as the sRGB color [10, 20, 30], so we must correctly translate a color encoded in the source color space to the destination color space.

Second, a potential issue in this transformation is that the P3 color space has a larger gamut than that of sRGB, so there will necessarily be colors in the photon that will never be accurately reproduced on your friend’s display — what do we do with these colors? Each display also has its own native color space, and an sRGB/P3 image will have to be transformed to the display’s native space. Fundamentally, if we want to display, say, a P3-encoded image, the display’s native gamut must be no smaller than P3.

Finally, the viewer might be under a different viewing condition than the condition under which the photo was originally edited. The viewing condition could affect the actual appearance of a color, so we must account for this shift in viewing condition.

Taking care of all these is part of color management, whose goal is to maintain a consistent color appearance throughout the workflow that might involve wildly different devices. It requires a collaboration between every single piece that touches color in the workflow: the image file must come with a profile that specifies what color space its pixel colors are encoded in and (an estimation of) the viewing condition under which the image was originally edited/viewed, the software that manipulates image content must correctly read and interpret the profile and perform the necessary transformation, potentially through APIs exposed by the Operating System (OS), and the display firmware and drive must communicate with the OS a similar profile of the display itself. Giorgianni and Madden (2009) and A. Sharma (2018) are two excellent references for color management. We will describe the key issues here.

5.6.1 Color Space Transformation

When opening and viewing an image encoded in, say, sRGB on a display, a few transformations have to happen (Miller 2019, chap. 7.1). The display’s native color space is most likely not exactly sRGB or any standard color space; we must correctly translate a color encoded in the sRGB space to the display’s space. A color [10, 20, 30] encoded in sRGB is not the same color as [10, 20, 30] in the display’s color space. This transformation is done in two steps.

First, the image file ideally has metadata that tells us what color space its pixel colors are encoded in or, better, the transformation matrix from the image’s color space to a device-independent color space, say the CIE XYZ space. The way to describe such information has been standardized by International Color Consortium (ICC) in what is called the ICC profile (International Color Consortium 2019). We can embed an ICC profile in common image file formats such as JPEG. Second, the display itself also has to report its native color space. To do that, modern displays usually come with an ICC profile that describes how to transform from the CIE XYZ space to the display’s native space. Now when the Operating System gets the image file, it would first transform the sRGB colors to the XYZ space using the ICC profile in the image and then transform the colors in the XYZ to the display’s native space using the display profile 1. You can see that the XYZ space here serves to connect the input color space and the output color space. ICC calls such a space a Profile Connection Space (PCS).

The transformation from the XYZ space to the display’s native space is necessarily linear. To calculate the transformation matrix, we will first measure the chromaticity values of the display’s native primary colors and the white point offline (Balasubramanian 2003). Then we take the exact the same steps as described in Section 5.3.1: we are essentially creating a color cube for the display ([1, 1, 1] represents the display white point, i.e., when all the sub-pixels emit maximum luminance, etc.).

5.6.2 Converting Pixel Colors to Drive Signals

After this transformation, we have obtained a set of luminance-linear, analog (between [0, 1]) color values in the display’s native color space. The next step is to turn the real-valued colors into discrete values (drive signals) that can be sent to the display to control the luminance of each sub-pixel. Ideally, we want 255 (assuming 8 bits) to produce maximum luminance and 0 to produce minimum luminance. Depending on how the display adjusts its luminance (by PWM, current, or voltage), the drive signal vs. luminance relationship, i.e., EOTF, may or may not be linear. Either way, we can offline calibrate an EOTF look-up table (or regress a function), from which we can then map a desired luminance level to a discrete value.

What is the desired luminance level for a pixel? It would be amazing if your display could reproduce the scene luminance, but that is unlikely, because the real world has a much higher, orders of magnitude higher, dynamic range (DR) than that of a display. A main challenge in imaging and display, thus, is tone mapping, which is concerned with mapping a high-dynamic-range scene to a low-dynamic-range display. This mapping can be described by an Opto-Optical Transfer Function (OOTF). Both the OETF of an imaging system and the EOTF of a display participate in the OOTF, and if the product of OETF and EOTF is not the desired OOTF, one would need to implement an Electro-Electrical Transfer Function (EETF) as part of the image processing pipeline to reach the desired OOTF. Tone mapping is the focus of extensive research (Reinhard 2010; Mantiuk et al. 2015).

5.6.3 Gamut Mapping

When viewing a P3-encoded image on a display whose gamut is smaller, e.g., similar to that of sRGB, the colors might not be accurately reproduced. The best thing we can do is to approximate an out-of-gamut color with an in-gamut color to minimize the color error. This is called gamut mapping. Morovič (2008) and Glassner (1995, chap. 3.6) describe the basic algorithms, with the former being more recent and comprehensive.

The simplest strategy would be to simply clamp out-of-range values, so a color of [12, 200, 300] would become [12, 200, 255]. Clearly, other than being extremely simple to implement, this strategy would introduce large color reproduction errors. ICC has defined four rendering intents, each of which corresponds to a gamut mapping algorithm (vaguely worded, and the implementation detail might vary). For instance, the Absolute rendering intent leaves all the in-gamut colors unchanged but maps the out-of-gamut colors to the boundary of the color gamut. The Perceptual rendering intent can be implemented by uniformly projecting all the colors to the white point so that all the colors are in-gamut. You can imagine that while this maintains the relative color appearance between colors (which the Absolute rendering intent fails at), but it would also change in-gamut colors that could have been accurately rendered!

5.7 Color Discrimination and Color Difference

In many practical applications, we need to calculate color differences. For instance, an image synthesis algorithm might want to be minimize the color difference in the synthesized image and some form of “ground truth”; a display’s color reproduction might not be 100% accurate, so we want to quantitatively compare the quality of different displays by measuring the color difference (compared to the colors to be reproduced) each introduces.

Fortunately, once we put colors into a three-dimensional coordinate system, calculating color differences becomes natural: the Enclidan distance between two colors gives a measure of the difference between the two colors. However, for the Euclidean distance to be a good measure, we must be sure that the distance is proportional to the perceptual color difference. How do we quantify the perceptual color difference?

5.7.1 Color Discrimination

Practically there are not many cases where we need to quantify large color differences. What is more important is to quantify small color differences. For a given reference color, we can use a threshold-detection psychophysical paradigm such as the one described in Krauskopf and Karl (1992) to estimate the set of colors that can just barely be discriminated from the reference color. These experiments are called color discrimination tests.

Color discrimination experiments typically use a n-alternative forced choice (nAFC) paradigm (Krauskopf and Karl 1992; Hansen, Pracejus, and Gegenfurtner 2009; Duinkharjav et al. 2022; Danilova and Mollon 2025; Hong et al. 2025). Figure 5.8 (a) shows a 4AFC variant from Duinkharjav et al. (2022). The visual field consists of an adapting background and four color patches. The background controls the light and chromatic adaptation state of the participant, which is shown to have significant impact on the color discrimination results (Krauskopf and Karl 1992). Among the four color patches, three have the same reference color whose discrimination contour we want to estimate. The other randomly placed color patch, the test patch, has a different color. A participant is instructed to identify which one of the 4 color disks appears different. The participant fixes their gaze on a crosshair at the center of the screen for the duration that the stimuli are shown; this makes sure that the four color patches are all placed at a given eccentricity.

Figure 5.8: (a): A 4AFC color discrimination trial; adapted from Duinkharjav et al. (2022, fig. 2a). (b): iso-luminance discrimination contours plotted in the DKL space (Derrington, Krauskopf, and Lennie 1984), where all the colors have the same luminance (L+M response); from Krauskopf and Karl (1992, fig. 14). (c): MacAdam ellipses (measured at 2\(^{\circ}\) eccentricity) plotted in the xy-chromaticity diagram (the ellipse sizes are magnified 10 times to be more visible); from Anonymous (2009). (d): iso-luminance discrimination contours in the DKL space similar to (b), but now the adapting field is the reference color itself; from Krauskopf and Karl (1992, fig. 8). (e): contours corresponding to a \(\Delta E_{00}\) = 1.0 in the \(a^*\)-\(b^*\) plane in the CIELAB space; from G. Sharma (2003, fig. 1.18). (f): discrimination contours under two different eccentricities (10\(^{\circ}\) and 25\(^{\circ}\)); from Duinkharjav et al. (2022, fig. 4a).

Each trial uses a 1-up-2-down staircase procedure (Cornsweet 1962; Leek 2001; Treutwein 1995) to adjust the color of the test patch: the test color is moved closer to the reference color (a harder trial) if the participant identifies the test patch correctly twice in a row, and moved farther away from the reference color (an easier trial) upon an incorrect response.

Figure 5.8 (b) shows a set of discrimination contours obtained by Krauskopf and Karl (1992) plotted in the DKL space (Derrington, Krauskopf, and Lennie 1984), which we have discussed in Section 4.4.3. The individual discrimination thresholds of a reference color are fit with an ellipse to approximate the discrimination contour, where the reference color is placed at the center of the ellipse. In their experiments, the reference colors were all iso-luminant (i.e., identical L+M responses) and the test colors were all forced to be on the same iso-luminance plane. Of course, the actual discrimination contour would be 3D ellipsoid, and obtaining that data would be take a huge amount work: we would have to sample a 3D space for the reference colors and, for each reference color, sample another 3D space to obtain its discrimination thresholds. The results in Figure 5.8 (b) essentially simplifies the 6D sampling to 4D.

We can see, from Figure 5.8 (b), that the discrimination contours are quite regularly placed in the DKL space. A striking feature is that the thresholds on one dimension do not change as the reference color changes along the other direction. For instance, the threshold along the S-(L+M) axis, i.e., the Yellow-Blue dimension, is roughly constant for the three reference color that differ only along the L-M axis, i.e., the Red-Green dimension. This seems to suggest that color discrimination might be independently mediated by the two opponent processes.

The first set of color discrimination data was collected by David MacAdam in his seminar work MacAdam (1942) and MacAdam (1943) 2. MacAdam did not use the 4AFC strategy above, but indirectly estimated the thresholds using variations in color matching experiments. A modern rendition of his results is shown in Figure 5.8 (c) in the CIE 1931 xy-chromaticity space; the ellipse sizes are magnified ten times to be visible. A practical interpretation of a dicrimination contour is that with an ellipse all the colors are non-discriminable with respect to the center, reference color.

5.7.2 Adaptation and Eccentricity Dependence

Discrimination contours depend on both the adaptation state and eccentricity. Figure 5.8 (d), also from Krauskopf and Karl (1992) shows, the discrimination contours when the background, adapting light is the reference color itself and the participant is fully adapted to that color. We will study chromatic adaptation later in Section 7.3, but what this practically means is that, in this experiment, the background color is perceived as achromatic, even though it might not normally be considered so. We can see that the ellipses are wildy different from those in Figure 5.8 (b). In particular, it seems like the threshold along the L-M dimension does not change at all with the L-M response of the reference color.

MacAdam’s original data were collected at 2\(^{\circ}\) eccentricity. Given that the visual acuity reduces as the eccentricity increases, it is only natural that the discrimination contours expand in size as the eccentricity. Duinkharjav et al. (2022) measures the ellipses under different eccentricities. Figure 5.8 (e) compares the results between 10\(^{\circ}\) and 25\(^{\circ}\). Not surprisingly, the ellipses are larger in the latter. The qualitatively different contour shapes between Duinkharjav et al. (2022) and Krauskopf and Karl (1992) is perhaps due to the differences in the measurement methodology, which warrants further investigations.

5.7.3 Color Difference and Perceptually Uniform Color Space

The difference between the reference color and a color on its discrimination contour is called the Just Noticeable Difference (JND). A color space is said to be “perceptually uniform” if the JND measure is a constant anywhere in the color space along any direction. In such a color space, the Euclidean distance would be a measure of color difference.

Unfortunately, no perceptually uniform color space has even been identified. Consider the results of Krauskopf and Karl (1992) or MacAdam, the discrimination contours are ellipses (both in the DKL space and in the xy space), not circles, so the JND for a reference color varies angularly. Worse, the discrimination contour changes its shape as the reference color change, suggesting that the JND is also spatially varying.

Quite a few attempts have been made to transform the XYZ space into a more perceptually uniform space. Among them, the two common ones are the CIE 1976 \(Lu^*v^*\) (CIELUV) space and the (more widely used) CIE 1976 \(La^*b^*\) (CIELAB) space, both of which are non-linear transformations from the XYZ space. The so-called CIE Delta E 1976 color difference metric (\(\Delta E_{ab}^*\) ) is defined as the Euclidean distance in the CIELAB space. If CIELAB is truly perceptually uniform (as far as color discrimination is concerned), \(\Delta E_{ab}^*\) being 1.0 would mean a JND. However, this is not true (G. Sharma 2003, fig. 1.18).

CIE has since recommended a new, much more involved, and non-Euclidean measure in the CIELAB space, called the Delta E 2000 metric (\(\Delta E_{00}\)), to achieve better perceptual uniformity (G. Sharma, Wu, and Dalal 2005). Figure 5.8 (f) shows iso-discrimination contours corresponding to \(\Delta E_{00}\) = 1.0 in an \(a^*\)-\(b^*\) (iso-luminance) plane in the CIELAB space. If the \(\Delta E_{00}\) is to be considered a perceptually uniform color difference metric, the CIELAB space itself must not be perceptually uniform given the varying contour shapes throughout the space. What if we construct a new space by transforming the CIELAB space using the \(\Delta E_{00}\) metric — would that space be perceptually uniform? The answer is unlikely: the discrimination contours at the low \(b^*\) end become weirdly non-convex, which is not a reflection of the human discrimination data but, rather, of the limitation of the \(\Delta E_{00}\) metric itself. This mean there is no simple, true color difference measure in that space either — if we insist on defining a perceptual uniform space to be one where a simple Euclidean distance is proportional to perceptual difference.

5.7.4 Science vs. Engineering

I always feel that color discrimination is a topic that exemplifies the similarities and differences between engineering and science.

From an engineering perspective, we would like to have a practical tool that allows us to quantify perceptual color differences, which have huge implications on capturing, storing, computing, and displaying colors. We want to make sure our workflow preserves the color fidelity as much as possible, and a quantitative metric is, just like any other engineering problem, a pre-requisite. Since the linear LMS/XYZ spaces and Euclidean distances are insufficient, we turn to non-linear spaces like CIELAB and non-Euclidean measures like \(\Delta E_{00}\). The goal is engineering convenience. It is fair to say that while they are not perfect, they have significantly improved color workflows in practice.

Vision scientists approach this problem with a different goal. The retina has access to only the cone reponses but the cone space itself is not perceptually uniform, so a natural question to ask is: how does color discrimination arise from cone reponses? It is of course of practical value to figure out the mechanisms that mediate our ability to discriminate between two colors; at the very least, we can design better color difference metrics for engineering applications. But fundamentally, understanding color discrimination might provide new insights of human color vision as a whole, and that is the scientific value of studying this problem.


  1. While in the XYZ space, we usually perform an additional transformation so that sRGB white becomes the white point in the display space. This is called chromatic adaptation, which we will discuss later in Section 7.3.↩︎

  2. MacAdam did the work while working for Eastman Kodak at Rochester and he later was an adjunct professor at the Institute of Optics, University of Rochester.↩︎