2 From Light to Vision

This chapter gives an overview of the human visual system. We start from eye optics, which direct light to the retina, where the optical-to-electrical signal transduction takes place. We describe a few basic facts about the retina, focusing on the structure and functions of retinal processing. We then briefly talk about processing that takes place after the retina, i.e., in the Lateral Geniculate Nucleus (LGN) and in the visual cortex.

2.1 The Big Picture

Before studying the HVS, it is useful to start by discussing why we care about the HVS at all — after all, if you are a computer science and/or engineering student, why would you care? We will then discuss the methodology we will use when studying the HVS.

2.1.1 Why Do We Study HVS?

Why do we care about studying the HVS? First and foremost, for the science itself — it is extremely satisfying to just understand “how stuff works”. Understanding the basics of the HVS will also allow us to investigate the unknowns of the HVS, and computer scientists have a lot to off. For instance, modern computational methods, especially deep (artificial) neural networks, have provided us a new toolbox to better understand the biological neural networks: if a signal representation or a learning paradigm is effective in deep neural networks, would it be possible that our HVS uses a similar representation or can learn based on similar representations?

For computer scientists and engineers working on visual computing systems, there is another reason, which is already illustrated in Figure 1.3. The psychological experiences of the users of a computing platform, be it an AR/VR headset or a smartphone, are what we want to influence, but we, for the most part, exert that influence indirectly, by designing and optimizing the imaging, rendering, and computer systems. The outputs of these systems, i.e., the visual stimuli coming out of the display, become the input to the HVS of a human whose psychological states we care to optimize. So if we understand the HVS, we could invert the HVS process, given the desired psychological states, to solve for the optimal visual stimuli, and from there we can then think about how to best design the various engineered systems.

Understanding the cellular, molecular, and neural processes in the HVS has also inspired people to better engineer systems such as imaging systems (Liao et al. 2022; Wodnicki, Roberts, and Levine 1995) and deep neural networks, even though the output of these systems is not meant to be consumed by the HVS (Idrees et al. 2024).

2.1.2 How Do We Study HVS?

How do photons in the real world give rise to perception and cognition in our brain when they enter our eyes? We want to show you that there is really no magic here. The perception and cognition we experience are fundamentally a result of the complicated, first optical and eventually electrical, signal processing in the physiological system — our eyes and brains.

This relationship between low-level electrical signals and high-level behavioral responses in humans is conceptually no different from one that we find in computers. This comparison is shown earlier in Figure 1.2. For someone unfamiliar with computer systems and chip design, it would seem rather magical that a computer does what it does. But we know that the high-level, observable behaviors of a computer program are a result of low-level processing in the electrical circuits. Similarly, the experiences humans have in response to visual stimuli are a result of the collective behaviors of the underlying neurons in the nervous system, whose behaviors result from the cellular and molecular processes within and between individual neurons.

The circuits in a computer are made of engineered material such as transistors, whereas circuits in the HVS are made of biological materials such as neurons. Fundamentally, however, it is all physics — electrons and/or ions move around and cause changes in voltage potentials and currents, and these changes are how information is propagated.

With the advancements in modern science and engineering, we can now measure, at a neuronal or even sub-neuronal level, the electrical responses of the HVS when presented with visual inputs. These measurements allow us to correlate electrical responses to perception and cognition, which, in turn, allow us to say something like “this part of the HVS supports or is responsible for that particular function (e.g., object detection).”” It is important to note, however, that we still do not know why the electrical responses cause our perception and cognition. The causation problem, for the moment, is at best a philosophical problem or, if you will, a religious one.

The goal of this chapter is to give you an overview of the Human Visual System (HVS). We will focus on the main components and key facts of the HVS so that you can start appreciating the connections between signal processing at the physiological level and perception, cognition, and action at the behavioral level while leaving many details to later chapters.

The signal processing in the HVS consists of three main components; this is illustrated in Figure 2.1. First, lights are processed in the optical domain as they enter our eyes and go through the eye optics. The optical signals then reach the retina and are first converted to electrical signals by the photoreceptors (cones and rods), which are further processed before exiting the retina. The retina output neurons, i.e., the retinal ganglion cells, encode low-level information such as wavelengths, contrast, timing of object motion, etc. The retinal outputs are then transmitted to the Lateral Geniculate Nucleus (LGN) and, for the most part, relayed to the visual cortex. Cortical processing essentially knits together the low-level, upstream information to give us vision. The retino-geniculo-cortical pathway is the main pathway for the electrical signals.

Figure 2.1: Pupil, under the control of the iris, lets in lights. Cornea and lens focus light with the former contributing the most optical bending power. Lens contracts and relaxes to accommodate object depth under the control of the ciliary muscle. Retina transforms optical signals to electrical signals, which are further processed and exit the retina through the optic nerve. Retinal signals go through the Lateral Geniculate Nucleus and then are projected to the visual cortex. This retino-geniculo-cortical pathway carries the main information flow in the HVS, with the cortex also providing feedback to the LGN. Adapted from Selket (2007).

2.2 Eye Optics

The optical signal impinging on the retina is called the optical image, which is a 2D continuous signal in that at any position on the retinal surface we can ask: how much optical power is there here¹? Ideally, the optical image is a perfect perspective projection from the 3D physical world, with no loss of information other than the projection. The reality is much more complicated.

Figure 2.2: Much of the optical bending power in the eye is contributed by the cornea, which has a large refractive index difference with respect to its adjacent ocular media (Snell’s law). The lens also contributes to light bending, albeit with a lower contribution. Cornea is rigid but the lens is malleable, so accommodation is attributed exclusively to the lens. From LaValle (2023, fig. 4.25).

2.2.1 The Main Goal is to Focus Lights

The main goal of the eye is to focus light on the retina. To focus light the optics need to bend light, which is achieved collectively by all the ocular media in the eye, including the cornea, aqueous humour, lens, and vitreous humour. This is illustrated in Figure 2.2. Lights bend because of the difference in refractive index between adjacent ocular media. Most of the bending is done by the cornea because there is a large difference in the refractive index between the cornea and the air. The lens also contributes to light bending, albeit with a lower contribution, because the differences in refractive index between the lens and its adjacent media (aqueous fluid and vitreous fluid) are relatively small.

The cornea is fixed in shape. Lens, in contrast, is malleable in its shape. The ciliary muscle controls the contraction and relaxation of the lens, which changes the focal length, and thus bending power, of the lens, and by extension the entire eye optical system. Adjusting the focal length to bring an object into focus is called accommodation.

But if the ciliary muscle cannot properly adjust the lens, we get defocused blur, which is a form of optical aberration. There are a number of other optical aberrations; astigmatism and chromatic aberration are two common ones found in eyes. While not an optical aberration, diffraction also contributes substantially to visible blurs when the pupil size is very small (e.g., under strong illumination).

For our purpose, “imperfections” introduced by eye optics (aberration and diffraction) can be modeled by the Point Spread Function (PSF) of the optical system, which we will see later in Section 15.5.

2.2.2 Ocular Media Absorb Light Selectively

While all the ocular media are generally transparent, they still absorb some amount of light. Critically, the absorption and, by extension, transmittance, are strongly wavelength dependent. Color vision is fundamentally tied to the power distribution of light over wavelengths, so the selective absorption of light by the ocular media significantly influences our color vision.

Figure 2.3: Transmittance spectra of ocular media. Adapted from Boettner and Wolter (1962, fig. 7)

Boettner and Wolter (1962, fig. 7) measured the spectral transmittance of the eye, which defines the amount of light allowed to transmit through the media at each wavelength; the results are shown in Figure 2.3. Each curve represents the percentage of light remaining at each ocular media and the retina (including both direct transmission and forward scattering). Considering the visible range (we will discuss in the next Chapter why there are even invisible lights) roughly between 380 \(\text{nm}\) and 780 \(\text{nm}\), we can see the ocular media significantly reduces the light power at short wavelengths. As a result, the transmittance spectrum of the ocular media is generally lower at short wavelengths, which means the ocular media generally absorbs blue-ish lights; so if the incident light is white-ish, the light would appear yellow after traveling through the ocular media.

2.3 Retina: Basic Facts

Now the photons have arrived at the retina. The retina is where optical signals are transformed into electrical signals. The electrical signals undergo further processing on the retina and are then carried by the optic nerve to the brain. The signal transduction and processing are carried out through layers of neurons on the retina, of which there are five categories (each of which has sub-categories). They are the photoreceptors, bipolar cells, horizontal cells, amacrine cells, and retinal ganglion cells (RGCs). This is illustrated in Figure 2.4.

The main information flow starts from the photoreceptors, flows through the bipolar cells, which synapse with photoreceptors and send their outputs to the RGCs. The horizontal cells synapse with the photoreceptors (and other horizontal cells), and the amacrine cells connect with both the bipolar cells and the RGCs (and other amacrine cells). Identifying the different classes of neurons and their connections is largely due to Santiago Ramón y Cajal².

Interestingly, while we might be used to neurons communicating through spikes, i.e., action potentials³, the RGCs are the only type of neurons on the retina that spike. The rest of the neurons are non-spiking neurons; they communicate through graded potentials.

Optical-to-Electrical Signal Transduction Takes Place in Photoreceptors

Photoreceptors are where optical signals are transformed into electrical signals. Photoreceptors absorb incident photons; once a photon is absorbed, it could generate electrical responses through the process of phototransduction cascade. The electrical response can be represented as photocurrents or, equivalently, photovoltages across the cell membrane of the photoreceptor. We will have a lot to say about this process later in Chapter 3.

Figure 2.4: The basic neural network on the retina. The photoreceptors convert optical signals to electrical signals. The electrical signals go through the bipolar cells and then to the retinal ganglion cells, which carry all the output of the retina. Horizontal and amacrine cells mediate lateral interactions, giving rise to important features such as the receptive field. Since the RGCs are at the outer most layer of the retina, the optical information and the electrical information flow in opposite directions. Adapted from Purves et al. (2017, fig. 11.5B)

Functional and Anatomical Organizations of the Retina are Opposite

The functional organization of the cells is opposite to the anatomical organization of the cells. This is illustrated in Figure 2.4.

Functionally, the first layer of the retina is the photoreceptor cells, which convert photons to electrical responses, and the last layer is the RGCs, which carry all the retinal output information and are directly connected to the optic nerve, which are effectively the axons of the RGCs. Anatomically, however, the RGCs lie at the outermost layer of the retina, and the photoreceptors are the innermost layer. Therefore, photons upon reaching the retina first hit the RGCs and then go through other neurons before eventually hitting the photoreceptors, where the signal transduction takes place. As far as a photon is concerned, neurons before the photoreceptors are transparent and simply let the photon through without doing much about it — with an exception that we will see soon.

ipRGCs are Light-Sensitive but Do Not Contribute to Image-Forming Vision

Photoreceptors are the only type of neurons on the retina that are sensitive to light and contribute to image-forming vision. There is another type of neuron, a sub-type of the RGCs actually, called the intrinsically photosensitive RGCs (ipRGCs) that are also sensitive to light (i.e., they absorb photons and convert optical signals to electrical signals), but interestingly they do not (primarily) contribute to image-forming vision.

The ipRGCs were discovered fairly recently, and it is fair to say that the discovery was a big deal for the field (Berson, Dunn, and Takao 2002; Hattar et al. 2002). For the past 150 years or so, human vision could be adequately explained by photoreceptors being the only light-sensitive neurons. Now, if the ipRGCs are also light sensitive, do we have to rewrite the science behind human vision? It turns out that while the ipRGCs do respond to lights, they primarily contribute to non-image-forming vision (but see Dacey et al. (2005)). For instance, they are shown to impact circadian rhythms, mood, and pupillary light reflex (Lazzerini Ospri, Prusky, and Hattar 2017; Do and Yau 2010).

2.4 Retinal Structure and Functions

Retina is organized to perform a set of low-level tasks that are crucial to vision. “Low-level” here refers to the fact that information encoded by the retina forms the building blocks for more complicated visual functions later in the HVS. At the risk of over-simplication, each task is achieved by a visual stream of neurons. These visual streams are also called parallel pathways. This section briefly discusses a set of basic functions of the retina and their visual streams.

2.4.1 Rod vs. Cone Specialization

Sensitivity and Kinetics

There are two types of photoreceptors: rods and cones. Perhaps the most important difference between the two is that rods are much more sensitive to light than cones. This is evident in Figure 2.6, which compares the single-photon response of rods and cones in primates. The response here is represented by the photocurrent, the change of current that flows into the photoreceptor as a result of photon absorption, which we will talk about in detail later in Chapter 3.

Figure 2.6: Comparing single photon responses (photocurrents) of rod and cone on a primate. Rods are more sensitive with a slower kinetics. From Angueyra-Aristizábal (2014, fig. 1.4C).

Due to the high sensitivity, rod responses saturate quickly as the ambient light level increases, so they are primarily responsible for vision at low illumination levels (e.g., at night); rod-mediated vision is called the scotopic vision. Cones are much less sensitive, so they are responsible for vision at normal illumination levels, such as during the day. Cone-mediated vision is called the photopic vision. Figure 2.7 shows the luminance range that both the scotopic and the photopic vision are sensitive to. The sensitivity range overlap, so there is a luminance range where both rods and cones contribute to vision, which is called the mesopic vision.

Figure 2.7: Sensitivity range of rod-mediated vision and cone-mediated vision. From Purves et al. (2017, fig. 11.11).

Cones also have faster response kinetics than rods: cone responses rise and fall much faster than rods; this is illustrated in Figure 2.6. The faster kinetics allows cones to track moving objects better than rods do. To reason about the influence of the response kinetics, think of a camera where the exposure time is very long: the resulting image is (motion) blurred. Shorter exposure/shutter time captures motion better. Cones have a shorter effective “exposure time” than rods.

Spectral Sensitivity and Color Vision

Yet another important difference between rods and cones is that the cone-mediated vision provides color information whereas rod-mediated vision encodes only light intensity but not color. This is because there is only one class of rods but three different classes of cones, each with a different (linearly independent) wavelength sensitivity function. Fundamentally, color arises from the wavelength information in incident lights. Having three types allows cones to have a stronger capability of encoding wavelength information than rods. The entire Chapter 4 is devoted to color vision; for now, let us just appreciate how different cones have different wavelength selectivities.

Figure 2.8: The absorbance spectra of the three cones (L, M, S) and the rod (R) in humans; data from Dartnall, Bowmaker, and Mollon (1983). The spectra are normalized to peak at 1

One way to measure the spectral differences between photoreceptors is using a technique called microspectrophotometry (MSP), which measures the fraction of photons that gets absorbed by a photoreceptor at each wavelength. Using MSP, Dartnall, Bowmaker, and Mollon (1983) collected data for cones and rods from human donors, shown in Figure 2.8. The \(y\)-axis plots absorbance, which is \(\log(I_{\text{incident}}/I_{\text{transmitted}})\), i.e., the log ratio between the incident light intensity and transmitted (i.e., unabsorbed) light intensity⁴.

While many cones were measured, there were only three distinct spectra, whose absorbance peaks at relatively long, medium, and short wavelength, respectively. We call them the L, M, and S cones. The rod’s peak is in-between that between the S and the M cones. Note that the spectra in Figure 2.8 are normalized to peak at unity. The absolute absorbance of rods is slightly lower than that of the cones.

Notably, the L and M cones exhibit greater similarity to each other than to the S cones, suggesting that the S cones are quite different from the L and M cones. This is a clue about the evolution of the three cone types. Most mammals have only two cone types, one that is sensitive to short-wavelength light and the other that is sensitive to long-wavelength lights; the former is evolved into the S cones, and the latter separated into the L cones and M cones through a local gene duplication (Jacobs 2008). Since the duplication is relatively recent (about 30 to 35 million years ago), the L and M cones are rather similar.

Bowmaker et al. (1978) shows similar data for a macaque. There, the L and M cone spectra are also closer to each other than to the S cone spectrum, indicating that the divergence between the L and M cones occurred before the split between modern Old World monkeys and great apes (including humans).

Spatial Distribution

There are about 120 million rods and about 6 millions cones. The left panel in Figure 2.9 shows the distribution of both cones and rods on the retina. Almost all the cones are concentrated at fovea, a small, central pit on the retina that is approximately 2 mm in diameter and subtends a visual angle of about 1\(^{\circ}\). The position in the fovea that has the peak cone density is defined to have an eccentricity of 0\(^{\circ}\). There are no rods in the fovea; all the rods are placed at the retina periphery, peaking at about 20\(^{\circ}\) away from the fovea.

The right panel in Figure 2.9 are images of photoreceptors at the fovea and at the periphery, taken by Curcio et al. (1990). Cones exclusively occupy the fovea and they become sparser and larger in the periphery. Rods fill in the spaces in the periphery.

There are many important implications of the photoreceptor mosaic and distribution. First, the visual acuity decreases in the visual periphery. Think of photoreceptors as sampling the continuous optical image impinging upon the retina. A higher density leads to a higher sampling rate. In addition, larger cone sizes in the periphery are equivalent to higher degrees of blurring, since photons hitting a cone are integrated together just like by a camera pixel (although, critically, the electrical response of a photoreceptor is not proportional to the photon count, unlike a camera pixel), and integration is a form of low-pass filtering.

Figure 2.9: Left: cone and rod distribution on the retina; the x-axis is the eccentricity (angular distance from the fovea, which has an eccentricity of 0\(^{\circ}\)). From Glassner (1995, fig. 1.4). Right: photos of photoreceptors in the fovea and periphery; rods are absent in the fovea, and cones become sparser and larger in the periphery. Adapted from Curcio et al. (1990).

We hasten to add that the lower acuity in the periphery is not exclusively attributed to the photoreceptor mosaic. As we will see shortly, how photoreceptors communicate with other neurons on the retina plays an important role, too.

Second, since the fovea has the highest visual acuity, our ocular motor system has evolved in such a way that when we want to see fine details of an object, we move our eyes so that light from the object is captured by the fovea. This means that we cannot see fine details of an object in dim environments if we fixate at it. Instead, we would have a better chance of seeing details if we intentionally placed the object in our peripheral vision.

Rod vs. Cone Pathways and Visual Streams

Rods and cones have their own pathways initially and merge later. This is shown in Figure 2.4. Both rods and cones synapse with bipolar cells, but they synapse with distinct bipolar cells. That is, an individual bipolar cell receives information from either rods only or cones only. The rod pathway and cone pathway are parallel streams at this point. The bipolar cells then feed their outputs to the RGCs. A RGC can mix information from both rod and cone bipolar cells. This mixing is enabled by amacrine cells, which synapse with both the rod and cone bipolar cells and with the RGCs. Thus, the distinct information in the rod pathway and the cone pathway gets merged in the RGC layers.

Why are rod and cone pathways initially parallel but merge later? The initial parallel pathways allow rods and cones to extract low-level information, such as contrast, independently under different lighting conditions, but once the information is collected, it is processed similarly, so there is really no need to duplicate the processing circuitry.

2.4.2 Contrast Detection

Another important function of the retina is to extract contrast information. Arguably most interesting information in the physical world exists all in image contrast, i.e., local differences in light intensities. Take a look at your surroundings; uniform light levels where there is absolutely no change in light are rare and do not present much useful information. Fine details of an object are really encoded in contrasts.

This imposes two requirements on our visual system. First, we need to extract contrasts and encode them in neural signals so that they can be processed by the rest of the brain. This is the focus of this section. Second, we must reliably encode contrast across a wide range of ambient light levels, which is the focus of Section 2.4.3.

Figure 2.10: Weber contrast is often used for detecting objects against a uniform background, and Michelson contrast is used for detecting patterns. The two definitions are compatible: they both describe the ratio between the maximal variation of the signal over the mean.

Contrast is Variation Over Mean

Before discussing how the RGCs meet these requirements, we must first define contrast more rigorously. Intuitively, contrast describes how much variation there is in a signal relative to the average strength of the signal. There are two commonly used definitions, both of which are compatible with this intuition. They are usually used in different scenarios. Figure 2.10 illustrates the two definitions.

Weber contrast is often used in scenarios where there is a small object against a relatively uniform background. The contrast \(C_w\) is defined as:

\[ C_w = \frac{I-I_b}{I_b}, \tag{2.1}\]

where \(I_b\) is the background luminance and \(I\) is the object luminance. If the object is small, the mean luminance of the entire field is approximately the background luminance, and naturally \(I-I_b\) is the maximal variance over the mean.

The Michelson contrast is used in scenarios where we want to detect patterned signals. Taking a sinusoidal pattern as an example (and recall any arbitrary pattern can be decomposed into sinusoidal basis patterns), the contrast \(C_m\) of a sinusoidal signal is usually defined as:

\[ C_m = \frac{I_{max} - I_{min}}{I_{max} + I_{min}}, \tag{2.2}\]

where \(I_{max}\) and \(I_{min}\) are the highest and lowest luminance, respectively, of the signal. We can see that \(C_m\) can also be interpreted as the ratio between the variation and the mean of the signal. A higher \(C_m\) would mean that the pattern is more easily detected, and vice versa.

RGC Pools Signals from Many Photoreceptors

There are about 120 million rods, 6 million cones, and 1 million RGCs on the retina. Therefore, a single RGC necessarily receives signals from multiple rods and/or cones. Pooling signals from multiple neurons into a single neuron is generally called neural convergence, a many-to-one mapping. Evidently, there is a much higher degree of neural convergence in rods than in cones. The fovea, which, recall, contains only cones, is an extreme case where there is no neural convergence. In fact, each foveal cone sends its signal to multiple RGCs, so there is a one-to-many mapping there.

Figure 2.11: Dendritic field sizes (of two RGC subtypes) increase with eccentricity, indicating a higher degree of neural convergence at the periphery. From Wandell (1995, fig. 5.7), which is after Dacey and Petersen (1992, fig. 2A).

The higher degree of neural convergence in the rod pathway is another reason why rod-mediated vision is more sensitive than cone-mediated vision: the responses of different rods that are pooled together to the same downstream RGC, so that the RGC could generate responses faster to the brain than if the RGC receives input from only a single cone at the fovea. The flip side of the higher degree of convergence is that rod vision offers low spatial acuity. If an RGC generates a response, we could not resolve the source of that response since it could come from anywhere within a large group of photoreceptors being stimulated. From a signal processing perspective, summation is a form of low-pass filtering (equivalent to convolving the signal with a box filter), which naturally reduces the frequency of the signal.

The degree of neural convergence increases as the eccentricity increases. Figure 2.11 shows the dendritic field sizes of two RGC subtypes; the size increases with the eccentricity. The higher degree of neural convergence is another reason why peripheral acuity is much worse than that at the fovea.

RGCs Have a Center-Surround Receptive Field

Neural convergence gives rise to an important concept called receptive field, which is central to contrast encoding. The receptive field of a neuron is the retinal area that influences the neuronal activity. For an RGC, its receptive field is the collection of photoreceptors whose output signals converge at that RGC. Due to the one-to-one mapping relationship at the fovea, the RGCs that are connected to the fovea cones have a receptive field of only one cone.

The way an RGC aggregates information from the receptive field is not to simply sum up the signals from the individual photoreceptors. If we illuminate the entire receptive field of an RGC uniformly, the RGCs respond similarly regardless of the illumination intensity. This is a form of light adaptation, which we will discuss shortly. Let’s call the RGC’s response rate under a uniform illumination its spontaneous rate.

Figure 2.12: RGCs have a center-surround receptive field with two types. The ON-center RGCs are excited by stimuli presented at the center but inhibited by stimuli presented at the surround (stimulus 2 on the left); OFF-center RGCs have the opposite response (stimulus 4 on the right). Drawn after Hubel (1995, p. 41).

If uniformly changing the light levels does not change the RGC’s response rate, what does? It turns out that you need to have variations in the illumination within the receptive field. The RGCs respond best to variation patterns that have a center-surround structure. For about half of the RGCs, their response rate is maximized if we present bright lights to the center photoreceptors and dark lights to the surround photoreceptors. These are called ON-center, OFF-surround RGCs, since they have an excitatory center (excited by light) and inhibitory surround (inhibited by light). The other half prefers the opposite pattern: dark at the center and bright at the surround. They are the OFF-center, ON-surround RGCs, since they have an inhibitory center and an excitatory surround. The RGCs are said to have a center-surround receptive field. Figure 2.12 illustrates the receptive fields of the two RGCs.

H.K. Hartline measured the RGC responses from horseshoe crabs (Hartline and Graham 1932), using which he famously demonstrated inhibitory signals (Hartline 1949; Hartline, Wagner, and Ratliff 1956)⁵; he was also the first to use the term receptive field (Hartline 1938, 1939, 1940a, 1940b). Barlow (1953) demonstrated the inhibitory signals in a frog’s RGC; Stephen Kuffler (Kuffler 1952, 1953) was the first to demonstrate the center-surround receptive-field structure in a mammalian (cat) RGC, with Barlow also making significant contributions (Barlow, Fitzhugh, and Kuffler 1957).

Center-Surround Receptive Fields are Designed to Encode Contrasts

Looking at the preferred stimulus of the two RGC types in Figure 2.12 (stimulus 2 for ON-center and stimulus 4 for OFF-center), evidently the RGCs are designed to extract illuminant variations, i.e., contrast. If a visual field has a high (positive) Weber contrast, i.e., there is a small object that is significantly lighter than the background, the ON-center RGC would respond well to it. Similarly, an OFF-center RGC would respond well to a dark object placed against a light background.

Figure 2.13: Contrast sensitivity function (CSF) under an ON-center midget RGC⁶; the filled circle is the represents the sensitivity of a uniform signal (i.e., 0 Hz). CSF is bandpass. \(x\)-axis is the cycles per degree (CPD) of the retinal signal (see Figure 2.14). Adapted from Derrington and Lennie (1984, fig. 3C).

We can also quantify the how the center-surround receptive fields respond to patterns of different Michelson contrast. A complication is that a pattern is described not only by its contrast but also by the frequency. At each frequency, we determine the minimal amount of contrast needed to produce a criterion level of RGC response (say 30 spikes/second)⁷. The contrast sensitivity at that frequency is defined as the reciprocal of the threshold contrast. We then sweep the frequency and repeat this exercise for each frequency. The result of such a measurement is called the Contrast Sensitivity Function (CSF); Figure 2.13 shows one such example.

We can see that the RGC’s CSF is bandpass, where there is a preferred frequency to which an RGC responds the best. When the frequency is too low, the signal is equivalent to a uniform background (filled circle); when the frequency is too high, the positive and negative cycles of the signal cancel each other. In both cases, an RGC would respond weakly, so the contrast needed to produce a criterion level of response is high (i.e., the sensitivity is low). With a spatial frequency of about 5 cycles per degree, the positive amplitude coincides with the ON-center of the cell and the negative amplitude coincides with the OFF-surround of the cell. As a result, the contrast required to produce the same level of response can afford to be low, resulting in a higher sensitivity.

Note that the Michelson contrast of a signal, as defined in Equation 2.2, is bounded between 0 and 1, so long as the signal is positive everywhere (which is of course the case in real-world visual signals). As a result, the contrast sensitivity is lower-bounded by 1. Sometimes we will see CSF plots where the \(y\)-axis goes below 1, because usually people fit a smooth CSF curve based on the measured data and do not cut off the curve below 1. In practice, any contrast sensitivity below 1 should be interpreted as “not detectable”.

Figure 2.14: The relationship between ordinary spatial frequency and cycle per degree (CPD). Signals \(S_1\) and \(S_2\) have different spatial frequencies but the same CPD, as they project to the same retinal signal \(R_b\). \(S_2\) and \(S_3\) share the same spatial frequency but differ in CPD, as they correspond to different retinal signals (\(R_a\) and \(R_b\)). Since retinal signal is what matters for vision, we usually use CPD to represent signal frequency (e.g., Figure 2.13).

Figure 2.13 quantifies the spatial frequency of a signal using Cycle Per Degree (CPD), which is the number of cycles/periods in a degree (\(\pi/180\) of a radian). CPD is an angular measure of spatial frequency. The relationship between the ordinary spatial frequency and CPD is illustrated in Figure 2.14. Why do we prefer CPD when describing the spatial frequency? This is because CPD better quantifies the frequency of retinal signals, which is what matters for vision, not the frequency of the physical objects themselves.

Consider an object \(S_1\) in the object space. It produces a retinal signal \(R_b\), which dictates how well the pattern in \(S_1\) is detected. Now we shrink the size of \(S_1\) but move it closer to the eye to obtain another signal \(S_2\). Clearly \(S_2\) has a higher spatial frequency that does \(S_1\), but they produce identical retinal signals (assuming the eye optics is approximated as a pinhole system), so the patterns in \(S_1\) and \(S_2\) are equally detected. This is captured by the fact that \(S_1\) and \(S_2\) have the same CPD. In contrast, if we move \(S_2\) closer to our eye, we get another signal \(S_3\) that has as an identical ordinary spatial frequency as that of \(S_2\) but whose pattern is more easily detected. This is adequately captured by the lower CPD in \(S_3\).

The CSF in Figure 2.13 allows us to study the joint effect of spatial frequency and contrast in detecting a patterned signal. In general, the ability of pattern detection depends on a number of other factors, such as the spatial frequency, eccentricity, color, and temporal frequency (if the stimulus is time-varying) (Mantiuk, Ashraf, and Chapiro 2022; Ashraf et al. 2024). Customarily, this high-dimensional data is plotted as a set of different CSFs, each quantifying the contrast sensitivity as a function of other factors.

Functionally, detecting contrast allows us to detect edges and contours: information across the two sides of an edge has the highest contrast. We will see shortly how later processing stages in the HVS leverage the contrasts to extract more specific information from the visual field to aid tasks such as object recognition.

2.4.3 Light Adaptation

Looking at Figure 2.12 again, the RGC responses do not change much with uniform illuminations (stimulus 1 and stimulus 3) regardless of the illumination level. This is true for a wide range of illumination levels. In some sense, the RGCs are able to “discount” the ambient light level so that the contrast is reliably encoded at arbitrary light levels. This is called light adaptation.

Figure 2.15: Illustration of the RGC adaptation. Through the increment-threshold experiment, we show that, over a wide range of the background intensity \(I_b\), the threshold \(\Delta I\) needed for the spot light to be detectable is linearly proportional to \(I_b\). That is, the minimal detectable contrast \(\frac{\Delta I}{I_b}\) is roughly constant, a.k.a., the Weber’s law, the result of light adaptation. The extended dashed line shows that the Weber’s law does not hold for all the luminance levels. Enroth-Cugell, Hertz, and Lennie (1977, fig. 6) and Sakmann and Creutzfeldt (1969) report actual data for cat’s RGC.

Figure 2.15 illustrates an experiment showing the effect of light adaptation. It uses the “increment-threshold” paradigm, where there is a uniform background light with an intensity of \(I_b\) and a spot light is superimposed over the background; the spot light has an intensity increment \(\Delta I\) over \(I_b\). The entire stimulus (background + spot light) is impinging on the receptive field of an RGC. The goal is to adjust the increment of the spot light so that the RGC’s response reaches a criterion level (e.g., 30 spikes per second). The plot in Figure 2.15 shows the minimal amount of increment (\(y\)-axis) under different background intensities (\(x\)-axis).

We can see that over a wide range of background intensity \(I_b\), the threshold \(\Delta I\) needed for the spot light to be detectable is linearly proportional to \(I_b\). That is, the minimal detectable (Weber) contrast \(\frac{\Delta I}{I_b}\) is roughly constant. We could also perform this increment-threshold experiment behaviorally on human participants, through which we can derive the minimal \(\Delta I\) needed for the spot light to be detectable to humans (Blakemore and Rushton 1965; Fuortes, Gunkel, and Rushton 1961; Aguilar and Stiles 1954; Barlow 1957). Perhaps unsurprisingly, the same trend holds: over a rather wide range of background levels, the increment threshold varies linearly with the background intensity. This means, behaviorally, the minimal detectable contrast is also constant, and this constancy could potentially be accounted for by the physiological constancy⁸.

Weber’s Law Means Desensitization

Minimally detectable contrast being constant over different background intensities is called the Weber’s law (Fechner 1860); in fact, this is why \(\frac{\Delta I}{I_b}\) is called the Weber contrast. A direct interpretation of the Weber’s law is that a signal needs to be proportionally stronger at high ambient light levels for the signal to be barely detectable. That is, our visual system is desensitized at higher ambient light levels. This desensitization is very well documented for photoreceptors (Matthews et al. 1988; Nakatani and Yau 1988; Fain et al. 2001), and it is unsurprising that photoreceptor desensitization can lead to (although does not fully account for) the desensitization observed in the RGCs and in the behavioral experiments (Dunn, Lankheet, and Rieke 2007).

This desensitization allows us to extract contrasts rather than absolute light levels, which is of significant advantage to us. The ambient level varies over several orders of magnitude, but the contrast of a scene is relatively stable regardless of the ambient light level. Consider our ape ancestors who need to find apples from a tree to survive. As the ambient light level increases, both the apple and the tree become brighter, but the contrast is relatively constant. To be able to reliably detect the apple, an ape needs to reliably extract contrast at all light levels but not the absolute light level itself.

Weber’s Law Fails at Low and High Intensities

Sharp readers like you have most definitely noticed that Weber’s law does not hold at all background illumination levels (Kolb, Fernandez, and Nelson 2005, pt. VIII Light and Dark Adaptation). The extended dashed line in Figure 2.15 indicates that Weber’s law fails at very low background levels. When the ambient light level is very low, Weber’s law fails because the retinal responses are dominated by noise, both retinal internal noise (called dark light or dark noise) (Barlow 1957; Blakemore and Rushton 1965; Donner 1992) and external photon shot noise (Rose 1948; De Vries 1943). At extremely high background levels, Weber’s law also fails because of photoreceptor saturation. All in all, however, Weber’s law holds reasonably well under a very wide range of normal lighting conditions that we encounter in everyday life.

\[ \Delta I =kI_b, \tag{2.3}\]

where \(k\) is a constant representing how fast the threshold increases with the background and is called the Weber’s constant.

When Equation 2.3 is written in the log-log domain, as is plotted in Figure 2.15, we have:

\[ \log(\Delta I) = \log(k) + \log(I_b). \tag{2.4}\]

We can see that in the log-log plot, the Weber’s constant affects the intercept of the threshold-vs-background line (the intersection of the dashed line and the \(y\)-axis; not shown in Figure 2.15).

For the Weber’s law to hold exactly, the slope of the threshold-vs-background line in the log-log plot must be 1, which is roughly the case in Figure 2.15 (for the range where the relationship is linear). In many measurements, the slope fit from the data is not exactly 1. To account for this, the Weber’s law is extended, phenomenologically, to take the following general form:

\[ \begin{aligned} \Delta I =kI_b^d, \\ \log(\Delta I) = \log(k) + d\log(I_b), \end{aligned} \tag{2.5}\]

where \(d\) is a free parameter that permits this additional degree of freedom.

Fechner’s Law and Stevens’ Power Law

The contribution of Weber is the empirical observation that the just-noticeable signal strength is proportional to the existing signal strength (Equation 2.3). We now know that Weber’s law holds because of light adaptation, where RGCs encode contrast information. Fechner was Weber’s student and gave a theoretical explanation of Weber’s law that predates, and is now replaced by, the prevailing contrast theory. Nevertheless, it is still widely used and is practical useful in many settings, so we will briefly discuss it here.

Fechner hypothesized that our perception of signal strength (e.g., light brightness) is directly governed by some sort of internal neural response, which relates to the physical signal strength (e.g., light intensity) by a logarithmic relationship:

\[ p(I) = \ln(I) + C, \tag{2.6}\]

where \(I\) is the current signal strength, and \(p(I)\) is the internal response under \(I\). \(C\) allows us to account for the fact that there is a minimal signal strength \(I_0\) at which Weber’s law holds⁹. The reason Fechner came up with this relationship is because

\[ \frac{\text{d}p(I)}{\text{d}I} = \frac{1}{I}. \tag{2.7}\]

Why is this relevant? We need three more assumptions. First, humans require a minimal change in internal response, \(\Delta p(I)\), for \(I+\Delta I\) to be behaviorally detactable against \(I\). Second, \(\Delta p(I)\) is a constant regardless of \(I\). Third, \(\Delta p(I)\) and the corresponding \(\Delta I\) are sufficiently small so that we can treat them as infinitesimal (equivalent to \(\text{d}p(I)\) and \(\text{d}I\)). Therefore, we have:

\[ \frac{\Delta I}{\Delta p(I)} \approx \frac{\text{d}I}{\text{d}p(I)} = I. \tag{2.8}\]

Since \(\Delta p(I)\) is a constant, we have \(\Delta I \propto I\), which is essentially Equation 2.3.

Equation 2.7 is called the Fechner’s law or Weber-Fechner law. Stevens’ power law generalizes Fechner’s law by using a power law to describe the relationship between perceived signal strength and the physical signal strength (Stanley S. Stevens 1957; Stanley Smith Stevens 1961):

\[ p(I) = kI^a. \tag{2.9}\]

The power-law relationship is simply more expressive. By changing \(a\), we can describe both sub-linear (\(0<a<1\), like Fechner’s law) and super-linear (\(a>1\)) relationships. The power-law relationship is widely used in practice. For instance, most color spaces and video communication standards use the power-law equation to encode luminance into digital values (Section 5.3.2).

Dark and Chromatic Adaptations

A concept related to light adaptation is dark adaptation. Dark adaptation deals with the situation where the eye is first exposed to light at a certain level and then the light is removed. We can all tell from experience that our visual sensitivity is terrible when the light is just removed but will improve over time as we spend more time in the dark. Dark adaptation is concerned with quantifying the dynamics of the visual sensitivity recovery at different times in the dark. Once again, dark adaptation can be studied both psychophysically (Hecht, Haig, and Chase 1937; Crawford 1937, 1947) and physiologically (T. D. Lamb and Pugh 2006; T. Lamb and Pugh Jr 2004).

While light and dark adaptations are concerned with visual experiences under different intensity levels, chromatic adaptation is concerned with how our vision adapts to illuminant colors. It turns out that our visual system can pretty reliably discount the color of the lighting illunminating a scene so that an object’s color appears relatively stable under different illuminations. We will study light, dark, and chromatic adaptations in greater depth in Chapter 6.

2.5 Post Retinal Processing

The signals leaving the retina are first routed to the Lateral Geniculate Nucleus (LGN) and then to the cortex, where vision is formed.

2.5.1 Lateral Geniculate Nucleus

Different classes of RGCs project to distinct LGN layers with virtually the same RFs: midget RGCs project to the Parvocellular layers (P cells) in the LGN (forming the P pathway/stream), parasol RGCs project to the Magnocellular layers (M cells) in the LGN (forming the M pathway/stream), and bistratified RGCs project to the Koniocellular layers (K cells) in the LGN (forming the K pathway/stream).

Similar to the RGCs, the LGN neurons also have center-surround receptive fields, and their receptive-field organizations are almost exact copies of that of the corresponding RGCs. This is why, by and large, LGN has been thought to be mainly a relay station, transmitting information from the retina to the brain. Interestingly, the way the LGN relays information to the brain is to gather information from one hemifield and send it to the other side of the cortex.

If LGN simply relays information, why does it exist at all? It turns out that LGN receives about 90% of its inputs from the cortex (Sherman and Koch 1986). This is different from the retina, which is a “closed” system that does not receive information from the rest of the brain. The feedback from the brain serves to regulate the visual signals before they are sent to the brain. Higher-order brain regions encode cognitive information such as attention, and one can imagine how attention can be used to influence what subsequent information is sent to the brain (O’Connor et al. 2002). If the brain were to send the feedback signals to the retina, the blind spot would have been 10 times larger, so the LGN seems like a convenient and cost-effective place where the feedback-driven regulation can take place.

Another Example of Parallel Pathways

Rods vs. cones is an example of parallel pathways in the HVS. The parvocellular vs. magnocellular pathway is another example; they encode different spatial/temporal frequency information. The magnocellular pathway responds to high temporal frequency well, is sensitive to low spatial frequency, and responds strongly to contrast changes. The parvocellular pathway, in large part, behaves oppositely. It is worth noting that these two visual streams start from the retina, where they start from distinct RGC cell types, and remain physically separated all the way into the primary visual cortex V1. This is different from the rod vs. cone pathways, which start at the photoreceptors and merge at the RGC layer.

2.5.2 Visual Cortex

Once in the cortex, the visual signals are first processed in the primary visual cortex, also known as visual area 1 (V1) or the striate cortex. V1 neurons primarily encode edge orientations but are also tuned to edge lengths, object motion direction, and specific colors. David Hubel and Torsten Wiesel were the first to elucidate the responses of V1 neurons and the architecture of V1 in general (Hubel and Wiesel 1959, 1962, 1968)¹⁰.

V1 Simple Cells are Orientation Selective

Perhaps the most striking feature of V1 neurons is that they are orientation selective. The left panel of Figure 2.16 shows the responses of a cat V1 neuron, recorded by Hubel and Wiesel (1959), when presented with a slit of illumination at different orientations. This neuron responds best to a particular orientation (vertical in this case) and responds very weakly, if at all, to other orientations. The right panel in Figure 2.16 plots the neuron responses (spikes/second) as a function of the illumination orientation; a plot like this is called the neuron’s orientation tuning curve.

Figure 2.16: Left: orientation selectivity of a cat V1 simple cell; from Hubel and Wiesel (1959, fig. 3). Right: orientation tuning curves of two illustrative V1 simple cells (do not necessarily correspond to the experimental data on the left); different cells can have different preferred orientations.

Figure 2.17: Left: responses of a V1 simple cell to spot lights at different locations in the receptive field. \(\triangle\): inhibitory areas; \(\times\): excitatory areas. \(f\) is when the entire field is illuminated uniformly. Right: the receptive field of the cell. From Hubel and Wiesel (1959, fig. 1).

Why would this neuron be tuned to a specific orientation? The reason lies in its receptive field structure. Figure 2.17 shows the response of such a neuron when illuminated with spot lights at different locations. When the neuron is illuminated by spot lights across the vertical axis, it is inhibited, and it is excited when the spot lights are across the horizontal axis. The right panel shows the receptive field of such a neuron, where the skinny, tall central area is inhibited and the flanking areas are excitatory. There are other neurons where the excitatory and inhibitory regions are swapped.

This receptive field explains why a neuron could have an orientation selectivity: when the orientation of the stimulus coincides with the excitatory region of the receptive field the neuron is optimally stimulated¹¹. Other orientations would involve both the excitatory and inhibitory regions, reducing or abolishing the response. V1 cells with such a receptive field are called simple cells. Different simple cells might have different preferred orientations; for instance, the first cell in the right panel of Figure 2.16 prefers a 90\(^{\circ}\) orientation.

Figure 2.18: Bottom: typical receptive-field maps for V1 simple cells (C – G); while there are on and off regions, they are not organized in a center-surround fashion as they are in RGCs/LGN (A and B). Top: multiple center-surround (LGN) neurons synpase with a V1 simple cell, producing the receptive field in C at the bottom. \(\triangle\): inhibitory areas; \(\times\): excitatory areas. Adapted from Hubel and Wiesel (1962, figs. 2, 19).

C–G in Figure 2.18 illustrate typical receptive fields found in V1 simple neurons. All are oriented (only one orientation is shown) but differ in arrangements. In comparison, A and B show the center-surround receptive fields found in RGCs and LGN neurons. Clearly, center-surround receptive fields simply cannot be orientation selective: try superimposing an edge and rotating it over the center-surround receptive field; will the response change much?

How would a V1 simple neuron acquire such an oriented receptive field? This can be explained by looking at how LGN neurons are connected to a V1 simple neuron. The top panel in Figure 2.18 illustrates the model suggested by Hubel and Wiesel (1962), which is supported by later electrophysiological results (Clay Reid and Alonso 1995). Each V1 simple cell synapses with and sums the inputs from multiple LGN neurons (which, recall, also have the center-surround receptive fields as the RGCs), whose receptive fields abut and overlap on the retina, and are arranged in an oblique angle. When those receptive fields all have the same ON-center (or OFF-center) structure, the simple cell would tune for an oblique, elongated edge. Therefore, even if center-surround cells do not have orientation selectivity, V1 simple cells can.

Direction, Length, and Binocular Vision Emerge from (Hyper)Complex Cells

The majority of neurons in V1 are actually not simple cells. Three-quarters of the V1 neurons are complex cells, which have, well, complex selectivities. Fundamentally, their receptive fields cannot be subdivided into excitatory and inhibitory areas. That is, they do not respond to a spot light no matter where the light is placed in the receptive field. Therefore, their responses to complicated geometries cannot be explained/predicted by their responses to spot lights, unlike those of simple cells.

The complex cells are also orientation selective, but unlike simple cells, many complex cells respond only to a properly oriented edge sweeping across the receptive field as if (but not actually) the entire receptive field is excitatory. However, when we present a properly oriented, stationary edge, complex cells do not respond at all, or only weakly, at the onset or the turning off of the edge. This further shows that the responses of complex cells are not a linear superposition of responses to spot lights.

Figure 2.19: Some V1 complex neurons prefer properly oriented edges sweeping across their receptive field; these neurons also have direction selectivity — even under the same orientation. From Hubel and Wiesel (1968, fig. 2).

Interestingly, about one-fifth of the complex cells prefer movement in a particular direction, showing the direction selectivity of many complex cells. Hubel and Wiesel (1968) measured the direction selectivity of V1 complex cells in monkeys, and some of the results are shown in Figure 2.19. In this example, the cell is excited by a properly oriented edge moving in upward directions, but not the opposing, orthogonal directions, showing selectivity toward motion directions.

Hubel and Wiesel (1968) also discovered a set of what they call the end-stopping neurons or hypercomplex cells in V1. Those neurons are tuned to properly oriented edges with a specific length, beyond which the neurons are inhibited. These neurons play a role in encoding corners, curvatures, and sudden breaks in lines (Hubel 1995, p. 85).

Finally, Hubel and Wiesel also found that some V1 neurons respond to stimuli only from the left eye or only from the right eye, a property termed ocular dominance. There are also binocular cells that can be stimulated independently by stimulus from either eye. There cells represent the first stage where information from the left and right hemi-fields converge, which is critical for depth perception.

“Be More Specific”

An obvious conclusion we can draw from comparing the V1 neurons and the retina/LGN neurons is this: as we progress along the visual pathway, the stimulus we present to the visual system must be more specific. Put another way, our visual system increasingly extracts more specific information as signals progress in the pathway.

Being more specific is critical, as that allows us to recognize objects by their subtle details. For instance, the RGCs/LGN neurons provide the contrast/edge detection capability, but virtually any object has contrasts and edges, so they are not terribly useful in recognizing specific objects. The V1 simple neurons, however, allow us to detect orientations, and that is critical to our vision — from orientations we can then infer shapes, as we recognize objects mostly by their shapes.

Critically, however, the V1 simple neurons offer orientation selectivity precisely because the RGCs/LGN neurons have contrast/edge detection capabilities, as demonstrated in Figure 2.18 (top). This is why we say the early visual system extracts low-level information, but the later visual system extracts high-level information: the former is used as the building blocks by the latter.

The Rest of the Cortex

Figure 2.20: Once in the cortex, signals are projected from area V1 to other areas, each generally specialized in a particular information process. The two main pathways from V1 are the ventral pathway (“what”) and the dorsal pathway (“where”). There is top-down feedback in the cortex from higher-order areas to lower-order areas. Adapted from Dowling and Dowling Jr (2016, fig. 1.3).

From V1, signals are projected to other areas such as V2, V4, IT, MT, etc. There are two main projection pathways (Nassi and Callaway 2009; Ungerleider and Mishkin 1982; Mishkin, Ungerleider, and Macko 1983), as shown in Figure 2.20. The first is the dorsal pathway, which is concerned with observing objects in space, such as their spatial location and motion, information that is also useful to guide actions (Goodale et al. 1991). Therefore, this pathway is also called the “where/how” pathway. The other is the ventral pathway, or the “what” pathway, that carries information of the details and identity of objects and supports visual functions such as object recognition, facial recognition, and color perception. The two pathways interact. For instance, to guide visual action we not only need to know the position and motion of the objects but also the shape, color, etc.

The discussion so far focuses on the bottom-up information flow, the flow of information from lower-order representations in the hierarchy, such as V1, to higher-order representations, such as V4 and beyond. There is also a top-down information flow from the higher regions to the lower regions. This information flow provides feedback information such as attention, knowledge, and expectation to influence the early information processing in the cortex (Gilbert and Li 2013; Briggs 2020). Combining the bottom-up and the top-down flows, the HVS acts essentially as a self-adaptive system that automatically optimizes its performance for a given task.

Aguilar, M, and WS Stiles. 1954. “Saturation of the Rod Mechanism of the Retina at High Levels of Stimulation.” Optica Acta: International Journal of Optics 1 (1): 59–65.

Angueyra-Aristizábal, Juan M. 2014. “The Limits Imposed in Primate Vision by Transduction in Cone Photoreceptors.” PhD thesis, University of Washington Libraries.

Ashraf, Maliha, Rafał K Mantiuk, Alexandre Chapiro, and Sophie Wuerger. 2024. “castleCSF—a Contrast Sensitivity Function of Color, Area, Spatiotemporal Frequency, Luminance and Eccentricity.” Journal of Vision 24 (4): 5–5.

Barlow, Horace B. 1953. “Summation and Inhibition in the Frog’s Retina.” The Journal of Physiology 119 (1): 69.

———. 1957. “Increment Thresholds at Low Intensities Considered as Signal/Noise Discriminations.” The Journal of Physiology 136 (3): 469.

Barlow, Horace B, Roo Fitzhugh, and SW Kuffler. 1957. “Change of Organization in the Receptive Fields of the Cat’s Retina During Dark Adaptation.” The Journal of Physiology 137 (3): 338.

Berson, David M, Felice A Dunn, and Motoharu Takao. 2002. “Phototransduction by Retinal Ganglion Cells That Set the Circadian Clock.” Science 295 (5557): 1070–73.

Blakemore, CB, and WA Rushton. 1965. “Dark Adaptation and Increment Threshold in a Rod Monochromat.” The Journal of Physiology 181 (3): 612.

Boettner, Edward A, and J Reimer Wolter. 1962. “Transmission of the Ocular Media.” Investigative Ophthalmology & Visual Science 1 (6): 776–83.

Bowmaker, JK, HJ Dartnall, JN Lythgoe, and JD Mollon. 1978. “The Visual Pigments of Rods and Cones in the Rhesus Monkey, Macaca Mulatta.” The Journal of Physiology 274 (1): 329–48.

Briggs, Farran. 2020. “Role of Feedback Connections in Central Visual Processing.” Annual Review of Vision Science 6 (1): 313–34.

Caerbannog. 2016. “Comparison of structures in vertebrate’s eye (left) with octopus’ eye (right); CC BY-SA 3.0 license.” https://en.wikipedia.org/wiki/Blind_spot_(vision)#/media/File:Evolution_eye_2.svg.

Clay Reid, R, and Jose-Manuel Alonso. 1995. “Specificity of Monosynaptic Connections from Thalamus to Visual Cortex.” Nature 378 (6554): 281–84.

Crawford, BH. 1937. “The Change of Visual Sensitivity with Time.” Proceedings of the Royal Society of London. Series B-Biological Sciences 123 (830): 69–89.

———. 1947. “Visual Adaptation in Relation to Brief Conditioning Stimuli.” Proceedings of the Royal Society of London. Series B-Biological Sciences 134 (875): 283–302.

Curcio, Christine A, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. 1990. “Human Photoreceptor Topography.” Journal of Comparative Neurology 292 (4): 497–523.

Dacey, Dennis M, Hsi-Wen Liao, Beth B Peterson, Farrel R Robinson, Vivianne C Smith, Joel Pokorny, King-Wai Yau, and Paul D Gamlin. 2005. “Melanopsin-Expressing Ganglion Cells in Primate Retina Signal Colour and Irradiance and Project to the LGN.” Nature 433 (7027): 749–54.

Dacey, Dennis M, and Michael R Petersen. 1992. “Dendritic Field Size and Morphology of Midget and Parasol Ganglion Cells of the Human Retina.” Proceedings of the National Academy of Sciences 89 (20): 9666–70.

Dartnall, Herbert JA, James K Bowmaker, and John Dixon Mollon. 1983. “Human Visual Pigments: Microspectrophotometric Results from the Eyes of Seven Persons.” Proceedings of the Royal Society of London. Series B. Biological Sciences 220 (1218): 115–30.

De Vries, HL. 1943. “The Quantum Character of Light and Its Bearing Upon Threshold of Vision, the Differential Sensitivity and Visual Acuity of the Eye.” Physica 10 (7): 553–64.

Derrington, AM, and P Lennie. 1984. “Spatial and Temporal Contrast Sensitivities of Neurones in Lateral Geniculate Nucleus of Macaque.” The Journal of Physiology 357 (1): 219–40.

Do, Michael Tri Hoang, and KW Yau. 2010. “Intrinsically Photosensitive Retinal Ganglion Cells.” Physiological Reviews.

Donner, Kristian. 1992. “Noise and the Absolute Thresholds of Cone and Rod Vision.” Vision Research 32 (5): 853–66.

Dowling, John E, and Joseph L Dowling Jr. 2016. Vision: How It Works and What Can Go Wrong. MIT Press.

Dunn, Felice A, Martin J Lankheet, and Fred Rieke. 2007. “Light Adaptation in Cone Vision Involves Switching Between Receptor and Post-Receptor Sites.” Nature 449 (7162): 603–6.

Enroth-Cugell, Christina, B Gevene Hertz, and P Lennie. 1977. “Cone Signals in the Cat’s Retina.” The Journal of Physiology 269 (2): 273–96.

Fain, Gordon L, Hugh R Matthews, M Carter Cornwall, and Yiannis Koutalos. 2001. “Adaptation in Vertebrate Photoreceptors.” Physiological Reviews 81 (1): 117–51.

Fechner, Gustav Theodor. 1860. Elemente Der Psychophysik. Vol. 2. Breitkopf u. Härtel.

Fuortes, MGF, RD Gunkel, and WAH Rushton. 1961. “Increment Thresholds in a Subject Deficient in Cone Vision.” The Journal of Physiology 156 (1): 179.

Gilbert, Charles D, and Wu Li. 2013. “Top-down Influences on Visual Processing.” Nature Reviews Neuroscience 14 (5): 350–63.

Glassner, Andrew S. 1995. Principles of Digital Image Synthesis. Elsevier.

Goodale, Melvyn A, A David Milner, Lorna S Jakobson, and David P Carey. 1991. “A Neurological Dissociation Between Perceiving Objects and Grasping Them.” Nature 349 (6305): 154–56.

Hartline, H Keffer. 1938. “The Response of Single Optic Nerve Fibers of the Vertebrate Eye to Illumination of the Retina.” American Journal of Physiology-Legacy Content 121 (2): 400–415.

———. 1939. “Excitation and Inhibition of the" Off" Response in Vertebrate Optic Nerve Fibers.” Am. J. Physiol 126:527.

———. 1940a. “The Effects of Spatial Summation in the Retina on the Excitation of the Fibers of the Optic Nerve.” American Journal of Physiology-Legacy Content 130 (4): 700–711.

———. 1940b. “The Receptive Fields of Optic Nerve Fibers.” American Journal of Physiology-Legacy Content 130 (4): 690–99.

———. 1949. “Inhibition of Activity of Visual Receptors by Illuminating Nearby Retinal Areas in the Limulus Eye.” Federation Proceedings 8 (1): 69.

Hartline, H Keffer, and Clarence Henry Graham. 1932. “Nerve Impulses from Single Receptors in the Eye.” Journal of Cellular & Comparative Physiology.

Hartline, H Keffer, Henry G Wagner, and Floyd Ratliff. 1956. “Inhibition in the Eye of Limulus.” The Journal of General Physiology 39 (5): 651–73.

Hattar, Samer, H-W Liao, Motoharu Takao, David M Berson, and KW Yau. 2002. “Melanopsin-Containing Retinal Ganglion Cells: Architecture, Projections, and Intrinsic Photosensitivity.” Science 295 (5557): 1065–70.

Hecht, Selig, Charles Haig, and Aurin M Chase. 1937. “The Influence of Light Adaptation on Subsequent Dark Adaptation of the Eye.” The Journal of General Physiology 20 (6): 831–50.

Hodgkin, Alan L, and Andrew F Huxley. 1952. “A Quantitative Description of Membrane Current and Its Application to Conduction and Excitation in Nerve.” The Journal of Physiology 117 (4): 500.

Hubel, David H. 1995. Eye, Brain, and Vision. Scientific American Library/Scientific American Books.

Hubel, David H, and Torsten N Wiesel. 1959. “Receptive Fields of Single Neurones in the Cat’s Striate Cortex.” J Physiol 148 (3): 574–91.

———. 1962. “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat’s Visual Cortex.” The Journal of Physiology 160 (1): 106.

———. 1968. “Receptive Fields and Functional Architecture of Monkey Striate Cortex.” The Journal of Physiology 195 (1): 215–43.

Idrees, Saad, Michael B Manookin, Fred Rieke, Greg D Field, and Joel Zylberberg. 2024. “Biophysical Neural Adaptation Mechanisms Enable Artificial Neural Networks to Capture Dynamic Retinal Computation.” Nature Communications 15 (1): 5957.

Jacobs, Gerald H. 2008. “Primate Color Vision: A Comparative Perspective.” Visual Neuroscience 25 (5-6): 619–33.

Kolb, Helga, Eduardo Fernandez, and Ralph Nelson. 2005. “The Organization of the Retina and Visual System.” Webvision-the Organization of the Retina and Visual System.

Kuffler, Stephen W. 1952. “Neurons in the Retina: Organization, Inhibition and Excitation Problems.” In Cold Spring Harbor Symposia on Quantitative Biology, 17:281–92. Cold Spring Harbor Laboratory Press.

———. 1953. “Discharge Patterns and Functional Organization of Mammalian Retina.” Journal of Neurophysiology 16 (1): 37–68.

Lamb, TD, and Edward N Pugh Jr. 2004. “Dark Adaptation and the Retinoid Cycle of Vision.” Progress in Retinal and Eye Research 23 (3): 307–80.

Lamb, Trevor D, and Edward N Pugh. 2006. “Phototransduction, Dark Adaptation, and Rhodopsin Regeneration: The Proctor Lecture.” Investigative Ophthalmology & Visual Science 47 (12): 5138–52.

LaValle, Steven M. 2023. Virtual Reality. Cambridge university press.

Lazzerini Ospri, Lorenzo, Glen Prusky, and Samer Hattar. 2017. “Mood, the Circadian System, and Melanopsin Retinal Ganglion Cells.” Annual Review of Neuroscience 40 (1): 539–56.

Liao, Fuyou, Zheng Zhou, Beom Jin Kim, Jiewei Chen, Jingli Wang, Tianqing Wan, Yue Zhou, et al. 2022. “Bioinspired in-Sensor Visual Adaptation for Accurate Perception.” Nature Electronics 5 (2): 84–91.

Mantiuk, Rafał K, Maliha Ashraf, and Alexandre Chapiro. 2022. “stelaCSF: A Unified Model of Contrast Sensitivity as the Function of Spatio-Temporal Frequency, Eccentricity, Luminance and Area.” ACM Transactions on Graphics (TOG) 41 (4): 1–16.

Matthews, HR, RLW Murphy, GL Fain, and TD Lamb. 1988. “Photoreceptor Light Adaptation Is Mediated by Cytoplasmic Calcium Concentration.” Nature 334 (6177): 67–69.

Mishkin, Mortimer, Leslie G Ungerleider, and Kathleen A Macko. 1983. “Object Vision and Spatial Vision: Two Cortical Pathways.” Trends in Neurosciences 6:414–17.

Nakatani, K, and KW Yau. 1988. “Calcium and Light Adaptation in Retinal Rods and Cones.” Nature 334 (6177): 69–71.

Nassi, Jonathan J, and Edward M Callaway. 2009. “Parallel Processing Strategies of the Primate Visual System.” Nature Reviews Neuroscience 10 (5): 360–72.

O’Connor, Daniel H, Miki M Fukui, Mark A Pinsk, and Sabine Kastner. 2002. “Attention Modulates Responses in the Human Lateral Geniculate Nucleus.” Nature Neuroscience 5 (11): 1203–9.

Purves, Dale, George J. Augustine, David Fitzpatrick, William Hall, Anthony-Samuel LaMantia, Richard D. Mooney, Michael L. Platt, and Leonard E. White. 2017. Neurosciences. 6th ed. Oxford University Press.

Rose, Albert. 1948. “The Sensitivity Performance of the Human Eye on an Absolute Scale.” Journal of the Optical Society of America 38 (2): 196–208.

Sakmann, Bert, and Otto D Creutzfeldt. 1969. “Scotopic and Mesopic Light Adaptation in the Cat’s Retina.” Pflügers Archiv 313:168–85.

Selket. 2007. “The ventral vs. dorsal stream; CC BY-SA 3.0.” https://commons.wikimedia.org/wiki/File:Ventral-dorsal_streams.svg.

Sherman, SM, and Christof Koch. 1986. “The Control of Retinogeniculate Transmission in the Mammalian Lateral Geniculate Nucleus.” Experimental Brain Research 63:1–20.

Stevens, Stanley S. 1957. “On the Psychophysical Law.” Psychological Review 64 (3): 153.

Stevens, Stanley Smith. 1961. “To Honor Fechner and Repeal His Law: A Power Function, Not a Log Function, Describes the Operating Characteristic of a Sensory System.” Science 133 (3446): 80–86.

Ungerleider, Leslie G, and Mortimer Mishkin. 1982. “Two Cortical Visual Systems.” In Analysis of Visual Behavior, edited by David J Ingle, Melvyn A Goodale, Richard JW Mansfield, et al., 549–86. Mit Press Cambridge, MA.

Wandell, Brian A. 1995. Foundations of Vision. Sinauer Associates.

Wodnicki, Robert, Gordon W Roberts, and Martin D Levine. 1995. “A Foveated Image Sensor in Standard CMOS Technology.” In Proceedings of the IEEE 1995 Custom Integrated Circuits Conference, 357–60. IEEE.

The power at an infinitesimal point is called irradiance; see Chapter 8.↩︎
Cajal shared the Nobel Prize in 1906 with Camillo Golgi, who invented a method that Cajal used to study neuronal connections.↩︎
which were first recorded by Edgar Adrian, a Nobel Prize laureate in 1932 who developed the all-or-none theory of action potentials; Hodgkin and Huxley (Hodgkin and Huxley 1952), who shared the Nobel Prize in 1963, explained the ionic mechanisms underlying the action potentials.↩︎
\(\text{absorbance} = \log(I_{\text{incident}}/I_{\text{transmitted}})\), and the fraction absorbed, i.e., \(\text{absorptance} = 1 - I_{\text{transmitted}}/I_{\text{incident}}\). Therefore, \(\text{absorptance} = 1-e^{-\text{absorbance}}\). Numerically, absorbance is approximately absorption when absorbance is low, which is the case here when using MSP to illuminate the photoreceptors.↩︎
Hartline he won the Nobel Prize in 1967 because of this discovery.↩︎
This is actually a parvocellular LGN neuron, which is directly projected from the midget RGC and shares the same receptive field with that of the midget RGC.↩︎
The implicit assumption here is that once the RGC responses reach a criterion level, the pattern becomes subjectively detectable at the behavioral level.↩︎
We emphasize “potentially” because while correlation is easy to establish, claiming causation requires ruling out other factors.↩︎
\(C = p(I_0) - \ln(I_0)\), so Equation 2.6 can also be written as \(p(I) = \ln(I/I_0) + p(I_0)\) or, equivalently, \(p(I) - p(I_0) = \ln(I) - \ln(I_0)\), which has an intuitive interpretation: the perceived signal strength change is proportional to absolute signal strength change in the logarithmic domain.↩︎
They shared the Nobel Prize in 1981.↩︎
Note that the receptive field in Figure 2.17 has an inhibitory central region and excitatory flanking areas, but the receptive field of the neuron in Figure 2.16 evidently has an opposite excitatory vs. inhibitory regions, so the two figures do not share the same underlying data.↩︎