Foundations of Visual Computing

Author

Yuhao Zhu

Published

July 12, 2025

Preface

This book attempts to present a unified view of visual computing, looking at it as a series of signal transductions across different domains — optical, analog, digital, and semantic — along with the processing that happens within each. Any sufficiently complex visual computing system worth studying will likely involve both transductions and processing in all of these domains.

Take Augmented Reality glasses as an example. The input signals — light — are in the optical domain. These first need to be converted into electrical signals by an image sensor so that a computer system can process them to extract semantic information–say, the orientation of a table in a room. The system then simulates light transport to generate photorealistic, context-appropriate virtual objects–perhaps a mug correctly oriented on that very table. Finally, these virtual objects must be transformed back from electrical signals, in the form of pixels, into optical signals by the display.

But wait, we are still not done! The light emitted by the display enters our eyes, where photoreceptors in the retina convert the optical signals back into electrical ones. The retina and, further downstream, the brain, process these signals, eventually giving rise to our perception and cognition: we see a virtual mug sitting naturally on a real table.

Why This Book

Visual computing is a wonderfully broad field, touching everything from the sciences of human vision to the engineering of sensors, optics, displays, and rendering. The main motivation for writing the book grew out of a simple observation: while each of these areas is well covered in excellent texts, they are rarely explored together in a single, coherent story.

Why bring them together? If you are an engineer, the ability to move comfortably “across the stack” let you see the whole system at once, finding connections and optimizations that are otherwise not obvious when working in isolation. For the scientifically curious, there is joy in understanding the fundamentals even without immediate application. As Edwin H. Land once put it, the true application of science is that “we finally know what we are doing.”

Every so often I ask myself: why write this book at all, when none of the material is new and so many fine resources already exist? Then I remind myself that most textbooks contain no new results; by the time knowledge reaches a textbook, it is already established. What books do is to offer perspectives, and that always reveals the idiosyncrasies of the author.

The perspective I most want to convey is this: every conclusion is drawn based on a set of precisely defined terms and assumptions, which, unfortunately, are not always announced. By taking a first-principles approach, I hope to make those definitions and assumptions explicit, so it is always clear when each conclusion applies and when it does not.

What Do We Not Cover

We will not attempt to be comprehensive — no one understands everything, and no one needs to understand everything to get started. Our aim is to take a first-principles approach, giving you a solid foundation and the confidence to learn new concepts as they arise.

We will not cover computer vision — a field central to, and is arguably the most well-understood part of, visual computing. There are simply too many excellent texts.

In human vision, our focus will be on the early visual system, particularly the retina. This is where the first steps of seeing occur, setting the limits of our vision and representing the most thoroughly studied part of the visual system.

In imaging and display, we review basic principles (optical-electrical signal transductions) while being intentionally light on implementation details — these evolve quickly and are often proprietary.

In rendering, we focus on the classical computational models of light-matter interactions while completely ignoring the implementation issues such as the modern graphics pipelines (e.g., programming Vulkan) and GPU hardware. We will not cover the emerging (neural) rendering paradigm in detail, but will explore its connections to the classical theories.

Other Books

This is necessarily an incomplete list, because I can mention only books that I find myself reviewing from time to time.

For human vision, the monumental book by Wandell (1995) is a must read; it is freely available, and a second version is in the works, I was told. Cornsweet (1970), while slightly dated, is an all-time classic that has inspired a whole generation of vision scientists. Rodieck (1998) is a breathtaking walk-through of the early stages in vision (eye optics and retinal processing), where every step is described operationally rather than conceptually. In the end, you get the feeling that there is no magic. What’s most powerful about these books is that none of them has complicated equations — they focus on building intuitions and basic principles. Goldstein and Brockmole (2017) and Yantis and Abrams (2017) are popular introductory texts to human perception (vision and beyond).

For rendering, Pharr, Jakob, and Humphreys (2023) is a golden reference, and Dorsey, Rushmeier, and Sillion (2010) provides a comprehensive coverage of light-matter in the graphics context. Glassner (1995) is bit dated but a classic. I would also highly recommend Johnsen (2012); it approaches light-matter interactions through the birth (emission), life (scattering), and death (absorption) of a photon. The book is written in the biological context, but the intuitions built are general. For a technical treatment of light-matter interactions, read Bohren and Clothiaux (2006). It is written for atmospheric scientists but, again, the principles apply broadly. Of course, for a quantum treatment of light-matter interactions, I’d be remiss not to mention Feynman (1985).

There are comparatively fewer texts on imaging. The one I particularly like is Rowlands (2020). Trussell and Vrhel (2008) and Nakamura (2006) are two excellent references; the former focuses more on the principles (of both imaging and display) and the latter is heavier on the hardware implementations of image sensors.