Hi,
Having binged a lot of the content on this forum as well as a few of the meeting recordings and chatted with troy a bit, I thought I would try to synthesize some core goals in a way that might clear up some common threads of “disagreement” (which I think is ultimately a bit of talking around each other rather than real disagreement ) that I’ve seen.
First, defining a couple of terms just for posterity as I might be misusing them or others might not know completely
- Stimulus - I think @Troy_James_Sobotka gave a quite good definition of this in another thread which I’ll use here:
Everything based off of the CIE XYZ chromaticity diagram is essentially a stimulus specification. When dealing with the three vectors such as RGB, it’s three discrete stimuli that sum to a cumulative singular stimulus. Note that this should not be conflated with observer sensation or appearance of the stimulus, but rather the singular stimulus specified via the original colour matching experiments.
2 - Sensation: This is the set of feelings or experiences that we actually have once a given stimulus is passed through the complexity of the human visual system (aka HVS). As a side note, sensation is what CAM systems are attempting to model, basically.
A key thing to realize on these definitions is that both the input and output of our DRT will be stimulus information. The input is unbounded, scene-referred stimulus while the output is a bounded, display-referred stimulus.
Building on that, to me, the goals of an ideal DRT are:
-
Insofar as a display is capable of producing the exact stimulus requested by the ground truth, produce that stimulus.
-
Insofar as the display is not able to produce the ground truth stimulus, create a well-defined, consistent mapping from out-of-bounds stimulus to in-bound (of display device) stimulus so that it doesn’t just happen by device-dependent accident
-
Exploit properties of the human visual system when devising the above mapping such that the sensation, given HVS peculiarities, that the original intended stimulus made would have made, is mimicked as closely as possible for any stimulus which falls out of display range and therefore must be mapped back into display range
Now… this is not quite so simple, as goals 1 and 3 are in some ways diametrically opposed to each other – due to HVS peculiarities, exact mimicry of ground truth stimulus in one area of an image may, indeed, destroy any chance of replicating the sensation that an out-of-bounds stimulus intended to create in another part of an image.
So, there is an inherent tradeoff to be made between exactly replicating ground-truth stimulus and creating a sensation-based mapping such that out-of-bounds stimulus still feels correct from a sensation perspective upon final output from the DRT.
As has I think been correctly but not strictly identified in the past in the (now somewhat legacy) term “tone mapping”, and more recently explicitly laid out in this thread On Brightness (Fry-ZCAM / Jed-JzAzBz Image Formation), I think it can be agreed upon that brightness, and more specifically, relationships between brightness throughout the image are of the highest importance in terms of mimicking sensation given an inability to replicate stimulus. This should come as no surprise, as it’s a technique that’s been used and taught in art fields for at the very least decades and in reality hundreds of years… ask any AD, painter, photographer, and they’ll tell you that (at least as the terminology was taught in my program) “value structure” is the most important piece of an image to get it to “look right”… what they mean by “look right” is, as I now interpret it, a simple way of saying “replicate the sensation we desire given the inability to replicate the exact stimulus which we are trying to depict.”
And, as hinted at in the above-linked thread, this is a way to justify the oft-cited as wanted “path to white” or “highlight chroma compression” which @jedsmith has done awesome work attempting to engineer… not as a creative “flair” but as a core mechanic of mimicking the desired sensation given the inability to mimic the stimulus… we’re choosing mimicking the sensation of brightness as most important, and then exploiting the fact that a less-colourful chroma creates the sensation of higher brightness to create the sensation of more brightness than is really available on a particular display in a well-defined way, rather than leaving it up to display-dependent chance.
Now, as for exactly how to do this… I don’t have the solution x( It’s a hard problem, in my view one of the hardest given that it so directly straddles the line between stimulus and sensation – usually, you’d want any operation you apply to stay strictly on one side or the other of that barrier. But, the goals outlined above necessitate doing both at the same time, or at the very least blending between them.
I do think the CAM-model idea is a very interesting path for this, as in many ways, it’s exactly the purpose of a CAM, as was outlined by @Alexander_Forsythe… to take input stimulus, extract the sensations it evokes, and retarget those sensations into a replacement stimulus that mimicks the sensation as much as is possible.