Debating CAMs

Not to nitpick, but this is false as best as I have come to understand things. No tristimulus model can ever be a “Colour Appearance Model”, let alone even if that is what should drive the picture formation process.

Visual cognition appears to leverage fields, and as such, number fornication should be called out for what it is. It will not hold up in other spatiotemporal fields, as is trivially demonstrable.

Every single three channel “Colour Appearance Model” is nothing short of misinformation.


Tomorrow is meeting#100.

Does anyone feel this CAM prototype is too complex for what it does ? Have we cornered ourselves in some kind of dead end ?

I am just worried by the amount of hacks and layers on top of each other.



I’m quite far from understanding the technical details in the CAM’s development progress but I follow the meetings with great interest. Personally I do feel that a lot of tuning has been done since the decision to use it in order to
1: achieve a pleasing baseline color appearance without technical errors.
2: match SDR/HDR appearance
3: have an as perfect as possible invertible DRT.
And from what I can tell and see none of these points have been fully satisfied yet.

All I’m wondering is how trivial it would be to roughly re-evaluate ZCAM for these same goals and see how quickly the ‘knobs can be tweaked’ to get to the desired result versus the current prototype. Using another model would feel like a big set back but if the same progress can be reached as that of the current one in a timely manner it could be a win?

In the initial 3 candidates testing my personal preference was already biased to ZCAM. I noticed that during development some of the appearance in the current model was tweaked to match that, or at least be similar. If such things are already part of a model perhaps performance and control is potentially in a better place?


There’s really not that much to disagree with here.

The author in the linked paper explicitly tries to point out errors or incorrect assumptions made when using XYZ for color appearance/difference models.

The line you quoted was just my attempt to paraphrase the ZCAM authors intention for that particular function.

I probably could have worded it a little more strongly to say “it addresses the inherent limits of CAMs (and XYZ) by enabling the transform to adjust the inevitable trade off between perceptual uniformity and hue linearity”.

As I mentioned in my post, the equation might just be an extra lever/hack to adjust the blues, it is by no means a solution.

I wouldn’t have even pointed out that particular equation if Pekka hadn’t commented on the ZCAM handling of Blue and if he wasn’t already adjusting his custom primaries by hand.

In my prototype repo are LUTs (and DCTL) for the version v034 with the new compress mode, and with the stock Hellwig/CAM16 primaries. It’s the version described in the very first post in this thread. These are for those that want to see what difference the stock primaries make in images, and of course how the different compress mode described in the first post behaves. In other respects it’s identical to v034.

1 Like

My main concern is about this issue with magenta-blue gradient that is still present in v35 and v34.
It’s a little bit better with ACES2 CAM DRT v034 Rec709 new compress mode.dctl, but still would require manual fixing per clip in a real project.

I probably wasn’t terribly clear. There’s no “trade off”. It’s an impossible nut to crack as not a single person on the planet has even the slightest idea as to how visual cognition works.

Following the tremendous amount of research given by the likes of Adelson, Grossberg, Shapiro, Kingdom, Gilchrist, and a couple of dozen other incredibly important names, no model that uses a discrete sampling approach can even remotely behave as a “colour appearance model”. It’s pure nonsense. (The demonstrations above will range from compelling to meh depending on field of view dimensions, adjacent field contribution, etc.)

These aren’t “edge cases”, nor “illusions”, nor anything more than isolating how visual fields appear to play critical roles in the formulation of colour cognition psychologically.

Try any “CAM” on even the most basic “achromatic” values example as per the Adelson Snake. They will all fail spectacularly because, lo and behold, not a person on the planet understands visual cognition. What are they affording this project, because again, a discrete sampling of tristimulus approach will never and can never be a colour appearance model? Where did this orthodoxy manifest?

And I wish I didn’t have to spend a breath of time on the subject of colour appearance models, because it has become the subject where the subject should be forming pictures from abstract colourimetrically fit data

Rewinding to creative film from this subject for a moment…

Creative colour and black and white film was, and remains, the apex predator of photographic picture formation.

Creative film wasn’t a colour appearance model.

Where did this whole idea to use a “colour appearance model” even come up in the first place? Surely someone can trace where the wisdom to start with the overly complex numerical fornicating of ZCAM to a patient zero? Who was this person? I swear I’ve been around these parts and I have no idea where this started.

There seems to be some goofy idea that the notion of “What is a picture?” has long since been “resolved” as a matter of stimulus conveyance. This is also pure nonsense, and the heavens know that countless tomes have been written on attempts to reconcile how pictures work in terms of mechanics. From Gombrich, to Peacocke, to Millikan, to Gibson, to Caron-Pargue, to name but less than a sliver of hundreds of minds.

To this end, we could enumerate the importance of picture-formation and picture authorship in two broad strokes categories:

  1. Make pictures that don’t look like complete ass.
  2. A protocol that affords picture authors full authorship control.

On the subject of 1., again, creative film has been the apex predator here, and it was not a “colour appearance model”.

Specifically, the method of chromatic attenuation and amplification afforded by dye layer channel mechanics, or even the adjacent per channel mechanics, have not been enumerated in the totality of this effort as best as I can tell. Those rates of change along the amplification / attenuation of purities are absolutely essential to bridge between “looks hideous” to “looks acceptable”.

On note 2., does anyone actually believe that creative control is being afforded? When folks open up their ACES project and see five different pictures depending on output medium, I’m 95% confident that there isn’t a single author on the face of the planet with cognitive faculty who says “Yep… this is working!”

I want to restate loud and clear that this whole of “Colour Appearance Model” has failed to address a critical point of protocol. In the earliest historical manifestations of “the film colourist”. As many people who are familiar with he history of cinema, the earliest film colourists, often women, would use dyes or paints to tint the formed pictures present within creative film, often times with stark depths of purity. It should be viewed as completely ironic that the current protocol outlined in ACES and other protocols prohibit the application of the first appearance of the profession of colourist work as we know it. This is mechanics. No one tried this I guess?

If there is a joke in here, that has to be a glaring punch line. Even the protocol is fundamentally… curious. There’s no attempt to localize “Where is the picture I am looking at?” so that further visual manipulations can be applied, but I guess that’s another great “That’s a different VWG” problem like we saw with the “Gamut” nonsense…

So even if everyone agrees that because visual cognition of pictures is complex, and that therefore authorship affordances are of utmost importance, the insistence on this “Colour Appearance Model” protocol, or the reductio-ad-absurdism of “Take derp values to derp display” has hampered even the most basic investigations as to what the apex predator did (or even the earliest per channel additive models) in terms of “effective” versus “ineffective” picture formation.

Honestly, and I would bet that most folks who have been involved in this process in some capacity, can probably realize that these nonsense “CAMs” (hello fault line patient zero ZCAM nonsense) are not the fundamental mechanics of what is delivering anything remotely “acceptable” in the formed pictures.

The clip clamps off the working data pre-picture formation by the per channel are.

That is it. Nothing more. The mechanics of the per channel are creating wholly new measurable values out of the “input tristimulus”. That’s forming the picture, and that picture, the thing we are all looking at, is formed out of the most basic of mechanics that has nothing to do with putting on a black robe and worshipping at the foot of what amounts to a per channel model, for no good reason.

Could a basic clip clamp be “improved”? No, because the surface of these problems has been so obfuscated with absolute nonsense that no one can even formulate in words what the precise “problem” is.

Ill defined problems beget ill defined non-solutions.

(In fairness, I still don’t have a shred of a clue what “tone” is, so I’m a little behind. Apologies.)


I still think the title of this thread is important and that particular step of the current prototypes is a critical step, and I hope to have something short and useful to share on that.

Unfortunately there may be no turning back from the direction this thread has taken, apologies if I have contributed to that, but maybe it is still useful and the occasional tendency to divert into deeper conceptual matters on this forum is probably overdue. Skip to the last few lines of this post to avoid this diversion.

It can often be helpful to (re)define useful terminology (yes, even “tone”) and clarify expectations to make sure there is not too much talking past each other.

Keep in mind the original post I made referred to the function of a particular line of code that might/or might not be useful. In general the “trade offs” I would refer to are are purely practical, when there is no perfect solution due to the limits of technology or understanding, as is definitely the case with the human visual system, yet something needs to be delivered!

These are the kind of choices/trade offs that members of this forum make every day. Lighter/darker, warmer/cooler, pinker/greener,etc… Often no extra technology or information make these choices/trade offs easier.

Most CAMs are not expected to model all the spatial temporal phenomena and complexity of the human visual system.

CAMs often do a decent job at the task they were designed for— to make statistically accurate predictions on observer reported color appearance/description from a specific data set(s) under very specific conditions.

This is not the goal of the ACES DRT, nor should it be.

The task that most on this forum have is to make pleasing pictures, and the ACES DRT should help them achieve that goal.

These tasks obviously have a great deal in common, but the end goals are not the same and I don’t really think anyone reading this needs reminding of that.

The hope is that tools/ideas from one (CAMs) can be applied to the other (DRT), which is very likely/somewhat already proven.

It is useful for Troy to reference the development of film and what could now be referred to as “Device Color”

Probably most of the critical development in the history of Color imaging came from innovation in dyes, Colorants, phosphors, substrates, sensors, and the chemical or electrical means to control of their application.

One could argue that Color models (like CIE XYZ ) have mostly been useful to evaluate and compare the output from “Device Color” rather than a direct means of producing/controlling the output.

This is of course no longer the case, so we should probably pay a great deal of attention to details/limits of these models if they are directly affecting the look of our pictures, and be ready to hack them or scrap them and build new ones if that is what is needed.

It is probably good to keep in mind that CIE XYZ or other tri-Color colorimetric measures are still very useful and accurate measure of “stimulus”.

If the light source and colorant/filters are known, then XYZ coordinates are sufficient to exactly reproduce that same stimulus with the appropriate colorant or filters, even if we do not have a model that describes the appearance of that stimulus under all conditions.

I will reply in further detail to some of the points/examples Troy has demonstrated, and some thoughts on CAMs and DRTs that have been raised by others (maybe in a new thread?).

A quick summary to this post (and intentionally provocative) reply to/expansion of some of Troy’s comments in the thread (which I mostly find myself in agreement, but not total agreement):

We don’t absolutely need to know how the human visual system works.

We only really need to know how to make pictures work!

(and maybe a little about the devices that we use to produce them)

What tools do we have to achieve this and what tools still need to be built.

A more productive summary that fits with the original intent of this thread would be:

WTF is going on with the shadows and the blue light on the pool table in blue bar!

How did it look? How should it look? How could it look? and what tools are available in the ACES DRT to control/modify the look?



Well put!

The main appeal for using a CAM should be the ability of changing the viewing conditions and have an image that appears roughly similar. Because they have good perceptual uniformity, using them to control chroma / saturation whilst preserving hue linearity seems sensible when this was one of the VWG requirement!

1 Like

Without the anrticulation of spatiotemporal fields as the basis, these things pure nonsense. Hence why for the past forty years everyone has relied on basic power function discrepancies between encoding and output. For example, Determination of the Perceived Contrast Compensation Ratio for a Wide Range of Surround Luminance by Ye Seul Baek, Hong-suk Kim, and Seung-ok Park.

Again, if there is a consistent ground truth to visual cognition, it is fields based, not discrete samples. Even constant spectra is variegated, with vision research left with unknowns.

Good Enough is a power function until spatiotemporal articulation forms the basis.

Regardless, these absurdist CAMs as picture formation chains are already showing how janky they are. After all, they are disconnected from appearance, and are nothing more than gravely distorted per channel systems. So why not cut out all the jank.

1 Like

I barely understood 15-20% of what you’ve said, because I’m not a color science engineer.

So, in short, your proposal is to use a common per-channel tonemapping and to deal with different hue skews in SDR vs HDR manually?

Curious to see more about what could be a better solution instead of the current candidates in your opinion. Or you mean that it needs more research and for now it’s impossible to build anything?

Regarding even just spatial part - it can’t be baked into LUT and loaded into a field monitor on set. I think, this dictates per-pixel as a requirement more than anything else. I can’t see how productions could be adapted to spatial “LUTs” in even next 5-10 years.


Now this is the sort of cutting humour I can get behind! Respect. “Somewhat”, “likely”, and “proven” are doing more work than Conan did behind that massive wheel!

This is a perplexing question to me.

If you and I went out and shot a creative colour film, and ended up at a print, would we be confused with how to translate the singular measured stimulus of the picture formed in the print to an additive medium? There’s one picture in this case, and it is not the “stimulus” in front of the camera. Starry Night is one picture. The Mona Lisa is one picture. Ansel Adams printed one picture.

While I would certainly want to account for folks wanting to completely rephotograph the film, I can’t believe for a moment that you or I would want to shoot multiple pictures? Are we forming pictures or selling subscriptions or televisions?

If no one can explain where the picture is, as a data state, we have a problem. (This also is precisely why ACES is currently a failure, as an aside.)

I am suggesting that the mechanic “working” as a picture formation mechanism in the completely bogus CAMs are rather clear:

  1. The harder legwork of shaping the data prior to picture formation. Fry, Pekka, etc. have done a significant amount of legwork on shaping tristimulus data pre picture formation.
  2. The per channel mechanic, the mechanic that underlies the forming and shaping of the reduction in purity that, for whatever mystical reason is mandatory in picture formation, is completely unrelated to “appearance” modelling. The “CAM” is just a per channel system. Make Occam’s Carpaccio out of it, and the odds of it being coaxed to behave go up by an order of a magnitude.

Given that control over the mechanics were made dumping the ridiculous ZCAM, and moving to the more reduced Hunt inspired model, does it not follow that stripping the Hunt model down to the simple per channel would also afford more control? Seems logical to cut it down the parts required, which is nothing more than a per channel mechanic as best as can be seen?

The point was not that none of “appearance” models function as “appearance models”, which of course they do not. This is not the essential point.

The essential point is that forming pictures based on an “appearance model” is the fault line nonsense. No “appearance” model even affords an explanation!

Creative colour film was not a CAM. Despite this, to this day, chemical colour film still thrashes every attempt at forming pictures that have existed since the advent of electronic sensors. We should have learned from it, instead of making offerings to false CAM gods.


Thank you for your answer!
To be honest, and I apologize for being stupid again, I’m not sure I understand what you mean. It all became too complicated for me. But I’m still interested to see where these debates about using CAM in ACES are going to bring.

I have non-scientific feeling that DRT that gives us nice pictures with strictly no artifacts (and invertible!) is way more important than preserving true colors of the original scene (even if this would be possible).
I hope this is at least somehow relevant to what you all are discussing here. Because I’m completely lost. Maybe I better just shut up and watch until I understand at least a half of the discussion.
Thanks again for your time!

1 Like

This feels like a brain worm. Don’t get me wrong, I can appreciate and understand the desire to have something can could be hacked to backproject, but perhaps energy should be expended on specific tooling to achieve this facet.

Creative colour film pictures are not invertible. Black and white film are not invertible. X-Ray pictures are not invertible.

The idea that a picture can be “inverted” at all is part and parcel of a larger and suffocating conceptual framework. If I said “invert Starry Night” you’d bust out laughing at me, yet folks utter it with a straight face. Again, I’m not diminishing the desirability to be able to take a fully formed picture and hack / approximate an open domain tristimulus representation from it, but rather as a requirement over the ultimate goal, it is a fool’s errand of ahistorical nonsense.

Note that the idea of using some mythical “appearance” model goes very far back in history. In fact, Judd, Plaza, and Balcom made this very mistake in their 1950 Condon report^1; the seductive assumption that a picture is “ideal” when “reproduction” of the stimulus is achieved. They made the mistake of not doing their history apparently, and reading the work of Jones^234. Despite being focused on black and white formation of pictures, provided hints that the depth of picture-texts was more than a simulacrum of “standing there”. MacAdam smashed that idea to bits in 1951, as has been cited at this forum several times^5.

It may be interesting to note that Bartleson^678, whilst employed by Kodak, was clearly leaning away from a “conveyance of stimulus” in colour in pictures, researching memory colour trends.

At any rate, there is a massive history of attempts at understanding picture-texts that goes back over a century, with an even longer history if we look at psychology and philosophy.

All of that disappeared in a puff of smoke when the number fornicators of electrical sensors took over, as depressing as that is.

  1. Senate Advisory Committee on Color Television, E. U. Condon, Chairman, "Present Status of Color Television”, Judd, Plaza, Balcom, 1950.
  2. Psychophysics and Photography, Jones, 1944.
  3. Photographic Reproduction of Tone, Jones, 1921.
  4. On the Theory of Tone Reproduction, with a Graphic Method for the Solution of Problems, Jones, 1921.
  5. Quality of Color Reproduction, 1951.
  6. Memory Colors of Familiar Objects, Bartleson, 1960.
  7. Some observations on the reproduction of flesh colors, Bartleson, 1959.
  8. Color in memory in relation to photographic
    reproduction, Bartleson, 1961.
1 Like

PhotoCD would beg to differ

Last time I looked into that vault, it was about the archival of already-formed pictures from film. Even then, I’m sceptical there’s any “inversion back to stimulus of the stuff in front of the camera”. The broader error is introduced when we skip over the critical stage of actually forming the picture via the quantal catches from an electronic sensor.

PhotoCD unbuilds densities to the exposures that created those densities.

As far the CAM DRT goes to me the “CAM” part is unimportant. What we’re mainly using from the model is the “perceptual” space, values for lightness and colorfulness, and that’s about it. Pretty much any other perceptual space should work also. We’re not even using the viewing conditions from the model (to change viewing conditions).

This is interesting question. Is it more control? With the current DRT we’ve had to engineer every part, including the path-to-white (aka. chroma compression), which affords enormous amount of control. It comes at the cost of that engineering work and more complicated transform.

The hacky bit, IMO, is the compression (the compress mode) we have to do avoid the model breaking apart because of those way out there values that would go negative in the perceptual space.

The inversion requirement to AP1 is tough. We want forward direction even in Rec.709 to handle those way way out there values but then the cube has to invert to (only) AP1. I think forward direction suffers as a result.

Densities would indeed be a formed picture. Dye densities are from the creative film, and correlate on the scale of maximal to minimal densities, subject to the aforementioned interactions that place the resulting purities on an amplification / attenuation continuum.

Still… a formed picture, where the densities of dyes now form the stimulus of the picture, not the stimulus of “light transport” happening in front of the camera.

It should be clear hopefully from the examples above, as well as a hundred more that could be supplied, that this is not the case. No such magical “lightness” scale has been derived, ever. Further, even the basic terminology of “brightness” versus “lightness” is a contested territory in visual cognition research^123. The point to be stressed, what folks think the metrics align with are not how visual cognition works.

Just look at the Adelson Snake demonstration. Articulated fields are everything. (Even Bartleson notes this!)

I’m a believer it is. It’s up to someone to trial it.

I don’t think anyone engineered the attenuation of purity. That’s a byproduct of applying a curve to three channels with numerical values in the complements. If you trace back to some of the earliest demonstrations that avoid the per channel mechanic, you’ll see that they grab onto the complement remainders, and gradually add them back.

Ergo, there’s no engineering of the actual attenuation rate, and this also happens to be what is currently exploding the “blues”, as well as the rather odd rates of change on the flesh.

Further still, the control of the chromaticity flights are incredibly challenging when dealing with the per-channel mechanic. If one seeks to assert greater control, it would require gutting the model down to the essence of what it is; a per channel model.

It should be noted that this is actually where the “acceptable picture” aspects comes from. In fact, if one were to trial a simple random set of “primaries” in such a system, and clip at a small value above zero, there would be virtually no gains exhibited between the clip slightly above zero mechanic and the added layers of math that the shibboleth has going on. When values are negative they are beyond the domain. Zero is simultaneously problematic in a per channel mechanic. It should be trivial to showcase that the above holds true in terms of “acceptable pictures”.

Why this cannot be considered unto itself as a separate design problem, complete with a more well defined design solution, is beyond me. Creative film is not invertible back to the stimulus that interacted with the formation. Creative black and white film is not invertible back to the stimulus that interacted with the formation. Just about every single picture that exists in human history is not invertible back to the stimulus that interacted with the formation. Notions of “inversion” are in fact a hack for specific challenges, and as such, probably deserve a specific attention to solve?

  1. Articulation effects in lightness: Historical background and theoretical implications, Gilchrist, Annan, 2002.
  2. The Dynamic Range of Human Lightness Perception, Radonjic, Allred, Gilchrist, Brainard, 2011.
  3. Layer and framework theories of lightness, Soranzo, Gilchrist, 2019.

Wow, that escalated quickly…

:person_raising_hand: Guilty here. But:
One of the most prominent appearances of colour appearances in literature close to our industry can be attributed to R.W.E Hunt. He got a whole chapter about that subject in his book ‘The Reproduction of Colour’ (PART SIX EVALUATING COLOUR APPEARANCE).
Also, he names chromatic adaptation as a form of colour appearance. He puts it as the basis of creative colour reproduction (also in film). He is referring to CAMs in his definition of preferred colour reproduction in his taxonomy of colour reproduction:

  • Colorimetric colour reproduction
    • Exact colour reproduction
    • Equivalent colour reproduction
    • Colorimetric colour reproduction as a practical criterion
    • Corresponding colour reproduction
    • Preferred colour reproduction

There is a considerable body of evidence that for Caucasian skin colour the above concepts must be supplemented to allow for the fact that a sun-tanned appearance is generally preferred to average real skin colour (MacAdam, 1951; Bartleson and Bray, 1962). There may also be other colours where similar considerations apply: for instance, blue sky and blue water are usually preferred in real life to grey sky and grey water; colour films can have some sensitivity to ultra-violet radiation and hence tend to increase the blueness of sky and water relative to the saturation of the other reproduced colours, but such a tendency, if not overdone, may well be preferred to a more consistent reproduction. It may also be desirable to introduce other distortions of colour rendering to create mood or atmosphere in a picture. These factors may be very important in practice, but it is felt that the concepts of spectral, colorimetric, exact, equivalent, and corresponding colour reproduction, provide a framework that is a necessary preliminary to any discussion of deliberate distortions of colour reproduction. In this context, preferred colour reproduction is defined as reproduction in which the colours depart from equality of appearance to those in the original, either absolutely or relative to white, in order to give a more pleasing result to the viewer.
(R.W.E Hunt)

Emphasis on depart from equality of appearance.
I tend to agree with the overall conceptual framework.

But I must agree with Troy that we should not treat CAMs as a godsend because, as Troy pointed out, there is no per-pixel solution. I also don’t think that we actually have a problem. We do not need to ‘pre-perceive’ the image for the viewer.
We just need a transform which allows us to do our work with ‘a bit of abstraction’ to the final explicit display technology (try to choose my words carefully here), so we can modify it easily to be reproduced on displays with slightly different capabilities.
Especially the CAMs derived with no requirements for image rendering should be taken with an extra portion of grain of salt. They were never intended to be used in the context we are using them for.


It is naive to think we can calculate lightness or anything remotely close to that without a big ‘scene understanding’ model in the background. And we are years away from tackling that (but things are progressing quickly lately). The best summary I know is from Paradiso ‘Illuminating the dark corners’. It is just three pages.

Brightness and Luminance is useless or even misleading in my eyes.
At best, we can calculate a proxy of achromatic. But then it needs to be simple and robust and not suggest some arbitrary weights up to the sixth decimal place resulting from fitting exercises. That is nonsense in my eyes. Also, complex models that make our lives harder are not helping us in the end.
That all applies to all other colour-appearance scales.


I agree 100% with Troy. The desire for invertibility comes from those who want to avoid using the process in the first place. It is a naive wish, chasing rainbows, an daydream of an unrealistic utopia. It is seductive and leads to the wrong working paradigm and bad habits.

We can use some considerations from CAMs to guide us, but if we see that we are adding duct tape on top of duct tape to fix something somewhere, we went too far.


I cannot resist restating that a good DRT is one you can swap for another. :slight_smile: