Debating CAMs

daniele · May 8, 2023, 7:52pm

Wow, that escalated quickly…

Guilty here. But:
One of the most prominent appearances of colour appearances in literature close to our industry can be attributed to R.W.E Hunt. He got a whole chapter about that subject in his book ‘The Reproduction of Colour’ (PART SIX EVALUATING COLOUR APPEARANCE).
Also, he names chromatic adaptation as a form of colour appearance. He puts it as the basis of creative colour reproduction (also in film). He is referring to CAMs in his definition of preferred colour reproduction in his taxonomy of colour reproduction:

Colorimetric colour reproduction
- Exact colour reproduction
- Equivalent colour reproduction
- Colorimetric colour reproduction as a practical criterion
- Corresponding colour reproduction
- Preferred colour reproduction

There is a considerable body of evidence that for Caucasian skin colour the above concepts must be supplemented to allow for the fact that a sun-tanned appearance is generally preferred to average real skin colour (MacAdam, 1951; Bartleson and Bray, 1962). There may also be other colours where similar considerations apply: for instance, blue sky and blue water are usually preferred in real life to grey sky and grey water; colour films can have some sensitivity to ultra-violet radiation and hence tend to increase the blueness of sky and water relative to the saturation of the other reproduced colours, but such a tendency, if not overdone, may well be preferred to a more consistent reproduction. It may also be desirable to introduce other distortions of colour rendering to create mood or atmosphere in a picture. These factors may be very important in practice, but it is felt that the concepts of spectral, colorimetric, exact, equivalent, and corresponding colour reproduction, provide a framework that is a necessary preliminary to any discussion of deliberate distortions of colour reproduction. In this context, preferred colour reproduction is defined as reproduction in which the colours depart from equality of appearance to those in the original, either absolutely or relative to white, in order to give a more pleasing result to the viewer.
(R.W.E Hunt)

Emphasis on depart from equality of appearance.
I tend to agree with the overall conceptual framework.

But I must agree with Troy that we should not treat CAMs as a godsend because, as Troy pointed out, there is no per-pixel solution. I also don’t think that we actually have a problem. We do not need to ‘pre-perceive’ the image for the viewer.
We just need a transform which allows us to do our work with ‘a bit of abstraction’ to the final explicit display technology (try to choose my words carefully here), so we can modify it easily to be reproduced on displays with slightly different capabilities.
Especially the CAMs derived with no requirements for image rendering should be taken with an extra portion of grain of salt. They were never intended to be used in the context we are using them for.

Lightness

It is naive to think we can calculate lightness or anything remotely close to that without a big ‘scene understanding’ model in the background. And we are years away from tackling that (but things are progressing quickly lately). The best summary I know is from Paradiso ‘Illuminating the dark corners’. It is just three pages.
https://www.cell.com/action/showPdf?pii=S0960-9822(99)00249-3

Brightness and Luminance is useless or even misleading in my eyes.
At best, we can calculate a proxy of achromatic. But then it needs to be simple and robust and not suggest some arbitrary weights up to the sixth decimal place resulting from fitting exercises. That is nonsense in my eyes. Also, complex models that make our lives harder are not helping us in the end.
That all applies to all other colour-appearance scales.

Inversion

I agree 100% with Troy. The desire for invertibility comes from those who want to avoid using the process in the first place. It is a naive wish, chasing rainbows, an daydream of an unrealistic utopia. It is seductive and leads to the wrong working paradigm and bad habits.

We can use some considerations from CAMs to guide us, but if we see that we are adding duct tape on top of duct tape to fix something somewhere, we went too far.

Flexiblity

I cannot resist restating that a good DRT is one you can swap for another.

Alexander_Forsythe · May 8, 2023, 8:40pm

Even CIELAB, at times, has been referred to as a color appearance model.

CAMs … no, though we are hardly the first to try. (Eg Windows Color System circa 2002)

There are real use cases requiring an inverse including bringing output referred material into the system. Not everyone needing an inverse is just trying to avoid the process.

Troy_James_Sobotka · May 9, 2023, 5:14pm

I’d be all to happy to blame you, but sadly I cannot here.

ZCAM surely wasn’t you, but perhaps “inspired” by the idea that Filmlight has something they labeled a “CAM”.

He was also the knucklehead trying to sell folks that it could be done using a discrete sample model, despite glaring evidence to the contrary. I’m sure he ran away with his bag too.

This is seductive wishing on Hunt’s part, cleaving to Hunt forcing a humanist vantage on things. I promise that no where in any of the Bartleson papers, nor MacAdam, is this “sun-tanned” nonsense present. At least not in any of the papers I’ve read on the subject from them. Perhaps I overlooked it.

The reason this is nonsense in my mind is that it appears Hunt has already made his mind up. MacAdam and Bartleson don’t forward such hypotheses laden things in their cited works. If anything, they simply point out the confusing nature of the phenomena. In fact, as a counterpoint, Bartleson makes a few astute observations, clearly aware of the receptive field of the picture-text. For example:

In the case of the present experiment, for example, there was only one very simple image configuration: a simple portrait. Undoubtedly, more complex images, smaller areas of flesh, variations in viewing and adaptation conditions, and variations in the luminances and chromaticities of surrounding image areas, to name only a few factors, will exhibit an influence on the choice of optimum flesh reproduction.^12

Where I think Hunt grossly fails is in his reduction of spatiotemporal articulated regions to a singular quality. For example, in MacAdam’s work, he notes specifically:

On the other hand, when the print of the highest acceptance is masked and compared with the original subject, it seems quite pale.^3

The “pale” flies in the face of Hunt’s attempt at imbuing the preference toward the “sun-tanned” nonsense.

Picture-texts are an encoded signal, to be decoded by the author-reader. Where the idea of a “CAM” fails absolutely miserably is in the regions that, without acute focus, are cognized invisibly and transparently.

If we take a laser pointer in standard ecological cognition, blast it at a wall, it will appear “green” or “blue” or “red for example. It never appears “white”. Yet this common vernacular in picture-texts goes unnoticed. In fact, I challenge anyone here to look at highly pure sources of great brightness and suggest that “they appear achromatic” in vision. This does not happen. Ever.

Yet in picture texts this is not only common, but a mandatory vernacular.

So Hunt’s claim falls on its face when discussing “colour appearance”; he fails to separate the activity of standard ecological cognition from the act of cognizing a text. Literary texts involve a unique mode of cognition, and as such, so too do picture texts. This decoding, or active cognition, is unrelated to the standard model of propelling our bodies through space.

Consider the following example:

This sample from Grossberg^4 is a fascinating example in that the reification of “dog-ness” broadly is an exercise in Gestalt belongingness and continuation. The separation of the marks, and their unique constellations of consistency versus inconsistency, connect up to a constellation of signs. Despite no one having experienced the arrangement of visible light radiation in the form in the picture, we are able to orient the constellation of signs in the text as “dog-ness”. Importantly, it is a constellation of signs, not a singular sign.

In much the same way, we might be wise to consider Caucasian “skin-signs” not as a singular (EG: A singular chromaticity) but rather as a constellation. The constellation of these patches is what is actively reified by our cognition. That is, it’s not a singular patch that is reified, but the spatiotemporal relationship.

This broken ass picture nicely touches on the idea of a constellation of stimulus. There are variegated regions of high articulation, where we can get broad senses of gradations. Yet on the chest of the red character, there is suddenly no such gradation; the constellation of stimulus exists as “other” in relation to the field patterns around. Worse still, the constellation of signs points to a human-like figure, but the region near the torso is lacking the signs present in other regions of the picture-text.

Perhaps an interesting question can be had in how, on one hand, Dalmations in the Snow form a cohesive picture-text, absent of smooth gradations. Yet the picture-text abysmally formed by ACES, be a bed poop. Is there something about the continuity of spatiotemporal frequencies at work within the bounds of the picture-text? What happens when these broad frequency distributions are disrupted?

Looping back to Hunt and CAMs, we can ask ourselves what these nonsensical CAM models are doing with respect to picture-formation that they are not doing with respect to standard ecological cognition.

For starters, they are a three channel model operating on a closed domain. It is that closed domain that is delivering the picture-text cues that are highly desirable. Specifically, the attenuation of chroma toward achromatic at the higher side, and the amplification of chroma toward the lower side. And to repeat, as a “colour appearance model”, these things are falling flat on their faces; bright and pure stimuli do not attenuate to achromatic in standard ecological cognition, they are just painfully bright and pure.

This desirable trait is nothing more than exclusive to the three channel model and how crosstalk is manifested.

I would broadly agree, except greater care and attention should perhaps be paid to the model in the way that it crafts and engineers the attenuation of chroma on the higher side.

There are clear earmarks that the current three channel model is pooping the sheets here. I have seen more than a few pictures formed through the chain that has peculiar “kinks” in the attenuations. The diver images seem to provide some good examples here, but it is also noticeable on the formation of face gradations, specifically paler skin.

The reason that these issues should be focused on is that it is easier to push a picture toward posterizing than it is to recover it, and these visual kinks are similar. It is likely easier to create a kink than undo it.

—

Some Observations on the Reproduction of Flesh Colours, Bartleson, 1959.
Apologies to anyone who happens to not be of the pasty Caucasian flesh variety. Caucasian flesh was sadly considered “flesh” in most of the research of this time. Similar unique peculiarities happen with darker skin, where “as measured” can look strangely “orange” for example. Here’s to hoping that moving forward we can extend our understanding to include the plethora of other human skin types and their related colour formations in picture-texts.
Quality of Color Reproduction, MacAdam, 1951.
How Humans Consciously See Paintings and Paintings Illuminate How Humans See, Grossberg, Zajac, 2017.

Troy_James_Sobotka · May 9, 2023, 5:21pm

This deserves its own discussion. The blunt force trauma of back projecting through a singular picture-formation chain is ridiculous. There are many options here for a design that fits the broader needs, and would include more optimal back-projecting.

Alexander_Forsythe · May 9, 2023, 5:25pm

I’m all ears. What’s your proposal? Code illustrating your solution would be best.

Troy_James_Sobotka · May 9, 2023, 6:05pm

Design first, then code.

Otherwise it will lead to the same place this mess is currently in.

Alexander_Forsythe · May 9, 2023, 6:16pm

You seem to have a design in mind. Please include code to illustrate as words alone can be misunderstood, particularly when the audience includes non-native english speakers.

ChrisBrejon · May 9, 2023, 9:52pm

Hi everyone, I cannot resist to answer. I think it is good to have a “safe” place to discuss these topics.

If I may, I would like to share a few ideas that have been on my mind for the past few years. I understand it might be too late to go back to the conceptual framework conversation but nonetheless I always think it is interesting to share ideas and opinions.

I went back to check on the Miro board to do a screenshot. Here it is :

The framework that has been on my mind for these past years is something between III and IV and I would like to share it from a CG/Animation perspective. Nowadays most of our workflows have two “states” :

scene-referred
display-referred

The CG industry has moved from “display-referred” to “scene-referred” workflows quite successfully and I guess everybody (myself included !) has seen it as progress. There is an interesting pdf from Naughty Dog about Uncharted 4 where two pictures are shared :

I could be wrong but I think the “HDR Tint” is presented here as superior when one could argue that these two images have different creative intentions and both approaches are valid. I am not saying that one “Tint” is better than the other but that we could think of a framework that allows both. It would have three “states” :

scene-referred
image-referred
display-referred

I can think of several operations that would benefit from a such workflow : sampling and denoising. I recently discovered that a major render engine´s denoise process expects a [0, 1] range and I thought this “image-referred” state would be a good place for that.

Also, with adaptive sampling, since the convergence criteria (or threshold) is “perceptual”, I guess it would make sense to use an “image-referred” state to sample properly. Some render engines ship an arbitrary curve to sample (so completely unrelated to the actual CMW used) and some of them (such as Karma) have implemented OCIO in the sampling options.

The issue with that is that users would sample using an SDR Output Transform and then when the HDR grade comes in, noise would appear (especially in the highlights). Because HDR view transforms usually have much less compression in the upper range of the image data, we cannot hide sampling issues up there as much. This is the main feedback I got from a famous plumber CG movie…

I remember that denoising has been discussed in this thread with a different take on the issue. So I guess there are several possibilities out there.

Finally, there are several grading operations that I believe would fit perfectly in this “image-referred” state such as saturation or contrast. It always feel weird to me to use contrast on tristimulus data in the open domain.

Anyway, that is my tuppence…

No need to apologize. We are all learning here.

And finally… If I understand correctly "The main appeal for using a CAM should be the ability of changing the viewing conditions and have an image that appears roughly similar." but “We’re not even using the viewing conditions from the model (to change viewing conditions).” I thought that was worth pointing out based on the different comments about the CAM.

Regards,
Chris

meleshkevich · May 9, 2023, 10:27pm

Shouldn’t LMT be per-show (as a show LUT) and IDT per-shot?

Phototechnica · May 10, 2023, 7:59pm

This thread has already been relegated once, so I will try and keep replies bite sized and on topic, although I am not exactly sure what positions are being debated in “Debating CAMs”. Is it CAM good vs Film and eyes good/CAM bad? or maybe it has become scene-referred vs display-referred? All these are good debates to have and I am eager to weigh in, as long as they are productive and maybe even offer some useful code, practical solutions or image comparisons.

I take Christophe’s comment above quite seriously and it has been echoed by others, but I think it is all very relative.

Let’s suppose it is all hacks, and nothing but hack’s all the way down.

ACES DRT 1.0, 1.2, 1.3… also became a collection of hacks, as feedback and use cases began to accumulate.

The question should be is a CAM or CAM-like (OKlab for example) space a better space than some other space (RGB, HSV, CIELab,etc…) to perform all the hacks that will eventually be needed to do the required scene/ACES to display transform.

If the user needs to add an additional “hack” of a LMT, is a CAM DRT a good/better transform for that case?

As Pekka and Thomas point out, CAMs just offer a few more tools and attributes, but they still need to be used carefully, and as we see with the gamut mapper every detail matters.

So far the results are promising and have achieved many of the goals that ACES 1. could not , but still more refinements to be done.

Christopher

ChrisBrejon · May 10, 2023, 10:06pm

I do agree that in the end colorimetry is just bullshit and it is all a hack. But then, one of the design requirements was to create a “simple” algorithm. And this current CAM Output Transform is arguably one of the most complex piece of software engineered by man. Ever.

Joke aside, I really appreciate the work and efforts but still wonder if the ratio complexity/visual result is worth it. And what does this CAM really bring to the table.

Again the discussion is interesting. So let´s keep going !

Regards,
Chris

Thomas_Mansencal · May 11, 2023, 9:06am

It is not, by any means. You should dive into Unreal Engine codebase for some perspective.

The Hellwig et al. (2022) CAM is not adding a lot of complexity, we are talking about roughly 3-4 times the line count compared to a colour model like ICtCp. ZCAM and CIECAM16 are in the ballpark. The source of complexity is all the tweaks we are bolting around the models: tone curve, chroma compression, gamut mapping, etc…

As the main advantage of the CAM is the viewing conditions management, it could be argued that if we don’t use it we should drop the CAM entirely but it would be dismissing that its perceptual uniformity is good. Another appeal of this specific model is the lightness correlate that accounts for the HKE effect. The model is also mathematically simple and invertible, c.f. Hunt’s model for comparison.

If we pedal back to some of the requirements, something that was pointed out by many people, yourself included @ChrisBrejon, is that the horrible™ hue skews should be removed. We also talked at length about the desire of having highlights desaturation, i.e. path to white. Those two points can be handled by a good perceptually uniform model (or colour appearance model) with low effort, because brightness/lightness are decorrelated from hue and chroma.

No one EVER said that this is the perfect approach, and no one said that CAMs are the silver bullet. @Alexander_Forsythe and I were talking about their immaturity and lack of spatial modelling circa 2016 on Slack , nothing to see really…

It is worth reminding that we had a few simpler DRT candidates but the author retracted them by adopting a license that is incompatible with the Academy and ASWF so it was natural to fallback on another candidate. If anyone has a simpler and better display rendering transform to offer, then this is the good place and time to do so. Everybody will be super grateful!

Cheers,

Thomas

Troy_James_Sobotka · May 11, 2023, 12:04pm

Not possible. There is no “perceptual uniformity” without considering the spatiotemporal articulation.

You are conflating authorship control with the mechanics of pictures.

TCAMv2 for example, utilized a chromaticity linear approach (clip distortions notwithstanding) to assert the control aspect, not for the mechanics of picture making.

The assumption is that a CAM can be used to form a picture. Seems like a broad leap.

Some pretty hard revisionist history going on here…

Cheers,
T

Troy_James_Sobotka · May 11, 2023, 1:02pm

Both of these questions are predicated on a certain orthodoxy. “scene to display” for example, might be considered as “We have meaningful data, and we seek to reveal it”, which loops back to the tautology of “The role of a camera is to present the stimulus as measured”. This is the precise error Judd, Plaza, and Balcom predicated their work upon.

If folks believe that a discrete sample based approach using tristimulus colourimetry can be used for an “appearance” model, the question comes down to a very boolean series of questions.

When we look at a high intensity and pure coloured laser projected onto a wall, does it appear to be white in appearance? If the answer is no, and the discrete sampling “CAM” predicts such an appearance, then the model is broken. The ways that colours attenuate in a picture does not correlate to any visual cognition model.

If, when we look at a Caucasian face in standard ecological cognition contexts, they appear “more yellow” and “more pale”? If the answer is no, and the discrete sampling “CAM” approach manifests this, then the model is broken. The ways that Caucasian skin is presented in a picture does not correlate to any visual cognition model.

The idea that a picture is present in open domain tristimulus is an a priori error of logic; the stimulus that we look at in a picture is formed and shaped by the mechanics at work in the picture formation chain. Everything from a well engineered per channel (EG: Harald Brendel’s / ARRI’s work), to an inverse 2.2 EOTF encoding, to more detailed efforts, creates something wholly new not present in the camera or render colourimetric triplets. Specifically, the happy accident of crosstalk from per channel mechanics of dyes and additive light are the thing we would be wise to be actively analyzing more deeply.

I would suggest folks take a critical look at the above questions. It comes down to an either-or scenario:

That a picture is nothing more than emulating appearances of stimuli. If this were even remotely correct, we should expect an intense green laser to appear “white”, or Caucasian skin to appear pale and toward the Tritan confusion line through achromatic in day to day visual cognition.
If we reject the idea that the above effects manifest in standard ecological cognition, and that we can see them manifesting in some model, the model cannot be behaving as an appearance model, as these effects do not occur as they do in pictures.

If any of these models were of any utility, one would think that the “simple” problem of achieving an appearance match between Display P3 and BT.709 pictures would be the perfect application of such.

Sadly, every single one of these self professed “appearance” models provide no such solutions to this “simple” problem either.

Thomas_Mansencal · May 11, 2023, 7:27pm

Of course there is, in a complete form certainly not. Perceptual uniform spaces enable better control over hues paths, one can saturate / desaturate a sky, a colourful light with a hue path that appears more linear. Spaces like IPT et al. have been designed to do exactly that, irrespective of what you think.

We are not using the CAM to render the picture…

Do you think I would be saying that without any data point? You weren’t even there at the time, how could you even know?

<div class="username">troy_s</div><div class="time">2017-01-15 19:20</div><div class="msg">has joined #general</div>

This leads me to something more interesting to discuss about: Given you seem to know better than everybody else, how about, for once, you formally propose your solution to render pictures so that the group can evaluate it?

Is that it?

Cheers,

Thomas

ChrisBrejon · May 11, 2023, 9:33pm

So we agree it is complex. Cool.

To be exact, our comment was : hue skews should be either removed or properly engineered and not be an accidental result.

Well there was a slide around September 2021 stating that “ACES is science” to introduce the CAMs.

Maybe it would be worth thinking about why we lost “the author” on the way (and a few brilliant people). Maybe the gaslighting is not helping…

There was this approach from back in the day. Maybe there is more than meets the eye.

By looking at the #aces slack archive where you stated in 2016 :

Regarding the RRT, do you think it would be worth touching base with the Academy and let them know we don’t like it?

I have to say that it is a bit sad that any comment or critic towards ACES generates such petty debate. A shame really because I cannot help but think about the missed opportunity of this VWG.

Again thanks for the hard work and all,
Chris

Thomas_Mansencal · May 12, 2023, 12:11am

Sometimes you have to do complex things to get something. Wētā FX pipeline and processes are complex but that what empower us to do Avatar. Manuka is most certainly more complex than all the renderers out there, but there are reasons for that.

I don’t see a problem with stating that CAMs are science, they are part of advanced colorimetry so I’m not sure what you are trying to say. If it is that the science is incomplete, it certainly is and this is what makes that field exciting.

Regarding OpenDRT, I’m not sure again what you are talking about. The only thing I remember is Jed expressing that he was not feeling that it would be right to have OpenDRT licensed with a license compatible with that of the Academy given how much Daniele inspired its design thus he decided to adopt GPL3 instead. It is a decision that I respect. There is also no animosity in my statement. I don’t think anyone in the TAC or ACES leadership has any grudge against Jed, I certainly don’t.

WRT to your last paragraph, two things:

We did let know the Academy that we did not like the RRT, invited Alex and Scott and produced the RAT document which has been the basis of ACES 2.0.
You have not asked us if it was appropriate to quote conversations from the colour-science Slack workspace. The #aces channel has been made public only very recently and I never got agreement from all the participants for all the history to be public. Please never do that again.

Thomas

Troy_James_Sobotka · May 12, 2023, 3:16pm

Of course there is not. The elasticity of our visual cognition is perhaps deluding us? The reasons this is fundamentally impossible is because our visual cognition is a fields-first cognition loop. There are countless examples of the influence of fields, including quite a number of models that provide such a transducer stage, post fields analysis. Citations as per some of the names already offered, and others if anyone sees any use.

If one truly believes that a field-agnostic metric is of any service, one merely needs to look at examples as to how the spatiotemporal articulation is a primary stage in our visual cognition, cascading upwards and receiving feedback downwards in the reification of meaning process. I have found no better demonstration than those of Adelson’s Snakes.

Given we know from these demonstrations that the spatiotemporal articulation fields are incredibly low in the order stack, we can also hypothesize that the fields and visual cognition will shift with a shift in spatiotemporal dimensions.

Some conclusions one might draw from these demonstrations:

Visual cognition, such as the reification of lightness, clearly has a primary driver of field relationship in the reification process, as well as a bit of research suggesting that instability based on cognition is also present.
The idea of the transducer as well as amplification becomes apparent in the field relationships. Has implications for HDR, for example. IE: The R=G=B peak output of the last diamond set is often cognized as exceeding the tristimulus magnitude of the display in terms of lightness reification.

The last question is in relation to the following demonstration, posted by @priikone, which I believe is the ARRI Reveal system:

We should be able to predict that at a given quantisation level of the signal that we can induce the cognition of “edge” or “other”. The picture in this case, expressed in some unit of colourimetric-adjacent magnitudes, relative to the observer. Indeed it seems feasible that at some scales, the signal is discretized, and an aliasing may be more or less cognized:

Indeed we see a repeating patterned relationship with the classic Cornsweet, Craik, and O’Brien demonstration.

Observers of the picture may cognize:

A “left looks lighter than right” reification.
A “dip” immediately adjacent to the ramps, aka “Mach” band.

If we attempt to provide a metric to this, we might harness luminance metrics. While it can be stated that the transduction / amplification mechanism makes no singular luminous efficacy unit function feasible, for a high level analysis, it seems at least applicable.

At specific octaves, we are able to at least get a semblance of visual cognition reification probability at the “full screen” dimension. For example, at low and middling high frequencies, and assuming a simplistic calibration to something like a 12” diagonal at 18-24” viewing:

We can at least see some degree of hope of a practical utility to help guide our analysis of the signal that may aid in locating regions that cognitive scission / segmentation has a higher probability of occurring.

For the inquisitive, visual fields play a tremendous role in the reification of colour in our cognition, which the aforementioned fields-first frequency analysis can provide some insight and predictive capability. There is likely a direct line to this and some of the incredible complexity in the formed picture from Red XMas, for example. Discs in the following are equal tristimulus magnitudes, and follow the patterns outlined above in the transduction / amplification concept.

Broad conclusions:

Fields first thinking should be at the forefront of analysis.
A general consistency of the viewing field in terms of spatiotemporal dimensions is likely mandatory for evaluating “smoothness” of fields.

This is not what I think. I am a dilettante idiot buffoon that reads vastly wiser and experienced minds. I don’t have an original thought in my body.

It strikes me that the claims of such systems are bogus. That’s just my pure hack opinion. Folks are free to evaluate the evidence and believe what they want.

Not a single tristimulus triplet in terms of colourimetry as it exists in the open domain data buffer is ever presented in the form of a spatiotemporal articulation. Not a single one. If we were to apply some colourimetric measurement between the thing we are looking at (the picture / image), versus the colourimetric data in the EXRs, there are new samples formed.

The whole discussion of hue flights and the attenuation of chroma? That’s a byproduct of the crosstalk from the per channel model, not the higher level lofty idea of the model. It is an accident.

Maybe this is how you personally approach understanding. I do not. I’ve been openly saying forever that I don’t even understand what “tone” means, and I’ve tried to be diligent in exploring concepts and understanding without cleaving to the orthodoxy. So let me be clear:

I have no ### idea how visual cognition works.
I consider picture-texts a higher order of complexity above the basic ecological cognition of moving a body through space.

What I do believe, is that much of the iron fisted beliefs that orbit in some small circles do not afford any shred of veracity under scrutiny.

I have proposed what I have believed to be the best paths for a long while now; attempt to enumerate the rates of change and model them according to a specific metric that holds a connection to the ground truth in question. Try to hook the map (the metric) up to the territory (the specific thing attempted to be measured).

Curves, for example…

The basic mechanics of a curve in a per channel model is far from “simple”:

A curve does not hold a connection to reified lightness, yet it is analysed as such. It holds a direct link to a metric of luminance in exactly one edge case of R=G=B when applied on a channel by channel basis.
A curve adjusts purity in terms of rates of change, depending on the engineering of the three channels, in the output colourimetry.
A curve adjusts the rates of change of the flights of axial colourimetric angle.
A curve adjusts the intensity in a non-uniform manner, origin triplet depending.

Some questions I believe deserve due diligence:

When considering a triplet of “high purity”, how does the transformation to result relate to the curve rate of changes of the above three metrics? “Middling purity”? “Low purity?”
When considering the above three broad classes of “purities”, how do the above track in relation to the equal-energy case, and at what spatiotemporal frequencies?
Are there known methods that can be used to broadly analyse, predict, and estimate where visual discontinuities exist? Could they be leveraged to make predictions in relation to the given curves at given spatiotemporal frequencies?
In the case of negative lobed values, are there broad trends that can be used to coerce those values into the legitimate domain prior to picture formation? Are there “rules” that should be established here? Why?

Of course not.

The reason behind the link you posted is to try and get a handle on how the seeming simplicity of per-channel mechanics in forming pictures is actually incredibly complex. I have used the experiments and the basic mechanics to glean insight into the surprisingly complex interactions when projected as colourimetry, and tried to get a better understanding on how these rates of change interact with our picture cognition.

I think we all can, or should aspire, to be far better at kindling our understanding of pictures.

Thomas_Mansencal · May 12, 2023, 11:27pm

I beg to differ…

All those spatial effects are great and well known but:

How many pictures do you see everyday looking like those that you posted?
What makes you think that the mechanics for a complex visual field are exactly the same than with your simple examples? What you are showing here are extreme outliers in a standard distribution of pictures as we author them in the entertainment industry. The effects that you highlighted are mixed, averaged together and dependent on so many factors that it becomes extremely hard to point them out on a specific picture. Can you for example identify and show them with precision on the Blue Bar picture?
How do you apply the learnings to picture rendering? Which model should we be using?

Because we are pragmatic, need to move forward and because those critical questions won’t be solved anytime soon, the VWG is working with colour models that give better control over hue and some more.

Would the appearance of the Red Christmas Lights picture affected by a blue surround or a patch in its the centre, I’m certain that it would be. Would the appearance of an overexposed blue sky “desaturated” through that model be affected by a bright purple patch around it, of course if would be! Here is a photograph of yours of such a sky:

Thomas

PS: That last one was admittedly snarky, see it as fair game

Thomas_Mansencal · May 13, 2023, 2:27am

For historical purposes as I was replying and you deleted your post (which is quite an habit I must admit)

The proposition is not boolean, I was describing the opposite: Spatially induced effect have an infinite quantity of magnitudes. Those magnitudes form a standard distribution and it turns out that you actually picked the most outliers and extreme examples.

I then proceeded to take one of those and shown that perceptual uniformity is still a thing, even under the strongest spatial induction but you still dismiss it, which is quite baffling. No one with normal vision would say that the Oklab and IPT gradients look less perceptually uniform than the CIELab or HSV ones. Do overall their overall hues change because of the purple induction, yes they certainly do.

I asked you to highlight areas in the Blue Bar image where spatial induction has magnitudes similar to your examples. I’m genuinely curious if they can be identified with precision and what should be done with them.

Again, no one denies that spatio-temporal induced effects are not important but I (and plenty of others) have put a cross on modelling them years ago because it is the hardest problem in vision. The current models (or their extensions), i.e. iCAM06, Retinex, are not exactly successful either and introduce objectionable artefacts, e.g. haloing. I tend to leave this stuff to researchers while following their work very closely.

From a pure complexity standpoint, we are talking about easily order(s) of magnitude more code, so if the 50-60 lines of Hellwig et al. (2022) is “one of the most complex piece of software engineered by man. Ever”, well… hold my beer

.

Ultimately photographers, artists and colorists have always done a better work than any spatial-temporal model or algorithm.

This brings me those fond memories when local tonemapping operators halos were all rage: