Dear all ACES professionals:
I have already worked on the ACES framework for various shows, and my experiences have been very good. Thanks to ACES, It’s such a great gift for the VFX industry.
I recently had a question and unsure how to handle it properly.
I converted a dark sense camera raw ACES2065-1(AP0) to ACEScg(AP1) EXR by Resolve, and I got a huge amount of negative values appearing on the EXR image! These negative pixel values became a little bit of trouble during the compositing in Nuke.
Question 1: Is this normal?
Question 2: Any suggestion to better deal with these negative value pixels in ACES framework?
If any other resources related to this topic, please let me know.
Thanks all.
This is expected as ACEScg is smaller than ACES2065-1, it is also possible to have negative values even in ACES2065-1 as it is potentially smaller than the camera RGB space. In such situation there are two common practices:
Work is done in the camera RGB space.
Use the Reference Gamut Compression operator to bring the negative values inside ACEScg.
Thanks Thomas’s reply:
Can you give me more detail from your reply:
Work is done in the camera RGB space.
You mean setup the shooting camera in RGB space? Such as, Arri Alexa V3 LogC? Or???
Use the Reference Gamut Compression operator to bring the negative values inside ACEScg.
How to process this task? in Resolve or Nuke? Any chance to be more specific?
Thanks Daniele’s reply:
What do you mean “flare the image until your noise floor is all positive.”?
It’s at the shooting on set camera setting preparation? or it’s a post process in Resolve or Nuke?
@vfxjimmy don’t be distracted by @Troy_James_Sobotka 's “cognition” mumbo-jumbo. It’s just a distraction from the practical question you have asked.
I believe the question was why ACES2065-1 converted to ACEScg would produce previously unseen negative values. @Thomas_Mansencal 's explanation that AP1 is a smaller encoding space than AP0 is correct and would explain why negatives that were not seen in AP0 would appear when re-encoded as AP1.
Negative values that already existed in the ACES2065 encoding - either from being “outside” the encoding triangle (but maybe inside a camera encoding triangle) or from the noise floor being set to the average of the noise - will persist as negatives in AP1.
So if the negative values are “new” (aka not previously present in the AP0 encoding) then the Reference Gamut Compression is a valid way to compress values into the AP1 encoding. It is not the only way, but it is one existing tool that might prove helpful - depending on the exact situation. It will alter and “compress” your colorimetry by squeezing all values to fall within the AP1 endocing triangle. If the content does not need to maintain provenance to the camera capture colorimetry, then it is a useful tool if all downstream work will be using ACES outputs, as AP1 is the effective limit of the rendering transform.
Why not ask the authors of the actual formula? If memory serves, it was authored by @jedsmith, based off of a Baselight algorithm authored by @daniele?
The bottom line is that bending the stimuli prior to picture formation will potentially cause problems in the formed picture. Using an offset, as per Daniele’s suggestion, will maintain stimuli relationships with respect to the achromatic centroid.
We cognize the stimuli presented. There is no “scene” or other magical metadata in the chain. There is no “compression” relative to our cognition, and arbitrarily shimming in operations that profess to “compress” will have cascading implications on the presented stimuli of the picture.
I don’t want to appear to be speaking for @daniele, but my understanding of the thrust of his (excellent) video was that preservation of additive mixtures is beneficial not for anything related to cognition, but rather because many processes used in VFX (and some used in grading) assume that the image data has a linear relationship to scene light because it is modelling the behaviour of light. Introducing non-linearities (as the RGC does) may break this assumption. He does talk in the video about the option of applying non-linearities downstream of operations that benefit from the preservation of linearity.
Plenty of people are using the RGC without issues. It depends (as has been said elsewhere in this thread) on the nature and cause of your negative values. It also depends on what is happening to the pixels with negative values in compositing. There is no one perfect solution. Try the options that have been presented here, and see what works in your use case.
If the issue is that highly chromatic portion of the image are OOG because either the IDT or the transformation to AP1 does put them OOG, lifting / offsetting the image might really be problematic.
Think about a green or blue screen lit by narrow-band lights for example, compensating by offset for this scenario has a high potential for introducing inacceptable shifts on the image.
No one’s cognition “knows” what “out of gamut” means. The only thing that is processed is the stimuli presented in the formed pictorial depiction.
Applying arbitrary and misguided number manipulations to “fit” some arbitrary stimuli into some stimuli specified region is lacking a properly outlined problem domain. Doing so while ignoring the unexplored relationship will almost always lead to problems. I am reasonably sure the authors, @jedsmith and @daniele would verify this. EG: In cognitions of transparency, these relationships will often fall apart, as well as forming a troubling layer that creative folks like colourists are left to fix.
Offsetting, while not ideal, will maintain stimuli energy relationships to the achromatic centroid. The same cannot be said for the hack fix of the “RGC”.
In the end, new world primates crunch the stimuli as presented, and the totality of the computation is cognitive mumbo jumbo.
There is no meaning to any of the acronyms. There is only the stimuli as presented to the audience. We would all do well to consider the stimuli of the pictorial depiction, and avoid the non-answers of “‘negative this” or “OOG” that, as they are all utterly empty, vacuous, and meaningless terms relative to our visual cognition.
The typical image processing algorithm, e.g., keying, convolution, does not really enjoy negative values and produces results that “cognition” deems unsatisfactory when encountering them. People in the industry dealing with practical production problems are using the RGC among some of the other aforementioned techniques to great effect.
I think for VFX work like CGI integration and keying compositing, I think it is most sensible to tackl the problem like this:
flare the image to get rid of negative values in the noise floor
The negative values in the noise floor are created by a black cap subtract in the camera, which I think is the wrong thing to do aka you take a spatial measurement (integration of an area) and apply the result to each individual pixel. But this is a different topic to be discussed with camera manufacturer.
matrix back into camera RGB-ish space to get rid of negative values introduced by colourimetric profiling of the camera.
None Linear gamut compression can be applied further downstream of the chain, but I would not do it right at the beginning.
Indeed. That is why in the RGC documentation on VFX recommends not baking it into VFX pulls, leaving it to the compositor’s discretion to apply it at an appropriate stage in their node tree.
Flaring and matrixing to a different working space can be inverted, to revert pixels to their original values, downstream of operations which benefit from maintaining the linear additive mixtures @daniele describes in his video. If this reverts some pixels to have negative values which will be problematic for subsequent operations, non-linear operations (including but not limited to the RGC) may be useful to correct those, while leaving non-problematic pixel values unchanged.
As a colorist, the last time I had to 3x3 matrix some negative values (is this what is called offsetting here?) instead of RGC, was about 18 hours ago. Unfortunately, if the scene is like some party or a club with blue LED lights mixed with warmer colors, and if the required image is “filmic colors please”, RGC helps maybe in half of the cases. The only solution that works all the time is simple 3x3 matrix. Yes, it skews all colors, but it’s often better than visible sharp transitions in gradients. Or I just use both, RGC and then 3x3 matrix. Depending on how strong is the look LUT.
If I wouldn’t understand why there are negative values when dealing with wide gamut, I would say that the whole wide gamut thing is just pure evil that kills creativity by restricting what you can do with the image without breaking it. And it also kills a lot of time. And it is never useful and always constrained to rec709. Even in P3 delivery it rarely goes beyond Rec709 colors by aesthetic reasons and/or film emulation LUTs.
But unfortunately, I guess, there is no way to build a sensor, that somehow would work completely differently and not have these of out spectral locus colors and cost about the same.
This is very much what I would call best practices and is being done by plenty of studios. Some do not even convert back to Camera RGB: They do all their comp work in Camera RGB directly: No real advantage of using ACEScg at this stage of the VFX pipeline.
The solution is simple:
Don’t use the human reference observer as mezzanine observer.
Define a standard generic camera observer which all the cameras on average agree on. Then you move the mapping challenges to colourimetric land to the output stage (DRT), where I think it should belong.
We’ve watched at least half a century of folks shovelling numbers around and pretending that there is some “reason”. The issue is that the problem surface is ill defined. When folks say “gamut mapping” they might as well be saying “flying spaghetti monster”.
I stress, the issue is not in the stimuli metric. No amount of C0, C1, C2 continuity of stimuli metrics will reveal anything we need to without evaluating whether the metric we are using is correct.
We need to evaluate the cognitive implications of the stimuli relationships. No one is doing this. Instead, it has been turned into a peak scientism chase of what folks are pretending to be the “problem”.
The problem surface is cognitive; how do we read a formed picture?
Case in point, we should be examining how we cognitively arrive at probabilities of reading a pictorial depiction along transparency mechanisms. I believe this is the axis that also embodies pictorial exposure.