I’m studying the color science before learning Aces and I’m trying to better understand the difference between these two types of encodings: Log and gamma.
Coding with a power function (“gamma correction”), is a coding that is used to compress a wide dynamic range (photometric, linear) in a perceptually uniform way, saving information where it is needed, and throwing away what is not needed, to a particular type of screen, sRGB for consumer monitors, Rec709 for HD TVs and so on.
But there are also many professionals who say that this correction continues to be applied for a “production” matter, as when LCD monitors came to market, they mimicked the way CRT monitors worked to be compatible with capture systems. which encoded video signals with “1 / 2.2”.
Which of these is the main reason why today we code with a gamma function?
Personally, the answer I gave myself is that we still need gamma encoding because our screens can’t recreate the scene light, so we need to compress that gamma for a particular screen. So backwards compatibility or not, gamma encoding is mainly for this … am I wrong?
As for log encoding, this is not meant for a screen, but is meant to intelligently save a large amount of dynamic range with a “photographic” approach. So it is possible to take a 16bit linear signal (for example), and save it with 10-12 bits.
However I read graphs in which the rec709 encoding is compared with log curves such as C-log or S-log and I notice that the rec709 encoding is practically linear, with a greater slope than the log curves. But what does this mean in practice?
Let’s say we have a camera and the sensor of this camera can represent a dynamic range of 15 stops, before reaching full scale. If I choose to encode using rec709 will I have all my stops, or just the first ten (for example)?
If I choose to encode with C-log (for example) I will have all the stops, but since I have to be able to see it on a hd tv, I need to convert with a C-log2Rec709 LUT, and in this case it is not like shooting directly in Rec709 and lose some of my dynamic range?
Idiot here, so take all of the following with a massive grain of salt.
First, a word from the sRGB specification, standardized in 1999:
Historically, both the photographic and television industries claim integral use of the term “gamma” for different effects. Hurter and Driffield first used the term in the 1890s in describing the straight-line portion of the density versus log exposure curves that describe photographic sensitometry. The photographic sensitometry field has used several interrelated terms to describe similar effects, including gamma, slope, gradient, and contrast. Both Languimier in the 1910s and Oliver in the 1940s defined “gamma” for the television industry (and thus the computer graphics industry) as the exponential value in both simple and complex power functions that describe the relationship between gun voltage and intensity (or luminance). In fact, even within the television industry, there are multiple, conflicting definitions of “gamma”. These include differences in describing physical aspects (such as gun “gamma” and phosphor “gamma”). These also include differences in equations for the same physical aspect (there are currently at least three commonly used equations in the computer graphics industry to describe the relationship between gun voltage and intensity, all of which provide significantly different results). After significant insightful feedback from many industries, this standard has explicitly chosen to avoid the use of the term “gamma”. Furthermore, it appears that the usefulness of the term in unambiguous, constructive standard terminology is zero and its continued use is detrimental to consistent cross-reference between standards and unambiguous communication.
As you note further down in your own post, the colour component transfer function is a means of compression. That is, it is a means of compressing linear light data that exploits the psychophysical response of our perceptual systems.
Displays output linear light ratios. The encoding is nothing more than a means to save bits.
Not wrong in the larger sense, but wrong in this specific display referred instance. If we want 18% emission from a display, we simply encode it using the inverse of the output transfer function (EOTF). For example, 0.18^(1.0/2.2) would yield an encoded signal for a commodity sRGB-like display, that when decoded by the display hardware, would output… 18% of the total display emission! The output transfer function in most instances of commodity sRGB-like displays is a pure power function of 2.2, so the encoded value of 0.4545455 would be decoded in the hardware back to 0.18. The encoding to decoding chain forms a no-operation.
In the larger sense, there is indeed another nonlinear compression that is applied to larger radiometric ranges. This is a form of gamut mapping, but is typically poorly referred to as “tone mapping”. Which leads us to a larger answer; a generic power function can only ever yield values of 1.0 or below when the input value is 1.0 or below.
To encode a larger linear range of reflectance, as opposed to a normalized percentage, typically something closer to a normalized log encoding is used. Most follow a log-like curve with a linear section.
Watch the domains of the plots! On a log-log plot, a straight line indicates an exponential quantity. BT.709 is not a straight line on a linear linear plot, but rather a two part curve that can be approximated by a pure power function of ~1.9.
And there’s the original answer to your query, too. If we didn’t use them, we’d need a larger bandwidth pipe for imagery.
There’s a lot packed in there.
BT.709 and most camera encodings come packaged in an integer encoding typically. This means that the total range of the encoding is generated on a 0-100% range of values, or in decimals, 0.0-1.0.
BT.709 in this instance, despite being a scene referred camera encoding (OETF) in theory, has an issue when dealing with scene reflectance values above 1.0 in this case, because of the nature of the two part function and math. The C-Log on the other hand is capable of encoding a higher scene reflectance quantity at the 1.0 encoded value position.
To “view on a TV” though is another matter. It isn’t simply enough to dump the linear light directly to a particular display (after encoding via an inverse EOTF) as the rough “sensation” of that light would be quite different depending on contextual factors such as the viewing surround. For proper output from a display, an assumption of a “dark” surround is made, and a further bit of “contrast” must be added to the otherwise display linear values.
So the TL;DR on that last one is tricky because there are quite a few layers moving around:
The BT.709 encoding transfer function on normalized integer encoding ranges can’t represent anything larger than 1.0 / 100%, which is approximately 2.47 EV over 18% middle grey. Camera log encodings negotiate this via their specific transfer functions that may also address specific hardware facets.
There is a “system” level “contrast” that needs to be tacked onto the output chain to accommodate particular surrounds, which in BT.1886’s case is assumed “dark”.
There is potentially an aesthetic contrast transfer function, subject to image crafting whims.