One of the issues the CLF VWG is wrestling with is if/how to put a tolerance on processing accuracy. This came up during the meeting today and I wanted to start a thread to solicit feedback from people. To start it off, I’ll begin with the easier part of the problem which is how to put a tolerance on something that is not perceptually uniform.
TOPIC 1: How to calculate a tolerance
As you know, tolerances are easier when one is comparing integer-based systems where the encoded values are perceptually uniform quantities such as video or log color spaces. But now we have a LUT format that is designed to handle floating-point values and work with linear color spaces such as ACES2065-1 and so we need to put a bit more thought into how differences are evaluated.
This is a similar problem to comparing floating-point numbers for near equality and has a standard solution which is to measure the difference in terms of “units of least precision” or ULPs.
When comparing floats, it’s generally a bad idea to do absolute comparisons like “(A - B) < tolerance” because the tolerance for values near one thousand is probably much larger than for values near one thousandth.
The basic idea behind ULP-based comparison is that if you reinterpret the float bit pattern as an unsigned int, it takes you from a linear scale to roughly a logarithmic (and more perceptually uniform scale) where a fixed tolerance is more meaningful.
For example, each of these pairs of floats is 128 half-float ULPs apart:
0.00390625 and 0.00439453125
1.0 and 1.125
256.0 and 288.0
A similar idea is to do a relative comparison. So rather than putting a tolerance on (A - B), put a tolerance on (A - B) / A.
The problem that both ULP-based and relative comparisons run into is that they tend to over-predict the amount of difference as the numbers approach zero. In the limit, when A == 0.0 and B is almost zero, the relative difference is infinite. ULP based compares are somewhat better due to the presence of the denormalized encoding near 0, but still have a similar problem.
One solution is to transition from a relative compare for most numbers to an absolute comparison for very small numbers. So basically, instead of (A - B) / A, the comparison is based on (A - B) / max(A, minA), where minA is a threshold that prevents the result from becoming too large. This could be called a “safe-guarded relative comparison.”
The problem is that a good choice of minA is quite specific to what those floating-point numbers represent and what sort of tolerances are expected.
The unit tests in OpenColorIO do lots of different types of comparison, based on the details of what is being compared, including absolute, relative, safe-guarded relative, and ULP-based comparison. For the purposes of CLF implementation testing, I will recommend safe-guarded relative comparison. OCIO has code for this in a function called EqualWithSafeRelError in UnitTestUtils.h.
The other aspect of how to calculate the tolerance is to decide what test target image to use and what CLF files to test with, but those are separate topics of discussion and work is already in progress on those.
TOPIC 2: Should there be a tolerance?
So that was the easy part and I’m confident that we could come up with a reasonable way of measuring processing accuracy. The harder part is deciding if we want to impose a tolerance on implementations, and if so, what the tolerance should be. Also, as Josh pointed out during the meeting today, we want to keep in mind what is feasible for various types of products.
If we do want to define tolerances that respect a range of capabilities, one solution would be to have several performance tiers/levels. For example, one for products used on set and another for those used in a DI suite.
Just wanted to open this topic for discussion on the forum since not everyone is able to attend the working group meetings and I imagine this might be a topic of wider interest.
CLF Implementation VWG chair