Hi
Sorry this is my first post.
But I am interested in seeing if it is possible to make a scene-linear or even display-linear fine tuning of popular open source stable video diffusion model WAN 2.1
Training the model is straight forward, you provide a number of example video files and a number of labels describing the contents of the video.
Creating the description is simple there is a model called Janus Pro that when given an sRGB still from the video will create a text file describing the scene contents
The following ComfyUI custom node will facilitate doing this for a folder of .png files and make a similarly named .txt file for each of the images
So that is the labels sorted.
The training is simple to
follow this project
Which makes use of libavcodec from FFMEG to load videos
Lots of formats available
Given that .mov is available that means using I can use the ffv1 codec and produce 4:4:4 16bit per channel video which should have sufficient fidelity to encode a large gamut of colour values
So my current idea is as follows
To grab a bunch of scene linear renders from Unreal Engine and make a bunch of QuickTime movies
as follows
`ls ./*/img/0001.exr | awk -F “/” ‘{print “ffmpeg -color_trc linear -color_range full -color_primaries bt709 -colorspace rgb -i “$1”/”$2"/img/%04d.exr -codec:v ffv1 -pix_fmt yuv444p16le “$2”.mov"}’ | sh`
Using this dataset from Hugging Face
Now for my question:
Am I doing it right?
What would be the best way to do an A:B comparison of two sets of QuickTime files
One 8bit videos and one 16bit videos
Then show the 16 bit videos have better colour fidelity.
Also is there a bunch of HDR videos that I can use instead of Unreal Engine captures?
Thanks In Advance
Sam