Framing metadata

Hi all,

We decided to include framing metadata as an optional element in AMF, but its form still needs to be decided upon. Thanks to Josh Pines and Walter A for both providing proposals on how this could look.

For discussion on the call tomorrow, here is an example framing setup from ARRI Frame Line Tool (Alexa LF Open Gate 4.5K, framing for UHD center crop) and proposed representations of this in XML form.

Proposal #1 - ‘coordinates’:

    <inputFrame>0 0 4447 3095</inputFrame>
    <extractionArea>304 468 4144 2628</extractionArea> 

Proposal #2 - ‘size with center offset’:

    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <centerOffset>0 0</centerOffset>

Proposal #3 - ‘size with coordinate offset’:

    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>

I think I did that right.

Generally I am a fan of #2 or 3. The benefit of #2 is that it is straightforward when it is a center crop, which in my experience is most often the case. The benefit of #3 over #2 is that you would know exactly where to crop in the event that the width is an odd number, and the ‘center’ is in the middle of a pixel. The ‘origin’ method of #3 solves that problem.

#1 is the most absolute, and closest to the EXR header attributes, but is less human readable (to me) and requires 8 numbers rather than 6.

ARRI and RED seem to store frame line info similar to #3
(@joseph and @Graeme_Nattress, please correct me if I’m wrong)

PDF, EXR, and Nuke seem to use something closer to #1
(proposed originally by @walter.arrighetti, please confirm I represented it correctly)

(@peterpostma, @rohitg, @brunomunger, any opinions from dailies / color software side?)

Any of these could be made proportional, rather than absolute,
so #3 above would turn into:

    <inputFrame>4448 3096</inputFrame>
    <extractionArea>0.86330935 0.69767442</extractionArea>
    <originTopLeft>0.06834532 0.15116279</originTopLeft>

I think we would also want to include the ‘final frame’ or ‘final resolution’ i.e. 1920x1080 for dailies - but wanted to get opinions on preference out of the above proposals.

Please let us know, or if you can make the call tomorrow at 9am, speak to you then.


cc: @Alexander_Forsythe

Thanks Chris,
I assume proposal #3 came from me, so maybe it goes without saying that I’m in favor of that one :slight_smile: But I personally like that one considering it checks all scenarios I have came across, regardless of how likely they are to come up. The other thing I would wonder is if it’s possible to actually label it:

    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>

Therefore we can track multiple frames. It is very common that we have two different frames we’re delivering during the dailies process. Or 2 different frames we’re having VFX track, which is the intended frame, versus the extended area they deliver back for finishing.

As an example, maybe the frame is 2.39, but we want to deliver 1.78 to editorial during the dailies pass, but PIX gets 2.39. Editorial then adds their own matte in the Avid.

The ultimate goal in my mind for this would be to:

  • Create frame lines in camera
  • Within Colorfront, Daylight, Etc, just click a button for Frame1 and it would apply the framing. Click a button for frame2, and it applies that framing. You could then create your software presets based on these, or just use the in camera frame lines as your framing option.
  • Then when we render EXR’s for VFX pulls, we place this information (somewhere?) in the EXR extended attributes for the VFX vendors to be able to then click a button in Nuke to re-apply said framing.
  • Same would go for stereo pulls for features

There is a lot to think about with this, but I think it’s important to not forget why we’re asking for this. And that reason to me is to no longer have post production departments all creating slightly different framing presets because of un-pixel accurate framing charts shot on set. But I assume I’m preaching to the choir on that one!

The other thing we may want to think about documenting for software vendors is what to do when the extraction area is wider than the frame of your project. Do they auto matte it? I would think yes, but it’s something that would need to be defined for them.

for what I can say I def also agree on #3 as the best approach.
Also I’m with Jesse on this one: I think we should be able to include multiple frame extraction pipelines in the AMF (although this is a conversation that goes along with the multiple color pipeline inside a AMF and - if the decision is not to proceed with multiple color pipeline, then probably the idea of have multiple AMF for each intended pipeline should also apply for the framing extractions).
However I agree with the goals that Jesse is aiming to: ultimately I think that the idea is that an AMF can be used to set automatically pipelines avoiding the need to always double and triple check the turnover of each vendor involved in a project. And that’s why is so important that not only color is tracked, but also frames extractions.

As Chris is saying, I also second that the extraction process should also include a scaling node (and blanking?), so one can scale both H and V at the same amount to upscale or downscale a picture keeping a consistent aspect ratio with the original source material, or use a different ratio to apply a desqueeze (of any nature) to the image.
For instance:
For a 1080p dailies process with a 2.39:1 blanking on

    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
    <scaling>1920 1080</scaling>
    <blanking>0 138 0 138</blanking>

Or a 2x anamorphic desqueeze from a 2.8K 4:3 Alexa sequence for a 2K DCI DI timeline would be like

    <inputFrame>2880 2160</inputFrame>
    <extractionArea>2570 2160</extractionArea>
    <originTopLeft>155 0</originTopLeft>
    <scaling>2048 858</scaling>
    <blanking>0 0 0 0</blanking>

I think that the order of the events in the xml should also represent the order on which each process happens (the way how I originally thought of it was to do something like the LMT stack position, but maybe just the order in the node will do just fine. Obviously the order of operation is essential to obtain the right results.
I also thought that we should include a comment on what scaling algorithm it is recommended to be used, although I would not go into trying to make that mandatory or actually design it in a way that should actually produce any affect (it’s more an instruction between DI houses and VFX to say things like " we gonna do this with a Catmull-Rom algorithm, see if you can do the same!")

Hope it makes sense.

Hi Chris.
Yes, proposal #1 is exactly mine; and yes: it’s like PDF, EXR and ARRIRAW v2 behave.

Personally I think that almost whatever choice is good, I’m really agnostic as to which solution is picked up in the end. All in all:

  • my proposal (#1) really includes all frame-mapping cases: i.e. scaling (+blanking) and H/V reflections (which are relevant to Stereo-3D and a few VFX shots), although it’s less human-readable;
  • proposals #2 and #3 may be easily extended to support reflections if negative values are allowed – but still no scaling;

what I would tend to discourange, instead, are:

  • adding scaling and blanking in separate XML elements adds complexity and ambiguity: the same framing can be described by several metadata combinations;
  • the soluton with float numbers, because that may lead to ambiguous renders at frame edges.

As regards adding multiple framing info, I believe we all agree that follows the same road than having multiple color pipelines: it’s reasonable that each pipeline also bears framing info along with it.

This is not the right topic but allow me to say that a wise and smart dosage of multi-pipelines and history elements (even if they are optional), allows to rebuild the imaging intent by DoP: with history, one also retrieves original formats, color encodings and framing – all help in recreating the wider perspective over a piece of footage.

As others have said, any would do as they are all a simple mathematical transform from one another. I would pick #2 if it were up to me, simply because it is instantly human readable in terms of the size of the extracted frame and “is it centred?”

Thank you all for your input.

On the call, we decided to proceed with #3. It has the readability of #2 but solves for the rare case when an odd number of pixels is involved. We also decided to stay with absolute pixel counts, rather than proportional / ratio values, to reduce the chance of rounding errors. We also decided to not add additional elements for output scaling / blanking, since this adds unnecessary complexity outside of the most important part: “what is the active image area to view?”.

We will be updating the schema before the next meeting, so if anyone has any strong objections please voice them now!


One remaining question: should we have a desqueeze ratio? (i.e. 2.0 for anamorphic). I’m thinking that, since we are not adding output scaling / blanking, the desqueeze ratio would be necessary for an anamorphic extraction to be fully automated. Open to thoughts here!


Hey Chirs,
Desqueeze ratio is definitely useful, but I think it should go along with scaling.
If you decide to put it without scaling (I do appreciate that there is an use for it even if scaling is not involved), please make sure that the number has three decimal points as there are certain squeezing factors that are not really conventional and sometimes one can require more precision.

Hope it helps

I agree with Francesco with both points.
Squeezing is important l, but it should be there only if scaling is: I hope no one will ever hand-type an AMF.

As the three decimal points I agree and all-in on that: let’s define them as a ratio, as it’s done already for noninteger frame rates – numerator and denominator. So no “rounding” or higher level cose processing of it is requited.

So 1.66 really is 5 : 6.

Hey Walter and Francesco,
I am curious why it is important to keep a scaling attribute along side the de-squeeze? I have never had to think about what I am scaling to, when de-squeezing an image, so I assume I’m missing something.

As an example, when receiving 2:1 Anamorphic media from set, in the lab, I apply the 2:1, or 1.33:1, but regardless of whether I am rendering out HD files for editorial, or 720 files for PIX…Or even smaller files for a different screener deliverable, or a random resolution for VFX, the de-squeeze never changes.

And the same would go for my team in finishing. When we conform, these two values are very independent of one another.

I guess I’ve just never heard of a de-squeeze changing in any way due to the scaling you’re applying to your output. Maybe I’m missing something though? If not, I would wonder how often we will actually get this de-squeeze value from set. The camera may not always know there is an amamorphic lens on it and we may not always have smart lenses, so I think we need to assume this field will sometimes be blank. I agree that it would be good to have it in there though.

Therefore: (Sorry, I can’t remember how you wrote in the frame ID #'s during the meeting, so those are missing below, but essentially:)

    <inputFrame>2880 2160</inputFrame>
    <extractionArea>2570 2160</extractionArea>
    <originTopLeft>155 0</originTopLeft>
    <deSqueeze>2048 858</deSqueeze>

Hi Jesse.
Here’s my three cents.

  • First point: the anamorphic factor, as far as it is proposed to be represented here (i.e. as a single number rather than detailed optical parameters of a cylindrical lens), is nothing more than a vertical scaling (squeezing) factor; so why neglect horizontal scaling?
  • Second point: generally speaking, all you might be ever concerned with in transporting framing metadata along with AMF is squeezing. However, to some advanced image evaluation workflows, you might want to preserve, along with the AMF history, the output resolution of what was exactly viewed. For exampke, in a compositing or QC perspective, you want to preserve that a Scope central extraction of a specific area was made, but also that this 2.39:1 frame was viewed in a sandboxed HD monitor (in Rec.709), rather than on something else (e.g. a 8K Rec.2020 reference monitor, where 1-by-1 pixel was viewed without scaling). In such a case, full (i.e. both horizonal and vertical) scaling factor may be relevant in AMF.
  • Third point: neither anamorphic factor nor scaling are, strictly speaking, color-related metadata; as they may both be considered framing parameters, they should be either both-in or both-out of AMF.

What I’m saying is that squeezing factor is of course fundamental (mostly for what you said at the end: that’s not always included in all raw camera footage metada because of lack of smart lenses).
From imaging science perspective, I propose to call squeezing factor as <verticalScaling> and have both it and <horizontalScaling> as an independently optional couple of parameters. So squeezing-only or full scaling can be both used, according to different needs.

I question whether a single stage scale+offset is sufficient.

The format choice is fine, and exact integers is important for QC as well.

But in the general model of processing including film scans (which are still relevant in restoration)
there is the Image Capture, the Extraction of the Working Image -perhaps with extra margin- (Bayer pattern requirements), possible desqueezing as noted above, and then extraction of the final work product frame.

Sometimes there are extractions of TWO different output ratios and they may be shifted (top-line or center-line as examples both a 1.85 and a 1.78 (TV) are sometimes extracted when a larger canvas is available. VFX would have been instructed to work in the larger ‘protection format’ that the DP shot. (protect for 1.78 with a 1.85 cinema release.)

If these are not carried in a chain in a IMF, then point to point files for each of these steps is needed
and have to be managed for the history of the transforms applied. Some of these transforms are unique to a small number of frames, so you will still get some splitting of the AMF over time. I think
part of the goal is to recreate from ‘camera’ original the image transforms that get you to the current result both in color and size? True?

In general, it is far better to have exact IN and OUT pixel integers than to apply an approximate ratio that is never really correct. Sometimes they are cheated because the ratio is not important, the output deliverable is.


I’ve updated the XSD based on the discussion so far.

And the example

As another reason not to use ‘scaling factors’ but go right to the deliverable raster…
the aspect ratio for DCI Cinema is NOT 2.39 it is 2048x858 (or a 2.386946…:1)
Exact integer line placement is important.

Also for TV, it is not 1.78:1 but rather 1.77777777777…:1 (1920x1080)

All ratios you have heard about are non-deterministic and sometimes choices have to be made
about fitting in that are not standard… easy with integers, hard with ratios. Another example, some old films have to shown at 2.4 which was the actual projection ratio, at 2.39 there might be a splice appearing.

So for the same reason we should not use float positions, we should not use float scalers.

BTW, I am not talking about the centering or offsetting to get a frame in the middle for letter boxing or pillar boxing, or for odd formats or even for pan and scan, I am just referring to creating the working active image. The origin is ALWAYS going to be relative to the inputFrame size.

I think that the should always be able to describe what the flat image size is even if you are working on an anamorphic working frame.


Hi @JesseKorosi ,

sorry for the late reply, it has been a busy week.

I think both @walter.arrighetti and @jim_houston are better describing and clarifying my points. I like the <verticalScaling> and the <horizontalScaling> idea a lot, I think solves both unusual squeezing problems and pixel accurate scaling.

However, I don’t think it’s enough still. Please don’t hate me if I try once more (and one last time, I promise) to express my doubts.

Having @jim_houston mentioning “active image area” on his last email, gives me the chance to argue again on the need of two additional (optional) nodes to add to the framing tree.

My argument is mainly based on trying to clarify (mostly to myself) what is the aim of this framing metadata and what we refer to when we talk about extraction. I might be stating the obvious here and I’m sure you will all have considered the following points and I’m just overkilling this topic but, for the sake of trying to clarify what I have in mind, I’ll go further.

I think it’s very important to make a difference between how an extraction guide is used on set and how it is used in post.

I think we all agree that extraction guide lines designed for on-set purposes want to specify what the operator has to frame for on-set and not more., especially if the we aim to be able to translate those numbers into a camera-compatible frame-line file. In 99.99% of the cases operators only want to see what they need to frame for. Too many lines in the viewfinder make their life impossible, it distracts them.

The post-production framing, on the other hand, in my personal experience, specifies what the working image area should be, or -in other words- how a given frame should be processed to fit the current post-production stage. Most of the time (I would say 80% of the time) the extraction isn’t the target frame, but instead is how the image needs to be cropped and adjusted to, from which the target area is obtained after and within it. In other words, in post production we never crop for the target area, but rather for a larger working area.

I know that different AMF will be generated for different stages of production, therefore the concept of extraction can vary from set to post from what is the target area to what needs to be pulled for VFX or DI, but still I think that one single instruction to define a frame and a working area isn’t enough. I strongly believe there is a need of a multi-layer system.

If this framing metadata wants to automate post-workflows I think it needs to account for these instructions:






We got the first three nailed down, allow me to argue that we need the forth one to make the whole thing working and possibly the fifth one to account for every scenario I ever had to deal with.

  • TARGET AREA: I previously referred to it as “blanking” instruction, but after reading Jim and Walter’s post I think we could refer to it as “target area”. Conceptually they are two different approaches to get to the same result: once an image has been cropped and scaled, we use this instruction to tell the software what portion of the frame matches with what has been framed on set. Implementations will then leave the software/user to decide what to do with it (blank it, make a reference line, create a different canvas). This is what will be used on set to calculate the frame lines as well.

  • ACTIVE AREA would mostly be used for VFX when the workflow requires the vendor to receive and deliver back to post an image different (most of the time bigger) than the rendered GC area. To elaborate: what happens 99% of the times on my projects is that we have to account for a 5/10% extra area outside the target area to allow post production to stabilise, reframe, make a 3D release, make an IMAX release and so on. For these reasons, VFX CG needs to be rendered outside the main target frame, so that once VFX pulls go back to DI, VFX shots will have the same extra room of drama shots and the CG has been rendered to account for all needs, like different releases (ie IMAX) or further post-production stages (ie 3D post conversion). You don’t want to have to render twice, right?

I reckon that by adding those two extra nodes we could really account for every need both on set and in post.

I have a bunch of projects I can mention and provide documentation for it is required. But I’m sure you all know what I’m talking about here…

Sorry for the long email.


Thank you all for your comments.

I’m in favor of adding Vertical & Horizontal scale factors to indicate anamorphic squeezing (most that I have seen seem to use a float, i.e. 2.0 for traditional anamorphic, 1.3 for hawk lenses, 1.0 for spherical)

And while I agree that Target Area would make this a more complete snapshot of what resize was done, I am not in favor of adding Target Area for a couple reasons:

  • You will have so many different target areas depending on the specific use case and software configuration. To name a few, HD 1920x1080 for dailies, UHD 3840x2160 for reviewing on consumer TV, DCI 2K or 4K a projector, etc. The software in which you do these resizes all have different methodologies of where you set your output resolution, the order of operations, etc. Even if we specify target area, most likely it would get overridden by ‘what output res the user is set to’ in the specific scenario.

  • What these all have in common, though, is the extraction area, which is what we are trying to communicate with this metadata. It should be possible to pass a single AMF between all of these places and get the same active image to be fit within the target area of choice within the software.

So I guess my questions is, with target area, are we trying to solve something outside the scope of ‘this is the area of the image to view’?

Hello Chris,
fair enough if we don’t want to over complicate the framing metadata node.

I just don’t quite understand what the expectations are from the extraction node, as to me it seems to be quite a hybrid concept now (especially after adding the V+H scaling): are we expecting from it to allow a software to pre-process a source frame so it can be transformed to fall into specific pipeline requirements or simply to provide an instruction of some sort that users will have to visualise a specific region of the source frame?

The latter only requires the extraction node then (one could argue that not even the scaling is needed) as its function will basically match that of the camera framelines on set. If so, then I understand what was your point initially and I do agree, we’ll be fine with extraction+offset.

If, on the other hand, we hope that this set of metadata will automate some steps in post, a bit like the metadata contained into the .mxf Alexa Mini files that allows to crop files recorded with a 4:3 2.8K flag to the right extraction (2880x2160), instead of leaving them Open Gate (which is what the camera really records), or like it happens with all cameras when shooting with an “anamorphic” flag (Arri, Red, Sony, all do the same) which instructs the software to automatically desqueeze the input frame accordingly to the right ratio. If that’s the aim then I have to reiterate my points and say that I believe we need the four (five, maybe) nodes as I was trying to explain on my post above, to make sure that things are instructed properly for each stage of production and post.

To go back to your points, the way I see it, the Target Area is not going to be affected by the specific use cases as it communicates exclusively the area framed on set by the Camera operators. In fact, the way how I see it, nothing really a part from the output scaling (the output resize, which we are currently not even considering here) drastically changes from use case to use case, except when there are specific needs in place, especially at the far end of the post production (like the ones quoted by Walter and Jim, ie center crop, pan and scan, QC, etc.).


  • INPUT FRAME and EXTRACTION as mandatory fields that will be related to how the source frame is meant to be extracted (cropped).
  • SCALING as optional third element of the extraction pipeline which is relative to the extraction output and tells how the cropped frame needs to be up/down scaled.
  • TARGET and ACTIVE are optional instructions aimed to the software that needs to work on the extracted and scaled images, to workout the canvas size (ACTIVE AREA) and the on-set intended framed area (TARGET AREA). The results of those two instructions won’t directly affect/transform the image, but only how the software will show it to the user, or on what it will allow the user to work on.

I will try to make some examples of how the framing metadata section would look like for me using some real-life examples of extraction guidelines designed on different shows over the years.
To put things in context, however, I would also like to pick a show and share all the framelines we had to design for Aladdin (Disney - 2019), which was a multi camera, multi lenses, multi format show that had complex needs. You can download them from here (Dropbox Link):

Let me consider two, quite standard, use cases:

  1. Multiple target frames, with extraction crop due to lens coverage and extra room for VFX area (from LIFE- Sony).

Put into context: Shooting Open Gate, lenses chosen by the DP won’t cover it (no surprises there, right?), hence all target frames needed to be calculated from an area that lenses would cover. We would normally just scale the input source to the desired VFX resolution, without cropping, but there was a problem: Sony required a 4K master, VFX costs demanded to keep their bits in 2K, DI (Tech) and DP wanted to keep the higher possible resolution. We managed to all agree on a 3.2K pipeline as it matched a full 35mm film gate, so lenses would cover, and it was the max allowed without incurring into extra budget for the VFX renders. This way both drama and VFX could be done at the same resolution and DI would be able to scale to 4K or 2K with better results.
Also, because the project required a 3D post conversion and an IMAX release, there was need of some extra room to allow all that fun too.

This would be the AMF:

<framing extractionName="AM_OG">
	<inputFrame>3424 2202</inputFrame>
		<extractionArea>3280 1844</extractionArea>
		<extractionOriginTopLeft>72 179</extractionOriginTopLeft>
	<activeArea activeAreaName="VFX">
		<activeAreaSize>3280 1728</activeAreaSize>
		<activeAreaOriginTopLeft>0 58</activeAreaOriginTopLeft>
	<target targetName="IMAX">
		<targetArea>3116 1642</targetArea>
		<targetOriginTopLeft>82 101</targetOriginTopLeft>
	<target targetName="2-39">
		<targetArea>3116 1306</targetArea>
		<targetOriginTopLeft>82 269</targetOriginTopLeft>

The same AMF would work for VFX pulls and dramas, but not for dailies. Since for AVID we normally crop the east-west edges of the frame to the target frame and then we leave the north-south to fill the 1.78:1 container, when it is possible. So, the version of the above AMF for dailies would look like:

<framing extractionName="AM_OG_dailies">
	<inputFrame>3424 2202</inputFrame>
		<extractionArea>3116 1752</extractionArea>
		<extractionOriginTopLeft>154 225</extractionOriginTopLeft>
	<target targetName="IMAX">
		<targetArea>3116 1642</targetArea>
		<targetOriginTopLeft>0 55</targetOriginTopLeft>
	<target targetName="2-39">
		<targetArea>3116 1306</targetArea>
		<targetOriginTopLeft>0 223</targetOriginTopLeft>

This should cover most of the needs for this project.

  1. Anamorphic 2x squeeze, with extra room for VFX area (from Aladdin)

The AMF would be:

<framing extractionName="ASXT_4-3_2x">
	<inputFrame>2880 2160</inputFrame>
		<extractionArea>2880 2160</extractionArea>
		<extractionOriginTopLeft>0 0</extractionOriginTopLeft>
	<activeArea activeAreaName="VFX">
		<activeAreaSize>2578 2160</activeAreaSize>
		<activeAreaOriginTopLeft>151 0</activeAreaOriginTopLeft>
	<target targetName="2-39-desqueezed">
		<targetArea>2450 2052</targetArea>
		<targetOriginTopLeft>215 54</targetOriginTopLeft>

This time VFX gets the full gate, as well as DI, so no crops required for post. Once again though, we need to extract differently for dailies:

<framing extractionName="ASXT_4-3_2x_dailies">
	<inputFrame>2880 2160</inputFrame>
		<extractionArea>2450 2160</extractionArea>
		<extractionOriginTopLeft>215 0</extractionOriginTopLeft>
	<target targetName="2-39-desqueezed">
		<targetArea>2450 2052</targetArea>
		<targetOriginTopLeft>0 54</targetOriginTopLeft>

This would cover most of the needs for this show.

The general idea is that the frame gets pre-processed by the software following the extraction node, then the active and target areas could only be a visual reference or, for some software implementations, become useful to set up the project/timeline or double check that the existing timeline/canvas size matches to what is required. I’m guessing, for example, that AVID would be able to transform the target frame instructions into a blanking filter for the timeline, as the editors now manually set up. Davinci could do the same (with the blanking).

I understand and now agree that we don’t want to over-complicate things with output scaling resolutions and leave the softwares to adapt these framing numbers to the desired resolution using their own methods, but if we add those additional nodes we could at least allow for every useful information to be carried through and properly communicated to each vendor.

As I’m writing this I’m also realising that the target area should really be put down as a relative number, like ARRI does in the xml framelines, so that the pixel count can be calculated by the software after the internal scaling (if you scale a source frame of 2880 px to1080 px then the target area numbers won’t mean much unless they get scaled as well, but maybe it’s easier if they get written down as relative instruction in the first place).

I know I’m insisting a lot here, but I just wanted to make sure that my points were clear.
I’m not going to bring this up again if you guys think I’m overthinking it.

As usual, sorry for the thousands words.


Thanks for the detailed message here! I think really getting down to real world scenarios is what will set this up for success, so this was great.

For the dailies example, you have the extraction area. So this gets you down into the 1.78 frame they want delivered for editorial. Therefore in Colorfront as an example, this is what we’d actually choose as our framing/render preset. But then is the idea that the ‘target’ is just metadata that is there should they now want to apply that 2.39 matte back on in Avid? I think if we’re really trying to pair things down, this could be something out of scope. Definitely nice, but not necissarily a must in my opinion.

What if the AMF looked like this:

With this example, there will be a lot of jobs that just have 1 or 2 framing options within, but some, like your job, would have many more. But at this point, its all in one self contained file, and still pretty easily manageable.

And then for our ASC Advanced Data Management Committee looking to also get framing data for non ACES jobs, we put these into columns (Like the CDL):
Width: 4448
Height: 3096
extraction001: (3280 1844)(72 179)(1.0 1.0)
extraction002: (3280 1728)(0 58)(1.0 1.0)
extraction003: (3116 1642)(82 101)(1.0 1.0)
extraction004: (3116 1306)(82 269)(1.0 1.0)

Hey Jesse,
thanks for the feedback!

For the dailies example, you have the extraction area. So this gets you down into the 1.78 frame they want delivered for editorial. Therefore in Colorfront as an example, this is what we’d actually choose as our framing/render preset. But then is the idea that the ‘target’ is just metadata that is there should they now want to apply that 2.39 matte back on in Avid? I think if we’re really trying to pair things down, this could be something out of scope. Definitely nice, but not necissarily a must in my opinion.

Yes, that’s exactly the idea.
I’m sorry if it sounds out of scope, as I said I’m struggling to understand the exact scope of the whole framing metadata, if it is not to try to automate these things.

Your proposed structure would work too for me, absolutely.

I have, however, some comments/concerns on it:

  1. The problem I see with leaving the Extraction node only (without the Target and Active Area nodes) are:
  • If the Extraction metadata only needs to express what has been framed on set, then it cannot be used to express an extraction (crop). If you don’t instruct an extraction (crop) then you leave to the users the hardest part of the work: cropping and scaling a frame to specs… see two points below what I think about this.
  • If the Extraction metadata could express both “what has been framed on set” and “the crop” required to adjust an input frame to a specific need, then the scope/intent of that metadata will change from use case to use case, even within the same workflow, making the whole thing a bit confusing. For instance: on set, it would communicate what you are meant to frame; for dailies or post it would be how the frame needs to be cropped before processing it, for VFX it would go back to express what was framed on set, etc. Just back to my question: what are we trying to achieve here?
  • If the Extraction only needs to express how to prepare/crop/adjust an input frame (as I propose), by not adding the Target Area you rely on the user/vendor to do so correctly as per status quo. Hence we are back into the same old conundrum of having to trust, check and verify, everything and every time.

I’m not saying that my proposed structure would fix that problem once and for all, but it would definitely be a step towards the semi-complete framing extraction automation by actually carrying all the required data in one package. I’m not really too worried about the AVID blanking (although I am to a point), I’m more worried about VFX vendors (last job we had 14 of them, from around the world, and I’m sure you know how it’s a PAINFUL job having to check pixel accurate turnovers for each delivery coming from each vendor, every time)

  1. My proposal can be condensed into a single AMF as well, just by adding multiple Framing trees, that’s why I added an “extractionName=” attribute, so they can be distinguished from each other and co-exist.
  2. Your proposal will be hard to be translated into a camera-compatible frameline instruction (let’s use the Alexa XMLs as an example), when multiple framing targets are required (all the time for me). I mean, you can, but the user will theoretically have to append multiple extractions together manually. On my proposal, because there are multiple Target Areas allowed within a single Extraction node, these could be translated into multi-targets camera compatible framelines (this would work for Red too using their “absolute values”).

Hope this makes sense.



Currently, I have to create pixel accurate framing charts for any of our high end jobs we do at Sim. As you know, the cameras framing charts are never pixel accurate. They are a good reference, but that’s it. We do more TV than features here, so it’s often less about many framing options for each camera, but plenty of cameras to deal with. So for one of the new Netflix shows I’m on right now as an example, which has actually been pretty light on various cameras, I’ve created a chart for the:

  • AlexaMini_OpenGate_3424x2202
  • Phantom4K
  • SonyVenice4K

I create these diagrams, people create their framing presets knowing my chart is pixel accurate, then they apply said framing preset on the camera recorded chart, and we see how it lines up. Its never bang on, because again…camera charts never are. But its a great reference!

So to me, the goal here would be to stop having humans manually choosing how much to zoom in, pan/tilt, crop, matte, etc… Have the software read the framing preset, apply it. And then you could even save that as a preset in your software.

Because not every job has people like you and I making these pixel accurate charts, dailies may frame one way, VFX then re-creates the framing trying to match the QT Reference. But maybe dailies f-ed up the framing and now VFX is just matching it, but no one hears about it because they are both wrong. haha. Anyhow;

I think we’re on the same page for the goals :slight_smile: But what I’m curious to hear is why the example I gave would not allow software to make the crop/zoom, etc to match the frame from on set? Where I wrote extraction frame, really what this meant is; this is the area that is intended to be active. If this active area happens to be 2.39, and you throw it into a 1.78 window, the software would know to matte it.

My thought had been that in camera, you set up your frame lines, the same way you currently do. These are each then an extraction name set of parameters from the saved XML. But in camera you could choose to use any of these 4 optional frame lines. I have never actually created these frame lines though and am not sure how they currently correlate with what gets saved into an XML. So if you’re saying this way is not do-able, copy that. I don’t think I fully get it yet though.

Maybe the disconnect is that I don’t understand the difference between your point about the framing on set, vs the crop? Is this as an example people framing on set for 1.78, but the crop would bring it down into a 2.39 as a different frame? If so, wouldn’t the 1.78 by one frame line you set in camera, and then the 2.39 be another. Then we have both?

I assume I’m missing something, but thought I’d also offer up the idea of chatting over the phone, versus continuing in this thread?

P.S. I totally hear your point about lots of vendors. We have 28 right now on Watchmen…Its nuts!