July 25, 2005
Chroma Sampling: An Investigation
By Graeme Nattress www.nattress.com
In this article I hope to be able to show if there are any picture quality benefits to bringing in DV (I'll refer to miniDV, DVCAM and DVCPro as DV as they all share the same codec) to Apple's Final Cut Pro over uncompressed SDI compared to the "normal" route of transfer over Firewire and using the Apple DV codec. I will also investigate new algorithms for dealing with DV chroma sampling issues. Many thanks to Peter McAuley of AXYZ Edit in Toronto who made this article possible and AOL Canada for permission to use the footage of the boy in the red shirt. Over the course of writing this article, there was a bit of "scope creep" as more ideas to analyse the video were thought of, and we decided to throw analogue BetacamSP into the mix for consideration. Near the end of writing the article I came across a controversy over whether Digital Betacam is an 8bit format or a 10bit format, so that too was included into the scope of the investigations.
Before I start, I'd first like to look at some terminology and an explanation of chroma sampling.
Component digital video if often referred to as YUV, but this is not an accurate description of component digital video, but instead refers to a set of intermediate qualities used in the formation of analogue composite video. The correct terminology for the components of digital video is Y' Cb Cr. The Y' represents luma, the ' is important, reminding us that it is non-linear (gamma corrected). The Cb and Cr are chroma components that are often wrongly referred to as U and V. The Cb is calculated as B' - Y', and Cr is R' - Y'. If you are interested in more details, check out Charles Poynton's excellent articles at http://www.poynton.com/
The CCDs that capture our image inside our video camera, capture colour by the use of Red, Green, and Blue filters. The phosphors in our CRT monitor, and filters in our LCD monitor and DLP projector also use Red, Green and Blue filters. Why is it then, that digital video is stored and manipulated as separate luma and chroma components? Separate luma and chroma are used so that the resolution of the chroma can be reduced with respect to the resolution of the luma so that large savings cand be made in the amount of data that needs to be transmitted - it is a form of compression. This reduction in the resolution of the chroma components works perceptually as our human vision systems are not as able to see fine details in colour as we are in lightness.
There are a number of methods of chroma sampling, and they each have terminology that refers to the chroma resolution (the second and third numbers) as compared to the luma resolution (first number):
4:4:4 Full resolution luma is represented by the number 4, and as the chroma components Cb and Cr are also 4, there is no reduction in resolution. 4:4:4 sampling is mostly used for RGB images, although it can be used for Y'CbCr, although no camera records 4:4:4 Y'CbCr
4:2:2 Full resolution luma, and half (2/4 = 0.5) resolution horizontally on the chroma components. This is the traditional broadcast standard for chroma sampling and is used by DigiBeta, DVCpro50 etc.
4:1:1 Full resolution luma and quarter (1/4 = 0.25) resolution chroma components. This is the system used by NTSC DV and PAL DVCPro
4:2:0 Full resolution luma, and half resolution in the horizontal direction and vertical direction for the chroma components. 4:2:0 is a very complex chroma sampling with many variants depending on wether the video is progressive or interlaced, or if it is being used by PAL DV or MPEG2. 4:2:0 compresses the resolution of the colour to 1/4, just like 4:1:1 compresses the resolution of the colour, but whereas the compression in 4:1:1 is horizontal only, the compression in 4:2:0 is horizontal and vertical. The illustration below is for PAL DV 4:2:0 chroma sampling.
Test 1 - Apple DV codec compared to Digital Betacam
Before we look at the DV over SDI v Firewire comparison, it is useful to see what a full quality 4:2:2 Digital Betacam image looks like, so that we have a sense of how both the the SDI DV and Firewire compare. Due to the nature of these tests we do not have the same footage for both this comparison and the later tests, but I still think this is a worthwhile exploration.
To make these comparisons possible, I wrote a few plugins for Apple's Final Cut Pro which allow us to examine the Y' Cb and Cr components seperately (G Take). Also, due to some codecs (like the uncompressed 4:2:2 codec used for the Digital Betacam captures) smoothing chroma and some (like the Apple DV codec) leaving chroma unsmoothed, I wrote plugins (G Make 422 & G Make 411) to show us unsmooth chroma where there codec wants to smooth it!
The Digital Betacam footage used is from 35mm film (Kodak 5293) transferred on a Rank C Reality with a Da Vinci 2K colour corrector to a DVW500 Digital Betacam deck. The Digital Betacam footage was brought into Final Cut Pro using a Kona SD card (with Decklink 4.6 driver) via SDI as fully uncompressed 8bit and 10bit files. With regards to chroma sampling, there is no real difference between the 8bit and 10bit files, but the 10bit files are of an overall higher quality. We will look at the difference between 10bit and 8bit capture later.
Here are the block diagrams that show the path the video took to get to Final Cut Pro.
The Uncompressed 4:2:2 codec smooths the chroma and this cannot be turned off, so G Make 4:2:2 was used to remove the smoothed chroma samples and replace them with copies of the unsmoothed chroma pixels
The Apple DV codec does not smooth chroma by default and has no option to turn it on, so the Apple 4:1:1 keying filter was used to smooth the chroma
As you can clearly see, there is a vast difference in the resolution of the chroma Cb and Cr when the components are seperated out. These image clearly demonstrate the difference between 4:1:1 and 4:2:2 colour sampling. It should be remembered, however, when a DV deck converts it signal from digital to analogue for output over component, composite or S-Video, the chroma is smoothed in a very similar manner as shown here, and hence the artifacts visible on your computer screen are much less obvious when viewed on a TV monitor.
Test 2 - DV over Firewire compared to DV over SDI
Traditionally DV has always been captured using it's native Firewire connection that effectively allows a NLE like Final Cut Pro to copy the exact data on tape to the hard drive, placing it in a Quicktime wrapper for easy use. Apple supply a DV codec for Quicktime and it has evolved over the last few years to become very capable. However, the Apple DV codec, unlike the Avid DV codec does not smooth the DV's blocky 4:1:1 chroma. This may sound like a bad move on Apple's part, but this allows it to have better generational quality. As an option, Apple inlude a filter named 4:1:1 in the keying section of filters to smooth DV 4:1:1 chroma.
Some professional DV decks, like the Sony DSR-1800 that was used for these tests, have an SDI output option. SDI, or Serial Digital Interface, is designed for a professional level 4:2:2 uncompressed transfer of video between decks. It can also be used to allow Final Cut Pro to capture 4:2:2 uncompressed video if you have a suitable capture card like the Kona SD used in these tests. Because DV is neither uncompressed nor 4:2:2, the deck must first transform the data into a format suitable for transfer over SDI. The DV data on tape gets uncompressed, and the 4:1:1 chroma gets upsampled to 4:2:2. I have not been able to find out the exact mathematical method by which the chroma is upsampled, but we shall see in these tests what it looks like compared to both unsmoothed and smoothed DV via the Apple DV codec.
This is the workflow used to examine the DV footage over Firewire.
This is the workflow to examine the DV footage over SDI.
The next image was generated using the above workflows, and the footage was shot on a Canon XL2. It was brought in via SDI from the DSR-1800 deck through the KonaSD card into Final Cut Pro. The Firewire footage was brought in via Firewire from the DSR-1800 deck. Great care was taken to align the captures so that the frames match precisely, although due to timecode uncertainties, this was tricky. The still images were brought into Photoshop, aligned and cropped to allow easy comparison.
With the naked eye, it is practically impossible, in this example, to see much difference in the Y'CbCr colour pictures. It is also very difficult to see any difference at all in the Y' pictures. This is good. There is no reason why the Y' should be any different between getting decoded by the hardware in the deck compared to being decoded by Apple's DV codec. The Cb abd Cr chroma component images are very telling. It is quite obvious that the unsmoothed Firewire chroma are very inferior to the chroma via SDI, but that the smoothed Firewire chroma images are practically identical to that of the SDI chroma images.
To highlight any subtle differences I decided to examine another piece of footage shot on the Canon XL2. This time I did not look at the unsmoothed Firewire chroma as we know it's going to look bad. Instead I concentrated on the difference between the smoothed Firewire Chroma and the SDI chroma.
In this case I have highlighted the differences by using a scaled difference technique in Photoshop whereby the difference between the two images being compared it scaled to allow subtle differences to be easily seen. Because we have reference image, it is impossible for us to tell which is better, SDI or Firewire, but we can tell that the differences between them are subtle indeed. There seems to be a subtle, random difference between the two Y' components, which, due to it's very low level, is hard to determine the origin of. The chroma components are certainly different, especially on the visible edges. Again, the differences are at such a low level that they are practically inconsequential.
Test 3 - DVCAM, Digital Betacam and BetacamSP Compared.
In this test we take the original video clip that we used in Test 1, of the boy in the red shirt, and we use the Digital Betacam version as a reference master. From the Digital Betacam dubs were made to BetacamSP via component to a UVW1800 BetacamSP deck and to DVCAM via SDI.
The Digital Betacam was brought into Final Cut Pro via SDI and the Kona SD card again,
and the DVCAM was brought in by both SDI and Firewire.
The BetacamSP was brought in by multiple methods: buy using the Digital Betacam deck as a high quality component to SDI converter to allow the analogue video to be brought in via SDI uncompressed 4:2:2, and the BetacamSP was brought in via a component dub to DVCAM, and from there via SDI and Firewire.
This produced a lot data which I've sumarised in the picture below.
Looking carefully at the results, I can see the instant superiority of the Digital Betacam master, in both the luma Y' component and both the chroma Cb and Cr components. It's hard to make out, but the Y' on the BetacamSP captures is slightly softer than the original Digital Betacam and the DVCAM. Again the DVCAM over SDI is practically identical to the DVCAM over FW after it has been smoothed.
The chroma on the BetacamSP is very interesting as it is significantly softer than the 4:2:2 chroma of the Digital Betacam, but also appears slightly sharper than the 4:1:1 chroma of the DVCAM. The BetacamSP to DVCAM dubs look worst of all, which is nor surprising given that they have gone down an extra generation compared to the other examples.
To see this more clearly, I have produced two sets of differences, the first is a plain difference, the second has the differences scaled by a factor of 8. I could not scale the differences to the factor of 128 as the differences are very much more pronounced.
Differences are unscaled in this diagram.
Differences scaled by a factor of 8.
The scaled difference pictures are very interesting. I'm not quite sure why the Y'CbCr colour difference shows such a large variation between DVCAM Firewire smoothed and DVCAM SDI, but again, the luma components are practically identical and the chroma components very similar. It's hard to say which is better, but I'd give the edge to the SDI chroma upsampling, but it a difference that's unlikely ever to be visible without such analysis. Final Cut Pro's scaling algorithm is fairly poor, so if a better scaling algorithm was employed it is doubtful wether even this slight difference would remain between the two methods.
The BetacamSP shows the larges luma differences from the original Digital Betacam. It is interesting how the right hand edge of the boy's red shirt shows a significant luma difference, probably caused by a subtle but noticible ringing / sharpening of the image coupled with the slight blurring noticed in the previous test. The chroma of the BetacamSP is also showing significant differences from the original Digital Betacam, of similar magnetude to that of the DVCAM differences, but again we see clearly that the chroma resolution is somewhere between that of Digital Betacam and DVCAM.
To allow you to see the subtly lower resolution of the BetacamSP luma compared to the Digital Betacam and DVCAM, I have extracted out just the luma differences and placed them side by side. Again, it's tricky to see but I'm noticing ringing / sharpness artifacts on the right hand edge of the boy's shirt and a thickening in the stalk and head of the grasses on the left of the picture. There is a general softness and subtle lack of detail in the BetacamSP luma.
I found the differences above hard to see so I've enlarged the images in Photoshop by a factor of 4 (nearest neighbour scaling) to allow them to be seen more clearly.
Test 4 - The 10bit difference
During the course of these experiments, it came to my attention that there is some confusion in some people's minds over whether Digital Betacam is a 10bit format or an 8bit format. In digital video, 10bit or 8bit refers to the precision of the quantisation of the luma and chroma components. A 10bit format has four times the precision compared to an 8bit format, so there should be a significant and measurable difference.
This test involved the use of Apple's Shake due to it's floating point compositing precision and flexible tool set. I also recieved some help from Apple Shake support in the creation of some of the analytical scripts.
First, to get an idea of what a know 10bit and know 8bit video looks like under the test script, a 16bit gradient from black to white was created in Shake, and exported as a 8bit movie using the 8bit uncompressed Black Magic Design codec, and also exported as a 10bit movie using the 10bit Black Magic Design codec. These original movies were kept for comparison and also dumped to Digital Betacam tape, and recaptured over SDI using the KonaSD with the 10bit Black Magic Design codec.
By examining the output of the PlotScanline1 node, I hoped to be able to determine 8bit or 10bit by looking at the smoothness of the resulting line.
The resulting image showed to me that Digital Betacam is indeed a 10bit format.
Next was the development of the script to allow me to determine whether an arbitary image was 8bit or 10bit. This involved an adition of a ColorX node and Histogram node to the script. The idea is to take the scanline and colour each pixel in the line a different shade of grey depending on it's height, and then to plot the entire image as a histogram. The resulting graph is scaled so that it displays full scale, and any gaps will show where there is no luma value. Of course, it must first be tested on known data:
From the above images it is clear that 8bit images leave clear gaps in the histogram, whereas 10bit images do not. Next I repeated the experiment upon the Digital Betacam video and dubs to BetacamSP and DVCAM we used for the earlier chroma sampling experiments. Because these are colour videos, the resulting histogram was also displayed in colour, although in RGB rather than Y'CbCr. There is no option in Shake to deal with video in it's native Y'CbCr format and any format conversion in Shake was found to muddy the results somewhat. Shake was chosen for these experiments because of it's ability to handle high precision video, and it's useful tools for analysing images.
From these histograms we can see quite clearly whether a video format is 8bit or 10bit. This does not, however mean that the quality of the video is 8bit or 10bit, just that it's using or not using 8bit or 10bit quantisation levels. For instance the 10bit DVCAM image does not show the all missing quantisation levels that the 8bit formats do - it appears to be somewhere between 8bit and 10bit. We know that DVCAM is 8bit in nature so they filling in of the missing quantisation levels could be caused by the transfer to Digital Betacam or the 10bit codec used, or perhaps that Shake is dealing with the video as RGB rather than Y'CbCr. BetacamSP, being an analogue format, can be digitised at whatever quantisation you choose, whether that be 8bit or 10bit. That does not mean that BetacamSP is equivalent in quality to 10bit, however, as noise from the analogue tape will be a limiting factor.
The limiting factor in the quality of DV in Final Cut Pro is related to the method by which the 4:1:1 chroma is dealt with. By using the SDI output into an uncompressed Final Cut Pro editing system, DV chroma can be quickly and easily upsampled to 4:2:2, although the upsampled chroma is visually lacking compared to a reference 4:2:2 Digital Betacam original. If SDI outputs are not available, then a traditional Firewire approach to DV capture can be used if care is taken over the workflow to ensure that the DV 4:1:1 chroma is upsampled before a chroma key or the DV footage is "bumped" up to uncompressed to be mastered out to Digital Betacam or other 4:2:2 format video tape. There is a quality difference between the two approaches. The luma and chroma components are practically identical in both cases, but the chroma generated from the hardware upsample in the DSR-1800 DVCAM deck is slightly closer to the original Digital Betacam 4:2:2 chroma, but the difference is of such a small magnetude for me to recommend you pick your DV capture method based upon workflow requirement rather than any quality difference between the two methods.
By including BetacamSP in these experiments I hope I have broadened the reader's understanding of how both DVCAM and BetacamSP compare to a reference of Digital Betacam. After performing these experiments and carefully examining the video footage, I find it hard to decide wether BetacamSP is better or worse than DVCAM, because although BetacamSP offers a quality of chroma that appears to be about half way between that of DVCAM and Digital Betacam, it's luma resolution softness and edge enhancement ringing detract more visibly. The chroma differences would seem to require special analysis to easily determine differences by eye, whereas the luma differences are obvious as the video is playing without the aid of analysis tools.
However, this is not the end of the story. I have spent the last year working with the chroma components of DV format video in an attempt to develop smart algorithms that would allow the quality of upsampled chroma to be improved beyond what "simple" interpolation offers especially for use when chroma keying from DV sources or for up-rezzing DV to HD formats.
Again using the footage of the boy in the red shirt, I applied two versions of my algorithm, labled G Nicer (a) and G Nicer (b). One of the benefits of using your own algorithm is that you can tweak the settings to your own liking. The (a) and (b) represent different settings of the same algorithm. The version of G Nicer being used is a beta of V2.0.
The algorithm used in G Nicer (a) and G Nicer (b) is designed to enhance edges and one of the main uses for it is to allow better chroma keying with DV footage. In the (a) version it is quite clear that it does indeed both find and enhance the sharpness of edges, and it's ability to add in detail can be seen in the grass to the left of the boy. (b) uses a setting which aims for a smoother balance of edges and detail and seems to deal with the details and edge of the red shirt better, at the expense of absolute sharpness. Now for a more detailed close-up look at the boys shoulder (scale factor 4).
In close-up, you can see that the Cb component is quite blocky in all the DV versions, however the DV via Firewire smoothed with the Apple 4:1:1 filter is more blocky than the rest. This is purely down to the interpolation algorithm used by Apple. G Nicer (b) fares slightly better at the neck, but is blocky on the shoulder. The DV over SDI is again slightly better than the Apple 4:1:1 filter. I can see definate improvements with the G Nicer (b) algorithm which manages to match the appearance of the original Digital Betacam better than the rest, although it still has some blockyness visible.
From this, it can be seen that by using different algorithms to deal with DV's 4:1:1 chroma sampling significant improvements over it's appearance can be made.
copyright © Graeme Nattress 2005
This article first appeared on www.nattress.com and is reprinted here with permission.
© 2000 -2005 Apple Computer, Inc. All rights reserved. Apple, the Apple logo, Final Cut Pro, Macintosh and Power Mac
are either registered trademarks or trademarks of Apple. Other company and product names may be trademarks of their respective owners.
All screen captures, images, and textual references are the property and trademark of their creators/owners/publishers.