In defense of interlaced video

There’s been a lot of chatter about interlaced vs. progressive video and how progressive video is inherently superior. Interlaced video, in comparison is said to be Just Plain Evil. Perhaps it’s my analog video background, but I have a far more favorable view of interlaced video.

Given a choice between interlaced and progressive video at the same resolution…and no bitrate constraints…I’d of course choose progressive. Progressive video is always easier to deal with than interlaced, and it’s so easy to process interlaced video incorrectly and get weird artifacts. But progressive’s not always an option.  For ATSC today, your choices are 720p/59.94 or 1080i/59.94. 1080p/59.94 generally requires a higher bitrate than 1080i/59.94 to achieve the same perceptual quality, especially when you consider the amount of work that’s gone into post-processing interlaced content so it looks good on a progressive display.

Some types of content benefit from 1080i more than others, of course. Movies and shows shot in 1080p/24 or 1080p/29.94 tend to do well when broadcast at 1080i. The reason is that there aren’t 60 frames/s to begin with, and through some processing (telecine in the case of 24p source content), it’s possible to recover something resembling the original frame at the television. Sports (and in general content that has high motion) tends to benefit more from progressive frame video. Generally here its preferable to sacrifice a bit of resolution in favor of a higher (full-frame) refresh rate.

Vidiots and Geeks

There’s a huge misconception about interlacing artifacts, particularly among those who grew up in the digital television age with flat panel displays. Most people think of the dreaded fingering that can occur when two fields are slapped together:

Well if you see things like that, then the video’s being processed incorrectly. And no, you are not allowed to slap two fields together that way.

Interlaced video grew out of analog television, and understanding exactly how interlaced video works in the analog world gives a lot of insight into how it’s best handled in a progressive world.

The introduction of computers to analog television wasn’t exactly graceful. Back in the 80s and 90s when computers were first being used to manipulate TV, one of the common complaints (at least from my “vidiot” colleagues from the true video side of the industry, like Ampex and Abekas),  was that computer (graphics) people didn’t know video. There were a number of complaints – performance (dropping a field is not acceptable under any circumstances), color spaces (no we do not use graphics RGB. Are you crazy?), pixel aspect ratios (square? Square? Maybe you are, but our pixels aren’t!)…and not the least of which was an utter lack of understanding of interlaced video. Even MPEG-1, an ISO standard,  got it wrong (from the vidiot’s perspective) with only support for (progressive) frames. It wasn’t until MPEG-2 that support for fields came along.

Why fields?

In the US, a broadcast TV signal has 6 MHz of bandwidth available to it.   For analog NTSC there are 525 horizontal lines in a frame and 29.97 frames/s. This translates to 15734 lines/s. Some parts of the line (horizontal blanking) are reserved for signaling for the TV and don’t get “drawn” (in CRT-speak, the picture is “painted” or “drawn” by the electron gun in the CRT). When all is said and done, with 6 MHz of bandwidth an 15734 horizontal lines, there are about 330 vertical lines per frame. (You can roughly think of this as 330 pixels per line. Do you find it confusing that TV people call both the rasters and the columns lines? You’re not alone.) It should be note that this horizontal resolution limitation results from the 6 MHz bandwidth that is allocated to a broadcast channel. In non-broadcast environments (e.g. DVDs) the resolution can be much higher.

While there are 525 horizontal lines in an NTSC frame, the electron gun actually sweeps vertically 59.94 times per second. So for each frame time, it paints the screen twice – the first time the even frame lines, and the next time the odd frame lines.  This is where the fields come from, with each field having half the lines of a frame. You’ll sometimes hear that the second field is offset from the first field by half of a line. This refers to half of a field line. Vidiots often prefer to speak in terms of fields. I’ve been around them long enough to have become infected, so you may see that terminology here.

Now a horizontal resolution of 330 isn’t a lot, even when you consider a 15” TV. Neither is 240 vertical lines per field. But we are constrained by the available bandwidth so that’s the best that can be done.

So we have 59.94 pictures (fields) per second each of resolution 330×240. How can we provide an illusion of higher resolution? On the horizontal side we’re at an impasse, but on something can be done on the vertical front. This is where the half-line vertical offset comes in. By moving every other field down by half a (field) line, we double the effective vertical resolution. This gives us the illusion of 330×480 resolution, yet we still can capture motion at 59.94 Hz.

To go back to our image above that had two fields woven together to produce fingering, we now know that these fields are offset in time by 1/59.94th of a second. So really there are two independent pictures here:

But really…what’s a field? (or….graphics geeks vs video vidiots)

This seems like a simple question, but one with complex undertones. And one which many computer/graphics types still get wrong. The naïve (and incorrect) description is that two fields come by taking every other line of a frame. I mean this is what was described above, and why there are odd and even fields right? One contains the odd lines, and one contains the even lines? Well, not quite.

In traditional broadcast video, the two fields are offset in time by a field time (half of a frame time). So for the 59.94 Hz field rate of NTSC, you take pictures at 59.94 Hz. For one picture take the even lines, the next the odd lines, etc. to get fields. This is closer to what goes on, but still not quite correct.

We’re now saying we’re taking the even lines and the odd lines, as if we were throwing out lines when we created fields. More appropriate would be to say that the frame is filtered a bit to drop some vertical resolution. So I suppose you could say that a line in a field spans (almost) two frame lines.

To make this a bit more concrete, consider how one of those ancient tube TVs works. This after all is what the original analog broadcast signal was intended to work with. The cathode ray tube has an electron gun that sweeps left to right and top to bottom. As the electron gun hits a point on the CRT, that point rapidly begins to glow with an intensity corresponding to the intensity of the beam. As the beam moves on, the portions of the screen it leaves behind begin to fade (decay). How rapidly the glow fades depend on the particular CRT, but in general for consumer TVs,  by the time a line for the second field is being painted, the corresponding line for the first field has (just) decayed enough so it’s not visible. So we never see both interlaced fields on the screen at the same time. In fact since the electron beam is sweeping from top to bottom at a fairly leisurely rate, we never even see one field on the screen at a time. Instead we see the lines of the current field that have been drawn so far, and whatever lines of the previous field that haven’t faded.

In consumer (analog) televisions, you can see that a field’s lines are rather thick. This is most obvious with video consoles. There’s a property of the analog television signal that causes the second field to be drawn one frame-line (or half of a field line) down. Video consoles, which have content authored/generated for progressive rather than interlaced display, play with this so that the offset never happens. The result is that you have 240p video rather than 480i. But when playing these video games on your typical television, you don’t see a black line where the electron gun didn’t paint anything. That’s because the field line is thick enough that it spans almost two frame lines. However, if you look at this on a studio monitor that draws finer lines, then you do see black rasters representing the lines that never got painted.

Interlaced Video in a Progressive World

All the history about analog interlaced video may be interesting, but we now live in a progressive world. Electron guns no longer sweep across CRTs to display an image. Just about everyone has flat panel TVs, which display progressive images. So to properly display interlaced video, it needs to be turned into progressive video.  This is a non-trivial task, but when done properly can yield high resolution with good refresh rates.

Telecine

For content that was originally produced at 24 frames/s and underwent  telecines to produce 59.94 fields/s, the inverse process can be used to restore the original image. A very similar process handles content that was originally 30 or 29.94 frames/s. This is the easy case.

Motion Adaptive Deinterlacing

The harder case is where the original content was shot at 59.94 fields/s. In this case, half the resolution is just gone (filtered out). However, it’s possible to use that half-line shift of every other field to guess what the missing pixels were. That’s what the more sophisticated deinterlacers, such as motion adaptive deinterlacers, do. To give you an idea, of how a very simple motion adaptive deinterlacer would work…for each pixel, the deinterlacer determines if there was motion in that area. (This in itself can be a complex process.) Areas with no motion can be treated as if the source were progressive, so similar to how 29.94 fps content is retrieved from 29.94 field/s broadcasts.

Areas with motion are more problematic. Consider a ball moving left to right across the screen. In one field, the ball is at one position. In the next field, it’ll be slightly to the left. If you were to merge these two you’d get the much dreaded interlacing artifacts. You could decide to ignore the previous field and scale the current field up. While you wouldn’t be enhancing the resolution, you wouldn’t be introducing artifacts. You could also try to determine how much and what type of motion occurred and extrapolate what the previous view would have looked like one field time later. Then you can merge that extrapolated field with the current one.

The amount of guessing that has to go one explains why it’s preferable to use progressive video for content with a lot of motion. Even the most sophisticated algorithms get things wrong on occasion. So either the deinterlacer needs to be conservative (in which case it generally won’t improve detail), or it will introduce visible artifacts from time to time. In these circumstances it may be desirable to just use progressive video and avoid the processing altogether.

Mixed-Mode Content

Interlaced to progressive conversion is made more difficult by “mixed-mode” content. This is content where interlaced and progressive source content was mixed together in the editing stage. Commercials are notorious sources of this, but scifi  shows can be particularly troublesome. The CGI used in scifi shows is often progressive, while the cameras on the set may capture interlaced video. When all the compositing and editing is said and done,  not only may segments of the show be interlaced and others progressive, but parts of a frame may be interlaced and parts progressive!

Needless to say, much engineering has gone into figuring out how to turn fields into frames. And when done properly, the results can be quite good.

The time of fields appears to be drawing to a close, however, as newer encoding standards such as H.265 appear to be dropping support for them. However until then I’ll happily watch 1080i content on my flat panel TV, and give thanks for all those hours that were spent developing good deinterlacers.