-
-
Notifications
You must be signed in to change notification settings - Fork 52
Composite Video
The various composite video standards define the content of a signal. They do not define the mechanism of its encoding or decoding. Therefore there is quite a lot of flexibility in those mechanisms. This page attempts to document both the form of a composite video signal and some of the valid approaches taken to decoding it.
A composite signal is one dimensional. It describes a two-dimensional image as a series of discrete scans — individual left-to-right slightly diagonal sweeps of content. Those scans belong to fields, each complete field being a collection of scans that covers the display.
Partitioning is achieved through the placement of syncs within the video stream. These are decoded to determine where the input device intended horizontal retrace (i.e. returning to the left of the display) and vertical retrace (i.e. returning to the top) to occur. Most TVs will attempt to rationalise the incoming sync signals, to smooth over noisy signals and to prevent potential mechanical damage.
A conventional TV image is 'interlaced'. That means that fields alternate between commencing in the top left of the display and the top centre; because scans are diagonal that naturally puts the discrete scans that lie within a field that starts in the top centre half a scan width lower than those that begin in the top left. Therefore fields that begin in the top left are often called 'odd' fields, and frames that begin in the centre are 'even' fields because if you view all the scans together then the field that begins in the top centre has the first (i.e. highest), third, fifth, etc lines — all the odd-numbered ones — whereas the one that begins in the top left has the second, fourth, sixth, etc.
A classic TV need have no awareness of this interlacing. It is a mere side effect of the timing of syncs.
An analogue-sourced TV signal also traditionally didn't impose any specific temporal correlation between fields. Each is individually just a capture of whatever the camera was pointed at, at that moment in time. No field inherently pairs with the one before it any more than the one after it.
So the temporal resolution remains the field rate, but the spatial resolution is improved by alternating the exact sampling positions.
Computers and other digital sources tend instead to work in terms of frames. When outputting a composite signal, then will either:
- effectively decline interlacing by posting only odd frames. Every field is a complete depiction of a single discrete video image; or
- pair fields off so that each two together represent a discrete frame. But even then there's no industry consensus on whether they're paired so that each even field kicks off a new pair, or each odd does so.
To paint its display, a CRT has two independent free-running pieces of state: horizontal scan position and vertical scan position. A CRT contains an electron gun which has a fixed position and orientation, and uses electromagnetism to alter the course of those electrons in order to effect the current scanning position. Since the electrons are deflected from their direct path, the scan positions are synonymously known as the deflector positions.
Both positions increase autonomously and independently. When one receives a sync, it will start to return to its origin. Such is the nature of the electronics that its return is not instantaneous, but it is speedy.
It is because both are running independently that scans are slightly diagonal. The vertical continues increasing during a horizontal run, but increases a lot more slowly.
It is because retraces are triggered independently that interlaced video is possible: the distinction in where the signal starts is simply a product of where the horizontal deflector happens to be when vertical retrace ends.
Composite signals have a single sync level. Sources should either hold the sync level for a short period (around 6% of a line) to communicate a horizontal sync, or hold it for a long period (2.5 or 3 lines, depending on the standard; with short gaps) to communicate a vertical sync.
The first thing a TV does when processing a composite signal is to find and classify the syncs. Very early designs used a design that simply assigned a horizontal sync to every leading edge of the sync level and charged a capacitor during sync periods. If that capacitor ever becomes full that indicates a vertical sync. So horizontal syncs are differential, vertical syncs integral.
Detected syncs are then fed into circuits that produce the real deflector syncs. These are in effect phase-locked loops. The horizontal phase-locked loop will be configured to produce an output within a certain tolerance of the expected horizontal period. It will attempt to lock to the incoming syncs. The vertical loop acts similarly but with a different period.
One approach to these phase-locked loops is flywheel sync. A flywheel sync:
- has an inherent velocity and current position (usually termed 'phase' in this context);
- will update position only as a direct function of velocity;
- will trigger sync only upon position reaching the end of its scale; but
- will use incoming syncs to adjust velocity, as a function of phase.
The simplest scheme is that velocity is adjusted proportionately to the difference between current phase and whatever the incoming signal implies that it should be. So e.g. if the flywheel appears to be a little behind the incoming signal because sync has just been detected but the flywheel won't trigger sync for a short period longer then the velocity is sped up slightly. If the flywheel appears to have fired early it will be slowed down a little.
As a result it will converge over time on the incoming signal if the incoming signal is consistent. It will specifically exhibit damped simple harmonic motion, producing sinusoidal errors in the interim.
For black and white images, simply decoding syncs is almost all the work a set has to do. Within the composite signal there is a 'back porch' which helps to establish the intended scale of the signal for proper brightness, then the horizontal content is simply a direct transfer of amplitude from the composite input to amplitude at the electron gun. Horizontal resolution is limited only by the bandwidth of the incoming signal.
The NTSC standard adds two colour channels within the same amount of bandwidth in a backwards-compatible fashion.
It partitions the amplitude content into two parts:
- all content below a certain frequency is a traditional brightness signal; but
- the content above that threshold describes colour.
The colour part is timed so that it will advance by half a cycle on each consecutive scan. This is intended to have the psychological effect of causing it to cancel itself out on black and white sets that do not filter it.
For colour content is uses quadrature amplitude modulation. This is an improvement on ordinary amplitude modulation.
Supposing there were only one colour channel to encode, which is f(t)
as a function of time, amplitude modulation would modulate a fixed-frequency sine wave to output f(t) * sin(m * t)
where m
is a scalar that defines the frequency of the wave.
One way to decode f(t) * sin(m * t)
if you already know m
and t
is counter-intuitively to multiply it by sin(m * t)
again and then apply a frequency filter:
f(t) * sin(m * t) * sin(m * t)
= f(t) * sin^2(m * t)
= f(t) * (1 - cos(2*m*t))/2 ; applying the trig identity
= f(t)/2 - [a function of cosine at twice the carrier frequency; filter this bit away]
In NTSC the wave that carries colour is known as the colour subcarrier and is approximately 3.58Mhz.
Encoding of two channels is achieved by using the sum of two amplitude modulations, one exactly ninety degrees out of phase with the other, which is the same thing as modulating one sine wave and one cosine wave. The same per-channel decoding rule works though:
output = f(t) * sin(m * t) + g(t) * cos(m * t)
=> output * sin(m * t) = (f(t) * sin(m * t) + g(t) * cos(m * t)) * sin(m * t)
= [f(t)/2 - (high frequency part)] + g(t) * cos(m * t) * sin(m * t)
= [f(t)/2 - (high frequency part)] + g(t) * 1/2 * (sin(m * t + m *t) + sin(m * t - m * t)) ; applying the trig identity
= [f(t)/2 - (high frequency part)] + g(t) * 1/2 * (sin(2 * m * t) + sin(0))
= [f(t)/2 - (high frequency part)] + g(t) * 1/2 * sin(2 * m * t)
= f(t)/2 - (high frequency part) + (another high frequency part)
So f(t) is recovered in isolation after filtering exactly as if there were no second channel.
Being able to recover colour from the colour channels as above requires that the decoder know the phase of the colour subcarrier. A burst of pure subcarrier is therefore added to the back porch, known as the colour burst. A colour TV will use that to establish phase for the coming line. Ideally, a colour TV will also decline to decode colour if no burst is present.
Various approaches are available, including:
- standard frequency filtering;
- a comb filter; and
- trigonmetric decoding.
Applying a low-pass filter below the colour subcarrier frequency can be used to extract luminance information; that can be subtracted from the complete signal to leave the colour signal.
The comb filter uses the observation that an entirely consistent colour signal of frequency n Mhz would add an opposite amount of signal at time (t + 1/2n) as at time t. Therefore the average of two samples can be used as brightness, and the difference between that and the source signal is colour.
The formula for composite encoding joins three independent variables as a function of phase; therefore the three can be derived given three separate samples plus knowledge of phase. Assuming that the three samples are placed so that they observe the same luminance and colour, e.g.:
s1 = y + q * sin(a) + r * cos(a)
s2 = y + q * sin(b) + r * cos(b)
s3 = y + q * sin(c) + r * cos(c)
=> s1 - s2 = (y + q * sin(a) + r * cos(a)) - (y + q * sin(b) + r * cos(b))
= q*( sin(a) - sin(b) ) + r*( cos(a) - cos(b) ) [1]
=> s2 - s3 = q*( sin(b) - sin(c) ) + r*( cos(b) - cos(c) ) [2]
[1] => s1 - s2 - r*( cos(a) - cos(b) ) = q*( sin(a) - sin(b) )
=> q = ( s1 - s2 - r*( cos(a) - cos(b) ) ) / ( sin(a) - sin(b) )
[2] => q = ( s2 - s3 - r*( cos(b) - cos(c) ) ) / ( sin(b) - sin(c) )
=> ( s1 - s2 - r*( cos(a) - cos(b) ) ) / ( sin(a) - sin(b) )
= ( s2 - s3 - r*( cos(b) - cos(c) ) ) / ( sin(b) - sin(c) )
=> ( sin(b) - sin(c) ) * ( s1 - s2 - r*( cos(a) - cos(b) ) )
= ( sin(a) - sin(b) ) * ( s2 - s3 - r*( cos(b) - cos(c) ) )
=> ( sin(b) - sin(c) ) * ( s1 - s2 ) - ( sin(a) - sin(b) ) * ( s2 - s3 )
= r * ( cos(a) - cos(b) ) * ( sin(b) - sin(c) ) - r * ( cos(b) - cos(c) ) * ( sin(a) - sin(b) )
=> r = ( ( sin(b) - sin(c) ) * ( s1 - s2 ) - ( sin(a) - sin(b) ) * ( s2 - s3 ) ) /
( cos(a) - cos(b) ) * ( sin(b) - sin(c) ) - ( cos(b) - cos(c) ) * ( sin(a) - sin(b) )
With the other colour component, q, being a variation on the same result and y being found by similar means.
PAL is a development of NTSC that gives greater resilience to phase errors induced by indirect radio signal receipt. This was manifested in the equipment as NTSC sets needing a 'tint' control, with which the viewer would manually shift the whole colour palette to account for those errors. PAL sets had no such control, being resistant to those errors.
A phase error causes the colour part of a signal to be shifted in phase.
On an NTSC set that causes the colours to shift as it affects the QAM decoding.
PAL sets get around the problem by alternating phase on each consecutive line. Collapsing the signal to a brightness part, b(t), and a colour part, c(t)*sin(m * t), the difference is:
NTSC encoding, every line: b(t) + c(t)
PAL encoding, even lines: b(t) + c(t)*sin(m * t)
PAL encoding, odd lines:
b(t) - c(t)*sin(m * t)
= b(t) + c(t)*sin(-m * t) ; as -sin(x) = sin(-x)
So if m * t
acquires a constant error, then that error has an opposite effect on each consecutive line. Early PAL sets leave the human eye to resolve the problem; more advanced ones average the colour signal across each pair of lines to remove any transmission error.
In terms of two-channel colour, this means changing the sign of only the sin component, as:
- (f(t) * sin(m * t) + g(t) * cos(m + t))
= -f(t) * sin(m * t) - g(t) * cos(m + t)
= -f(t) * sin(-m * t) + g(t) * cos(-m + t)
Less substantially, PAL is broadcast with a different subcarrier frequency and a line-by-line offset of approximately 0.25 cycles. The colour subcarrier 'swings' between -90 and +90 degrees on each line so that the receiver knows which way to decode.
SECAM avoids phase transmission errors by not using QAM. It uses a regular AM encoding for each colour component and alternates which component is supplied with each line.
It was therefore more expensive to implement than PAL or NTSC when introduced because it mandates storage whereas NTSC doesn't require any storage and with PAL it's an optional improvement.