Video technology

Uit Kust Wiki
Ga naar: navigatie, zoeken

This article on Video technology deals with a few common technical issues related to video, in particular underwater video. First, some fundamental properties of a video data stream are discussed. Furthermore, analogue and digital video is described, followed by a short note on compression of the latter. For a discussion on the uses of underwater video, see the article application and use of underwater video, for an introduction to equipment used under water, see underwater video systems.

Fundamental properties of the video signal

A video data stream (analogue or digital) is characterized by a number of parameters. Their actual values constitute a trade-off between the available bandwidth (or channel capacity/data rate for digital signals) and the information content. In information theory it is stated, that the conveyable information content on a transmission channel is directly proportional to the frequency range (or the data rate) of the signal used for the communication. The larger the bandwidth/data rate is, the more information can be conveyed.

Each individual image in a video stream is called a frame. For analogue video formats, a frame is specified as a number of horizontal (scan) lines (sometimes called TV-lines, TVL), each with a determined length in time. A digital image is defined as a number of rows of picture elements/pixels, or a matrix of pixels if you like. The size of a frame and the frame rate are important components in the description of a video stream.

A complete description of a video stream is called a video format; this term is sometimes extended to descriptions also of physical media (tapes, discs), transmission details or equivalent.

Many of the parameters used to describe video formats originate in analogue video/TV standards and are more or less obsolete in the context of digital video. Since they are widely spread and still in use, it is though still fair to describe them in some detail.

Frame-rate

The number of still images, frames, per second is also known as the frame rate, measured in frames per second (fps) or Hz. If a sequence of still pictures is showed at a frame rate above 10-12 fps or so, the human perceptual system will regard them as a moving scene (Anderson & Anderson, 1993[1])

Different video standards have different frame rates. The analog television standards used in Europe (and Australia, for example) (PAL, SECAM) specify 25 Hz; as well as the digital MPEG-2/DVB-T replacing them. Another standard, NTSC, used in Northern America, Japan etc.) specifies 29.97 Hz. Digital formats sometimes allow for arbitrary frame-rates, where it is specified in the file- or streaming format.

Aspect ratio

The aspect ratio describes the relation between width and height of a video screen (screen aspect ratio) or the individual pixels (pixel aspect ratio). A traditional television screen has a screen aspect ratio of 4:3, wide screen sets use an aspect ratio of 16:9. Computer monitors and digital video usually have screen nominal aspect ratios of either 4:3 or 16:9.

The pixel aspect ratio is related to a single pixel in digital video. The pixel aspect ratio displayed on monitors is usually 1:1 (square pixels), while digital video formats often specify other ratios, inherent from analogue video standards and the conversion from analogue to digital signals.

As an example, consumer camcorders, often used for underwater video recordings, are often based on a digital video standard called DV (or IEEE 1394). DV is defined with a 4:3 screen aspect ratio, a screen resolution (for PAL) that is 720x576, but with (approximately) a 0.9:1 pixel aspect ratio. This means that a DV video will appear horizontally stretched if displayed on a computer monitor with square pixels, for example in an editing program. There are ways to correct this, either by cropping the image or by re-sampling it, and in practice it is often not important.

Analogue video

In older video cameras (until 1990 or so) a picture is projected by the camera’s optics onto a light sensitive plate in a specialized electronic component called video camera tube. The photons of the projected picture changes the electrical properties of the camera tube plate, more light (more photons) induces a larger change of properties.

By scanning the plate with a focused beam of electrons, moved in a pattern, these property changes can be read out from the camera tube as small current changes. The optical picture can thus (after amplification) be represented by a variation in voltage or amplitude.

It is of course essential to know the way in which the photoelectric plate is scanned – for example when a new frame is being started, the number of scan lines, when a line scan is started, the time to scan each line, etc. These synchronizing elements is indicated in the video signal by certain amplitude levels different from those used for picture information.

If the image is monochrome, these voltage variations, picture as well as synchronization, is called luma signal. Colour is added to the picture in a slightly more involved way, and is transmitted by a chrominance (or chroma) signal.

To recreate the picture from an analogue signal, another cathode ray tube (CRT) can be used (cf. above). In this case, an electron beam is swept over a fluorescent surface, producing light (emitting photons) in proportion to its amplitude – higher amplitude means more light. By synchronizing the electron beam of the CRT to the one emanating from the camera (cf. above) the picture projected onto the camera tube plate can be recreated.

The number of analogue scan lines in a full frame is different for different video standards, but is 625 lines in the European analogue video standards PAL and SECAM; not all of these lines are used for image data, though.

Although modern cameras use solid-state image sensors (CCDs or CMOS), the signal emanating from them follows the standards that were established for CRT technology. This creates a number of anachronistic complications that may seem confusing, for example the interlaced lines in PAL and NTSC video.

Interlaced and progressive

Although the perceptional system interprets a sequence of images as motion, we will still see a flickering scene if the image is updated at a rate below 15 Hz or so), and this phenomenon is only gradually decreasing up to maybe 75 Hz, where most people will be unable to see the flicker. To increase the perceived rate of image updates without increasing the bandwidth needed some video systems (notably the ones used for TV broadcasting) use a concept called interlaced video (as opposed to progressive), that sometimes causes unnecessary concern and confusion.

Interlacing is related to how the individual frames are captured in the camera and recreated on the monitor. Consider an image that is composed of horizontal lines. If every line is numbered consecutively, the image can partitioned into two fields: the odd field (odd-numbered lines) and the even field (even-numbered lines). If the odd field is captured/recreated first, then the even field, it means, that the monitor screen has been updated twice for each complete frame; a 25 Hz frame rate is seen a something updated at 50 Hz (field rate). A disadvantage that the technique can create visual artifacts, such as jagged edges, apparent motion or flashing. These artefacts are often seen when interlaced video is displayed on computer monitors or LCD projectors (that are progressive by nature), in particular when played back in slow motion or when capturing still pictures from the video stream

Progressive scan video formats will capture and recreate all of the horizontal lines in a frame consecutively. The result is a higher (perceived) resolution and a lack of various artefacts

Interlaced video can be converted to non-interlaced, progressive, by more or less sophisticated procedures, together known as de-interlacing. De-interlacing will remove the visual artefacts to an extent, but not entirely and it will sometimes introduce new impairments to the image, such as an apparent blurring.

Digital video

As described above, analogue video is a continuously varying value as a function of time – a signal – representing light changes in a projected scene. An analogue video signal is continuous in both time and amplitude and (in theory) arbitrarily small fluctuations in the signal are meaningful and carries information.

A binary digital signal is either on or off (high/low, true/false, 1/0 etc.), but note that these states are generally represented by analogue levels being below or over set threshold values.

To represent the analogue signal as binary values, the signal is constrained to a discrete set of values, in both time and amplitude. This is done by a process called Analog-to-Digital Conversion (ADC). In short, this means that the analogue value is determined at certain intervals of time (sampling rate) and represented as a flow of binary numbers. The size of the binary number (number of bits) gives the number of amplitude levels possible, the sampling rate limits the frequency content of the digitized signal.

If a video stream is represented as digital values, either by converting an analog signal or by creating digital values directly, we have a digital video signal.

The number of rows and columns in the digital frame depends on the sample rate used. As an example, a PAL frame sampled for DV (a common digital video standard) at 13.5 MHz consists of 576 lines, each 720 pixels long (actually 702 actual image pixels (52 µs), but a part of the horizontal blanking is sampled, too), while the same frame sampled at 6.4 MHz may contain 320x240 pixels.

When converting from analogue to digital format, the synchronizing components of the video signal are used to determine what part of the signal to sample, but a digital frame contains only the image information, and the number of lines in the digital frame is reduced when compared to its analogue counterpart; for PAL 576 lines instead of 625 if sampled at 13.5 MHz, for example.

Other digital video formats, not originating in analogue video, have other sizes. For example High Definition video (as standardized in ITU-R BT.709) can have a picture size of 720 rows of 1280 pixels, the computer monitor standard SVGA has 600 rows of 800 pixels each, etc.

Compression of digital video

Bit rate is a measure of the channel capacity, the amount of data conveyed over a (binary) digital channel. It is measured in bits per second (bit/s, sometimes bps). More bits per second is mostly equal to better video quality. The bit rate can be fixed or variable, real-time, streaming video often uses a fixed rate while recorded video may be using a variable bit rate. Compression of digital video

Digital video can be compressed, i.e. decrease the number of bits necessary to convey the images – to lower the bit rate. The data compression (or encoding) can be done because the images contain spatial and temporal redundancies that are removed in the compression process.

As a simplified example, consider transmitting ”20xZ” instead of ”ZZZZZZZZZZZZZZZZZZZZ” – a compression rate of 1:5. This of course implies that the receiver knows how to interpret ”20xZ” – a process known as decoding.

Generally, the spatial redundancy is reduced by analysis of changes within a frame (intra-frame compression) while the temporal redundancy is reduced registering differences between frames (inter-frame compression).

There are a number of standards for video compression, DV and MPEG-2 (used for DVDs) are just two. The scheme or algorithm for the encoding/decoding is often disguised as a small computer program-like plug-in (codec) that fits into a larger framework (container file format). For example, the Microsoft format AVI is a container format, where many different codecs can be used. There are other well known container formats including Apple’s QuickTime, DMF and RealMedia.

A compression algorithm, and hence a codec, is a trade-off, emphasizing different aspects of the compressed video: It could be colour, detail resolution, motion, file size or low bi-trate, ease of (de-)compression, etc. There are hundreds of codecs with different qualities available.

Unfortunately underwater video often does not compress well. Consider a typical underwater scene, where the camera is moving across a seabed covered with vegetation. The scene itself contains few areas with similar properties, there are many details, changes in light and colour, and so on; there is no blue sky covering 50% of the picture. Intra-frame compression is therefore not too efficient. Since the camera is moving, the entire picture is updated between frames which hampers inter-frame compression. How this ”incompressibility” is seen in the result depends on the actual codec used. It may be seen as larger files, a lower frame-rate, flattening of the colours, artefacts or loss of resolution, etc.

There is really no way around this but to increase the amount of data, that is to use a higher bi-trate. In practice, and for standard resolution video, DV or DVD bi-trates are sufficient but for the most demanding tasks.

See also

References

  1. Anderson J., Anderson B. (1993);The Myth of Persistence of Vision Revisited; Journal of Film and Video; 45:1 pp 3-12.
The main author of this article is Peter Jonsson
Please note that others may also have edited the contents of this article.