Animation can either be generated in real-time or in single-frame mode. Real-time implies that the images are being generated at a fast enough rate to produce the perception of persistence of motion. For general purposes, the rate capable of producing this perception is usually taken to be 1/24th of a second; the actual rate depends on the types of images being viewed and on the specific viewing conditions. Unfortunately, it's not uncommon to hear people speak of real-time when referring to rates as low as five frames a second.
If the imagery cannot be produced at a fast enough rate to provide real-time animation, then it can be generated a single frame at a time and each frame can be recorded on some medium so that it can be played back at animation rates later (i.e., rates fast enough to produce persistence of motion). The difference between real-time animation and single-frame animation is dependent on image quality, the computational complexity of the motion, and the power of the hardware that is being used to calculate the motion and render the images. Model-based motion control algorithms (e.g. computational fluid dynamics) can require processing that is too intense to be done in real-time. Sometimes it is possible to pre-compute the motion and then render in real-time.
The reader should note a distinction here in the two processes taking place: motion control and rendering. In one possible scenario, the motion control might be sophisticated simulations of physical processes that brings a supercomputer to its knees. But at the end of the calculation, a series of transformations for objects over multiple times is produced which the display hardware can read in and render in real time. At the other extreme, simple motion control such as linear interpolation, may be used in conjunction with a software ray tracer for the rendering. In this case, the motion calculation may be able to run at real-time rates but the renderer can't produce images in real-time.
Another distinction to be made is between playback rate and update rate. The playback rate is the rate at which frames are displayed on the display device. The update rate is the rate at which the motion is computed. For example, common with some cartoons, the update rate may be as low as 8 frames a second even though the playback rate is 30 frames a second. Similarly, with interlaced scan (explained below), the update rate may be 60 times a second while the playback rate is 30 frames a second.
The first computer animation used film to record the single frames. Single frame film technology has been around for a long time; it had been used in the late 1800s to produce some of the first film animation. The requirements are a film recorder that is capable of precise positioning of a single frame of film over the shutter in a static situation. If the positioning is not precise enough, then the image will jitter or float when played back. For each frame, the image is displayed and the shutter is opened and then closed during which time the film is exposed to the image. The film can then be advanced to the next frame and the next image displayed in preparation of the next frame exposure. The opening and closing of the shutter can be done manually, although this is a very human labor intensive operation. It is facilitated greatly if there is a computer interface to such a camera. This technology had been developed for traditional single frame animation (stop motion animation). One advantage of film technology is the resolution of the medium itself. The emulsion that coats the film celluloid can capture very high frequency image components.
The early film medium was 16mm film. The problem with this technology was the standard use of 16 frames per second playback rate which often is not fast enough to avoid flicker. The 35mm standard that employs 24 frames per second playback rate results in a much more stable image and has been used for most of the computer animation captured on film. For very high quality animation, a 70mm standard is used with a playback rate of 24 fps. 70mm usually requires that images be calculated at least at 2000 by 2000 resolution which drives up the cost of computer generated animation.
An easy way to transfer an image onto film is to position a camera in front of the screen, plot an image on the computer screen, open the camera shutter, close the camera shutter, advance the film to the next frame and repeat the process. Drawbacks of this approach include difficulties in eliminating extraneous light, the curvature of the screen, color shifting, and mechanical difficulties in maintaining a stable device. Another thing to keep in mind is that the image on the computer screen is not a static image, but one that is continuously being drawn, decaying and being redrawn (refreshed) on the screen, even if it is a static image. Therefore it is important that the camera shutter be open for several refreshes of the computer screen so that a solid image is recorded on the film or that it be precisely coordinated with the screen refresh so that an entire single refresh is captured on film. Usually, the former approach is taken due to its relative simplicity.
Some of the early computer animation used random vector displays to render the frames. A color filter was placed on the camera and that component of the image was scanned out while the camera lens was open. The process was repeated for the various color components (usually red, green and blue).
Currently there are many film products made specifically for computer animation. Film recorders have a special high-resolution, flat screen onto which the image is drawn so as to minimize distortion of the image. On some, mounting brackets allow either SLR cameras or single frame motion cameras to be mounted on the unit.
Film plotters use special electronics that 'draw' an image directly onto the film without the intermediate screen involved. Typically, these special purpose recorders and plotters are designed as very high-resolution devices (e.g., 4000 by 4000).
Some of the main drawbacks to using film technology are 1) the medium (film) can't be reused and 2) there is a delay between the time the recording is done and the time it can be viewed (developing).
The advent of video technology and the fact that it is driven by a mass consumer market has brought it into the price range of just about everybody. This has resulted in affordable video single-frame recorders and controllers.
Video technology is based on a raster scan display refreshing format. Raster scan refers to the pattern used to scan out the image: top-to-bottom a line at a time, left-to-right along a line. A line is called a scanline. The image is drawn by an electron beam which strikes a phosphor-coated screen which emits photons in the form of light. The intensity of the electron beam is controlled by the image being scanned out whether that image is stored in the digital memory of a computer, or generated by the similar raster scanning of a video camera. After a scan of an individual scanline, the electron beam is turned off and is positioned at the beginning of the next scanline line. The time it takes to do this is called the vertical retrace interval and the signal which notifies the electronics of this is called vertical blanking or vertical sync. When the beam gets to the bottom of the image, it is turned off and is returned to the top left of the screen. The time is takes to do this is called the horizontal retrace interval and the signal which notifies the electronics of this is called horizontal blanking or horizontal sync. A complete scan of all the scanlines of an image is called a frame. In some video formats, all of the scanlines are done at once (progressive scan). In other video formats every odd-numbered scanline is done on one pass and every even-numbered scanline is done on the next pass (interlaced scan). In interlaced scan, each pass is called a field (two fields per frame).
The National Television Systems Committee (NTSC) in 1941 established 525-line, 60.00 Hz field rate, 2:1 interlaced monochrome television in the United States. In 1953, 525-line 59.94 Hz field rate, 2:1 interlaced, composite color television signals were established. Broadcast video must correspond to this specific standard. The standard sets specific times for a horizontal scanline time, a frame time, the amplitude and duration of the vertical sync pulse, etc. Home video units typically generate much sloppier signals and would not qualify for broadcast. There are encoders that can strip old sync signals, etc. off a video signal and re-encode it so that it does correspond to broadcast quality standards. The specific pieces of video equipment will be mentioned later in this section.
There are 525 total scanline-times per frame-time in NTSC format. 29.97 frames are transmitted per second. There is a 2:1 interlace of the scanlines in alternate fields. Of the 525 total raster lines, 480 contain picture information; the remainder comprise vertical scanning overhead. The aspect ratio of a 525-line television picture is 4:3, so equal vertical and horizontal resolution are obtained at a horizontal resolution of 480 times 4/3 or 640 pixels per scanline. PAL and SECAM are the other two standards in use around the world. They differ from NTSC in specifics, like the number of scanlines per frame and the refresh rate, but both are interlaced raster formats. One of the reasons that television technology uses interlaced scanning is, when a camera is providing the image, the motion is updated every field thus producing smoother motion.
A black and white video signal is basically a single line that has the sync information and intensity signal superimposed on one signal. The vertical and horizontal sync pulses are negative with respect to a reference level with vertical sync being a much longer pulse than horizontal sync. On either side of the sync pulses are reference levels called the frontporch and back-porch. Between horizontal sync pulses, which identify the period between scanlines, is the active scanline interval. During the active scanline interval, the intensity of the signal controls the intensity of the electron beam of the monitor as it scans out the image (see Figure X).
A color monitor has three electron guns, each of which can be focused on one of three phosphor coatings on the screen. These phosphors are almost always some shade of red, green and blue. One way to drive a color monitor is to have four lines going into it: red, green, blue, and sync. Sometimes green and sync are superimposed onto one line in which case it resembles a black and white TV signal. In this case a monitor would have three lines going into it: red, green/sync, and blue.
It is often the case that a doubling of the input value on one of the lines does not result in a doubling of the light emitted from the screen. Gamma correction is a modulation of the input signal used to compensate for the non-linear response of the display screen. In graphics systems this is often done by a look-up table which converts the input value to a new value such that a linear response is produced at the screen output.
When color came on the scene in broadcast television, the engineers were faced with incorporating the color information in such a way so that black & white TVs could still display a color signal and color TVs could still display black & white signals. The solution was to encode color into a high-frequency component that was superimposed on the intensity signal of the black and white video. A reference signal for the color component was added to the back-porch of each horizontal back-porch, called the color burst. The color was encoded as an amplitude and phase shift with respect to this reference signal.
A signal that has separate lines for the color signals is referred to as a component signal. Signals such as the color TV signal with all of the information superimposed on one line is referred to as a composite signal.
Because of the limited room for information in the color signal of the composite signal, the TV engineers optimized the color information for a particular hue which they considered most important: Caucasian skin tone. Because of that, the RGB information had to be converted into a different color space: YIQ. Y is luminance and is essentially the intensity information found in the black and white signal. It is computed as:
Y= 0.299*R + 0.587*G + 0.114*B
The YIQ television signal is similar to the CIE defined YUV (or XYZ) color spaces in that the Y's (luminance) are the same. U and V are color difference signals and are scaled versions of B-Y (by .5/.866) and R-Y (by .5/.701) respectively. The I and Q chromanence signals used in television pick up the remaining two degrees of freedom of the UV space. I and Q are the signals used to modulate the amplitude and phase shift of the 3.58Mz color frequency reference signal. The phase of this chroma signal, C, conveys a quantity related to hue, and its amplitude conveys a quantity related to color saturation. In fact, the I and Q stand for "in phase" and "quadrature" respectively. The NTSC system mixes Y and C together and conveys the result on one piece of wire. The result of this addition operation is not theoretically reversible; the process of separating luminance and color often confuses one for the other (e.g., the appearance of color patterns seen on TV shots of people wearing black and white seersucker suits).
composite video format PAL, SECAM NTSC IDTV, EDTV, HDTV NTSC format 525 line, 59.94 Hz, interlaced (525/59.94/2:1) 4:3 aspect ratio 480 scanlines; w/aspect => 640 pixels b&w is sync and amplitude with specifics for timings Color NTSC - add while maintaining compatability to b&w signal color burst - 3.58MHz subcarrier YUV (similar to YIQ actually used; In-phase & Quadrature) Y = luminance = .299R + .587G + .114B U = B-Y V = R-Y I,Q are U,V space coordinate system to actually encode color info I,Q carry color info; use less bandwidth for these I,Q modulate 3.58MHz subcarrier; Q (quadrature) phase - hue, I (in-phase) amplitude - saturation SVHS and ED-Beta format connectors - Y&C on separate wires SVHS has severely limited bandwidth for chroma PAL & SECAM 625/50/2:1 (still 4:3 aspect ratio) PAL and SECAM are actually the color modulation methods. PAL similar to NTSC 576 lines have picture info IDTV - improved definition TV processing that involves use of field store and/or frame store (memory) techniques at the receiver. e.g., de-interlacing at the receiver involve no change to picture origination equipment and no change to emission standards EDTV - extended (or enhanced) definition television employs techniques at transmitter and receiver that are transparent to existing receivers e.g., separation of luminance and color components by pre-combing the signals prior to transmission (reduce NTSC artifaces such as dot crawl e.g., use of progressive scan at camera and de-interlacing at receiver require changes in picture origination equipment but complies with emission regulations HDTV - High Definition TV approx. twice horizontal and twice vertical resolution, component color coding, aspect ration 16:9 and frame rate of at least 24Hz. e.g., 1120/60.00/2:1
Both the size of the tape, the speed of the tape and the encoding format contribute the quality that can be supported by a particular video format. Common tape sizes are 1/2", 3/4", 1" and 2". Up until about 15 years ago, before 1" made its debut, 1/2" was strictly consumer grade, 3/4" was industrial strength and 2" was professional broadcast quality.
Current common 1/2" video formats are VHS, BetaCam, S-VHS, and ED-Beta. VHS and BetaCam are the two consumer grade video formats. They differ primarily on the speed of the tape which results in how much information you can record in a single frame; the more tape used per frame the more information that can be stored and, therefore, the better image and/or sound can be recorded and played back. S-VHS is a format in which the Y and C signals are kept separate when played back, thus avoiding the problems created when the signals are superimposed. All video equipment actually records signals this way, but S-VHS allows the Y signal (luminance) to be recorded at a higher than normal resolution. The color information is recorded to the same fidelity that it is on VHS. In addition, the sound is encoded differently from regular VHS also resulting in greater fidelity. The advantages of S-VHS are especially pronounced when played back on an S-VHS compatible television.
In addition, there are two digital formats, D1 and D2. They were both originally 8-bit formats, but have recently been expanded to 10-bit. Current recording devices are still 8-bit, but supporting equipment like frame synchronizers, switchers, etc. handles 10 bit formats.
D-1 came first and is a component format. It uses YUV coding, so-called 4:2:2, which means that the U and V components are horizontally sub-sampled 2-to-1. Luminance is sampled at 13.5 MHz, 720 samples per picture width. Aggregate data rate is roughly 27 MB/s (megabytes per second). D-1 was standardized back when the industry thought to would make the composite-analog to component-digital transition in one fell swoop. But that didn't happen. The cost was somewhere above $100K.
Ampex saw a niche and came up with the less expensive D-2 composite NTSC digital format (i.e., digitized NTSC). The composite signal is sampled at four-times-color-sub-carrier, about 14.318 Mz at one byte per sample (aggregate data rate, of course, 14.318 MB/s). It has all the impairment of NTSC, but with the reliability and performance of digital. It uses the same cassette as D-1.
components of video facility image generation camera computer (either single-frame or real-time) video signal generation scan converter (high resolution averaging down to NTSC resolution and encode) NTSC encoder (RGB to NTSC conversion) sync generator time-base corrector (used for image capture or broadcast quality signal generation) image storage Single frame VCR controllers OR film recorder digital still store laser disc image manipulation paint systems image processing image compositing video compositor digital compositing chroma keys wipes mattes monitoring scopes NTSC signal monitor RGB v. NTSC monitor
The main problem in producing animation is, of course, recording the frames in a sequence so that they can be played back as an animation sequence. There are various alternatives for recording video in sequence. One is to record it on the video tape directly. This requires a video recorder capable of recording a single frame at a time and it requires a video controller that can control the recorder based on signals it gets from a computer. Another medium on which single frames can be recorded are digital disks. These must be capable of real-time conversion if and playback of the video signal.
abekas D1 format for input and output. NTSC encoder and sync generator $64.5K for 25 sec; +$50K for +25sec. digital editing single frame video need NTSC encoder and sycn generator ($5K for both) $12K - for controller with external controls (e.g., lyon-lamb) 1", 3/4" or new 1/2" single frame recorders at least $6K for single frame recorder (SONY 3/4") new 1/2" single frame might be cheaper (ED-Beta) laser disk Panasonic, SONY ($20K) 45 sec per side of disk at $300 per disk SONY is analogue recording ?
Special purpose graphics hardware can produce real-time or near-real time computer animation. It comes in the form of anything from flight simulators to graphics workstations to personal computers with built-in graphics processors. While the term real-time is loosely defined in manufacturer's claims, true real-time performance would produce on the order of thirty frames of animation a second. Notice the difference between refresh-rate, which is the number of times the image on the display gets refreshed (typically thirty or sixty frames a second on most displays), and animation-rate which refers to the number of different images that can be produced and displayed. Saturday morning cartoons have degenerated into the range of six to eight frames of animation per second (while the TVs they are show on still operate at the refresh rate of thirty frames a second).
Simulators are special-purpose computer graphics systems that are designed to only produce displays of shaded imagery in response to human manipulated controls in mock-up cockpits. Usually most of the database is static containing few moving objects such as planes, boats, tanks, cars, etc. A human operator manipulates controls which the simulator samples, processes and updates the status of the vehicle being simulated. This results in a new viewpoint which must be used to produce new images on the screens of the cockpit.
Graphics workstations have built-in display processors which can typically handle tens of thousands of polygons in real-time. However, the definition of a 'display polygon' varies from spec to spec and can drastically impact performances statistics. Silicon Graphics, HP, DEC and SUN are among the manufacturers that support special graphics facilities in otherwise general purpose workstations to enhance display performance. Applications programs, if they are fast enough, can then produce polygon definitions at real-time rates and pass the polygons on to the rendering engines inside these workstations.
Some personal computers, most notably the Amiga, also have some special-purpose graphics hardware built in to operate in the same way. However, at this level the support is almost exclusively for two-dimensional graphics.
Check out the following list of hardware (under construction):
Go back to Table of Contents
Go to Previous Chapter: Chapter 1.
Introduction
Go to Next Chapter: Chapter 3. Display
Considerations