From The MPEG-4 Structured Audio Book by John Lazzaro and John Wawrzynek.

Part IV/3: Signal Processing Core Opcodes

Sections

Introduction.
Level Matching. Absolute and relative.
Specialops. Blending a-rate and k-rate semantics.
Window Wavetables. Generating window function tables.
Sample Rate Conversion. Map between a-rate and k-rate.
Gain Control. Compressor/limiter/expander/noise-gate opcode.
Fourier Analysis. Core opcodes fft and ifft.
Portamento. Smooth pitch changes.

Core Opcodes:
balance compressor decimate downsamp fft gain ifft port rms samphold sblock upsamp
Wavetable Generator:
window

Introduction

In this chapter, we complete our description of the core opcode library.

We describe opcodes that perform signal processing operations on a buffer of a-rate signal values. These opcodes perform operations such as gain control, sample-rate conversion, and Fourier analysis on the buffer.

We describe the specialop semantics that govern several of the opcodes described in the chapter. A specialop opcode computes at the a-rate, but returns values at the k-rate.

We describe the core wavetable generator window, that computes popular windowing functions used in block-based signal processing. Several of the opcodes in this chapter have wavetable parameters that window buffer data.

Level Matching

The a-rate core opcodes gain and balance act as simple automatic gain control systems. See the right panel for header syntax.

On each call, these opcodes return the scaled copy a*x of the input signal parameter x, where a is an internal variable that sets the attenuation of the system.

The attenuation variable is initialized to 1 during the first opcode call, and is updated as specified by the gain control algorithm for each opcode.

`gain`

The gain opcode returns a signal whose RMS power approximates the power level specified by the parameter g.

To perform this task, gain periodically recalculates the attenuation variable, using a formula (shown on the right panel) that measures the power level of recent values of the signal parameter x.

By default, the attenuation is updated once every control period (the inverse of the k-rate). The optional i-rate parameter length (units of seconds) overrides the default value for the attenuation period.

During the first call to gain, a buffer is created of sufficient size to hold the x values for an entire attenuation period, and the current x value is placed at the start of the buffer. Subsequent calls to gain fill successive positions in the buffer.

On the gain call that fills the buffer, a new attenuation value is computed, using the equation shown on the right panel. The buffer is cleared, and future calls to gain refill the buffer in preparation for the next attenuation update.

`balance`

The balance opcode returns a scaled copy of the parameter x. The returned signal has an RMS power level that approximates the power of the signal parameter ref.

To achieve this behavior, the opcode creates two buffers, to hold recent values of ref and x. The opcode periodically updates the attenuation parameter, to reflect the energy of the signals in the two buffers.

The control period (1/k_rate) sets the default length of balance's buffers, which may be overridden by the optional i-rate parameter length.

During the first call to balance, buffers are created for ref and x parameters, and the current values for ref and x are placed at the start of the buffers. Subsequent calls fill successive positions in the ref and x buffers.

On the opcode call that fills the buffers, a new attenuation value is computed, using the equation shown on the right panel. The buffers are cleared, and future calls to balance refill the buffers in preparation for the next attenuation update.

`gain`

aopcode gain(asig x, ksig g [, ivar length]) on every call, return a*x. on first call: set internal variable a to 1, and create buffer xh. the optional parameter length (units of seconds) sets the buffer size. if this parameter is not given, set length to the 1/k_rate. xh contains L = floor(length*s_rate) samples. insert x into xh[0]. on subsequent calls: put x into next position in xh. if xh is filled, compute new value of a: g*sqrt(L) a = ------------------------------- sqrt(xh[0]^2 + ... + xh[L-1]^2) this completes one cycle of the algorithm. on the next call, insert x into xh[0], starting the next cycle.

`balance`

aopcode balance(asig x, asig ref [, ivar length]) on every call, return a*x. on first call: set internal variable a to 1. create buffers xh and rh. optional parameter length (units of seconds) sets the buffer sizes. if this parameter is not given, set length to the 1/k_rate. xh and rh contains L = floor(length*s_rate) samples. insert x into xh[0] and ref into rh[0]. on subsequent calls: put x into next position in xh, and ref into next position of rh. if buffers filled, compute the new value of a: sqrt(rh[0]^2 + ... + rh[L-1]^2) a = ------------------------------- sqrt(xh[0]^2 + ... + xh[L-1]^2) this completes one cycle of the algorithm. on the next call, insert x into xh[0], and ref into rh[0], starting the next cycle.

Specialops

The gain opcode, if called without the optional length parameter, fills its buffer by accepting new x values with each a-rate call, and computes the signal energy of its buffer once per k-rate.

The rms opcode also performs this function, but returns the signal energy as its k-rate return value. See the right panel for the header syntax and exact semantics for the rms opcode.

The rms opcode is an example of a SAOL specialop opcode, which has aspects of both aopcode and kopcode semantics.

Like an a-rate opcode, the rms opcode runs at the a-rate in order to fill the buffer. But like a k-rate opcode, it also runs at k-rate, and returns a k-rate value.

Specialop calls may only appear in instrument code, and in aopcode user-defined opcodes (described in Part IV). The rules below set the semantics of specialop opcodes:

A specialop returns values at the k-rate. For the purpose of evaluating the rate of expressions, a specialop is considered to be a kopcode.
A specialop is evaluated at both the a-rate and the k-rate. However, the expression returns a value, and the statement containing it executes, at the k-rate.
A specialop may appear in an a-rate statement. If so, its k-rate return semantics work in the same way as a normal k-rate opcode call.

Specialop calls may only appear in instrument code, and in aopcode user-defined opcodes (described in Part IV/4). Specialop calls are also restricted in these ways:

An expression containing a specialop opcode is considered a specialop expression.
A specialop expression may not appear in the code block or guard expression of a while statement.
A specialop expression may only appear in the code block of an if or if-else statement if the guard expression of the statement is also specialop.

The right panel shows several examples of specialop semantics, using the rms opcode.

`rms`

specialop rms(asig x, [, ivar length]) as a specialop, it runs at the a-rate and k-rate, but only returns values at the k-rate. k-rate, first call: create the buffer xh, and initialize values to zero. the optional parameter length (units of seconds, must be > 0) sets the buffer size. if this parameter is not given, set length to the 1/k_rate. xh contains: L = floor(length*s_rate) samples. create buffer index, set it to zero (first element). k-rate, all calls: return the value sqrt(xh[0]^2 + ... + xh[L-1]^2) ------------------------------- sqrt(L) a-rate, all calls: place the x value into the buffer xh, at the position of the buffer index. then increment buffer index. if index has value L, reset the index to 0.

Examples

asig x; ksig k; // legal, rms runs at a-rate // and k-rate, but returns // a value at k-rate that is // assigned to y. y = rms(x); // legal, both rms run at a-rate // and k-rate, if condition is // true at k-rate, assignment // is made. rms in assignment // returns same value as rms in // conditional if (rms(x) > y) { y = rms(x); }

Sample Rate Conversion

The rms opcode converts the information carried by an a-rate parameter to a k-rate return value. In this sense, it performs a type of sample-rate conversion.

In this section, we describe other core opcodes that perform sample-rate conversion.

Downsampling Opcodes

Three other simple opcodes make a-rate signal information available at the k-rate. These opcodes are all specialop opcodes.

The decimate opcode returns (during its k-pass) one of the in parameter values that it received in the preceding set of a-pass calls. The opcode definition does not specify which in value is chosen.

The downsamp opcode buffers the in values of the last s_rate/k_rate opcode calls at the a-rate. At the k-rate call following the a-rate calls, it returns the mean of the buffer.

If the downsamp call includes the optional table parameter win, the wavetable values are multiplied with the buffer values point by point, and opcode returns the sum of of all multiplication results. If the win table is shorter than the buffer, zeros are used for the extra window values.

The sblock opcode buffers in values of the last s_rate/k_rate opcode calls at the a-rate. During the k-rate call, it places these buffer values in the table provided by parameter t, which must have at least s_rate/k_rate table values. The opcode always returns zero.

Upsampling Opcodes

The simplest way to upsample control information to the audio rate is to assign a k-rate value to an a-rate variable. The upsamp and samphold core opcodes offer more sophisticated methods of upsampling.

The upsamp opcode upsamples the k-rate parameter in to a-rate via a shift-and-add technique. An optional table parameter win controls the spectral properties of the upsampling. The upsamp opcode reduces the aliasing artifacts produced by assigning k-rate values to a-rate variables directly. See the right panel for a complete explanation of this opcode.

The polymorphic samphold opcode performs a sample-and-hold operation on the polymorphic input parameter in, under the control of the k-rate parameter gate. It acts as an upsampling system if the in parameter is a-rate.

The samphold opcode returns the value of an internal state variable, that is initialized to zero at the start of the first call to the opcode. If the gate parameter is non-zero, the internal state variable is updated to the value of the in parameter.

Downsampling Opcodes

specialop decimate(asig in) specialop downsamp(asig in [,table win]) specialop sblock(asig in, table t) see left panel for algorithms.

Upsampling Opcodes

opcode samphold(xsig in, ksig gate) see left panel for algorithm. asig upsamp(ksig in [,table win]) This opcode upsamples the k-rate in parameter to a-rate, using a smoothing buffer. In the interesting case, the buffer size is the size of the table win, and is several times greater than a_rate/k_rate in length. On the first call to upsamp, the buffer buf[] is created, and initialized to zeros. On the first a-pass call to upsamp in a given execution cycle, the contents of buf[] are shifted forward by a_rate/k_rate samples. The last a_rate/k_rate buff[] values are set to zero. Then, all buf[] values is updated using this formula: buf[i] = buf[i] + input*win[i] This first a-pass call returns buf[0]; future a-pass calls in the execution cycle return buf[1], buf[2], ... If the win table has fewer than a_rate/k_rate elements, the buf[] has a size a_rate/k_rate, and zeros are used for the extra win values in the formula. If no win table is provided, a win of size a_rate/k_rate is used, with all samples of value 1. The buf[] is also a_rate/k_rate.

Window Wavetables

Several of the opcodes in the previous section let the programmer specify a windowing function as a wavetable of window values.

The core wavetable generator window simplifies the creation of windowing wavetables. The right panel shows the declaration syntax and algorithm for this wavetable generator.

The size parameter sets the number of samples in the window table, and must be greater than zero. The type parameter is an integer that sets the window type.

The window generator produces six window types.

Hamming window.
Hanning window.
Bartlett window.
Gaussian window.
Kaiser window.
Boxcar window

The numbering of the list indicates the value of the type parameter that produces the associated window shape.

The Kaiser window algorithm creates a family of windows, controlled by the optional parameter p.

`window`

table t(window, size, type[,p]); Type parameter is an integer that codes the window shape. Listing below shows algorithm, for samples that lie in range 0 <= x <= size-1. [1] Hamming window. 0.54 - 0.46*cos(2*pi*x/(size-1)) [2] Hanning window. 0.54*(1 - cos(2*pi*x/(size-1))) [3] Bartlett window (triangle). 2*fabs(x - ((size-1)/2)) 1 - ------------------------ (size-1) [4] Gaussian window: exp(-((m-x)^2)/a) where m = size/2 a = (size*size)/18 [5] Kaiser window a = (size-1)/2 Io[p*sqrt(a^2 - (x-a)^2)] ------------------------- Io[p*a] [6] Boxcar window -- all table values are 1. Slib defines the constants WINDOW_HAMMING, WINDOW_HANNING, WINDOW_BARTLETT, WINDOW_GAUSSIAN, WINDOW_KAISER, and WINDOW_BOXCAR to use as the type parameter in the window wavetable generator.

Gain Control

The compressor opcode implements a complete gain control system. The opcode may be configured to perform gain control functions such as compression, expansion, noise-gating, and limiting. The right panel shows the header syntax and algorithm for this opcode.

The opcode returns a scaled version of the a-rate input signal parameter x, with a latency of set by the parameter look. The scaling depends on the loudness of the a-rate signal parameter comp. For most uses, comp and x are set to the same value.

The opcode measures the loudness of comp, expressed in terms of decibels (dB), and changes the scaling of x in response to this loudness. The loudness is not computed as an instantaneous value, but by evaluating the signal over a short analysis window (set by the parameter look).

In this scale, 90 dB corresponds to a signal with a peak waveform value of 1, 70 dB corresponds to a signal with a peak waveform value of 0.1, etc. The noise floor of the system is set by the k-rate parameter nfloor, in units of dB.

The parameters att and rel set the attack and release times (in seconds) for the loudness measurement of comp. Short attack and release times let the loudness track quick signal transients, while longer attack and release times result in a smoother loudness estimate.

Given the loudness measurement of comp, the opcode calculates the scaling factor for the delayed version of x using the table shown on the right panel. The k-rate parameters nfloor, thresh, loknee, hiknee, and ratio control this scaling. All of these parameters have units of dB.

Noise gating

The nfloor and thresh parameters control noise gating. If the loudness of comp is above thresh, the noise gate is open, and the opcode returns a delayed replica of the x signal. If the loudness of comp is below nfloor, the noise gate is closed, and the opcode returns zero. Non-normative interpolation occurs in the transition regime between nfloor and thresh.

To turn off noise gating, both nfloor and thresh should be set to noise floor of this system (for most applications, a value of -40 dB yields good results).

Compression/expansion

If the loudness of comp is above hiknee, the opcode acts as a compressor or expander. The value of ratio determines the exact behavior in this regime. If the loudness of comp increases by ratio dB, the opcode returns a delayed version of x whose loudness has increased by 1 dB. Thus, ratio values greater than 1 dB result in compression, and ratio values between 0 and 1 dB result in expansion. Negative ratio values are prohibited.

If the loudness of comp is below loknee, the opcode performs as a "wire with latency", returning a replica of parameter x delayed by the analysis window time look. Non-normative interpolation occurs in the transition regime between loknee and hiknee.

Cross-signal effects

By choosing the comp signal to be different than the x signal, the opcode produces a version of the x signal whose dynamics are shaped by the comp signal.

`compressor`

aopcode compressor(asig x, asig comp, ksig nfloor, ksig thresh, ksig loknee, ksig hiknee, ksig ratio, ksig att, ksig rel, ivar look) The compressor opcode delays the signal parameter x for look seconds, and returns the delayed value after weighting it by R. R is determined by measuring the dB level of the signal parameter comp, as shown by the table. The parameters nfloor, thresh, loknee, and hiknee, and ratio are all in units of dB (90 dB corresponds to a signal amplitude of 1.0). comp (dB) | R ------------------------- less than | 0 nfloor | (noise gate: | closed) ------------------------- between | 0 < R < 1 nfloor and | (noise gate: thresh | transition) ------------------------- between | 1 thresh and | (noise gate: loknee | open) ------------------------- between | transition loknee and | regime hiknee | ------------------------- greater | R is set so than | that a ratio hiknee | dB increase | in comp | yields a 1 | dB increase | in x. ------------------------- given that: nfloor <= thresh thresh <= loknee loknee <= hiknee ratio > 0 If ratio is < 1 dB, the opcode acts as an expander. If ratio > 1 dB, the opcode it acts as a compressor. To compute comp dB value, the opcode keeps a buffer of instantaneous dB values of the comp signal, using the equation: 90 + 20*log_10(abs(comp)) This buffer length is set by the parameter look. The comp dB signal is computed by extrapoling signal trends in this buffer, under the guidance of the attack and release times of the opcode, set by parameters att and rel (which have units of seconds). Short att and rel values produce in quick changes in R, longer att and rel produce slower changes in R.

Fourier Analysis

The fft opcode computes a windowed and overlapped complex-valued Discrete Fourier Transform (DFT) on the a-rate parameter signal in. It stores the results in the wavetables re and im.

The complementary opcode ifft computes a windowed and overlapped Inverse Discrete Fourier Transform on the wavetable pair re and im, and returns samples of the resulting audio waveform.

These opcodes are designed to be used together to implement sound synthesis algorithms that use spectral modification techniques. If a boxcar window is used for both fft and ifft, an fft-ifft pair has unity gain. See the right panel for the header syntax of fft and ifft.

`fft`

The fft opcode is a specialop, that executes at the a-rate and k-rate, but returns a value at the k-rate.

The fft opcode returns a 1 if a new DFT has been calculated since the last k-pass, and 0 otherwise. If a new DFT has been computed, the real components are placed in the wavetable parameter re, and the imaginary components are placed in the wavetable parameter im.

The optional parameters len, shift, and size control the operation of the fft opcode.

The len parameter sets the size of the holding buffer for new audio samples. In most cases, len is also the size of the DFT.

The shift parameter controls the number of audio samples to add to the holding buffer before computing a new DFT. For a simple, non-overlapped DFT, shift is set to the same value as len. For an overlapped DFT, shift is set to a value smaller than len. For example, if len is 1024 and shift is 128, the opcode computes a new 1024 DFT every 128 samples.

On the first call to fft, a buffer hbuf of size len is created and zeroed, and the in parameter is placed in position hbuf[len - shift].

Subsequent calls fill hbuf[len - shift + 1], hbuf[len - shift + 2] ... until the buffer is filled, and then the DFT computation begins. The optional size parameter may be used to set the DFT size; if size is not used, the len parameter is used. The DFT size may be no larger than 8192, and must be a power of 2.

The table win may be supplied to window the audio samples prior to computing the DFT. If it is not supplied, a boxcar window is used. When hbuf is filled for the first time, a buffer new with size values is created. Each buffer variable new[i] takes the value win[i]*hbuf[i]. If size is greater than len, the extra values of new[i] are set to zero.

Once new is filled, a DFT is performed on the buffer, and the real and imaginary results placed in the wavetables re and im respectively, which must be able to hold size values. The first position in each table holds the DC DFT value, the size/2 position holds the Nyquist frequency coefficient value, and the positions after size/2 hold values that code the reflection of the spectrum above the Nyquist frequency.

The shift parameter controls the data overlap between successive DFT calculations. After the first DFT is computed, the hbuf buffer is shifted forward by shift values. The shift spaces at the end of the buffer are the place where future calls to fft place in values. Once the hbuf buffer is refilled, the new is refilled, and a new DFT is performed.

The right panel describes the default values and legal ranges for the fft parameters len, shift, size, and win.

`ifft`

The ifft opcode runs at a-rate, and returns audio samples created from the complex DFT values in the re and im tables. The opcode assumes these tables are in the format created by the fft opcode.

The optional parameters len, shift, and size control the operation of the ifft opcode.

The len parameter sets the size of the holding buffer for output audio samples. Since in most cases len is also the size of the IDFT, the size parameter defaults to len. The IDFT size may be no larger than 8192, and must be a power of 2.

During the first call to ifft, the opcode computes the IDFT of the re and im tables. If re and im are greater than size, only the first size elements of the wavetables are used to compute the IDFT.

The first len components of the IDFT result are multiplied point-by-point by the windowing table win, and placed in an output buffer out of length len.

The value out[0] is returned on this first call, the next call returns out[1], etc. Each sample is scaled by shift/len, so that an fft-ifft pair using boxcar windows has unity signal gain.

On the call where out[shift-1] is returned, the next IDFT is calculated, in the following way.

The contents of the out buffer are shifted forward shift elements, and the last shift values of out are set to zero. A new IDFT is computed, and the first len components of the result are multiplied point-by-point with the win table, and added into the out buffer. Values from out[0] to out[shift-1] are returned as described above, and the cycle repeats.

The right panel describes the default values and legal ranges for the ifft parameters len, shift, size, and win.

Example

The right panel shows a simple example, using fft and ifft together in a simple spectral modification algorithm.

FFT and IFFT

specialop fft(asig in, table re, table im [, ivar len, ivar shift, ivar size, table win]) See right panel for algorithm details. Characteristics of parameters described below. in: audio input signal that is processed by the opcode. re: table that holds the real portion of the DFT. Must have at least size samples. im: table holds the imaginary portion of the DFT. Must have at least size samples. len: optional parameter that sets the number of samples to use. may not be negative. if zero or not provided, it is the next power of two greater than a_rate/k_rate. shift: optional parameter that sets the shift amount of the analysis window. may not be negative. if not provided or zero, set to len. size: optional parameter that sets the DFT size. may not be negative. if zero, set to len. must be a power of 2, and no greater than 8192. win: windowing table for analysis. if not provided, a boxcar of length len. may not have fewer than len samples. aopcode ifft(table re, table im [, ivar len, ivar shift, ivar size, table win]) See right panel for algorithm details. Descriptions of parameter limits for fft also hold for ifft.

Example

// hanning window table table win(window, 1024, 2); // space for fft table re(empty, 1024); table im(empty, 1024); table re_m(empty, 1024); table im_m(empty, 1024); // signal new fft done ksig flag; // signal to process asig in; flag = fft(in, re, im, 1024, 128, 1024, win); if (flag) { // modify re and im here // put results in re_m and im_m } output(ifft(re_m, im_m, 1024, 128, 1024, win));

Portamento

The core opcode port is a k-rate filter, that converts a step transition of the k-rate parameter ctrl into a smooth transition with an exponential trajectory. When applied to a pitch control signal (in Hertz), it confers a portamento effect on pitch changes.

The right panel shows the header syntax and algorithm for the port opcode. A k-rate parameter htime sets the time that the output signal traverses one half of its total excursion.

This section concludes our descriptions of the SAOL core opcode library. In the final chapter in this section, we describe how users may write new opcodes in SAOL.

Next section: Part IV/4: User-Defined Opcodes

`port`

kopcode port(ksig ctrl, ksig htime) ctrl: input k-rate signal to be filtered. htime: half-transition time, in seconds. one half of the time for the return value of port to reflect a step change in ctrl. port returns the value: o + (n - o)*(1 - 2^(t/htime)) where o is the old value of ctrl and n is the new value of ctrl. o and n are updated whenever ctrl and n are not equal (o = n, n = ctrl). t is set to zero at each ctrl transition, and incremented by the 1/k_rate on each call. on first call, both o and n are set to ctrl.