ch11.6

Chapter 11
VLSI FOR TELECOMMUNICATION SYSTEMS

11.6. Case study: ATM switch

This section shows the architecture of the critical routing part in an ATM switch. Before talking about an existent ATM chip, we will present the technological constrains that drive the design.

The switch functionality can be split into two main parts:

A routing function to carry data from one input port to an output port.
A queuing function to temporally memorise incoming data causing the blocking problem.

11.6.1. Main switching considerations

11.6.1.1. Solving the blocking problem (Head of line

This section show why output buffering is a better solution to solve blocking problems (section 11.2.2.1 shows the blocking scenario)

Consider a simple 2X2 (2 input ports and 2 output ports) switch (see figure 11.16). Each number represents the destination port address. Queued cells are in yellow and routed cells are in blue.

[Click to enlarge image]

Figure-11.16: Input and Output buffering sequence

With an input buffering technique we need four cycles to route all cells.

First cycle shows the queuing of one cell and the routing of the other.
Second cycle shows the routing of the previously queued cell and the queuing of two incoming cells.
Third cycle shows the routing of the two previously queued cells and the queuing of the incoming cell.
The last cycle shows the routing of last cell.

With an output buffering technique we need three cycles to route all cells.

First cycle shows the routing of all incoming cells. One is queued, the other is sent through the connected output line #2.
Second cycle shows the routing of the second couple of incoming cells. One is queued in queue #2, the previously queued is sent through the connected output line #2 and the last one is directly sent through the output line #1.
Last cycle shows the sending of the queued cells through the line #2 and the routing and sending of the cell to the line #1.

In certain cases, output buffering allows smaller cell latency. Therefore, a lower memory capacity in the switch is needed. To solve the blocking problem the use of the output buffering technique has been chosen.

After this choice, we need to know how the routing function can be implemented. Next section presents the currently used techniques.

11.6.1.2. Routing function implementation

The simplest technique to implement the routing function is to link all the inputs to all the outputs. By programming this array of connection the data can be routed from any of the input ports to any of the output ports. We can implement this function using crossbar architecture.

11.6.1.2.1. Crossbar switch

A crossbar is an array of buses and transmission gates implementing paths from any input port to any output port. This section describes this technique. To understand the limitations of such technique we first describe the transmission gate.

[Click to enlarge image]

Figure-11.17: Electric view of a transmission gate.

11.6.1.2.1.1 Transmission gate

Figure 11.17 shows an electric view of a transmission gate. Figure 11.18 shows a schematic view of the same transmission gate. Two complementary transistors transmit the input signal without degradation (the NMOS transmit the VSS and the PMOS transmit the VDD). Command input enables or disables the transmission function. For instance:

If Command = VSS then both transistors are locked.
If Command = VDD then, both transistors are saturated.

Cin represents the parasitic load on the input line and Cout represents the parasitic load on the output line.

[Click to enlarge image]

Figure-11.18: Schematic view of a transmission gate.

11.6.1.2.1.2 The crossbar switch

If we wire an array of transmission gates as shown in figure 11.19, we obtain a programmable system capable of routing any incoming data to any output port.

[Click to enlarge image]

Figure-11.19: 2X2-crossbar switch.

We can implement a 4X4 switch repeating this 2X2 structure (see figure 11.20).

[Click to enlarge image]

Figure-11.20: 4X4-crossbar switch.

We can repeat this structure N times to obtain the required number of input and output ports. This approach causes a bus load problem. The more the number of input and output ports is, the more the load and length of each bus is. For example, in figure 11.20 load on the input bus #1 is four times the input load of one transmission gate plus the parasitic capacitance of the wire. Therefore, the routing delay from an input to an output is long. We can not use this technique to implement high throughput switches with a large number of ports.

To solve this problem a switch based on a 2X2 switches network has been developed. Next section shows how these switches are implemented.

11.6.1.2.2. The Batcher-Banyan switch

Figure 11.21 shows the 2X2-switch module. This switch is composed of one 2X2 crossbar implementing the routing function and four FIFO memories implementing the output buffer function. The delay to carry data from an input to an output is lower than that of the crossbar switch because buses are short and are loaded by only two transmission gates.

Figure 11.22 shows an 8X8 Banyan switch. Input ports are connected to output ports by a three stage routing network. There is exactly one path from any input to any output port. Each 2X2-switch module simply routes one input to one of their two outputs.

[Click to enlarge image]

Figure-11.21: 2X2 switch.

[Click to enlarge image]

Figure-11.22: Banyan network switch.

A blocking scenario in a Banyan switch is shown in figure 11.23. In this figure red paths show successful routing cells and blue ones show blocking cells. The numbers at the inputs represent cell destination output port number.

All the incoming cells have a different output destination, but only two cells are routed. Some internal collision causes this problem.

A solution to this problem is to make sure that this internal collision scenario never appears. This can be achieved if incoming cells are sorted before the Banyan routing network. The sorter should sort the incoming cells according to bitonic sequence rules. A Batcher sorter using a 2X2 comparators network implements this function.

[Click to enlarge image]

Figure-11.23: Blocking in a Banyan network

Figure 11.24 shows some routing scenario without internal collisions.

[Click to enlarge image]

Figure-11.24: Routing scenario without collision

For instance, the following sequence is a bitonic sequence: {7, 5, 2, 1, 0, 3, 4, 6}.

Rules to identify bitonic sequences are as follows:

An ascending order sequence, {0, 1, 2, 3, 4, 5, 6, 7}, like in the first scenario of figure 11.24.
A descending order sequence.
An ascending order sequence followed by a descending order sequence.
A descending order sequence followed by an ascending order sequence {7, 5, 2, 1, 0, 3, 4, 6}, like in the second scenario of figure 11.24.

This well-known architecture is currently used to implement the switching function. Next section comments an existent switching chip using this technique.

11.6.2. ATM Cell Switching

11.6.2.1. ATM high-level Switch Architecture

Table 2 shows the main function of each ATM layer.

Function	Layer name
Convergence Layer	CS	AAL
Segmentation and Reassemble	SAR	AAL
GFC field management Header generation and extraction VCI and VPI processing Multiplexing and demultiplexing of the cells	ATM
Flow rate adaptation HEC generation and check Cell synchronization Transmission adaptation	TC	PL
Synchronization Data emission and detection	PM	PL

Table-11.2: ATM layer structure

AAL: ATM Adaptation Layer

CS: Convergence Sublayer

SAR: Segmentation and Reassemble layer

ATM: ATM Layer

PL: Physical Layer

TC: Transmission Convergence

PM: Physical Medium

Figure 11.25 shows a switch high-level architecture. Each block implements some of the functions describe in Table 1.

[Click to enlarge image]

Figure-11.25: Switch architecture

An explanation of the general functionality of each layer can be found in section 11.5.4.

The management block drives and synchronizes other layers, for instance, it drives the control check and the administrative functions. High data transfer rates can be reached (up to some gigabits per second).

One of the critical blocks of this architecture is the switching module (surround in bold in figure 11.25).

Previous section discusses one of the most currently used techniques to implement this function. In next section we will comment an existent chip designed with the previously described techniques.

11.6.2.2. Existent Switch Architecture

Figure 11.26, Yam[97], shows the mapping between the chip architecture and the functional architecture.

[Click to enlarge image]

Figure-11.26: Comparison Functional to Real architecture

There are three main blocks in this chip:

The first block implements the heading processing
The second one implements the commutation table
The third one implements the switch function

Figure 11.27 shows the details of the entire switching system.

[Click to enlarge image]

Figure-11.27: switching system

The switching network module is mainly composed of the following blocks: a Batcher-Banyan network, one input multiplexer bank and one output demultiplexer bank. The Batcher-Banyan network implements the switching function. The Multiplexer-Demultiplexer banks are used to reduce the internal Batcher-Banyan network bus width. (From 8 bits to 2 bits and vice versa).

This means that to switch one incoming 8-bit-word in one cycle, four internal Batcher-Banyan network cycles are needed. A drawback for the bus width reduction is a four times increase in the internal switch frequency. Therefore, the chip designers had to choose a faster technology to keep a high throughput switching function. In this case they choose the Ga-As technology, usually used for high-frequency systems.

This chapter edited by E. Juarez, L. Cominelli and D. Mlynek

a joint production of

EJM 17/2/1999

Chapter 11 VLSI FOR TELECOMMUNICATION SYSTEMS

Chapter 11
VLSI FOR TELECOMMUNICATION SYSTEMS