# A 622Mbps ATM Physical Layer ASIC and Its "Design for Test" Methods

Chan Kim, Yeong Ho Park, Doo Sub Eom, Jae Geun Kim Broadband Communications Department, ETRI

## Abstract

In this paper, a 622Mbps ATM Physical Layer ASIC design is described. This ASIC performs the full 622Mbps ATM physical layer functions according to the ITU-T 1.432 and ATM Forum UNI standards. The cells are processed at 77.76MHz speed but most of the other STM related circuits run at 19.44MHz clock. Each functional block is explained with its robust synchronization mechanism among other processing blocks. The design aspects and techniques for the ASIC test are explained briefly for scan test and additional functional test in reduced frame mode which was specially designed for this ASIC. Most of the basic functions were verified through loop-back test including fiber loops.

#### 1. INTRODUCTION

Though the dominant ATM physical layer is 155Mbps. the 622Mbps ATM physical layer interface will be the major network side interface for ATM switches and access nodes and it can be used to increase the internal ATM transfer speed at some distributed network equipment. In this paper, a 622Mbps ATM Physical Layer ASIC implementation is described. This ASIC performs the full 622Mbps ATM Physical layer functions according to the ITU-T and ATM Forum UNI standards. content of this paper is as follows. In section 2, the basic specification of 622Mbps ATM physical layer is briefly explained with respect to its frame structure, and its OAM functions. Section 3 describes the design of the ASIC with its processing blocks and its robust synchronization mechanism. In section 4, the "design for test" methods used in this ASIC are explained for full scan test with several techniques and an additional functional test in reduced frame mode. With this tests, fault coverage of 98% was achieved and the chip is believed to be highly reliable. Section 5 concludes this paper.

# 2. 622Mbps ATM PHYSICAL LAYER'S SPECIFICATION[1],[2],[3]

#### 2.1 Frame Structure

Fig. 1 shows the 622Mbps ATM physical layer frame structure. In this 622Mbps ATM physical layer, the frame carries VC4-4c virtual container and the virtual container carries the actual C4-4c container to which continuous stream of ATM cells are mapped. This frame is transmitted every 125 micro-second to form 622.080Mbps transmission rate and the C4-4c container has 599.04Mbps bandwidth capacity. The frame has a pointer which points to the start of the VC4-4c in the current frame and it is through the pointer operation which dynamically decouples the frequency and phase offset between the network elements.



Figure 1. STM4-4c Frame Structure

The SDH(SONET) has a highly layered network architecture and it has RS(Regenerative Section), MS(Multiplex Section) and Path layers. The MS is served by the RS layer and the Path layer is served by the MS layer. The OAM(Operations and Maintenance) is also carried for each layer and using overhead bytes, some fixed bandwidth is allocated for carrying OAM information of each layer. The details of STM overheads and OAM operations are not explained here.

#### 3. ASIC FUNCTIONAL ARCHITECTURE

#### 3. 1 Overall Hardware Architecture

The architecture of the ASIC is shown in Fig. 2. It has a generic 8 bit CPU interface, 8 bit interface to the lower transceiver and 16 bit UTOPIA interface<sup>[4]</sup> to the upper ATM layer. The ATM cell processing block and the POH processing block run at 77.76MHz speed. Between the POH processing block and pointer processing block are located multiplexer or demultiplexer for each transmit or receive direction. The pointer and STM processing is done in parallel for four 19.44MHz data streams. All major synchronization signals are generated by 19.44MHz clock even in the POH processing block and if a 77M control signal needed, it is locally derived from its 19M signal.



Fig. 2. 622Mbps ATM Hardware block diagram

Special care was taken to guarantee the clock phase re-

lations between 19M and 77M clocks. 77M data stream is aligned so that the first column data is located at the first slot of 19M clock. And for the data transfers between the 19M and 77M clocks, negative edge flip-flops were used to guarantee that the data are transferred at the intended edge regardless of the clock phase difference(though it is guaranteed to be very small). The processing blocks have robust synchronization mechanism in which all the blocks are synchronized and enabled by their synch mastering blocks using one or two signals on every frame base and no data buffers are required between blocks.<sup>[5]</sup>

#### 3.2 Functions of each block

# 3.2.1 transmit ATM cell processing block

ATM cell processing block is enabled during the C4-4c interval by the C EN signal coming from the POH processing block. There are four cell FIFO in the ATM cell processing block which are comprised of four 16bit x 27word DPRAMs. The external ATM layer circuit writes cells to the FIFO through the 16 bit UTOPIA interface. When there are cells to be transmitted, the cell processing block reads the cell data and inserts idle cells when there is no cell to transmit. The 16 bit format is converted to 8 bit standard form exiting the FIFO without interrupting the stream of standard cells. The ATM cell processing block generates HEC value and scrambles the payload data. As a result, the cell processing block sends the continuous stream of C4-4c cell stream to the POH processing block with the start of cell signal to enable the generation of H4 value at the POH processing block.

#### 3.2.2 transmit POH processing block

The transmit POH processing block is enabled during the VC4-4c interval by the VC\_EN signal coming from the pointer processing block and it generates C4-4c enable signal to enable the cell processing block. The POH processing block generates and multiplexes POH bytes to the C4-4c cell stream to form VC4-4c data. The C4-4c cell stream comes to the POH processing block already aligned to the VC4-4c timing because the POH processing block governs the generation of C4-4c interval. Therefore no buffer is required between blocks. The VC4-4c data is delivered to the pointer processor with the TXJ1\_OFS signals indicating the start of the VC4-4c. The POH bytes processed are J1, B3, C2, G1, F2, H4, Z1, Z2, Z3. On exiting the POH processing block, this VC4-4c data is de-multiplexed into four 19M streams and delivered to the pointer processing block.

#### 3.2.3 transmit pointer processing block

The pointer processing block, which is the master of the whole transmitter, commands the start of the STM transmission frame with F8k signal. It also enables the POH processing block with VC\_EN signal. The pointer processing block generates the pointer value using the TXJ1 OFS signal and multiplexes the pointer value(H1,H2) with VC4-4c data to form AU-4c(VC4-4c plus pointer ). The VC4-4c data stream comes to the pointer processing block already aligned to the pointer(STM frame) generation because the pointer processing block governs the generation of the VC4-4c and the STM frame and there is no need for data buffer between POH processing block and pointer processing block. In this ASIC, since the pointer processing block explicitly commands the start of VC4-4c every frame. the transmit pointer value is always fixed to 1.

#### 3.2.4 transmit SOH processing block

SOH processing block's timing is synchronized by the pointer processing block and generates the frame periodically by generating all the SOH bytes and multiplexing them to the already aligned AU4-4c data to form whole STM frame. There is no buffer between pointer processing block and SOH processing block. The overhead bytes processed at the SOH processing block are J0(C1), B1,E1, F1,D1, D2,D3, B2 ,K1, K2,D4~D12, S1,M1, and E2. The 19M data streams are finally multiplexed to form a single 77M stream and scrambled out.

# 3.2.5 receive SOH processing block

In the receiver, the framer detects the frame start from the received 77.76M stream and synchronizes the SOH processing block to the incoming data stream. The frame data is de-scrambled and de-multiplexed into four 19MHz STM1 streams for further SOH processing. The SOH processing block extracts the SOH bytes and processes them. The data stream is then delivered to the pointer processing block with frame start(F8k) signal. The SOH bytes processed in the SOH processing block are J0(C1), B1,E1, F1,D1~D3, B2,K1, K2,D4~D12, S1,M1, and E2.

#### 3.2.6 receive pointer processing block

The pointer processing block is synchronized to the received data steam by the frame start signal(F8k) from the SOH processing block and extracts and interpret the pointer to detect the start location and the interval of the received VC4-4c data and delivers the information(RXJ1\_OFS signal and VC\_EN signals) with the received VC4-4c data to the POH processing block. Pointer increment, decrement, new pointer, and three consecutive valid pointer conditions are processed.

#### 3.2.7 receive POH processing block

The four data streams from the pointer processor are again multiplexed to form a single 77M VC4-4c data stream on entering the POH processing block. The POH processing block is enabled during the received VC4-4c interval by VC\_EN signal coming from the pointer processing block and synchronizes its timing to the received data using the RXJ1OFS signal. Then, it extracts and processes all the POH bytes. The POH processor also delivers the C4-4c payload to the ATM cell processor with C\_EN signal. The POH bytes processed are J1,B3,C2,G1,F2,H4,Z3,Z4,Z5.

#### 3.2.8 receive ATM cell processing block

The receive ATM cell processor is enabled by the C\_EN signal coming from POH processor and is operated only within the C4-4c payload period. The cell processor performs the cell delineation using the HEC decoding process and extracts the start of cell information. The cell payload is then descrambled and idle cells are filtered out. The valid cells are written to the receive FIFO with 16bit UTOPIA format conversion without delay. The ATM layer reads cell data from the 16 bit UTOPIA interface.

#### 3.3 Some Design Considerations

#### 3.3.1 Data Multiplexing and Demultiplexing

In this design, the timing relation between processing blocks are held as before based on 19.44Mhz clock domain and all the control signals were generated with 19M clock. If a 77.76Mhz version control signal—is needed, it was locally generated using the corresponding 19M version. As shown in Fig. 3, the 19M clock's rising edge was controlled to be as close to the rising edge of 77M clock as possible and the first column data of the 77M stream is controlled to be always located at the first slot of the 19M clock period. In this manner, the data and control signals are always aligned to 19M clock at any interface so that the interfaces are easy to understand.

For each direction, the 19M clock is derived from the 77M clock, and they use physically separate clock buffers(remember, the clock buffer delay can only be estimated statistically). To guarantee the clocks' phase relationship, the 19M clock edge was delayed as close to the right(column 0) 77M clock edge as possible using 77M clock(including negative edge to leave smallest phase gap behind) and then the remaining gap was adjusted using delay element. By using as small amount of delay as possible, the phase relation is guaranteed for the ASIC fabrication process variation and operating condition changes. The 77M and 19M clocks are all of BCT(Balanced Clock Tree) type and the worst and best

case clock buffer delays were taken into account to guarantee the phase relation. In addition to this, for the data transfers between the 19M and 77M clocks, negative edge flip-flops were used to guarantee that the data are transferred at the intended edge regardless of the clock phase difference(though it is guaranteed to be very small).



Fig. 3. Data Interleaving and Deinterleaving

#### 3.3.2 Synchronization Mechanism

As shown in the block diagram(Fig.2), each processing block's controller is synchronized by a single external signal sent from its synch mastering block. Once it is synchronized to its data streams, the internal control signals are periodically generated every frame even if no external synchronization is received. But whenever an external synchronization signal is received, the internal controller is re-synchronized by the external command. At the same time, each block sends another synchronization signal every frame to its synchronization slave block. Therefore the synchronization is hardly broken and safely acquired by the periodic master-slave synchronization mechanism. This makes the chip more robust so that each block runs autonomously even when no synchronization information is available temporarily. The synchronization schemes are explained for each block later.

# 4. ASIC "DESIGN FOR TEST" METHODS

# 4.1 SCAN test [6]

Scan test is a method of testing the chip by loading all the flip-flops with known test vectors and after giving one clock, reading the changed values of the flip-flops. By comparing the changed values with the verification data prepared by simulation, any internal fault can be detected. To load and read the flip-flops, all the flip-flops are temporarily connected into shift register chains during the

scanning cycles using multiplexers at each flip-flop's inputs. The system mode clock is given when the circuit is configured to its normal unchained mode. The loading(scan-in) of the test vectors and reading(scan-out) of the changed data are performed at the same time. Detailed scan test methodology is not explained here but the design aspects for scan test of this ASIC are explained.

During the scan test, all the clocks were made to be driven by single external clock input using multiplexer at each clock buffer. In this ASIC, the whole circuit was divided into four scan chains according to the use of physical clock buffers and if different clocks are used in a single chain, the flip-flops were grouped according to the clock domain and latches were inserted at the boundary of the two different clock domains. The grouping and insertion of lock up latches are done automatically by the SCAN synthesis software with appropriate commands. Also, all the asynchronous inputs and internal asynchronous feedbacks like reset are disabled and are broken to prevent unintended data reset or preset by some test vectors or changed flip-flop values during the scan test.

The CPU registers were changed to synchronous circuit using enable signals generated for the chosen address in the middle of write cycles, and the DPRAMs were forced into transparent mode by making the read and write addresses equal and constantly enabling the write signals during the scan test. During the ATPG(Automatic Test Patter Generation), the equivalent transparent model was used for these DPRAMs. The integrity of the DPRAMs are tested in the reduced frame mode function test which will be described later.

Likewise, the latches and negative edge flip-flops were also forced to look transparent using multiplexers at the outputs. This technique was extended to bypass two functional blocks too. The two main pointer processing blocks which were not conforming to the scan rules and were the remains of the former 155M design, were bypassed during the scan test by making them look like arbitrary combinational logics and the corresponding combinational model was used during the ATPG process. For bi-directional pins a special input pin was added to inhibit the internal tri-state buffers to make it possible to supply test patterns as well as read the changed primary output values through the same pins. This tri-state inhibit control signals is periodically activated for all the test vectors.

Since the simulation of the actual scanning operation takes too much time, the loading of the test vector and comparing the changed values with the verification data is done in parallel during the ATPG simulation and this test pattern is extended to the actual serial scan later. Using parallel scan simulation, some possible test errors caused by asynchronous loops or negative clocks were fixed and

the verification data was prepared. Through serial scan simulation for several test vectors, the scanning operation was verified before test sign-off. Due to some untestable pins like reset, using this full scan test, 90 % of fault coverage was obtained with almost 1000 test vectors of length of about 500. But with the function test described later, the overall 98 % fault coverage was achieved.

#### 4.2 function test in reduced frame mode

Due to the exclusion of the main pointer processing block and the DPRAMs from the scan test, a loopback functional test (at 5MHz) was added to cover the whole chip. But in STM circuit, most actions are based on frames and it requires too many test vectors to synchronize the receiver all the way up to the cell delineation process. This problem was solved by modifying the timing circuits of each block to have a test mode in which each block runs in 9 x 30 mode instead of 9 x 270 mode(in 19.44M clock domain) and the control signals relations between each block is preserved exactly the same as in normal mode. In this reduced frame mode, 8 actual cells with test patterns were transmitted and received in loop-back and most of the major functions could be verified including FIFO integrity.

As shown in Fig. 4, the controllers of each block can be divided into periodic timing pulse generation and actual control signal generation. The reduced frame mode can be achieved, for example, by bypassing the 3x3 part of the SOH processing block's periodic pulse generator(Fig.5). This is possible because all the processing blocks are actuated only by the control signals on every frame base without having internal "memory" of timing across frames. This can be called a scaleable architecture because scaling the timing control makes the whole circuitry scaled proportionately. It is noteworthy that some logic were also modified so that the circuit can go to its normal operating condition as fast as possible during this reduced frame mode loop-back test. For example, pointer stability check in the transmitter and C2 mismatch check in the receiver is disabled during this test.



Fig. 4 Structure of each block's controller



Fig. 5. Example of timing generator

### 5. CONCLUSION

A 622Mbps ATM Physical layer ASIC was designed and implemented. It performs the full functions of ITU-T and ATM Forum standards. It also supports full OAM functions and diagnostic functions. Most of the basic functions were verified through loop back test including optical fiber loops and it will undergo formal test when the test equipment is set up. This chip will be used in the design of a ATM service node having 622Mbps ATM physical interface. Figure 6 shows the chip photograph.



Fig. 6 Chip Photograph

#### 6. REFERENCES

- 1. "Synchronous Multiplexing Structure," ITU Recommendation G.709.
- 2. "B-ISDN User Network Interface-Physical Interface Specification," ITU Recommendation I.432
- 3. ATM User-Network Interface Specification, ATM Forum V3.0, October 1993.
- 4. UTOPIA Level 1 Specification, ATM Forum
- 5. Thomas J. Robe and Kenneth A. Walsh, "A SONET STS-3c User Network Interface Integrated Circuit," IEEE
- J. Selected Areas Commun., Vol. 9, No. 5, June 1991
- 6. Test Builder Manual, LSI Logic