1
DYNAMIC PACKET PROCESSOR ARCHITECTURE
This application claims the benefit of Provisional Application No. 60/268,813, filed Feb. 14, 2001.
TECHNICAL FIELD
The present invention relates to a dynamic packet processor architecture that includes generic pipeline stages which may handle a variety of packet information.
BACKGROUND AND SUMMARY OF THE INVENTION
Most conventional packet processors based systems, such as L2/L3/L4 switches and routers, include separate static and pre-defined input filters, route filters and output filters. Each filter is often only designed to perform one specific function and cannot handle a wide range of functions. Conventional router/switch systems are also designed to handle a predetermined number of processing steps.
Specialized processors and dedicated hardware are currently used to perform packet processing in today's routers and switches. Each of these approaches represents some advantages and limitations. While specialized processors provide the flexibility to support new protocols and packet flows, they cannot handle line speed processing rates. The opposite is true for specialized hardware where flexibility is very limited but they can handle line speed processing rates.
If for example, the conventional packet processor based system has three fixed filters and is designed to handle three stages and the packet is non-conforming in that it requires more than three stages, then the conventional router systems cannot easily handle such a request and requires complicated re-looping mechanisms that dramatically slow down the processing of the non-conforming request. In many cases, the lookup tables of the input filters are relatively small which limits the global functionality and system flexibility. The approach of relying on a fixed number of filters that are each designed for a specific function unduly limits the flexibility and performance of the system when the required stages of incoming packets are not conforming to the specific design of the system. It is also often cumbersome to rely on many different specialized filters instead of generic filters that may handle almost any type of incoming packet type, packet flow and input port type.
The dynamic processor of the present invention provides a high degree of packet processing flexibility at line speed rates. More particularly, the dynamic packet processor architecture of the present invention includes a generic pipeline stage assembly in which every pipeline stage may be dynamically configured partly depending upon the packet type, the packet flow requirements and the input/output port type. In this way, the generic pipeline stages may be adjusted to the required processing stages and is designed for any packet flow, packet type and input/output port type. The input port types may, for example, include ethernet, POS (Packet Over Sonet), DTM (Dynamic Transfer Mode) and raw data. The raw data may be any input that is not using a predefined network protocol, i.e., voice data received by an El/Tl framer. The configuration may be performed on a per packet flow basis so that, for example, the same stage may function as an input filter, route filter or as an output filter, as required.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic block diagram of a packet processor system of the present invention;
2
FIG. 2 is a schematic block diagram of the dynamic packet processor of the present invention;
FIGS. 3a/3fc are schematic flow diagrams of the various states in a pipeline stage of the present invention;
FIG. 4 is a schematic block diagram of a generic pipeline stage of the present invention;
FIG. 5 is a schematic block diagram of a queue and memory management unit of the present invention; 10 FIG. 6 is a schematic flow diagram of a descriptor chain of the present invention; and
FIG. 7 is a schematic block diagram of a lookup engine of the present invention.
15 DETAILED DESCRIPTION
With reference to FIGS. 1-7, the processor architecture of the present invention is suitable for a variety of network systems and the dynamic transfer mode (DTM) topology is only mentioned as a suitable example of an application. In 20 other words, the generic distributed architecture of the present invention is suitable and optimized for, but not limited to, DTM systems. Every packet that is received initiates a task that requires processing. The task includes the full packet and some overhead information. The task then traverses a generic routing pipeline where the task is modified based on the routing lookup results received by different pipeline stages along the path. At the end of the pipeline, the task may be written into a memory to allow the backplane output interface unit to read it out and send it over the DTM
30
backplane or over any other suitable interface. On the transmit side of the architecture, the packet may traverse a similar path from the backplane interface unit to the front end unit.
35 FIG. 1 schematically illustrates a packet processor blade system 10 that is connected to an I/O port unit 12, a DTM backplane unit 14 and a CPU host 16 that is part of a central main CPU that communicates with all the nodes of the DTM backplane unit 14. The system 10 could, for example, be a
40 component of a node that is connected to a DTM ring or bus topology so that the system could receive information, such as information packets, from the outside and inside of the DTM node topology. All nodes in the backplane unit 14 are local but the host 16 is in communication with all the nodes
45 through a different path. As explained in detail below, the system 10 may be used to determine, based on the packet information coming in through an incoming line interface, to which node an outgoing line interface should forward the packet information.
50 The unit 12 may be connected to any suitable user unit that carries information to and from the unit 12 such as ethernet, POS (packet over sonet), DTM (dynamic transfer mode) or raw data. The I/O port unit 12 may be connected to a front end unit 18 via two-way connectors 20 or copper
55 traces using a printed circuit board. For example, the front end unit 18 may be a MAC/PHY device. The front end unit 18 may be used to determine if the incoming packet are valid and supported by the system 10. Preferably, the front end unit 18 is connected, via two-way connectors 21 to a line
go interface unit 22 of a dynamic packet processor 24. In this way, the line interface unit 22 may be the interface between the front end 18 and the packet processor 24.
One function of the line interface unit 22 is to convert an incoming packet 61 to a task and to determine which type
65 the incoming packet is and what would be the first instruction to perform on the task. The task may be a block of data created from the original packet by the line interface unit 22
3
that may be modified by one or a plurality of pipe line stages in the pipeline assembly 26. The task is the actual data that traverses the pipeline stages and has a variable but limited length.
As indicated above, when the line interface 22 receives a 5 packet from a port at the front end 18, the line interface 22 generates the task containing the full packet and additional task overhead (TOH) information. In this way, the packet will not be stored in the packet buffer until it traverses the routing pipeline stages in the pipeline assembly 26. 1° Therefore, the packet ordering within a flow can be guaranteed without special handing. Every valid packet received by the line interface unit 22 will generate a task. By transforming an incoming packet to a task, unnecessary information that is not required by the packet processor 24, :5 is removed from the packet.
In addition to the line interface unit 22 and the first generic pipeline stage assembly 26, the packet processor 24 includes a second generic pipeline stage assembly 28 for transmission, a single lookup engine or multiple lookup 20 engines unit 30, a packet buffer/memory 32 and a backplane interface unit 34. The pipeline stage assemblies 26, 28 are generic and can handle virtually all incoming tasks. The pipeline assemblies 26, 28 include routing pipeline stages that can be dynamically instructed to perform a specific 25 function on a per packet/task basis. For increased system flexibility, and to be able to support future protocols, an un-supported task may be sent to the local blade processor automatically. As described below, each assembly has a plurality of identical generic pipeline stages. All lookup 30 requests from the pipeline stage assembly 26 are handled by the lookup engine 30 for interface rates up to several million lookups per second. With currently available technology, it is possible to build a lookup engine that can handle 150 million lookups per second. 35
The pipeline stage assembly 26 is connected to the interface unit 22 via an one-way RBUS bus 36 and to the lookup engine unit 30 via a two-way bus 38. The pipeline stage assembly 26 is also connected to the packet memory 4Q 32 via an RFQI bus 40 and to the backplane interface 34 via a bus 42.
The bus 42 may be used if a particular packet needs more stages than the available number of stages in the assembly 26. The extra stages may be performed by the assembly 26 45 in a second loop. In this way, when the backplane line interface 70 retrieves a packet from the packet memory 32 and realizes that all the steps are not completed, the interface 70 may send back the packet via the bus 42 to take care of the uncompleted stages in the assembly 26. It should be 50 noted that this procedure does not require additional memory for partially processed packets.
The packet memory 32 is connected back to the line interface 22 via a TLQI bus 44, to the second pipeline stage assembly 28 via an TFQI bus 46 and to the backplane 55 interface 34 via an RLQI bus 48. The second pipeline stage assembly 28 is connected to the line interface 22 via a bus 50 and to the backplane interface 34 via the TBUS bus 52. The backplane interface 34 may be the interface between the packet processor 24 and the DTM back plane 14. The go interface 34 is connected to the DTM backplane 14, or any other suitable network, via two-way buses 54.
The dynamic packet processor 24 is connected to a blade control processor 56 via a two-way bus device 58. The processor 56 is connected to the host CPU 16 via a two-way 65 bus 60. The processor 56 is responsible for updating the memory content of the lookup engine 30 and for configuring
4
the packet processor 24. The processor 56 is the interface between the host processor 16 and the packet processor 24 and does not require a significant processing power to perform its basic functions.
When the system 10 receives an incoming packet 61 through the I/O port 12, the packet 61 will pass to the front end 18 via the bus connectors 20 and into the line interface 22 via the bus connectors 21. The line interface 22 converts the packet 61 to a task and forwards the task, via the bus 36, to the first pipeline stage assembly 26. One function of the line interface 22 is to control the way the incoming tasks are distributed to the pipeline stage assembly 26 by analyzing the overhead of the task. The overhead includes instructions regarding which functionality/purpose or type of process that are required for the packet 61. The overhead is changing as the information is passed through the pipeline stage assembly 26. The overhead may include instructions about what is required in the next step of the process. For higher I/O performance, it is possible for the line interface 22 to interface with multiple pipelines 26 for a distributed processing architecture.
Depending on the input port type and the packet type, the packet processor 24 needs to perform specific filtering tasks in a given order at each processing stage of the packet Therefore, every packet flow has a predefined processing path with different pipeline stages and lookup numbers. To be able to handle all types of packet and input ports listed in the design requirements, without having to build complicated multi-branch pipelines, the generic pipeline stage assembly 26 can perform all of the required filtering tasks.
A typical first overhead instruction may be to filter the task information. The pipeline stage assembly 26 may be designed to conduct some flow level filtering at a low level so that, for example, the assembly 26 may drop all tasks that have been sent from a particular port. The pipeline stage assembly 26 is also designed to conduct higher level filtering such as route filtering and input filtering. As explained in detail below, if the task is passed through an input filter, the lookup may provide information about whether the task should be dropped or not. If, for example, the task is passed through a route filter, the lookup may provide information about where the packet should be sent. A lookup for an output filter may determine which channel or port should be used on the DTM backplane. Other types of lookup types include encapsulation and decapsulation lookups that determine if the task needs to be encapsulated or decapsulated before it is sent into the DTM backplane 14. The lookup results may also provide information related to the type of encapsulation/decapsulation that is needed and what additional information that encapsulation includes and the next hop address of the task/packet. There are different protocols for the various encapsulation/decapsulation procedures. The hop address refer to the IP (Internet Protocol) address of the next router.
In general, each stage in the generic pipeline assembly 26 receives tasks at line speed and variable length from the previous stage or line interface with pre-defined instructions. Since this is a cut-through architecture, it is possible to have variable length tasks without performance penalty or buffer size limitations. For every lookup request, the pipeline stage receives back lookup results that include the computations or modifications that need to be performed and the next instructions to be performed. The pipeline stage applies the modified instructions accordingly and sends out the modified task with the next instruction to the next pipeline stage.
More particularly, the task traverses through each stage of the pipeline stage assembly 26. In each stage, the overhead
5
of the task includes instructions about what needs to be performed in each particular stage. For example, the instructions may relate to the type of lookup and modifications that are required. As indicated above and described in detail below, the payload of the packet is forwarded to the FIFO 5 buffer for buffering while the lookup engine 30 looks up the information requested by the stage assembly 26 in a database.
FIG. 2 is a more detailed schematic view of the packet processor 24 that has a receiving portion, as indicated by a 1° thick receiving arrow 62 directed in a first direction, and a transmitting portion, as indicated by a thick transmitting arrow 64 directed in a second direction that is opposite the first direction. The white arrows represent transfer of packets/tasks from one component to another component of :5 the packet processor. Both the transmit and receive portions together form the packet processor.
The line interface 22 of FIG. 1 may be divided into a receiving line interface 66 and a transmitting line interface 68. Similarly, the backplane interface 34 of FIG. 1 may be 20 divided into a receiving backplane interface 70 and a transmitting backplane interface 72. As explained below, the backplane interface 70 may retrieve packets stored in the QMMU 32 by sending a retrieval signal 71 that includes a correct descriptor for the required packet and the amount of 25 data requested. The line interface unit 68 may retrieve packets from the QMMU 32 by sending a signal 69 that includes the correct descriptor for the required packet.
The pipeline stage assembly 26 contains a plurality of 3Q identical generic pipeline stages 26a-26«. An important feature of the present invention is that each pipeline stage is identical and can handle a wide range of functions. However, the last stage 26m may be slightly different in that it may need to change the task into a packet format with 3J descriptors that are suitable for being written into the memory. It is necessary to include a plurality of generic pipeline stages in the pipeline stage assembly 26 because certain packets may require more than one generic pipeline stage if many lookup functions are required. For example, if 4Q a packet only requires two lookups, only two generic pipeline stages will be used while the task is passed through all the other pipeline stages in the assembly 26 without any lookups or modifications.
As best shown in FIG. 4, each pipeline stage 26A: receives 45 a task 72 from either the previous pipeline stage or if the pipeline stage 26A: is the first pipeline stage, the task is received from the line interface unit 22. The task 72 comprises the task overhead information that may be divided into an instruction portion and a miscellaneous overhead 50 portion that go into the request engine 76. The instruction portion as well as portions of the packet may be read by a key extractor or key engine that reads the instruction portion. The engine 76 has access to a table that may be used to interpret the instructions in the instruction portion of the 55 packet. If the instruction bit in the packet corresponds to a value in the table that requires no action, then the pipeline stage will not modify the packet and just read and forward the packet as it is to the next pipeline stage in the assembly. The entire task, including the payload and all the overhead 60 information, goes into a FIFO buffer 86. It is also possible to send the entire task simultaneously into the request engine 76 so that the request engine may use the portions of the task that are needed for the lookup process.
The request engine 76 reviews the instruction portion and 65 determines what type of processing is required. For example, the request engine 76 may determine the type of
6
packet that originated the task 72. The originating packet may have been an IP packet so that the request engine knows that the next step is to generate a correct key 78 for the lookup engine 30 so that the correct lookups are performed by the lookup engine for the particular IP packet. The request engine 76 may have a destination address but does not know which route the packet with the specified IP destination address should take. The lookup engine 30 may provide this type of information. If the incoming task 72 is based on an IP packet, then the instruction portion may include information related to the source address, the destination address and other fields of information. The lookup engine 30 determines who the sender is and where the packet is going to. Based on a table disposed in the lookup engine 30, the engine 30 will send back a modification data signal 82 to a modifier engine 84. The signal 82 may include modify instruction for the next step in the subsequent generic pipeline stage. For example, if the lookup is a routing lookup, the signal 82 includes results related to, among other things, instructions regarding which channel is to be used, the next IP address and what the next step in the process is. The lookup results may bypass the modifier engine 84 by using a next instruction bus 87 that is linked to the bus 85.
The task may also need some modifications before it is transmitted to the destination address. In this way, the lookup engine 30 receives the key 78 and obtains the requested information from a database 80 to, for example, find out whether the output I/O should be instructed to route the task to a certain node. The task may be dropped if the sender of the packet, for example, is not permitted to transmit the packet or the receiver is not permitted to receive the packet. If the lookup was related to an input filter, the results retrieved from the database and issued by the lookup engine 30 may include information whether the task should be dropped or not. An important feature of the present invention is that the results from the lookup engine 30 depends upon the requested function submitted to the lookup engine. The requested function, in turn, depends on the type of the incoming packet and what processing needs to be performed. Therefore, any stage may request any lookup type and handle any possible modification.
Preferably, the entire task 75, including the payload and task overhead, goes into the FIFO buffer 86 and remains in the buffer while the lookup engine 30 processes the instruction portion. The buffer 86 may store several tasks while the lookup engine 30 processes a plurality of tasks to compensate for the delay between submitting a request to the lookup engine 30 and obtaining a result from the lookup engine 30.
In certain cases, the request engine 76 may be congested and cannot process the incoming tasks for any reason. A rate control engine 81 may inform the previous pipeline stage 26/ via a backpressure bus 83 to the previous modifier engine that the pipeline stage 26A: is congested and the previous pipeline stage may stop sending tasks to the pipeline stage 26/. The request engine 76, the modifier engine 84 and the FIFO buffer 86 may send signals 77, 79, 91 respectively, to the rate control engine 81 when congestion occurs. For example, when the buffer 86 is filled to capacity, the buffer 86 may generate the signal 91. Any lookup results that arrive to the modifier engine 84 from the lookup engine 30 during this time may be internally stored in the modifier engine 84 until the congestion problem has been resolved. When the upstream pipeline stages receive congestion signals from downstream pipeline stages, the upstream pipeline stages will slow down or delay the sending of tasks until the temporary congestion has been resolved. If the congestion is
« AnteriorContinuar » |