METHOD AND APPARATUS FOR NETWORK INTRUSION DETECTION SYSTEM CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/534,283, filed January 5, 2004.
TECHNICAL FIELD
[0002] The present invention generally relates to computer network monitoring and, more particularly, to a system and method of monitoring a computer network for unauthorized access and intrusion.
BACKGROUND
[0003] Intrusion detection is a security technology that attempts to identify and isolate potential intrusions into a computer system. An intrusion may include unauthorized usage or misuse of a computer system, or various other types of anomalous activity related to the computer system. Such intrusions, which in many instances are attempted by so-called "computer hackers," may include, for example, computer viruses and Trojan horses. To prevent, or at least significantly reduce the likelihood of, such unauthorized intrusions, many computer systems include some type of intrusion detection system.
[0004] Various types of intrusion detection systems presently exist. These systems include host-based intrusion detection systems, stack-based intrusion detection systems, and network-based intrusion detection systems (or "NIDS"). A NIDS operates on network data flow to detect improper activity. In many instances, a NIDS uses a network adapter in promiscuous mode that listens to and analyzes all network traffic in real-time as the traffic traverses the network. A NIDS interprets the traffic and attempts to detect intrusions by monitoring for patterns of suspicious activity in the traffic. A NIDS can typically discern attacks that involve low-level manipulation of the network, and correlate data regarding attacks against multiple machines on a network.
[0005] A NIDS monitors traffic using a network segment to which it is connected as a data source. This is generally accomplished by placing a network interface card (NIC) in a promiscuous mode to capture all network traffic that crosses the network segment, which is also sometimes referred to as using NIDS sensor. Packets are considered to be of interest if they match a signature, which typically corresponds to a specific type of intrusion. There are three primary types of signatures: header condition signatures, port signatures, and string signatures. Most signatures may use a combination of header, port, and string signatures.
[0006] Header signatures correspond to dangerous or illogical combinations in packet headers. For instance, a well-known header signature is a TCP packet with both the SYN and FIN flags set, which signifies that the requestor wishes to start and stop a connection at the same time.
[0007] String signatures correspond to a text string in the payload that indicates a possible intrusion. For instance, a UNIX signature may be "/bin/chmod" which may indicate an attempt by an intruder to change user permissions. To refine the string signature to reduce the number of false positives, it may be necessary to use a compound string signature. A compound string signature for a common Web server attack might be "cgi-bin" AND "aglimpse" AND "IFS". Another compound string signature for a common Web server attack might be the presence of "/view-source" and "../" strings in a packet.
[0008] Typically, the data size "n" of a packet may e, for example, up to 1,522 bytes. In the case of the search of a content string of an average byte length S anywhere in the packet of, for example, length 24 bytes, the number of byte comparisons required is S(n-S), which for the foregoing example is 24(l,522-24)=35,952. In the case of a search in a typical signature database containing 1,800 signatures, the number of byte comparisons would be 64,713,600. Thus, a larger number of signatures requiring searching may create a significant processing burden for a NIDS.
[0009] There are several existing algorithms used to decrease the time to search for a string in a data packet, such as the Boyer-Moore algorithm and its derivatives. These are more suited for software implementations to decrease the time of search. The search time will linearly scale with the number of patterns to be searched.
[0010] There are also existing finite state machine algorithms that arrange all the content strings in a finite state machine. This decreases the time of search for all patterns, but this
method requires significant amount of memory. For instance, if a content string byte is used to index a next state, then each state can have 256 next state pointers. So, for comparisons involving longer content strings, this method may use a lot of memory and is likely more suited for a software implementation in a general purpose microprocessor. However, the use of a general purpose microprocessor for pattern searches limits the rate at which network intrusion detection may be performed.
[0011] Several existing NIDS implementations begin to ignore a portion of the network traffic data packets because the implementations are not able to maintain processing at telecommunications line rates of, for example, 100 megabit-per-second (mbps) Fast Ethernet. This may permit certain network intrusions to escape detection by the NIDS. For example, several network attacks are based on flooding the NIDS with data containing trigger signatures to overload the NIDS processing capability so that it becomes the first target of an intrusion.
[0012] Thus, it would be desirable to have a NIDS that provides faster signature comparisons for improved detection of network traffic at telecommunications line rates.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The invention is pointed out with particularity in the appended claims. However, for a more complete understanding of the present invention, reference is now made to the following figures, wherein like reference numbers refer to similar items throughout the figures:
[0014] FIG. 1 illustrates a gateway device coupling an internal network to an external network;
[0015] FIG. 2 illustrates a simplified functional block diagram of a specific example of a system architecture suitable for use in implementing a network intrusion detection system and method, in accordance with an embodiment of the present invention;
[0016] FIG. 3 illustrates a high-level simplified functional block diagram of a network intrusion detection system, in accordance with an embodiment of the present invention; and
[0017] FIG. 4 illustrates a more-detailed functional block diagram of the hash circuit of FIG. 3, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
[0018] The following description and the drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice it. Other embodiments may incorporate structural, logical, electrical, process and other changes. Examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of the invention encompasses the full ambit of the claims and all available equivalents.
[0019] The elements that implement the various embodiments of the present invention are described below, in some cases at an architectural level. Many elements may be configured using well-known structures. The functionality and processes herein are described in such a manner to enable one of ordinary skill in the art to implement the functionality and processes within the architecture.
[0020] An exemplary communications system 100 that may be used to implement an embodiment of the present invention is shown in FIG. 1, and includes two communication networks, an external communication network 102 and an internal communication network 104, which are coupled to one another via a gateway device 106. The external communication network 104 is any one of numerous communication networks that may be used to communicate a remote network or device with the internal communication network. For example, the external communication network 102 may be the Internet, which may in turn provide a connection to yet another (non-illustrated) external network, or one or more non-illustrated single-user computers. The internal network 106 may any one of numerous types of local network communication systems such as, for example, a local Ethernet network. By passing I/O traffic destined for internal network 106 through gateway device 102 with processing by NIDS 204, undesirable intrusions may be detected in such I/O traffic. Intrusions identified by NIDS 204 may be stopped in the I/O traffic flow, and any further related improper intrusion prevented from passing to internal network 106.
[0021] The gateway device 106 is used to facilitate communication between the external 102 and internal 104 networks. It will be appreciated that the gateway device 106 may be any one of numerous known devices useful for providing this functionality including, for example, a router, a network switch, a network processor, or at least part of a network interface card (NIC). The gateway device 106 preferably performs various packet classifications and other network processing functions, and further performs security processing such as encryption, decryption, and authentication. The details of many of the various packet classification and network processing functions performed by the gateway device 106 are not needed to understand or enable the present invention, and are thus not fully described in detail.
[0022] In addition to the above-mentioned packet classification and network processing functions, the gateway device 106 preferably integrates the functions of a stateful firewall, a VPN (virtual private network), and network intrusion detection. As part of the firewall functionality, the gateway device 106 extracts so-called "flow information" from the header of IP data packets it receives, and searches the extracted flow information in a lookup table or database (not shown). As is generally known, the flow information of an IP data packet header typically includes data representative of the IP Source Address, the IP Destination Address, the Layer 4 Source Port, the Layer 4 Destination Port, and the protocol byte (e.g., TCP, UDP, etc). As is also generally known, the flow information is typically referred to as a "five-tuple." In any case, the results from the lookup table or database are used to form a data structure known as a connection table entry. It will be appreciated that the firewall and NPN lookup results may be combined, in which case the connection table entry will have a pointer to the VPN security association.
[0023] The gateway device 106 also includes several processors (not shown in FIG. 1) that, among other things, examine data packet headers and, for the most-common packets, perform modifications to them. This functionality is typically referred to as fast-path processing. Data packets that cannot be handled by the fast-path processing are sent as exception items for so-called slow-path processing. The slow-path processing preferably combines the policy tables of the firewall and VPN with a NIDS policy (not shown). However, if not so combined, the slow-path processing functionality searches the NIDS policy during the formation of a connection table entry, and adds the resulting flow identifier information from the search to the connection table.
[0024] The NIDS policy is administered by a rules file, which is read during initialization and whenever the NIDS policy is updated. The NIDS initialization software builds two databases, a NIDS flow database and a NIDS signature database (neither of which is shown in FIG. 1). The NIDS flow database entries are similar to the firewall rules, in that the entries are also five-tuples, but with the addition of several checks to header variables and patterns. The difference in the five-tuple lookup is that all five-tuple lookups belonging to the same type of flow may be classified together. For example, all http server traffic may have the NIDS flow identifier.
[0025] Implementing the NIDS functionality along with the stateful firewall and VPN, permits combining the five-tuple lookup database for the NIDS, the stateful firewall and VPN, thereby requiring that only one lookup be performed. This single lookup is performed in the firewall fast-path processing for every packet. The resulting connection table entry flow identifier information is used by the NIDS functionality to search for a match using only signatures relevant to that particular flow.
[0026] Turning to FIG. 2, a particular embodiment of the gateway device 106 that implements the above-described function is shown is functional block diagram form, and will now be described in more detail.
[0027] The gateway device 106 includes a plurality of packet engines 202, a plurality of cryptographic core processors 204, and a NIDS 206. In the depicted embodiment, the gateway device 106 includes three packet engines 202, which are configured to operate in parallel, and thus supply three channels of network processing. It will be appreciated that this is merely exemplary, and that the gateway device 106 could be implemented with more or less than this number of packet engines 202, to thereby provide more or less network processing channels. No matter the specific number of packet engines 202, each packet engine 202 is preferably coupled to, and transmits data packets to/receives data packets from, an I/O interface 208. It will be appreciated that the data packets may be formatted according to any one of numerous transmission protocols such as, for example, internet protocol (IP) packets. It will be appreciated that the data packet may be an IP packet, or the IP packet may be assembled externally into a stream such as, for example, a TCP or UDP stream, and a window of bytes from this stream may be sent to the NIDS 206. It will additionally be appreciated that the data packet may be normalized data for a particular flow. For example, the data packet may be http normalized data, instead of raw data.
[0028] Each packet engine 202 is* in operable communication with one of the cryptographic core processors 204, via a pre-crypto bus 210 and a post-crypto bus 212. The cryptographic core processors 204 implement security processing such as, for example, IPSec and/or SSL processing for IP packets, and further implement data encryption/decryption and authentication routines on packets of data each receives from its associated packet engine 202. It will be appreciated that clear text data packets that have not been encrypted are supplied to the cryptographic core processors 204 via the pre-crypto bus 210, and that post-decryption clear text data packets are supplied to the packet engines 202 via the post-crypto bus 212.
[0029] The NIDS 206, which is described in more detail further below, is configured to match parts of a data packet (e.g., the packet header, payload, trailer) against one or more signatures in the above-mentioned NIDS signature database (not shown in FIG. 2). For example, during fast-path processing, the header information of a data packet arriving via the I/O interface 208, may be examined and passed to the NIDS 206, which then uses the header and flow identifier information written by the fast-path processing functionality to aid in its search for possible signature pattern matches. In the depicted embodiment, the NIDS 206 is configured to implement this signature matching routine only on clear text data packets. Thus, as is shown in FIG. 2, the NIDS 206 is coupled to receive clear text data packets from the pre-crypto 210 or post-crypto buses, depending on whether the data packets are encrypted or decrypted. It will be appreciated, that this configuration is merely exemplary, and that the NIDS 206, in an alternative embodiment, could be configured to implement the routine on encrypted data packets. The NIDS 206 is also coupled to a register bus 214, which is used to provide set-up and control information for fast-path and/or slow-path processing, and to the packet engines 202 to assist in controlling future related traffic flow that is determined to be suspect.
[0030] Referring now to FIG. 3, a high-level simplified functional block diagram of the
NIDS 206 is shown, and will be described. As shown in FIG. 3, the NIDS 206 includes a packet memory 302, a hash circuit 304, a hash memory 306, an address memory 308, a signature request circuit 310, a signature database 312, a signature compare circuit 314, and a command and status circuit 316. The packet memory 302 is coupled to receive clear text data packets from the pre-crypto 210 and post-crypto 212 busses. Thus, the packet memory
302 may receive and store two data streams simultaneously. In the depicted embodiment, the pre-crypto 210 and post-crypto 212 busses are 32-bit data busses, and the packet
memory 302 is configured to receive the clear text data packets in 64 byte bursts. It will be appreciated that this is merely exemplary, and that the pre-crypto 210 and post-crypto 212 busses could be implemented to transmit more or less than this number of bits, and the packet memory 302 could be implemented to receive data bursts of more or less than this number of bytes. It will be additionally appreciated that the packet memory 302 may be implemented using any one of numerous memory types, sizes, and configurations, but in the depicted embodiment it is implemented as a one kilobyte, first-in first-out (FIFO) buffer, with 512 bytes being used for each data stream.
[0031] The hash circuit 304 is in operable communication with, and is configured to receive clear text data packets from, the packet memory 302. Although the manner in which the clear text data packets are supplied to the hash circuit 304 may vary, in the depicted embodiment, the data packets are supplied in 8 byte bursts via, for example, a 64-bit data bus 303. The hash circuit 304 performs "N" hash computations on sequential bytes of the data packets, to thereby provide a set of "N" hash values. Preferably, prior to performing the hash computations, the hash circuit 304 converts all the packet data bytes to upper case (e.g., characters "a" to "z" are converted to characters "A" to "Z"). This is done because some signatures are case insensitive, and could be missed if the data bytes are not converted to upper case. In any case, as will be described in more detail further below, the "N" hash values are used as read addresses 318 into the hash memory 306. As will also be described further below, the hash circuit 304 retrieves the data stored at these addresses 318 in the hash memory 306 and, using the retrieved data, supplies a read request to the address memory 308. A functional block diagram of the hash circuit 304 is shown in FIG. 4, and before describing the remainder of the NIDS 206, it will be described in more detail.
[0032] The content strings of the signatures stored in the signature database 312 may occur anywhere in the packet data. Hence, each signature content string should be compared to the packet data at each and every byte offset of the packet data. In a typical embodiment, there may be, for example, up to 1,800 content strings to be searched. In this particular embodiment, each signature content string has a size of less than 48 bytes. Larger content strings may be accommodated by using the multiple content signature feature discussed herein, for up to, for example, seven signatures per multiple content signature, each of which may be up to 48 bytes in length.
[0033] As FIG. 4 shows, the hash circuit 304 includes a state machine 402 and a plurality of hash units 404 (404-1, 404-2, 404-3, . . . 404-N). The state machine 402 receives the clear text packet data stored in the packet memory 302, via the 64-bit bus 303, as described above. The state machine 402 converts the packet data to upper case, as described above, and supplies the converted data 406, byte-by-byte, to the hash units 404. The state machine 402 is also configured to retrieve data from the hash memory 306, to supply one or more address data values to the address memory 308, and, if necessary, to implement an arbitration routine prior to supplying the address data values to the address memory 308. As will be described in more detail, the data that the state machine 402 retrieves from the hash memory 304 is based on the hash values computed by the hash units 404, and the address data values correspond to the computed hash values.
[0034] The hash units 404 are each configured to receive two data values and, using the same hash algorithm, to supply a hash value 408 having a width of Na-bits. In the depicted embodiment, it is seen that one of the two data values supplied to each hash unit 404 is the same, in that each hash unit 404 is coupled to receive the same data byte 406 from the state machine 402. However, the second data value supplied to each hash unit 404 is different. Specifically, the. second data value supplied to the first hash unit 404-1 is a fixed initial value 410, while the second data value supplied to the remainder of the hash units 404-2, 404-3, 404-4, . . . 404-N is the resultant hash value from another hash unit 404. For example, in the depicted embodiment, the second data value supplied to the second hash unit 404-2 is the hash value 408-1 output from the first hash unit 404-1, the second data value supplied to the third hash unit 404-3 is the hash value 408-2 output from the second hash unit 404-2, and so on to the Nth hash unit 404-N, for which the second data value is the hash value 408-(N-l) output from the (N-l) hash unit. Thus, it may be seen that because the two data values supplied to the second hash unit 404-2 are the hash for the previous packet byte (e.g., the hash value 408-1) and the current data byte 406, the second hash unit 404-2 supplies a hash value 408-2 that reflects a 2-byte hash. Similarly, the third hash unit 404-3 supplies a hash value 408-3 that reflects 3-bytes, and so on to the Nth hash unit, which supplies a hash value 408-N that reflects N-bytes. Stated in more general terms, each hash unit 404-/ receives two values - the current data byte 406 supplied from the state machine 402, and a value reflecting the hash of the previous /-l bytes (or if i=l, the initial value 410). Therefore, the hash value 408-/ supplied from each hash unit 404-/ reflects an /- byte hash.
[0035] From the foregoing, it is seen that, upon initialization, only the first hash unit 404- 1 will supply a valid hash value 408-1, since another valid data value has not yet been supplied to any of the other hash units 404. However, after the first clock cycle, not only does the state machine 402 supply the second data byte 406 to each hash unit 404, but the first hash unit 404-1 supplies a second valid data value (i.e., its resultant hash value 408-1) to the second hash unit 404-2. As a result, only the first 404-1 and second 404-2 hash units will supply valid hash values 408-1, 408-2. Then, on the next clock cycle, only the first 404-1, second 404-2, and third 404-3 hash units will supply valid hash values 408-1, 408-2, 408-3, and so on until all of the hash units 404 supply valid hash values 4xx-l, 408-2, 408-3, . . . 408-N. Thereafter, the hash units 404 operate in parallel to provide the set of N hash values, as mentioned above, as read addresses 318 to hash memory 306.
[0036] It will be appreciated that the initial value 410 supplied to the first hash unit 404-1 may be a fixed, predetermined value, or it may be programmable. It will additionally be appreciated that in other embodiments the bit size of the data 406 provided to each hash unit 404 may be longer or shorter than one byte. Moreover, in a particular preferred embodiment, the hash circuit 304 is implemented with 48 hash units 404. However, it will be appreciated this number is merely exemplary, and that the hash circuit 304 may be implemented with more or less than this number of hash units 404.
[0037] Before proceeding further, it will be appreciated that the bit- width (Na) of the hash values 408 computed by each hash unit 404 will depend, at least in part, on the hash algorithm that the hash units 404 implement. In the depicted embodiment, the implemented hash algorithm is a 12-bit cyclic redundancy check polynomial (CRC12). However, it will be appreciated that the hash units 404 could implement any one of numerous other hash algorithms, including any one of numerous other CRC polynomials that generate hash values of different widths. As is generally known, the CRC 12 generator polynomial generates the following 12-bit CRC result:
G(x) = xu + x11 + x3 + x2 + x + 1 .
[0038] The hash memory 306 stores a plurality of single-bit datum that indicate whether the valid hash values 408 computed by the hash circuit 304, and which are used as hash memory 306 read addresses 318, match one or more hash values that result from the hash of one or more signatures stored in the signature database 312. More specifically, each data bit
stored in the hash memory 306 represents either a logical "0" or a logical "1." If the data bit at a particular address in the hash memory 306 is set (e.g., a logical "1"), this indicates that the particular hash value 408 computed by the hash circuit 304 matches the hash value resulting from the hash of the content string of one or more signatures in the signature database 312. For example, if the first hash unit 404-1 supplies a 12-bit hash value of "HI," and the data bit stored at address "HI" in the associated portion of hash memory 306 is a logical "1," then the content string of one or more signatures also hashed to a 12-bit hash value of "HI."
[0039] It will be appreciated that the hash memory 306 may be implemented in any one of numerous configurations, which may depend, for example, on the specific hash algorithm implemented in the hash circuit 304. In the depicted embodiment, in which the hash circuit implements the CRC 12 polynomial hash algorithm, the hash memory 306 is configured as a 4096x1 -bit random access memory (RAM) (e.g., 212 = 4,096). It is noted that with this particular configuration, all the data read from the hash memory 306 will occur in one clock cycle.
[0040] It will be appreciated that the above-described hash memory 306 configuration may be implemented using either a single memory or a plurality of memories, -which may be either on-chip memories, off-chip memories, or a combination of both. For example, the hash memory 306 may be implemented using N-number of arrays or memories. With this implementation, one memory (or array) is associated with each hash value 408 calculated by the hash circuit 304, and each memory (or array) has a width of 2Na bits, and a depth of 1 bit. Moreover, the data 320 retrieved from the hash memory 306 during each clock cycle has a length of "N" bits. Thus, in the particular preferred embodiment described above, in which the hash circuit 304 calculates 48 hash values using the CRC12 hash algorithm, the hash memory 306 is implemented using 48 memories (or arrays), each with a width of 4,096 bits (e.g., 212 ) and a depth of 1 bit, and the data 320 retrieved therefrom has a length of 48 bits.
[0041] It will be further appreciated that the hash memory 306 may be modified to save chip area by sharing the memories for sets of hash values 408 calculated by the hash circuit 304. For example, a first memory (not shown) may contain data corresponding to the first hash value 408-1, the second hash value 408-2, the third hash value 408-3, and the fourth hash value 408-1, a second memory (not shown) may contain data corresponding to the fifth hash value 408-5, the sixth hash value 408-6, the seventh hash value 408-7, and the eighth
hash value 408-8, and so on. It should be understood, however, that sharing in this manner may cause a false positive hit by indicating a hash value match in the hash memory 306 when the signature content strings correspond to different byte offsets, but any such false positive hit will be later corrected when all fields of the signature data are accessed from the signature database (as describe in more detail below) and the content string byte length is compared to the byte length of the corresponding hash value 408 that was used as the read address 318 for the hash memory 306.
[0042] As was mentioned above, the hash circuit 304, using the data 320 retrieved from the hash memory 306, supplies a read request 322 to the address memory 308. More specifically, the hash circuit 304, based on the N-bit length data 320 retrieved from the hash memory 306, uses one or more of the calculated hash values 408 as a read address 324 for the address memory 308. Thehash values 408 indicate at least the data packet offset and the byte length of the corresponding hash values 408 computed by the hash circuit 304. As portions of the clear text data packet are processed by hash circuit 304, if the data 320 read from the hash memory 306 indicates that there is more than one signature match (e.g., two or more bits in the data 320 are set), then the hash circuit 304, as previously alluded to, implements an arbitration routine to select the order in which to send the plurality of hash values 408 to the address memory 308.
[0043] The address memory 308 stores data representative of the start addresses of linear linked lists of one or more signatures that are stored in the signature database 312. The address memory 308 may be implemented in any one of numerous configurations, but preferably has the same depth as the hash memory 306 (e.g., 2Na words). It will be appreciated that the width of each word stored in the address memory 308 depends on the maximum address to be requested by the address memory. In a particular preferred implementation, the address memory 308 includes a 16-bit wide base address register 326 and a 16-bit wide RAM 328. With this implementation, the address register 326 is used to store the higher address bits (e.g., bits 16-31) of the linked list starting addresses, and the
RAM 328 is used to store the lower address (e.g., bits 0-15) linked list starting addresses. In addition, each word in address memory RAM 328 may store a data bit to indicate whether the signature data is located on-chip or off-chip. For example, this bit may be set to "1" to indicate that the linked list starting address is located on-chip, or set to "0" to indicate that it is located off-chip. No matter the particular location of the linked list start address, the address memory 308, in response to the read requests 322 received from the hash circuit
304, supplies the linear linked list starting addresses 330, and signature read requests 332 to the signature request circuit 312. The address memory 308 also preferably passes the data representative of the data packet offset and the byte length for the corresponding hash values to the signature request circuit 310.
[0044] The signature request circuit 310 is in operable communication with the address memory 308, the signature database 312, and the signature compare circuit 314. The signature request circuit 310 receives the linked-list starting addresses 330 and the signature read requests 332 from the address memory 308 and, in response, issues read requests 334 to the signature database 312 using the linear linked list starting addresses. The signature request circuit 310 also supplies the data representative of the offset and byte length to the signature compare circuit 314 along with the signature read from the signature database 312. In a preferred embodiment, the signature request circuit 310 is implemented in a pipelined configuration, which allows the signature request circuit 310 to issue multiple read requests before the completion of a previous read request.
[0045] The signature database 312 stores each of the signatures that are compared, if necessary, to the network data packets. The signatures are each preferably stored in the signature database 312 using a data structure that includes the pattern content string of the signature, the length (in bytes) of the content string, and the range to search for the first content string. In at least one particular rule implementation, the range is specified by two parameters, the offset and the depth. The offset specifies the number of bytes to ignore from the start of a packet payload, and the depth specifies the maximum byte offset to search for a content string. For multi-content rules, a subsequent content may specify offset and depth from the previous content match (sometimes referred to using the terms "distance" and "within"). Preferably, the data structure further includes one or more fields indicating the type of packet communication protocol (e.g., TCP, UDP, or ICMP) that is associated with the signature, the case sensitivity of the content string, a flow identifier to identify the particular data packet flow of which the data packet is a part, a field to indicate whether a selected number or all of a signature's fields must match a data packet for a signature match to occur, and a pattern identifier number to identify the particular signature.
[0046] In the depicted embodiment, the signature database 312 is implemented using two memories, an on-chip signature memory 336 and an off-chip signature memory 338. The on-chip signature memory 336 is preferably used to store signatures that have a higher
probability of a match in the hash memory 306, or signatures that have a higher collision count in the linear linked list. For example, signatures having a length of one byte are preferably stored in the on-chip signature memory 336, since these will typically have a higher probability of a hit.
[0047] In a particular preferred embodiment, 64 kilobytes of the on-chip memory 336 are configured as 8,192 words, using a 64-bit data width. Of this total memory, 256 bytes are reserved as a cache, which is used as a temporary space for caching signatures read from off-chip memory 310. After allocating this portion of the on-chip memory 336, the remaining portion is first allocated to those signatures having a length of one byte and then, if available, to signatures having a higher probability of a hit or a higher collision count in the linked list. Once all of the on-chip memory 336 space is allocated, the remaining signatures in the signature database are allocated to the off-chip memory 338. No matter the specific location of a signature in the signature database 312, be it in on-chip signature memory 336 or the off-chip signature memory 338, each signature that is read from the signature database 312 is supplied to the signature compare circuit 314.
[0048] The signature compare circuit 314 is in operable communication with the packet memory 302, the signature request circuit 310, the signature database 312, and the command and status circuit 316. In the depicted embodiment, the signature compare circuit 314 is coupled to, and receives the data packet presently stored in, the packet memory 302, via a 64-bit data bus 339. The signature compare circuit 314 also receives each of the signatures, one by one, in the linear linked list of signatures that are read from the signature database 312. In the depicted embodiment, the signature compare circuit 314 is coupled to receive the signatures directly from either the on-chip memory 336 or the off-chip memory 338, as appropriate. It will be appreciated that this is merely exemplary, and that the signature compare circuit 314 could receive the read signatures from, for example, the signature request circuit 310.
[0049] Upon receipt of the currently stored data packet and the signatures, the signature compare circuit 314 compares the data packet to one or more fields of each signature in the linear linked list. The specific fields that the signature compare circuit 314 is to compare may depend, for example, on whether the current data packet was a pre-crypto data packet or a post-crypto data packet. This information is supplied to the command and status circuit 316, which in turn communicates the specific fields to be compared to the signature
compare circuit 314. As was previously mentioned, the hash circuit 304 converts all packet data to upper case prior to conducting the hash computations and thus does not take case sensitivity into account. However, the signature compare circuit 314 is preferably configured to examine both upper and lower cases during content string comparisons.
[0050] The signature compare circuit 314 determines that the data packet and the signature match if, for example, the content string, the offset, and the byte length fields in the signature data structure match the data packet content, offset, and byte length information. It will be appreciated that the signature compare circuit 314 may be configured to compare one or more additional signature data structure fields to one or more additional characteristics associated with the data packet, before determining whether a match occurs. Some non-limiting examples of additional fields that may be checked in the signature compare circuit 314 include IP, TCP flags, TCP sequence number, TCP acknowledge number, ICMP type, ICMP code, ICMP sequence number, ICMP identifier, and flow identifier.
[0051] It is noted that the signature compare circuit 314, at least in the depicted embodiment, is configured such that the signature comparison for each signature in the linked list is conducted serially. It will thus be appreciated that the longer the linear linked list is, the more time it will take to process the entire list. Therefore, as will now be discussed in more detail, the signature compare circuit 314 is preferably configured to compare the signature data structure fields with the data packet in manner that optimizes the operational efficiency of the NIDS 204 (e.g., in a minimum number of clock cycles).
[0052] Specifically, the comparison of the signature data structure content string to the data packet is preferably made conditional on other signature data structure fields matching the data packet. For example, if the signature byte length does not match, then the signature compare circuit 314 determines that there is no match and does not perform a content string comparison. Similarly, if the signature data structure offset or packet protocol fields do not match the data packet characteristics, then the signature compare circuit 314 does not perform the content string comparison. It will be appreciated that these conditional comparisons are merely exemplary, and that the signature compare circuit 314 could be configured such that a full comparison is not implemented unless the signature flow identifier field matches the data packet flow identifier and/or that the signature depth field matches the hit identified for the data packet. If the content string comparison is not
performed, the first transfer of packet data from the packet memory 302 is not affected. Moreover, if the content string comparison fails in the initial fields of the first transfer, then subsequently accessed packet data bytes will be ignored, and as a result, no further transfer requests will occur.
[0053] If a signature match is not found, the signature compare circuit 314 supplies a request to the signature request circuit 310 to fetch the next signature in the linear linked list. It is noted here that the location of the next signature in the linear linked list is indicated by an address increment field in each signature data structure. If the next signature in the linear linked list also does not match, then additional signatures in the linear linked list are similarly tested until each signature in the linear linked list are tested. The end of the linear linked list is indicated, for example, if all of the bits in the address increment field have a "0" value.
[0054] The signature compare circuit sends status data resulting from the comparisons it performs to the command and status circuit 316. If a signature match is found, then the signature compare circuit 314 supplies status data regarding the match to the command and status circuit 316. Such status data may be, for example, a 64-bit word indicating the byte offset at which the match occurred, the pattern identifier, and any relative minimum and maximum distances (if present). In the case of multiple content signature matches, relative minimum and maximum distances between strings are passed to the command and status circuit 316 for ordered checking.
[0055] The command and status circuit 316 is in operable communication with the register bus 214 and the signature compare circuit 314. The command and status circuit 316 receives control and configuration information from the register bus 214. The command and status circuit 316 also supplies status data, which indicates, among other things, the pattern identifier of a matched signature, to the register bus 214 for subsequent communication to the fast path software. In turn, the fast path software preferably sends a list of pattern identifiers to an exception path, thereby avoiding the need for the slow path software to repeat a search for the signature. This latter functionality is discussed further below.
[0056] The command and status circuit 316 is coupled to the signature compare circuit 314 and, as was noted above, is configured to supply the signature compare circuit 314 with
packet field data that is representative of the specific fields that the signature compare circuit 314 is to use when comparing the current data packet for a signature match. Thus, it will be appreciated that in addition to the above, the register bus 214 also preferably supplies the command and status circuit 316 with header-specific data for each packet. The command and status circuit 316 is also configured to receive the status data that is supplied from the signature compare circuit 314. Preferably, the command and status circuit 316 orders the status data based on absolute offsets and pattern identifiers. Moreover, in the case of multiple content signatures, the command and status circuit 316 checks whether all of the multiple content signatures are found with the correct relative minimum and maximum distances.
[0057] By the foregoing description, an improved system and method for monitoring network access have been described. The system and method improves the efficiency of, and/or reduces the number or extent of, signature or content comparisons that used to implement network access or intrusion monitoring and detection. In addition, the NIDS 206 implements filtering that compares several header fields in the packet data to that in the signature, which reduces the number of false positive detections. The NIDS 206 is preferably implemented along with a stateful firewall and VPN, which enhances network traffic monitoring by searching data before encryption or after decryption, and which permits combining the five-tuple lookup database for all three functionalities. As a result, only one lookup needs to be performed for each data packet. The resulting connection table entry includes a flow identifier, which is used by the NIDS 206 to compare the data packet to only those signatures that are relevant to that packet's traffic flow, and for a VPN connection may additionally include a security association (if present).