US9385949B2 - Routing controlled by subnet managers - Google Patents

Routing controlled by subnet managers Download PDF

Info

Publication number
US9385949B2
US9385949B2 US13/721,052 US201213721052A US9385949B2 US 9385949 B2 US9385949 B2 US 9385949B2 US 201213721052 A US201213721052 A US 201213721052A US 9385949 B2 US9385949 B2 US 9385949B2
Authority
US
United States
Prior art keywords
subnet
router
routing
specific
routers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/721,052
Other versions
US20140177639A1 (en
Inventor
Ilya Vershkov
Dror Goldenberg
Eitan Zahavi
Diego Crupnicoff
Marina LIPSHTEYN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mellanox Technologies TLV Ltd
Original Assignee
Mellanox Technologies TLV Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mellanox Technologies TLV Ltd filed Critical Mellanox Technologies TLV Ltd
Priority to US13/721,052 priority Critical patent/US9385949B2/en
Assigned to MELLANOX TECHNOLOGIES LTD. reassignment MELLANOX TECHNOLOGIES LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CRUPNICOFF, DIEGO, GOLDENBERG, DROR, ZAHAVI, EITAN, LIPSHTEYN, MARINA, VERSHKOV, ILYA
Assigned to MELLANOX TECHNOLOGIES TLV LTD. reassignment MELLANOX TECHNOLOGIES TLV LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MELLANOX TECHNOLOGIES LTD.
Publication of US20140177639A1 publication Critical patent/US20140177639A1/en
Assigned to JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT reassignment JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT PATENT SECURITY AGREEMENT Assignors: MELLANOX TECHNOLOGIES TLV LTD.
Application granted granted Critical
Publication of US9385949B2 publication Critical patent/US9385949B2/en
Assigned to MELLANOX TECHNOLOGIES TLV LTD. reassignment MELLANOX TECHNOLOGIES TLV LTD. RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL AT REEL/FRAME NO. 37898/0959 Assignors: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/58Association of routers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/16Multipoint routing

Definitions

  • the present invention relates generally to computer networks, and particularly to routing data packets between subnets.
  • a subnetwork is a logical subdivision of a Layer-3 network.
  • Network ports of nodes within a given subnet share the same Layer-3 network address prefix.
  • IP Internet Protocol
  • the ports in each subnet share the same most-significant bit-group in their IP address, so that the IP address is logically divided into two fields: a network or routing prefix, and the rest field or host identifier.
  • IB InfiniBandTM
  • each subnet is uniquely identified with a subnet identifier known as the Subnet Prefix.
  • this prefix is combined with a respective Globally-Unique Identifier (GUID) to give the IB Layer-3 address of the port, known as the Global Identifier (GID).
  • GID Globally-Unique Identifier
  • the logical subdivision of a Layer-3 network into subnets reflects the underlying physical division of the network into Layer-2 local area networks.
  • the subnets are connected to one another by routers, which forward packets on the basis of their Layer-3 (IP or GID) destination addresses, while within a given subnet packets are forwarded among ports by Layer-2 switches or bridges.
  • IP or GID Layer-3
  • These Layer-2 devices operate in accordance with the applicable Layer-2 protocol and forward packets within the subnet according to the Layer-2 destination address, such as the EthernetTM medium access control (MAC) address or the IB link-layer Local Identifier (LID).
  • Layer-2 addresses in a given subnet are recognized only within that subnet, and routers will swap the Layer-2 address information of packets that they forward from one subnet to another.
  • a Subnet Manager (SM) in each subnet assigns an LID to each physical port of each host within the given subnet.
  • SA subnet administration
  • SA Subnet Management Agent
  • SMA Subnet Management Agent
  • Layer-2 switches within the subnet are configured by the SM to forward packets among the ports on the basis of the destination LID (D-LID) in the packet header.
  • D-LID destination LID
  • the SM is typically implemented as a software process running on a suitable computing platform in one of the nodes in the subnet, such as a host computer, switch or appliance.
  • Routing protocols are used to distribute routing information among routers, so as to enable each router to determine the port through which it should forward a packet having any given Layer-3 destination address.
  • OSPF Open Shortest Path First
  • BGP Border Gateway Protocol
  • Embodiments of the present invention provide improved methods and devices for routing packets between subnets.
  • a method for communication in a packet data network including at least first and second subnets interconnected by multiple routers and having respective first and second subnet managers.
  • the method includes assigning respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet.
  • the routers are configured by transmitting and receiving control traffic between the subnet managers and the routers. Data packets are transmitted between network nodes in the first and second subnets via one or more of the configured routers under control of the subnet managers.
  • transmitting the data packets includes receiving at the first subnet manager a routing query from a sending node in the first subnet with respect to transmission of a packet to a destination node in the second subnet, and in response to the routing query, sending an instruction from the first subnet manager to the sending node to direct the packet to a specified router.
  • Sending the instruction may include selecting the specified router so as to balance a traffic load among the multiple routers.
  • sending the instruction includes instructing the sending node to direct the packet to a first router and upon occurrence of a failure of the first router, to direct the packet to a second router.
  • sending the instruction may include selecting the specified router as a numerical function of the address field.
  • the routing query specifies a global identifier of the destination node
  • sending the instruction includes instructing the sending node to address the packet to a local identifier that the subnet manager has assigned to a port of the specified router.
  • the method may include transmitting, from the sending node to a distributed name server, a name query with respect to a host name of the destination node, and receiving the global identifier at the sending node from the distributed name server in response to the name query.
  • transmitting the data traffic includes receiving at the second subnet manager a routing query from a router in response to having received at the router a packet from a sending node in the first subnet for transmission to a destination node in the second subnet, and in response to the routing query, sending an instruction from the second subnet manager to the router to direct the packet to a port having a specified local identifier in the second subnet.
  • transmitting the data packets includes receiving at the first subnet manager a routing query from a node in the network, sending an instruction, in response to the routing query, from the first subnet manager to the node to direct the packet to a specified port, and caching the instruction at the node for use in forwarding of subsequent packets.
  • configuring the routers includes forming a multicast group extending over at least the first and second subnets via one or more of the routers.
  • apparatus for communication including a plurality of routers interconnecting at least first and second subnets in a packet data network.
  • At least first and second subnet managers are operative to assign respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet.
  • the subnet managers configure the routers by transmitting and receiving control traffic to and from the routers, and control transmission of data packets between network nodes in the first and second subnets via one or more of the configured routers.
  • a computer software product including a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer in a first subnet of a packet data network that includes a plurality of routers interconnecting multiple subnets, cause the computer to function as a first subnet manager in the first subnet so as to assign respective local identifiers to ports for addressing of data link traffic within the first subnet, while at least a second subnet manager assigns the local identifiers to the ports in at least a second subnet.
  • the instructions cause at least the first and second subnet managers to configure the routers by transmitting and receiving control traffic to and from the routers, and to control transmission of data packets between network nodes in the first and second subnets via one or more of the configured routers.
  • FIG. 1 is a block diagram that schematically illustrates a computer network, in accordance with an embodiment of the present invention.
  • FIG. 2 is a flow chart that schematically illustrates a method for packet routing, in accordance with an embodiment of the present invention.
  • IP routers In order to support the routing protocols that have become standard in IP networks, IP routers must typically have substantial autonomous computing power, memory and communication capabilities. These sorts of routing protocols and capabilities have not been developed in Layer-3 routers for other network architectures, such as InfiniBand (IB) networks.
  • IB InfiniBand
  • Embodiments of the present invention that are described hereinbelow provide methods and apparatus for routing packets between subnets that take advantage of management capabilities that already exist within the subnets and thus relieve routers of the need to support complex routing protocols. Such an approach is appropriate particularly for IB networks, in which the capabilities and responsibilities of the existing subnet manager can be expanded to manage inter-subnet routing, as well.
  • the embodiments described below therefore relate particularly to IB networks and use the vocabulary of IB specifications.
  • the principles of the present invention may also be applied, mutatis mutandis, in other network architectures that have a similar subnet management function.
  • a packet data network comprises at least two subnets, which have respective subnet managers and are interconnected by multiple routers.
  • the subnet managers assign local identifiers to the ports in their respective subnets for addressing of data-link (Layer 2) traffic within the subnet.
  • the subnet managers transmit and receive control traffic, typically in the form of management packets, in order to learn the network topology and configure the routers accordingly. Data packets can then be transmitted between network nodes in the first and second subnets via the routers so configured under control of the subnet managers.
  • the sending node before a sending node in a first subnet transmits a data packet to a destination node in another subnet, the sending node submits a routing query to the subnet manager in the first subnet.
  • this sort of query may be referred to as a “path query.”
  • the subnet manager sends an instruction to the sending node to direct the data packet to a specified router that connects the subnets.
  • the subnet manager may take into account considerations such as load balancing among two or more routers, as well as other facets of route optimization and protection in case of router failure.
  • the router may then query the subnet manager in the destination subnet for forwarding instructions to the destination node, or multiple destination nodes in the case of a multicast packet.
  • the subnet manager in the first subnet may provide the router with complete path information in response to the initial routing query, so that no further query by the router will be required.
  • the above approach is advantageous, as noted earlier, in leveraging capabilities that already exist within the subnets. It can provide optimized performance and quality of service while avoiding any need for a central routing authority or global synchronization of routing information, and while having no single point of failure. As routing intelligence is focused in the subnet managers, the routers themselves need be little more than switches with forwarding information provided by the subnet managers. Exchange of routing information between routers themselves is unnecessary.
  • FIG. 1 is a block diagram that schematically illustrates a computer network 20 , in accordance with an embodiment of the present invention. It will be assumed, for clarity and convenience of description, that network operates in accordance with IB specifications, although as noted earlier, the principles of the present embodiment may similarly be applied in other Layer-3 networks that have a subnet management function similar to that defined in IB networks. Relevant features of the IB architecture are described in the InfiniBandTM Architecture Specification Volume 1 (Release 1.2.1, November 2007), distributed by the InfiniBand Trade Association and incorporated herein by reference, and particularly in Chapter 14: “Subnet Management” and Chapter 19: “Routers.”
  • Network 20 comprises multiple subnets 22 (labeled subnets A, B and C), which are interconnected by Layer-3 routers 24 (labeled R0, R1 and R2).
  • Each subnet 22 comprises multiple Layer-2 switches 26 , which connect to hosts 28 via suitable host channel adapters (not shown).
  • Switches 26 within each subnet may be interconnected in any suitable topology, such as a “fat tree” topology. Certain of the switches (for example, spine switches in the case of a fat tree topology) connect to routers 24 and thus enable packet transfer between subnets.
  • any given pair of subnets 22 is separated by no more than a single routing hop, but the principles of the present invention may also be extended to networks in which traffic between certain subnets must traverse two or more routers in sequence.
  • each pair of subnets 22 is connected by two or more routers 24 , for purposes of load balancing and failure protection.
  • a subnet manager (SM) 30 in each subnet 22 performs management and administration functions defined by the above-mentioned IB specification, as well as additional routing functions that are described herein. (Optionally, more than one subnet manager may exist in a given subnet to provide backup in case of failure, but typically only a single subnet manager is active in performing these functions at any given time.)
  • SM 30 is typically a combined hardware/software element, comprising a computing platform, such as an embedded or stand-alone central processing unit (CPU) with a memory and suitable interfaces, which runs management software that performs the functions described herein.
  • the computing platform may be dedicated to subnet management functions, or it may alternatively be shared with other computing and communication functions.
  • the software components of the SM may be downloaded to the computing platform in electronic form, for example over network 20 or via a separate control network (not shown). Alternatively or additionally, these software components may be stored on tangible, non-transitory computer-readable media, such as in optical, magnetic, or electronic memory.
  • SM 30 in each subnet 22 assigns a Layer-2 address, in the form of a LID, possibly including a multicast LID (MLID), to each port of each switch 26 and host 28 within the subnet.
  • LID possibly including a multicast LID (MLID)
  • Each port also receives a GID Layer-3 address, wherein all ports in a given subnet have the same GID prefix, as explained above.
  • Subnet managers 30 learn the topology of their respective subnets using methods defined by the IB specification, such as transmission and reception of suitable management packets, for example Direct Route Management Datagrams. By transmitting and receiving such packets to and from routers 24 , the subnet managers are also able to learn which other subnets are connected to each router, as well as collecting information other network features, such as multicast groups.
  • routers may autonomously publish their respective subnet connections to the subnet managers.
  • SM 30 in subnet A may discover, for example, that this subnet is connected by both router R1 and router R2 to subnet B.
  • the subnet managers save this intra- and inter-subnet topology information in their respective memories for use in making subsequent routing decisions, and update the information periodically when changes occur (due to failures or reconfiguration, for example).
  • DNS distributed name server
  • DNS 32 may be implemented by any suitable means that are known in the art, such as manual tables, standard DNS servers, or SM-based translations.
  • FIG. 2 is a flow chart that schematically illustrates a method for packet routing in network 20 , in accordance with an embodiment of the present invention. It is assumed in the description that follows, for the sake of simplicity, that the packet in question is a unicast packet, but similar methods may be applied, mutatis mutandis, in routing multicast packets.
  • the method of FIG. 2 is initiated when one of hosts in subnet A (referred to as the sending host, or S-HOST) has to send a packet to a destination host (D-HOST) in another subnet, for example subnet B.
  • the sending host may obtain the GID of the destination host from DNS 32 , as described above, or by any other suitable means.
  • the sending host queries subnet manager 30 in subnet A (referred to as SM-A) for a path to the GID of the destination host, at a host query step 40 .
  • SM-A checks its topology records to identify the router or routers 24 that can provide access to the destination GID. (Alternatively, if SM-A determines that the destination GID refers to a node in subnet A, then it may simply return the LID of that node to the sending host.) SM-A chooses an appropriate one of these routers 24 , such as R1, and returns a response to the sending host containing the LID of the port of R1 on subnet A, at a host response step 42 . When multiple routers are available for this purpose, the subnet manager may apply various considerations in choosing the response to return at step 42 .
  • the subnet manager may choose different routers for different packets (based on the source and/or destination address, for instance) in order to balance the traffic load among the routers and thus optimize bandwidth availability. Additionally or alternatively, the subnet manager may give the sending host both primary and backup router LIDs, and instruct the sending host to direct the packet to the primary router first, or to the backup router in the event of a failure of the primary router.
  • SM-A For purposes of router selection at step 42 , it may be useful for SM-A simply to take a numerical function of a destination address field specified in the query of step 40 .
  • the subnet manager may choose the router by taking the modulus of the destination GID (DGID) by the number of routers available.
  • DGID modulus of the destination GID
  • each router will have a routing table whose size is on the order of 1/N (wherein N is the number of routers). This algorithm is useful in load balancing and scales readily with the numbers of hosts and routers that are supported.
  • the sending host After receiving instructions from the subnet manager, the sending host transmits a data packet containing the GID of the destination host in the destination GID (DGID) header field and the LID of the router port specified by the subnet manager in the destination LID (DLID) header field, at a packet transmission step 44 .
  • the sending host inserts its own GID and LID in the appropriate source address fields of the packet.
  • the router identifies the destination GID as belonging to subnet B and therefore sends a routing query to subnet manager 30 in subnet B (SM-B) with respect to this GID, at a router query step 46 .
  • SM-B checks its own memory for the LID of the destination host corresponding to the specified GID, and returns this LID to router R1, at a router response step 48 . Based on this information, the router replaces the destination LID of the packet that it received from the source host with the LID provided by SM-B, and replaces the source LID with the router's own port LID on subnet B, and thus transmits the packet to the destination host, at a packet forwarding step 50 .
  • the above flow may not necessarily be repeated every time a packet is to be transmitted to a given DGID; rather, the sending host and the router may cache the responses that they receive at steps 42 and 48 , and then used this cached information in forwarding subsequent packets to the same DGID without querying the subnet managers each time.
  • the query responses and cached information may include not only GID/LID correspondence, but also other forwarding information, such as service levels.
  • SM-A may provide the necessary forwarding information not only to the sending host, but also to the router, in which case steps 46 and 48 may be unnecessary.
  • subnet managers 30 may form multicast groups extending over multiple subnets via routers 24 .
  • the subnet managers are capable of supporting dynamic groups, which may have multipath and/or asymmetrical packet distribution routes.
  • subnet managers 30 may send and receive queries via routers 24 to discover multicast groups that are supported in neighboring subnets.
  • a subnet manager may instruct a router to register a multicast group in which the router servers as the transit point between members in different subnets.
  • Routers 24 are programmed to support only loop-free topologies in this regard.
  • the topologies may be tree- or mesh-type and either uni- or bi-directional, and they may be shared among multiple multicast groups and subnets, or they may be specific to a given group and/or subnet.
  • subnet managers discover and distribute routing information may be used not only for exploring network connectivity, as described above (including multicast groups), but also for collecting other information regarding subnets 22 and the nodes that they contain, such as network maximum transfer units (MTU) and partition keys (PKEY), for example.
  • MTU network maximum transfer units
  • PKEY partition keys

Abstract

A method for communication in a packet data network that includes at least first and second subnets interconnected by multiple routers and having respective first and second subnet managers. The method includes assigning respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet. The routers are configured by transmitting and receiving control traffic between the subnet managers and the routers. Data packets are transmitted between network nodes in the first and second subnets via one or more of the configured routers under control of the subnet managers.

Description

FIELD OF THE INVENTION
The present invention relates generally to computer networks, and particularly to routing data packets between subnets.
BACKGROUND
A subnetwork, commonly referred to as a subnet, is a logical subdivision of a Layer-3 network. Network ports of nodes within a given subnet share the same Layer-3 network address prefix. For example, in Internet Protocol (IP) networks, the ports in each subnet share the same most-significant bit-group in their IP address, so that the IP address is logically divided into two fields: a network or routing prefix, and the rest field or host identifier. Similarly, in InfiniBand™ (IB) networks, each subnet is uniquely identified with a subnet identifier known as the Subnet Prefix. For each port in the subnet, this prefix is combined with a respective Globally-Unique Identifier (GUID) to give the IB Layer-3 address of the port, known as the Global Identifier (GID).
Typically, the logical subdivision of a Layer-3 network into subnets reflects the underlying physical division of the network into Layer-2 local area networks. The subnets are connected to one another by routers, which forward packets on the basis of their Layer-3 (IP or GID) destination addresses, while within a given subnet packets are forwarded among ports by Layer-2 switches or bridges. These Layer-2 devices operate in accordance with the applicable Layer-2 protocol and forward packets within the subnet according to the Layer-2 destination address, such as the Ethernet™ medium access control (MAC) address or the IB link-layer Local Identifier (LID). In general, Layer-2 addresses in a given subnet are recognized only within that subnet, and routers will swap the Layer-2 address information of packets that they forward from one subnet to another.
In IB networks, a Subnet Manager (SM) in each subnet assigns an LID to each physical port of each host within the given subnet. A subnet administration (SA) function provides nodes with information gathered by the SM, including communication of the LID information to a Subnet Management Agent (SMA) in each node of the subnet. For simplicity and clarity in the description that follows, all of these subnet management and administration functions will be assumed to be carried out by the SM. Layer-2 switches within the subnet are configured by the SM to forward packets among the ports on the basis of the destination LID (D-LID) in the packet header. The SM is typically implemented as a software process running on a suitable computing platform in one of the nodes in the subnet, such as a host computer, switch or appliance.
Routing protocols are used to distribute routing information among routers, so as to enable each router to determine the port through which it should forward a packet having any given Layer-3 destination address. In IP networks, the routing information is developed and distributed by and among the routers themselves. A number of routing protocols are commonly used to exchange routing information among IP routers, such as Open Shortest Path First (OSPF) and the Border Gateway Protocol (BGP).
SUMMARY
Embodiments of the present invention provide improved methods and devices for routing packets between subnets.
There is therefore provided, in accordance with an embodiment of the present invention, a method for communication in a packet data network including at least first and second subnets interconnected by multiple routers and having respective first and second subnet managers. The method includes assigning respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet. The routers are configured by transmitting and receiving control traffic between the subnet managers and the routers. Data packets are transmitted between network nodes in the first and second subnets via one or more of the configured routers under control of the subnet managers.
In some embodiments, transmitting the data packets includes receiving at the first subnet manager a routing query from a sending node in the first subnet with respect to transmission of a packet to a destination node in the second subnet, and in response to the routing query, sending an instruction from the first subnet manager to the sending node to direct the packet to a specified router. Sending the instruction may include selecting the specified router so as to balance a traffic load among the multiple routers. Additionally or alternatively, sending the instruction includes instructing the sending node to direct the packet to a first router and upon occurrence of a failure of the first router, to direct the packet to a second router. Further additionally or alternatively, when the routing query includes an address field of the destination node, sending the instruction may include selecting the specified router as a numerical function of the address field.
In a disclosed embodiment, the routing query specifies a global identifier of the destination node, and sending the instruction includes instructing the sending node to address the packet to a local identifier that the subnet manager has assigned to a port of the specified router. The method may include transmitting, from the sending node to a distributed name server, a name query with respect to a host name of the destination node, and receiving the global identifier at the sending node from the distributed name server in response to the name query.
In some embodiments, transmitting the data traffic includes receiving at the second subnet manager a routing query from a router in response to having received at the router a packet from a sending node in the first subnet for transmission to a destination node in the second subnet, and in response to the routing query, sending an instruction from the second subnet manager to the router to direct the packet to a port having a specified local identifier in the second subnet.
Additionally or alternatively, transmitting the data packets includes receiving at the first subnet manager a routing query from a node in the network, sending an instruction, in response to the routing query, from the first subnet manager to the node to direct the packet to a specified port, and caching the instruction at the node for use in forwarding of subsequent packets.
In a disclosed embodiment, configuring the routers includes forming a multicast group extending over at least the first and second subnets via one or more of the routers.
There is also provided, in accordance with an embodiment of the present invention, apparatus for communication, including a plurality of routers interconnecting at least first and second subnets in a packet data network. At least first and second subnet managers are operative to assign respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet. The subnet managers configure the routers by transmitting and receiving control traffic to and from the routers, and control transmission of data packets between network nodes in the first and second subnets via one or more of the configured routers.
There is additionally provided, in accordance with an embodiment of the present invention, a computer software product, including a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer in a first subnet of a packet data network that includes a plurality of routers interconnecting multiple subnets, cause the computer to function as a first subnet manager in the first subnet so as to assign respective local identifiers to ports for addressing of data link traffic within the first subnet, while at least a second subnet manager assigns the local identifiers to the ports in at least a second subnet. The instructions cause at least the first and second subnet managers to configure the routers by transmitting and receiving control traffic to and from the routers, and to control transmission of data packets between network nodes in the first and second subnets via one or more of the configured routers.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram that schematically illustrates a computer network, in accordance with an embodiment of the present invention; and
FIG. 2 is a flow chart that schematically illustrates a method for packet routing, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS
In order to support the routing protocols that have become standard in IP networks, IP routers must typically have substantial autonomous computing power, memory and communication capabilities. These sorts of routing protocols and capabilities have not been developed in Layer-3 routers for other network architectures, such as InfiniBand (IB) networks.
Embodiments of the present invention that are described hereinbelow provide methods and apparatus for routing packets between subnets that take advantage of management capabilities that already exist within the subnets and thus relieve routers of the need to support complex routing protocols. Such an approach is appropriate particularly for IB networks, in which the capabilities and responsibilities of the existing subnet manager can be expanded to manage inter-subnet routing, as well. The embodiments described below therefore relate particularly to IB networks and use the vocabulary of IB specifications. The principles of the present invention, however, may also be applied, mutatis mutandis, in other network architectures that have a similar subnet management function.
In the disclosed embodiments, a packet data network comprises at least two subnets, which have respective subnet managers and are interconnected by multiple routers. The subnet managers assign local identifiers to the ports in their respective subnets for addressing of data-link (Layer 2) traffic within the subnet. The subnet managers transmit and receive control traffic, typically in the form of management packets, in order to learn the network topology and configure the routers accordingly. Data packets can then be transmitted between network nodes in the first and second subnets via the routers so configured under control of the subnet managers.
According to this scheme, before a sending node in a first subnet transmits a data packet to a destination node in another subnet, the sending node submits a routing query to the subnet manager in the first subnet. (In the InfiniBand context, this sort of query may be referred to as a “path query.”) In response to the query, the subnet manager sends an instruction to the sending node to direct the data packet to a specified router that connects the subnets. In providing these instructions, the subnet manager may take into account considerations such as load balancing among two or more routers, as well as other facets of route optimization and protection in case of router failure. Upon receiving the data packet from the sending node, the router may then query the subnet manager in the destination subnet for forwarding instructions to the destination node, or multiple destination nodes in the case of a multicast packet. Alternatively, the subnet manager in the first subnet may provide the router with complete path information in response to the initial routing query, so that no further query by the router will be required.
The above approach is advantageous, as noted earlier, in leveraging capabilities that already exist within the subnets. It can provide optimized performance and quality of service while avoiding any need for a central routing authority or global synchronization of routing information, and while having no single point of failure. As routing intelligence is focused in the subnet managers, the routers themselves need be little more than switches with forwarding information provided by the subnet managers. Exchange of routing information between routers themselves is unnecessary.
FIG. 1 is a block diagram that schematically illustrates a computer network 20, in accordance with an embodiment of the present invention. It will be assumed, for clarity and convenience of description, that network operates in accordance with IB specifications, although as noted earlier, the principles of the present embodiment may similarly be applied in other Layer-3 networks that have a subnet management function similar to that defined in IB networks. Relevant features of the IB architecture are described in the InfiniBand™ Architecture Specification Volume 1 (Release 1.2.1, November 2007), distributed by the InfiniBand Trade Association and incorporated herein by reference, and particularly in Chapter 14: “Subnet Management” and Chapter 19: “Routers.”
Network 20 comprises multiple subnets 22 (labeled subnets A, B and C), which are interconnected by Layer-3 routers 24 (labeled R0, R1 and R2). Each subnet 22 comprises multiple Layer-2 switches 26, which connect to hosts 28 via suitable host channel adapters (not shown). Switches 26 within each subnet may be interconnected in any suitable topology, such as a “fat tree” topology. Certain of the switches (for example, spine switches in the case of a fat tree topology) connect to routers 24 and thus enable packet transfer between subnets. In the pictured implementation, any given pair of subnets 22 is separated by no more than a single routing hop, but the principles of the present invention may also be extended to networks in which traffic between certain subnets must traverse two or more routers in sequence. Typically (although not necessarily), each pair of subnets 22 is connected by two or more routers 24, for purposes of load balancing and failure protection.
A subnet manager (SM) 30 in each subnet 22 performs management and administration functions defined by the above-mentioned IB specification, as well as additional routing functions that are described herein. (Optionally, more than one subnet manager may exist in a given subnet to provide backup in case of failure, but typically only a single subnet manager is active in performing these functions at any given time.) SM 30 is typically a combined hardware/software element, comprising a computing platform, such as an embedded or stand-alone central processing unit (CPU) with a memory and suitable interfaces, which runs management software that performs the functions described herein. The computing platform may be dedicated to subnet management functions, or it may alternatively be shared with other computing and communication functions. The software components of the SM may be downloaded to the computing platform in electronic form, for example over network 20 or via a separate control network (not shown). Alternatively or additionally, these software components may be stored on tangible, non-transitory computer-readable media, such as in optical, magnetic, or electronic memory.
SM 30 in each subnet 22 assigns a Layer-2 address, in the form of a LID, possibly including a multicast LID (MLID), to each port of each switch 26 and host 28 within the subnet. Each port also receives a GID Layer-3 address, wherein all ports in a given subnet have the same GID prefix, as explained above. Subnet managers 30 learn the topology of their respective subnets using methods defined by the IB specification, such as transmission and reception of suitable management packets, for example Direct Route Management Datagrams. By transmitting and receiving such packets to and from routers 24, the subnet managers are also able to learn which other subnets are connected to each router, as well as collecting information other network features, such as multicast groups. Alternatively or additionally, routers may autonomously publish their respective subnet connections to the subnet managers. By such mechanisms, SM 30 in subnet A may discover, for example, that this subnet is connected by both router R1 and router R2 to subnet B. The subnet managers save this intra- and inter-subnet topology information in their respective memories for use in making subsequent routing decisions, and update the information periodically when changes occur (due to failures or reconfiguration, for example).
In many network applications, processes and nodes are identified by names and/or numbers other than the LID and GID, such as a domain name, IP address or MAC address. Therefore, when a process running on one of hosts 28 needs to communicate with another node, possibly in a different subnet, the process may have only the name and not the GID needed to transmit a packet. To find the appropriate GID, the host may query a distributed name server (DNS) 32 at a predefined address in network 20. In response to a name query from one of hosts 28, DNS 32 returns the appropriate GID, which the host may then use as described below. DNS 32 may be implemented by any suitable means that are known in the art, such as manual tables, standard DNS servers, or SM-based translations.
FIG. 2 is a flow chart that schematically illustrates a method for packet routing in network 20, in accordance with an embodiment of the present invention. It is assumed in the description that follows, for the sake of simplicity, that the packet in question is a unicast packet, but similar methods may be applied, mutatis mutandis, in routing multicast packets.
The method of FIG. 2 is initiated when one of hosts in subnet A (referred to as the sending host, or S-HOST) has to send a packet to a destination host (D-HOST) in another subnet, for example subnet B. The sending host may obtain the GID of the destination host from DNS 32, as described above, or by any other suitable means. To identify the LID in subnet A to which this packet should initially be sent, the sending host queries subnet manager 30 in subnet A (referred to as SM-A) for a path to the GID of the destination host, at a host query step 40.
In response to this query, SM-A checks its topology records to identify the router or routers 24 that can provide access to the destination GID. (Alternatively, if SM-A determines that the destination GID refers to a node in subnet A, then it may simply return the LID of that node to the sending host.) SM-A chooses an appropriate one of these routers 24, such as R1, and returns a response to the sending host containing the LID of the port of R1 on subnet A, at a host response step 42. When multiple routers are available for this purpose, the subnet manager may apply various considerations in choosing the response to return at step 42. For example, the subnet manager may choose different routers for different packets (based on the source and/or destination address, for instance) in order to balance the traffic load among the routers and thus optimize bandwidth availability. Additionally or alternatively, the subnet manager may give the sending host both primary and backup router LIDs, and instruct the sending host to direct the packet to the primary router first, or to the backup router in the event of a failure of the primary router.
For purposes of router selection at step 42, it may be useful for SM-A simply to take a numerical function of a destination address field specified in the query of step 40. For example, the subnet manager may choose the router by taking the modulus of the destination GID (DGID) by the number of routers available. In this case, each router will have a routing table whose size is on the order of 1/N (wherein N is the number of routers). This algorithm is useful in load balancing and scales readily with the numbers of hosts and routers that are supported.
After receiving instructions from the subnet manager, the sending host transmits a data packet containing the GID of the destination host in the destination GID (DGID) header field and the LID of the router port specified by the subnet manager in the destination LID (DLID) header field, at a packet transmission step 44. The sending host inserts its own GID and LID in the appropriate source address fields of the packet. Upon receiving this packet, the router (R1 in this example) identifies the destination GID as belonging to subnet B and therefore sends a routing query to subnet manager 30 in subnet B (SM-B) with respect to this GID, at a router query step 46. SM-B checks its own memory for the LID of the destination host corresponding to the specified GID, and returns this LID to router R1, at a router response step 48. Based on this information, the router replaces the destination LID of the packet that it received from the source host with the LID provided by SM-B, and replaces the source LID with the router's own port LID on subnet B, and thus transmits the packet to the destination host, at a packet forwarding step 50.
The above flow may not necessarily be repeated every time a packet is to be transmitted to a given DGID; rather, the sending host and the router may cache the responses that they receive at steps 42 and 48, and then used this cached information in forwarding subsequent packets to the same DGID without querying the subnet managers each time. The query responses and cached information may include not only GID/LID correspondence, but also other forwarding information, such as service levels. Furthermore, as noted earlier, SM-A may provide the necessary forwarding information not only to the sending host, but also to the router, in which case steps 46 and 48 may be unnecessary.
As noted earlier, although the examples presented above relate mainly to routing of unicast packets, the principles of the present invention and the capabilities of the subnet managers that are described above may similarly be applied to multicast routing. In this case, subnet managers 30 may form multicast groups extending over multiple subnets via routers 24. In contrast to IP routers that are known in the art, the subnet managers are capable of supporting dynamic groups, which may have multipath and/or asymmetrical packet distribution routes.
To set up multi-subnet multicast groups, subnet managers 30 may send and receive queries via routers 24 to discover multicast groups that are supported in neighboring subnets. A subnet manager may instruct a router to register a multicast group in which the router servers as the transit point between members in different subnets. Routers 24 are programmed to support only loop-free topologies in this regard. The topologies may be tree- or mesh-type and either uni- or bi-directional, and they may be shared among multiple multicast groups and subnets, or they may be specific to a given group and/or subnet.
Furthermore, the mechanisms by which subnet managers discover and distribute routing information may be used not only for exploring network connectivity, as described above (including multicast groups), but also for collecting other information regarding subnets 22 and the nodes that they contain, such as network maximum transfer units (MTU) and partition keys (PKEY), for example.
It will be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims (22)

The invention claimed is:
1. A method for communication, comprising:
in a packet data network comprising at least first and second subnets interconnected by multiple routers and having respective first and second subnet managers, assigning respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet;
receiving by the first subnet manager, from a sending node in the first subnet, a routing query for a specific data packet directed to a destination node in the second subnet;
sending by the first subnet manager to the sending node, a routing instruction as to a specific router to which the specific data packet is to be transmitted, responsive to the routing query; and
providing the specific router, by one of the at least first and second subnet managers, a forwarding instruction for the specific data packet.
2. The method according to claim 1, wherein sending the routing instruction comprises selecting the specific router so as to balance a traffic load among the multiple routers.
3. The method according to claim 1, wherein sending the routing instruction comprises instructing the sending node to direct the specific data packet to a first router and upon occurrence of a failure of the first router, to direct the specific data packet to a second router.
4. The method according to claim 1, wherein the routing query comprises an address field of the destination node, and wherein sending the routing instruction comprises selecting the specific router as a numerical function of the address field.
5. The method according to claim 1, wherein the routing query specifies a global identifier of the destination node, and wherein sending the routing instruction comprises instructing the sending node to address the specific data packet to a local identifier that the subnet manager has assigned to a port of the specific router.
6. The method according to claim 5, and comprising transmitting, from the sending node to a distributed name server, a name query with respect to a host name of the destination node, and receiving the global identifier at the sending node from the distributed name server in response to the name query.
7. The method according to claim 1, and further comprising configuring the routers to form a multicast group extending over at least the first and second subnets via one or more of the routers.
8. The method according to claim 1, wherein providing the forwarding instruction comprises providing the forwarding instruction by the second subnet manager in response to a query from the specific router generated responsively to receiving the specific data packet.
9. The method according to claim 1, wherein providing the forwarding instruction comprises providing the forwarding instruction by the first subnet manager in response to the routing query.
10. The method according to claim 1, and further comprising caching the forwarding instruction at the sending node for use in forwarding subsequent data packets to a destination of the specific packet.
11. Apparatus for communication, comprising:
a plurality of routers interconnecting at least first and second subnets in a packet data network; and
at least first and second subnet managers, which are operative to assign respective local identifiers to ports for addressing of data link traffic within each subnet, such that the first subnet manager assigns the local identifiers in the first subnet, and the second subnet manager assigns the local identifiers in the second subnet, wherein the first subnet manager is configured to receive from a sending node in the first subnet, a routing query for a specific data packet directed to a destination node in the second subnet, and to send to the sending node, a routing instruction as to a specific router to which the specific data packet is to be transmitted, responsive to the routing query, and wherein one of the at least first and second subnet managers is configured to provide to the specific router a forwarding instruction for the specific data packet, responsively to the routing query.
12. The apparatus according to claim 11, wherein the first subnet manager is configured to select the specific router so as to balance a traffic load among the plurality of the routers.
13. The apparatus according to claim 11, wherein the first subnet manager is configured to instruct the sending node to direct the specific data packet to a first router and upon occurrence of a failure of the first router, to direct the specific data packet to a second router.
14. The apparatus according to claim 11, wherein the routing query comprises an address field of the destination node, and wherein the first subnet manager is configured to select the specific router as a numerical function of the address field.
15. The apparatus according to claim 11, wherein the routing query specifies a global identifier of the destination node, and wherein the routing instruction to the sending node specifies a local identifier that the subnet manager has assigned to a port of the specific router.
16. The apparatus according to claim 15, and comprising a distributed name server, which is configured to receive from the sending node a name query with respect to a host name of the destination node, and to provide the global identifier of the destination node to the sending node in response to the name query.
17. The apparatus according to claim 11, wherein the subnet managers are operative to configure the routers to form a multicast group extending over at least the first and second subnets via one or more of the routers.
18. The apparatus according to claim 11, wherein the second subnet manager is configured to provide the forwarding instruction to the specific router in response to a query from the specific router generated responsively to receiving the specific data packet.
19. The apparatus according to claim 11, wherein the first subnet manager is configured to provide the forwarding instruction to the specific router in response to the routing query.
20. The apparatus according to claim 11, wherein the sending node is configured to cache the forwarding instruction at the sending node for use in forwarding subsequent data packets to a destination of the specific packet.
21. The apparatus according to claim 11, wherein the multiple routers are configured to operate such that they do not exchange routing information.
22. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer in a first subnet of a packet data network that includes a plurality of routers interconnecting multiple subnets, cause the computer to function as a first subnet manager in the first subnet so as to assign respective local identifiers to ports for addressing of data link traffic within the first subnet, while at least a second subnet manager assigns the local identifiers to the ports in at least a second subnet,
wherein the program instructions cause the first subnet manager to be configured to receive from a sending node in the first subnet, a routing query for a specific data packet directed to a destination node in the second subnet, and to send to the sending node, a routing instruction as to a specific router to which the specific data packet is to be transmitted, responsive to the routing query, and cause one of the at least first and second subnet managers to be configured to provide to the specific router a forwarding instruction for the specific data packet, responsively to the routing query.
US13/721,052 2012-12-20 2012-12-20 Routing controlled by subnet managers Active 2034-05-22 US9385949B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/721,052 US9385949B2 (en) 2012-12-20 2012-12-20 Routing controlled by subnet managers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/721,052 US9385949B2 (en) 2012-12-20 2012-12-20 Routing controlled by subnet managers

Publications (2)

Publication Number Publication Date
US20140177639A1 US20140177639A1 (en) 2014-06-26
US9385949B2 true US9385949B2 (en) 2016-07-05

Family

ID=50974607

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/721,052 Active 2034-05-22 US9385949B2 (en) 2012-12-20 2012-12-20 Routing controlled by subnet managers

Country Status (1)

Country Link
US (1) US9385949B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9888010B2 (en) 2014-09-09 2018-02-06 Oracle International Corporation System and method for providing an integrated firewall for secure network communication in a multi-tenant environment
US20190173786A1 (en) * 2016-01-27 2019-06-06 Oracle International Corporation System and method for supporting resource quotas for intra and inter subnet multicast membership in a high performance computing environment
WO2019203414A1 (en) * 2018-04-19 2019-10-24 엘지전자 주식회사 Method for transmitting multicast frame in wireless lan system including plurality of subnets and wireless terminal using same
US11005724B1 (en) 2019-01-06 2021-05-11 Mellanox Technologies, Ltd. Network topology having minimal number of long connections among groups of network elements
EP3989513A1 (en) 2020-10-26 2022-04-27 Mellanox Technologies, Ltd. Routing across multiple subnetworks using address mapping
US11575594B2 (en) 2020-09-10 2023-02-07 Mellanox Technologies, Ltd. Deadlock-free rerouting for resolving local link failures using detour paths
US11765103B2 (en) 2021-12-01 2023-09-19 Mellanox Technologies, Ltd. Large-scale network with high port utilization
US11870682B2 (en) 2021-06-22 2024-01-09 Mellanox Technologies, Ltd. Deadlock-free local rerouting for handling multiple local link failures in hierarchical network topologies

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8671187B1 (en) 2010-07-27 2014-03-11 Aerohive Networks, Inc. Client-independent network supervision application
US9231888B2 (en) * 2012-05-11 2016-01-05 Oracle International Corporation System and method for routing traffic between distinct InfiniBand subnets based on source routing
US9948626B2 (en) 2013-03-15 2018-04-17 Aerohive Networks, Inc. Split authentication network systems and methods
US9690676B2 (en) 2013-03-15 2017-06-27 Aerohive Networks, Inc. Assigning network device subnets to perform network activities using network device information
US9559990B2 (en) 2013-08-27 2017-01-31 Oracle International Corporation System and method for supporting host channel adapter (HCA) filtering in an engineered system for middleware and application execution
US9843512B2 (en) 2013-08-27 2017-12-12 Oracle International Corporation System and method for controlling a data flow in an engineered system for middleware and application execution
US9152782B2 (en) 2013-12-13 2015-10-06 Aerohive Networks, Inc. Systems and methods for user-based network onboarding
US9729439B2 (en) 2014-09-26 2017-08-08 128 Technology, Inc. Network packet flow controller
US9736184B2 (en) 2015-03-17 2017-08-15 128 Technology, Inc. Apparatus and method for using certificate data to route data
US9729682B2 (en) 2015-05-18 2017-08-08 128 Technology, Inc. Network device and method for processing a session using a packet signature
US10630816B2 (en) * 2016-01-28 2020-04-21 Oracle International Corporation System and method for supporting shared multicast local identifiers (MILD) ranges in a high performance computing environment
US10536334B2 (en) 2016-01-28 2020-01-14 Oracle International Corporation System and method for supporting subnet number aliasing in a high performance computing environment
US10616118B2 (en) 2016-01-28 2020-04-07 Oracle International Corporation System and method for supporting aggressive credit waiting in a high performance computing environment
US10659340B2 (en) 2016-01-28 2020-05-19 Oracle International Corporation System and method for supporting VM migration between subnets in a high performance computing environment
US10581711B2 (en) 2016-01-28 2020-03-03 Oracle International Corporation System and method for policing network traffic flows using a ternary content addressable memory in a high performance computing environment
US9985883B2 (en) * 2016-02-26 2018-05-29 128 Technology, Inc. Name-based routing system and method
US10171353B2 (en) 2016-03-04 2019-01-01 Oracle International Corporation System and method for supporting dual-port virtual router in a high performance computing environment
CN108604199B (en) 2016-08-23 2022-08-23 甲骨文国际公司 System, method, and medium for supporting fast hybrid reconfiguration in a computing environment
US10868685B2 (en) 2017-03-24 2020-12-15 Oracle International Corporation System and method to provide explicit multicast local identifier assignment for per-partition default multicast local identifiers defined as subnet manager policy input in a high performance computing environment
US10601765B2 (en) * 2017-03-24 2020-03-24 Oracle International Corporation System and method to provide combined IB and IP address and name resolution schemes via default IB multicast groups in a high performance computing environment
US10862694B2 (en) * 2017-03-24 2020-12-08 Oracle International Corporation System and method to provide default multicast proxy for scalable forwarding of announcements and information request intercepting in a high performance computing environment
US10560277B2 (en) 2017-03-24 2020-02-11 Oracle International Corporation System and method to provide multicast group MLID dynamic discovery on received multicast messages for relevant MGID in a high performance computing environment
US10693815B2 (en) 2017-03-24 2020-06-23 Oracle International Corporation System and method to use all incoming multicast packets as a basis for GUID to LID cache contents in a high performance computing environment
US10868686B2 (en) 2017-03-24 2020-12-15 Oracle International Corporation System and method to provide default multicast group (MCG) for announcements and discovery as extended port information in a high performance computing environment
US10841199B2 (en) 2017-03-24 2020-11-17 Oracle International Corporation System and method for optimized path record handling in homogenous fabrics without host stack cooperation in a high performance computing environment
US10397096B2 (en) 2017-04-28 2019-08-27 International Business Machines Corporation Path resolution in InfiniBand and ROCE networks
US10778767B2 (en) 2017-04-28 2020-09-15 International Business Machines Corporation Persistent memory replication in RDMA-capable networks
US11243899B2 (en) 2017-04-28 2022-02-08 International Business Machines Corporation Forced detaching of applications from DMA-capable PCI mapped devices
US11792107B2 (en) * 2021-05-26 2023-10-17 Amadeus S.A.S. Resilient routing systems and methods for hosted applications

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999536A (en) 1996-11-29 1999-12-07 Anritsu Corporation Router for high-speed packet communication between terminal apparatuses in different LANs
US6169741B1 (en) 1995-10-12 2001-01-02 3Com Corporation Method and apparatus for transparent intermediate system based filtering on a LAN multicast packets
US20020039357A1 (en) 2000-09-29 2002-04-04 Jaakko Lipasti Addressing and routing in mobile ad hoc networks
US20040213220A1 (en) * 2000-12-28 2004-10-28 Davis Arlin R. Method and device for LAN emulation over infiniband fabrics
US6831918B1 (en) 1997-12-01 2004-12-14 Telia Ab IP/ATM network system adapted for the simultaneous transmission of IP data packets to a plurality of users
US20050144313A1 (en) * 2003-11-20 2005-06-30 International Business Machines Corporation Infiniband multicast operation in an LPAR environment
US20050266842A1 (en) * 2003-12-03 2005-12-01 Nasielski John W Methods and apparatus for CDMA2000/GPRS roaming
US7009968B2 (en) 2000-06-09 2006-03-07 Broadcom Corporation Gigabit switch supporting improved layer 3 switching
US7136642B1 (en) * 1999-12-30 2006-11-14 Massie Rodney E System and method of querying a device, checking device roaming history and/or obtaining device modem statistics when device is within a home network and/or a complementary network
US20080253299A1 (en) 2007-04-11 2008-10-16 Gerard Damm Priority trace in data networks
US20090034540A1 (en) * 2007-08-02 2009-02-05 Thales Avionics, Inc. System and method for streaming video on demand (vod) streams over a local network
US7499456B2 (en) 2002-10-29 2009-03-03 Cisco Technology, Inc. Multi-tiered virtual local area network (VLAN) domain mapping mechanism
US7650424B2 (en) 2000-04-04 2010-01-19 Alcatel-Lucent Usa Inc. Supporting mobile hosts on an internet protocol network
US7715328B2 (en) 1999-12-07 2010-05-11 Broadcom Corporation Mirroring in a stacked network switch configuration
US20110261687A1 (en) * 2010-04-26 2011-10-27 International Business Machines Corporation Priority Based Flow Control Within a Virtual Distributed Bridge Environment
US20120051362A1 (en) 2004-01-20 2012-03-01 Nortel Networks Limited Metro ethernet service enhancements
US20120063466A1 (en) 2010-09-10 2012-03-15 Futurewei Technologies, Inc. Specifying Priority On a Virtual Station Interface Discovery and Configuration Protocol Response
US20120093023A1 (en) * 2009-07-02 2012-04-19 Bull Sas Methods and devices for evaluating interconnection efficiency of parallel computer networks based upon static routing schemes
US8175094B2 (en) 2007-04-06 2012-05-08 International Business Machines Corporation Method and system for personalizing a multimedia program broadcasted through IP network
US8243745B2 (en) 2009-02-27 2012-08-14 Hitachi, Ltd. Buffer management method and packet communication apparatus
US20120275301A1 (en) 2011-04-29 2012-11-01 Futurewei Technologies, Inc. Port and Priority Based Flow Control Mechanism for Lossless Ethernet
US20130182704A1 (en) 2001-10-16 2013-07-18 Cisco Technology, Inc. Prioritization and preemption of data frames over a switching fabric
US20130301646A1 (en) * 2012-05-11 2013-11-14 Oracle International Corporation System and method for routing traffic between distinct infiniband subnets based on fat-tree routing

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6169741B1 (en) 1995-10-12 2001-01-02 3Com Corporation Method and apparatus for transparent intermediate system based filtering on a LAN multicast packets
US5999536A (en) 1996-11-29 1999-12-07 Anritsu Corporation Router for high-speed packet communication between terminal apparatuses in different LANs
US6831918B1 (en) 1997-12-01 2004-12-14 Telia Ab IP/ATM network system adapted for the simultaneous transmission of IP data packets to a plurality of users
US7715328B2 (en) 1999-12-07 2010-05-11 Broadcom Corporation Mirroring in a stacked network switch configuration
US7136642B1 (en) * 1999-12-30 2006-11-14 Massie Rodney E System and method of querying a device, checking device roaming history and/or obtaining device modem statistics when device is within a home network and/or a complementary network
US7650424B2 (en) 2000-04-04 2010-01-19 Alcatel-Lucent Usa Inc. Supporting mobile hosts on an internet protocol network
US7009968B2 (en) 2000-06-09 2006-03-07 Broadcom Corporation Gigabit switch supporting improved layer 3 switching
US20020039357A1 (en) 2000-09-29 2002-04-04 Jaakko Lipasti Addressing and routing in mobile ad hoc networks
US20040213220A1 (en) * 2000-12-28 2004-10-28 Davis Arlin R. Method and device for LAN emulation over infiniband fabrics
US20130182704A1 (en) 2001-10-16 2013-07-18 Cisco Technology, Inc. Prioritization and preemption of data frames over a switching fabric
US7499456B2 (en) 2002-10-29 2009-03-03 Cisco Technology, Inc. Multi-tiered virtual local area network (VLAN) domain mapping mechanism
US20050144313A1 (en) * 2003-11-20 2005-06-30 International Business Machines Corporation Infiniband multicast operation in an LPAR environment
US20050266842A1 (en) * 2003-12-03 2005-12-01 Nasielski John W Methods and apparatus for CDMA2000/GPRS roaming
US20120051362A1 (en) 2004-01-20 2012-03-01 Nortel Networks Limited Metro ethernet service enhancements
US8175094B2 (en) 2007-04-06 2012-05-08 International Business Machines Corporation Method and system for personalizing a multimedia program broadcasted through IP network
US20080253299A1 (en) 2007-04-11 2008-10-16 Gerard Damm Priority trace in data networks
US20090034540A1 (en) * 2007-08-02 2009-02-05 Thales Avionics, Inc. System and method for streaming video on demand (vod) streams over a local network
US8243745B2 (en) 2009-02-27 2012-08-14 Hitachi, Ltd. Buffer management method and packet communication apparatus
US20120093023A1 (en) * 2009-07-02 2012-04-19 Bull Sas Methods and devices for evaluating interconnection efficiency of parallel computer networks based upon static routing schemes
US20110261687A1 (en) * 2010-04-26 2011-10-27 International Business Machines Corporation Priority Based Flow Control Within a Virtual Distributed Bridge Environment
US20120063466A1 (en) 2010-09-10 2012-03-15 Futurewei Technologies, Inc. Specifying Priority On a Virtual Station Interface Discovery and Configuration Protocol Response
US20120275301A1 (en) 2011-04-29 2012-11-01 Futurewei Technologies, Inc. Port and Priority Based Flow Control Mechanism for Lossless Ethernet
US20130301646A1 (en) * 2012-05-11 2013-11-14 Oracle International Corporation System and method for routing traffic between distinct infiniband subnets based on fat-tree routing

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
Almog et al., U.S. Appl. No. 13/717,733, filed Dec. 18, 2012.
Almog et al., U.S. Appl. No. 13/754,912, filed Jan. 31, 2013.
Annex 31B of IEEE802.3x, "MAC Control Pause operation", pp. 741-751, year 2008.
Ayoub et al., U.S. Appl. No. 13/731,030, filed Dec. 30, 2012.
Cisco Systems, "Priority Flow Control: Build Reliable Layer 2 Infrastructure", 8 pages, Jun. 2009.
IEEE 802.1Q, "Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Standard for Local and metropolitan area networks, chapter 6 (p. 47-96) and chapter 9.6 (pp. 150-151), Aug. 31, 2011.
IEEE 802.1Qbb, "Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks-Amendment 17: Priority-based Flow Control", IEEE Standard for Local and metropolitan area networks, 40 pages, Sep. 30, 2011.
Infiniband Trade Association, "Architecture Specification", vol. 1, Release 1.2.1., Nov. 2007.
Nichols et al., "Definition of the differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", Network Working Group, Internet Engineering Task Force, RFC2474, 19 pages, Dec. 1998.
U.S. Appl. No. 13/717,733 Office Action dated Jun. 12, 2014.
U.S. Appl. No. 13/754,912 Office Action dated Apr. 22, 2015.
U.S. Appl. No. 13/754,912 Office Action dated Oct. 23, 2014.

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9888010B2 (en) 2014-09-09 2018-02-06 Oracle International Corporation System and method for providing an integrated firewall for secure network communication in a multi-tenant environment
US20190173786A1 (en) * 2016-01-27 2019-06-06 Oracle International Corporation System and method for supporting resource quotas for intra and inter subnet multicast membership in a high performance computing environment
WO2019203414A1 (en) * 2018-04-19 2019-10-24 엘지전자 주식회사 Method for transmitting multicast frame in wireless lan system including plurality of subnets and wireless terminal using same
US11005724B1 (en) 2019-01-06 2021-05-11 Mellanox Technologies, Ltd. Network topology having minimal number of long connections among groups of network elements
US11575594B2 (en) 2020-09-10 2023-02-07 Mellanox Technologies, Ltd. Deadlock-free rerouting for resolving local link failures using detour paths
EP3989513A1 (en) 2020-10-26 2022-04-27 Mellanox Technologies, Ltd. Routing across multiple subnetworks using address mapping
US11411911B2 (en) 2020-10-26 2022-08-09 Mellanox Technologies, Ltd. Routing across multiple subnetworks using address mapping
US11870682B2 (en) 2021-06-22 2024-01-09 Mellanox Technologies, Ltd. Deadlock-free local rerouting for handling multiple local link failures in hierarchical network topologies
US11765103B2 (en) 2021-12-01 2023-09-19 Mellanox Technologies, Ltd. Large-scale network with high port utilization

Also Published As

Publication number Publication date
US20140177639A1 (en) 2014-06-26

Similar Documents

Publication Publication Date Title
US9385949B2 (en) Routing controlled by subnet managers
US6397260B1 (en) Automatic load sharing for network routers
EP2109962B1 (en) Triple-tier anycast addressing
US9253140B2 (en) System and method for optimizing within subnet communication in a network environment
US11038834B2 (en) Selecting an external link of a plurality of external links
EP2748992B1 (en) Method for managing network hardware address requests with a controller
EP2907279B1 (en) Ensuring any-to-any reachability with opportunistic layer 3 forwarding in massive scale data center environments
US10333793B2 (en) Network fabric topology expansion and self-healing devices
WO2014161408A1 (en) Internet protocol address resolution
EP2584742B1 (en) Method and switch for sending packet
US11736393B2 (en) Leveraging multicast listener discovery for discovering hosts
US11290497B2 (en) Access-control list generation for security policies
Scott et al. Addressing the Scalability of Ethernet with MOOSE
CN104734930B (en) Method and device for realizing access of Virtual Local Area Network (VLAN) to Variable Frequency (VF) network and Fiber Channel Frequency (FCF)
CN114338512A (en) MLAG link fault switching method and device
EP3989513A1 (en) Routing across multiple subnetworks using address mapping
US8732335B2 (en) Device communications over unnumbered interfaces
KR101786616B1 (en) Method, apparatus and computer program for subnetting of software defined network
KR102211282B1 (en) Methods of data routing and a switch thereof
CN111064818B (en) Configuration method and device
KR20050054003A (en) System and method for switching data between virtual local area networks included in same ip subnet
US9521065B1 (en) Enhanced VLAN naming
Liu Efficient Data Switching in Large Ethernet Networks using VLANs
KR20150052773A (en) Communication method in software defined network using hierachical structure and system thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: MELLANOX TECHNOLOGIES LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VERSHKOV, ILYA;GOLDENBERG, DROR;ZAHAVI, EITAN;AND OTHERS;SIGNING DATES FROM 20121216 TO 20121218;REEL/FRAME:029505/0737

AS Assignment

Owner name: MELLANOX TECHNOLOGIES TLV LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MELLANOX TECHNOLOGIES LTD.;REEL/FRAME:030138/0225

Effective date: 20130129

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT

Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:MELLANOX TECHNOLOGIES TLV LTD.;REEL/FRAME:037898/0959

Effective date: 20160222

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MELLANOX TECHNOLOGIES TLV LTD., ISRAEL

Free format text: RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL AT REEL/FRAME NO. 37898/0959;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:046542/0699

Effective date: 20180709

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8