WO2015035891A1 - Patching method, device, and system - Google Patents

Patching method, device, and system Download PDF

Info

Publication number
WO2015035891A1
WO2015035891A1 PCT/CN2014/086042 CN2014086042W WO2015035891A1 WO 2015035891 A1 WO2015035891 A1 WO 2015035891A1 CN 2014086042 W CN2014086042 W CN 2014086042W WO 2015035891 A1 WO2015035891 A1 WO 2015035891A1
Authority
WO
WIPO (PCT)
Prior art keywords
patch
error message
directory information
protocol
data structure
Prior art date
Application number
PCT/CN2014/086042
Other languages
French (fr)
Chinese (zh)
Inventor
王工艺
李涛
李生
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015035891A1 publication Critical patent/WO2015035891A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration

Definitions

  • the present application relates to the field of storage, and in particular to a patch method, device and system.
  • a cache coherence protocol Non-Uniform Memory Access (CC-NUMA) multiprocessor system includes a plurality of CPUs 110 and a plurality of node controllers (Node Controller, NC). ) 120.
  • the node controller 120 and the CPU 110 are topologically connected, and the node controller 120 coordinates the work between the plurality of CPUs 110.
  • each CPU 110 has at least one private cache buffer (Cache), all CPUs share a main memory.
  • the CPUs 110 respectively put the data into their own private cache and modify the data according to their respective needs, when one of the CPUs 110 is modified.
  • the other CPU 110 does not know that the existing CPU 110 has modified this data, and it is still processed according to the original data, causing a conflict, which leads to a Cache consistency problem.
  • the prior art provides a Cache coherency protocol, and at least one protocol processing engine is set in each node controller 120.
  • the protocol processing engine records the packet according to the type and directory information of the packet.
  • the directory status is found in the corresponding protocol table, where the protocol table records the processing mode of the current message, the operation mode of the directory information, and whether conflicts occur. If a conflict occurs, the protocol processing engine processes the current packet according to the processing mode of the current packet and the operation mode of the directory information, and obtains new packets and updated directory information, thereby avoiding the Cache consistency problem.
  • the application provides a protocol processing engine fault tolerance processing method, device and system, which can ensure the normal operation of the system when the protocol is in error.
  • a first aspect of the present application provides a patching method, if a protocol processing engine in a node controller performs an Cache Coherency Protocol error, suspends an error message, and continues to process subsequent packets different from the error packet address,
  • the patch receives the lookup table data structure, the directory information, and the error message sent by the node controller; the added patch module executes the patch according to the table lookup data structure, the directory information, and the error message
  • the patch is configured to process the error message in place of the protocol processing engine to generate a new message and generate corresponding new directory information.
  • the error message includes a current error message and a message having the same address as the current error message.
  • the added patch module is configured to perform a patch procedure according to the table lookup data structure, the directory information, and the error message.
  • the method further includes: the added patch module receives the lookup table data structure, the directory information, the error message, and the identifier of the protocol table that needs to be queried sent by the node controller.
  • the second aspect of the present application provides a node controller, including a patch module, where the patch module is coupled to a protocol processing engine module in a node controller, where the patch module includes: a receiving unit, and an executing unit, where the receiving unit is used by the receiving unit
  • the protocol processing engine module in the node controller executes the Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the table data sent by the node controller is received.
  • the receiving unit sends the lookup table data structure, the directory information, and the error message to the execution unit;
  • the execution unit is configured to receive the check a table data structure, the directory information, and the error message, executing a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used instead of the protocol processing
  • the engine module enters the error message Line processing to generate new messages and generate corresponding new directory information.
  • the error message includes a current error message and a message having the same address as the current error message.
  • the receiving unit is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller.
  • the identity of the agreement table is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller.
  • a third aspect of the present application provides a patch system, including a patch module and a node controller, where the patch module is coupled to a protocol processing engine module in a node controller, where the node controller is configured to execute a Cache coherency protocol in a protocol processing engine.
  • the patch module is configured to receive the table lookup data structure, the directory information, and the error message sent by the node controller. And executing a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used to replace the error processing message by the protocol processing engine to generate a new report. And generate corresponding new directory information.
  • the error message includes a current error message and a message having the same address as the current error message.
  • the patch module is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller.
  • the identity of the agreement table is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller.
  • the patch module includes an input random access memory, a patch processor, and an output random access memory, wherein the input random access memory One end is coupled to one end of the protocol processing engine module in the node controller, and the other end is coupled to one end of the patch processor, and the other end of the patch processor is coupled to one end of the output random access memory, the output The other end of the random access memory is coupled to the other end of the protocol processing engine in the node controller, where the input random access memory is used to store the table lookup data structure, the directory information, and the error message sent by the node controller.
  • the patch processor is configured to use the lookup table data structure, the directory information, and the error message execution patch, where the patch is used to process the error message instead of the protocol processing engine. Generating a new message and generating corresponding new directory information; the output random access memory is used to store new messages and Directory information.
  • the number of the input random access memory and the output random access memory are the same as the number of the protocol processing engines, wherein one end of one of the input random access memories is coupled to a protocol processing At one end of the engine, the other end of all the input random access memories is coupled to the patch processor, and the other end of the patch processor is coupled to one end of all the output random access memories, and the other end of each output random access memory.
  • the other end of a protocol processing engine module is coupled to each other.
  • the above application is added with a patch module.
  • the protocol processing engine When the protocol processing engine is faulty, the protocol processing engine suspends the error message, and continues to process subsequent messages with different error message addresses, and the error packets are processed by the patch module. To ensure that the system will still work properly in the event of a protocol error.
  • FIG. 1 is a schematic structural view of an embodiment of a prior art multiprocessing system
  • FIG. 2 is a schematic structural diagram of an embodiment of a patch system of the present application.
  • FIG. 3 is a flowchart of an embodiment of a patching method of the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a node controller according to the present application.
  • FIG. 5 is a schematic structural diagram of another embodiment of a node controller according to the present application.
  • FIG. 2 is a schematic structural diagram of an embodiment of a patch system of the present application.
  • the patch system of this embodiment includes a patch module 210 and a node controller 220.
  • the patch module 210 and the node controller 220 collectively schedule a plurality of CPUs.
  • the patch module 210 includes a plurality of input random access memories 211, a patch processor 213, and a plurality of output random access memories 215.
  • the node controller 220 includes a plurality of protocol processing engines 221 and a plurality of protocol table (P-Table) memories 223. Moreover, the numbers of the input random access memory 211, the output random access memory 215, the protocol processing engine 221, and the protocol table memory 223 are all equal.
  • P-Table protocol table
  • One end of the input random access memory 211 is coupled to one end of a protocol processing engine 221.
  • the other end of the input random access memory 211 is coupled to the patch processor 213.
  • the other end of the patch processor 213 is coupled to all the outputs at the same time.
  • One end of the memory 215 is accessed, and the other end of each output random access memory 215 is coupled to the other end of a protocol processing engine 221.
  • Each protocol processing engine 221 is coupled to a protocol table memory 223.
  • the protocol processing engine 221 reads the protocol in the protocol table memory 223 according to the type of the message and the current directory status.
  • the protocol in the protocol table memory 223 includes the processing mode of the message. How the directory operates, how internal entries work, and whether the protocol is in error. If the protocol design is incomplete, the packet will be processed according to the protocol, and the result will be incorrect. If the protocol cannot process the packet, the protocol will be faulty, and the packet that cannot be processed is an error packet. If there is no error in the protocol, the packet is processed by the Cache coherence protocol according to the protocol, so that a new packet is obtained and a new directory is generated, and the new packet is added to the pipeline.
  • the protocol processing engine module 221 suspends the error message to suspend processing of the message, including the current error message and the same message as the current error message address, since each Each message has a corresponding address, so it is easy to find the above message, where the address refers to the system address of the CPU, and the table data structure, the directory information, and the error message are sent to the protocol processing engine 221.
  • the random access memory 211 is input.
  • the lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict.
  • the directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space.
  • the lookup table data structure records the environmental data when accessing this address space.
  • the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict.
  • the processor If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent together with the lookup table data structure, the directory information, and the error message to the input random access memory 211 corresponding to the protocol processing engine module 221. The protocol processing engine 221 continues to process subsequent messages that are different from the error message address to avoid clogging.
  • the patch processor 213 When the patch processor 213 detects that there is an error message to be processed in the input random access memory 211, the table data structure, the directory information, and the error message are read from the input random access memory 211, and according to the table data structure.
  • the directory information and the error message execution patch wherein the patch is used to replace the protocol processing engine 221 to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information.
  • the patch processor 213 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet.
  • New directory information If the protocol table needs to be further queried, the patch processor 213 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the data structure, the directory information, and the error message to generate a new packet and generate a
  • the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine.
  • the processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
  • the patch processor 213 writes the new message and the new directory information to the output random access memory 215 corresponding to the protocol processing engine module 221, wherein each protocol processing engine module 221 and one output random access memory 215 connection. After the protocol processing engine 221 detects that there is a new message in the corresponding output random access memory 215, the new message is inserted into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
  • each protocol processing engine 221 is respectively provided with an input random access memory 211 and an output random access memory 215, and a plurality of input random access memories 211 and a plurality of output random access memories 215.
  • a patch processor 213 is shared.
  • the plurality of protocol processing engines 221 may share one input random access memory 211, one output random access memory 215, and one. Patch processor 213.
  • a protocol processing engine 221 can also use an input random access memory 211, an output random access memory 215, and a patch processor 213.
  • FIG. 3 is a flowchart of an embodiment of a patching method of the present application.
  • the patch method of this embodiment includes the following steps:
  • the protocol processing engine module When a protocol processing engine module in the node controller performs a Cache coherency protocol error, the protocol processing engine module suspends the error message to suspend processing of the message, including the current error message and the same message as the current error message address. Therefore, since each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and the data structure, directory information, and error message of the lookup table are sent to the patch.
  • the lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict.
  • the directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space.
  • the lookup table data structure records the environmental data when accessing this address space.
  • the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict.
  • the processor If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If you need to continue to query the protocol table, you can send the identifier of the protocol table to be queried together with the table data structure, directory information, and error packets to the patch module. The protocol processing engine continues to process subsequent messages that are different from the error message address to avoid clogging.
  • the added patch corresponds to the lookup table data structure, directory information, and error packets sent by the node controller. If the protocol table needs to be queried, the patch module receives the data structure of the lookup table, the directory information, the error packet, and the identifier of the protocol table to be queried.
  • the added patch module executes a patch according to the lookup table data structure, the directory information, and the error message, where the patch is used to replace the protocol processing engine to process an error message to generate New messages and corresponding new directory information.
  • the added patch module detects an error message that needs to be processed
  • the data structure, the directory information, and the error message are input, and the patch is executed according to the data structure of the table, the directory information, and the error message.
  • It is used to replace the protocol processing engine to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information.
  • the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine.
  • the processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
  • the added patch queries the protocol table according to the identifier of the protocol table to be queried, and then executes a patch according to the data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet.
  • New directory information The added patch inserts new messages and new directory information into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
  • FIG. 4 is a schematic structural diagram of an embodiment of a node controller according to the present application.
  • the node controller of the present embodiment includes a patch module, and the patch module is coupled to a protocol processing engine module in the node controller, where the patch module includes: a receiving unit 410 and an executing unit 420.
  • the receiving unit 410 is configured to: when the protocol processing engine module in the node controller executes a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the receiving node controller sends the check Table data structure, directory information, and error messages. For example, when a protocol processing engine module in the node controller executes the Cache coherency protocol error, the protocol processing engine module suspends the error message to suspend the processing of the message, including the current error message and the same as the current error message address. The message, because each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and will look up the data structure, directory information and error messages. Send to the receiving unit 410.
  • the lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict.
  • the directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space.
  • the lookup table data structure records the environmental data when accessing this address space.
  • the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict.
  • the processor If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent to the receiving unit 410 together with the lookup table data structure, the directory information, and the error message. The protocol processing engine module continues to process subsequent messages that are different from the error message address, thereby avoiding congestion.
  • the receiving unit 410 correspondingly receives the table lookup data structure, the directory information, and the error message sent by the node controller. If it is necessary to continue to query the protocol table, the receiving unit 410 correspondingly receives the table data structure, the directory information, the error message, and the identifier of the protocol table that needs to be queried.
  • the receiving unit 410 transmits the lookup table data structure, the directory information, and the error message to the execution unit 420.
  • the executing unit 420 is configured to receive the lookup table data structure, the directory information, and the error message, and execute the patch according to the table data structure, the directory information, and the error message, where the patch is used instead of the protocol processing engine module to report the error.
  • the text is processed according to the original processing method of the protocol processing engine to generate a new message and generate corresponding new directory information.
  • the execution unit 420 When the execution unit 420 detects an error message that needs to be processed, it receives the table data structure, the directory information, and the error message, and executes the patch according to the table data structure, the directory information, and the error message, wherein the patch is used. Processing the error message instead of the protocol processing engine module to generate Create new messages and generate corresponding new directory information. If it is still necessary to continue to query the protocol table, the executing unit 420 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the table data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet. New directory information.
  • the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine.
  • the processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
  • the execution unit 420 inserts the new message and the new directory information into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
  • FIG. 5 is a schematic structural diagram of another embodiment of a node controller according to the present application.
  • the protocol processing engine fault tolerance processing apparatus of the present embodiment includes a patch module including an input random access memory 510, a patch processor 520, and an output random access memory 530.
  • One end of the input random access memory 510 is coupled to the protocol processing engine module in the node controller, the other end of the input random access memory 510 is coupled to one end of the patch processor 520, and the other end of the patch processor 520 is coupled to the output.
  • the other end of the output random access memory 530 is coupled to a protocol processing engine module within the node controller.
  • the input random access memory 510 is configured to: when the protocol processing engine module in the node controller executes a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the receiving node controller The data structure of the table lookup, the directory information, and the error message. For example, when a protocol processing engine in the node controller executes the Cache coherency protocol error, the protocol processing engine suspends the error message to suspend the processing of the message, including the current error message and the same message as the current error message address. Since each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and the data structure, directory information, and error message of the lookup table are sent to the input random access.
  • the lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict.
  • the directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the address empty. The specific state between.
  • the lookup table data structure records the environmental data when accessing this address space.
  • the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict.
  • the processor If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent to the input random access memory 510 together with the lookup table data structure, the directory information, and the error message. The protocol processing engine continues to process subsequent messages that are different from the error message address to avoid clogging.
  • the input random access memory 510 correspondingly receives the lookup table data structure, the directory information, and the error message sent by the node controller. If it is necessary to continue to query the protocol table, the input random access memory 510 correspondingly receives the table data structure, the directory information, the error message, and the identifier of the protocol table to be queried.
  • the patch processor 520 is configured to execute a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used to replace the protocol processing engine to process the error message to generate a new message and generate a corresponding new one.
  • Directory information For example, when the patch processor 520 detects that the input random access memory 510 has an error message that needs to be processed, it receives the table data structure, the directory information, and the error message, and according to the table data structure, the directory information, and the error message. The patch is executed, wherein the patch is used to replace the protocol processing engine to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information.
  • the patch processor 520 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the table data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet.
  • New directory information For example, the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine. The processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
  • the patch processor 520 inserts the new message and the new directory information temporarily stored in the patch processor 520 into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
  • the output random access memory 730 is used to temporarily store the new message output by the patch processor 720 and the new directory information.
  • the application also provides a protocol processing engine fault-tolerant processing system, including a patch module, and the patch module is coupled to a protocol processing engine module in the node controller.
  • a protocol processing engine fault-tolerant processing system including a patch module, and the patch module is coupled to a protocol processing engine module in the node controller.
  • the above application is added with a patch module.
  • the protocol processing engine When the protocol processing engine is faulty, the protocol processing engine suspends the error message, and continues to process subsequent messages with different error message addresses, and the error packets are processed by the patch module. To ensure that the system will still work properly in the event of a protocol error.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device implementations described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present application in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a computer device which may be a personal computer, server, or network device, etc.
  • a processor to perform various embodiments of the present application. All or part of the steps of the method.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A patching method, device, and system. Said method comprises: if there is an error when the protocol engine in a node controller implements a cache coherence protocol, then suspending the error message and continuing to process the subsequent messages having addresses different from that of the error message, and an added patch receiving a lookup table data structure, catalog information, and error message, which are all sent by the node controller; the added patch module executes a patch process according to the lookup table data structure, catalog information, and error message; the patch process is used in place of the protocol engine to process the error message and generate a new message and generate corresponding new catalog information. By means of adding a patch module, when an error occurs, the protocol engine module suspends the error message and continues processing the subsequent messages, while the error message is processed by the patch module; thus when a protocol error occurs, the system is still guaranteed to operate normally.

Description

补丁方法、设备及系统Patch method, device and system
本申请要求于2013年9月16日提交中国专利局、申请号201310421697.5、发明名称为“补丁方法、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20131042169 7.5, filed on Sep.
技术领域Technical field
本申请涉及存储领域,特别是涉及补丁方法、设备及系统。The present application relates to the field of storage, and in particular to a patch method, device and system.
背景技术Background technique
随着各种终端的高速发展,各种各样的高级应用越来越丰富,对于中央处理器(Central Processing Unit,CPU)的运算速度的要求也是越来越高。但是,单个CPU的性能提高是有限的,因此,多处理器系统得到了高速的发展。如图1所示,Cache一致性协议非均匀存储器存取结构(cache coherence protocol Non-Uniform Memory Access,CC-NUMA)多处理器系统包括多个CPU 110以及多个节点控制器(Node Controller,NC)120。其中,节点控制器120与CPU110之间拓扑连接,节点控制器120协调多个CPU 110之间的工作。With the rapid development of various terminals, various advanced applications are becoming more and more abundant, and the requirements for the operation speed of a central processing unit (CPU) are also increasing. However, the performance improvement of a single CPU is limited, and therefore, the multiprocessor system has been developed at a high speed. As shown in FIG. 1, a cache coherence protocol Non-Uniform Memory Access (CC-NUMA) multiprocessor system includes a plurality of CPUs 110 and a plurality of node controllers (Node Controller, NC). ) 120. The node controller 120 and the CPU 110 are topologically connected, and the node controller 120 coordinates the work between the plurality of CPUs 110.
由于每个CPU 110都至少有一个私有的高速缓存缓冲器(Cache),而所有CPU共享一个主存储器。当不同的CPU 110使用到主存储器的同一个数据时,这些CPU 110会分别把这个数据放到自己私有的Cache中,并根据各自的需要对数据进行修改等操作,当其中一个CPU 110修改了这个数据时,另一个CPU110并不知道已有CPU 110对这个数据进行了修改,所述它还是按原来的数据进行处理,造成冲突,这就导致了Cache一致性问题。Since each CPU 110 has at least one private cache buffer (Cache), all CPUs share a main memory. When different CPUs 110 use the same data to the main memory, the CPUs 110 respectively put the data into their own private cache and modify the data according to their respective needs, when one of the CPUs 110 is modified. At the time of this data, the other CPU 110 does not know that the existing CPU 110 has modified this data, and it is still processed according to the original data, causing a conflict, which leads to a Cache consistency problem.
现有技术提供了一种Cache一致性协议,每个节点控制器120内至少设置一个协议处理引擎,当报文进入到协议处理引擎时,协议处理引擎根据报文的类型及目录信息所记载的目录状态查找到对应的协议表,其中,协议表中记载了当前报文的处理方式、对所述目录信息的操作方式及是否发生冲突等等。如果发生了冲突,则协议处理引擎会根据当前报文的处理方式及所述目录信息的操作方式进行处理,并获得新的报文以及更新的目录信息,从而避免出现Cache一致性问题。The prior art provides a Cache coherency protocol, and at least one protocol processing engine is set in each node controller 120. When the packet enters the protocol processing engine, the protocol processing engine records the packet according to the type and directory information of the packet. The directory status is found in the corresponding protocol table, where the protocol table records the processing mode of the current message, the operation mode of the directory information, and whether conflicts occur. If a conflict occurs, the protocol processing engine processes the current packet according to the processing mode of the current packet and the operation mode of the directory information, and obtains new packets and updated directory information, thereby avoiding the Cache consistency problem.
但是,系统规模的快速扩充、网络延时的不确定性和存储一致性模型的多 样性等诸多因素,使Cache一致性协议异常复杂,协议的状态空间呈指数级增长,在现有技术条件下,Cache一致性协议并不能做到覆盖所有的情况,即所有的冲突都能解决,而一旦有冲突不能被解决,协议就会出现问题,导致协议处理引擎无法对当前报文进行处理,造成后面的报文也无法进行处理,从而导致系统无法正常工作。However, the rapid expansion of system scale, network latency uncertainty and storage consistency models Many factors, such as modality, make the Cache coherence protocol extremely complex, and the state space of the protocol grows exponentially. Under the existing technical conditions, the Cache coherence protocol cannot cover all situations, that is, all conflicts can be solved. If there is a conflict that cannot be resolved, the protocol will be in a problem, and the protocol processing engine will not be able to process the current packet, and the subsequent packets will not be processed, resulting in the system not working properly.
发明内容Summary of the invention
本申请提供协议处理引擎容错处理方法、装置及系统,能够在协议出错时,保证系统正常工作。The application provides a protocol processing engine fault tolerance processing method, device and system, which can ensure the normal operation of the system when the protocol is in error.
本申请第一方面提供一种补丁方法,如果节点控制器内的协议处理引擎执行Cache一致性协议出错,挂起出错报文及继续处理与所述出错报文地址不同的后续报文,则增设的补丁接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文;所述增设的补丁模块根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息。A first aspect of the present application provides a patching method, if a protocol processing engine in a node controller performs an Cache Coherency Protocol error, suspends an error message, and continues to process subsequent packets different from the error packet address, The patch receives the lookup table data structure, the directory information, and the error message sent by the node controller; the added patch module executes the patch according to the table lookup data structure, the directory information, and the error message The patch is configured to process the error message in place of the protocol processing engine to generate a new message and generate corresponding new directory information.
结合第一方面,本申请第一方面的第一种可能的实施方式中,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。In conjunction with the first aspect, in a first possible implementation manner of the first aspect, the error message includes a current error message and a message having the same address as the current error message.
结合第一方面,本申请第一方面的第二种可能的实施方式中,所述增设的补丁模块根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序的步骤之前进一步包括:所述增设的补丁模块接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要查询的协议表的标识。With reference to the first aspect, in a second possible implementation manner of the first aspect of the present application, the added patch module is configured to perform a patch procedure according to the table lookup data structure, the directory information, and the error message. The method further includes: the added patch module receives the lookup table data structure, the directory information, the error message, and the identifier of the protocol table that needs to be queried sent by the node controller.
本申请第二方面提供一种节点控制器,包括补丁模块,所述补丁模块耦接节点控制器内的协议处理引擎模块,所述补丁模块包括:接收单元、以及执行单元,所述接收单元用于在节点控制器内的协议处理引擎模块执行Cache一致性协议出错,挂起出错报文及继续处理与所述出错报文地址不同的后续报文时,接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文,所述接收单元将所述查表数据结构、所述目录信息以及所述出错报文向所述执行单元发送;所述执行单元用于接收所述查表数据结构、所述目录信息以及所述出错报文,根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎模块对所述出错报文进 行处理以生成新的报文及生成相应的新的目录信息。The second aspect of the present application provides a node controller, including a patch module, where the patch module is coupled to a protocol processing engine module in a node controller, where the patch module includes: a receiving unit, and an executing unit, where the receiving unit is used by the receiving unit When the protocol processing engine module in the node controller executes the Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the table data sent by the node controller is received. a structure, directory information, and the error message, the receiving unit sends the lookup table data structure, the directory information, and the error message to the execution unit; the execution unit is configured to receive the check a table data structure, the directory information, and the error message, executing a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used instead of the protocol processing The engine module enters the error message Line processing to generate new messages and generate corresponding new directory information.
结合第二方面,本申请第二方面的第一种可能的实施方式中,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。In conjunction with the second aspect, in a first possible implementation manner of the second aspect, the error message includes a current error message and a message having the same address as the current error message.
结合第二方面,本申请第二方面的第二种可能的实施方式中,所述接收单元还用于接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要查询的协议表的标识。With reference to the second aspect, in a second possible implementation manner of the second aspect of the present application, the receiving unit is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller. The identity of the agreement table.
本申请第三方面提供一种补丁系统,包括补丁模块以及节点控制器,所述补丁模块耦接节点控制器内的协议处理引擎模块,所述节点控制器用于在协议处理引擎执行Cache一致性协议出错时,挂起出错报文及继续处理与所述出错报文地址不同的后续报文;所述补丁模块用于接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文,根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息。A third aspect of the present application provides a patch system, including a patch module and a node controller, where the patch module is coupled to a protocol processing engine module in a node controller, where the node controller is configured to execute a Cache coherency protocol in a protocol processing engine. When an error occurs, the error message is suspended and the subsequent message different from the error message address is processed; the patch module is configured to receive the table lookup data structure, the directory information, and the error message sent by the node controller. And executing a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used to replace the error processing message by the protocol processing engine to generate a new report. And generate corresponding new directory information.
结合第三方面,本申请第三方面的第一种可能的实施方式中,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。In conjunction with the third aspect, in a first possible implementation manner of the third aspect, the error message includes a current error message and a message having the same address as the current error message.
结合第三方面,本申请第三方面的第二种可能的实施方式中,所述补丁模块还用于接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要查询的协议表的标识。With reference to the third aspect, in a second possible implementation manner of the third aspect, the patch module is further configured to receive a table lookup data structure, directory information, the error packet, and a query that are sent by the node controller. The identity of the agreement table.
结合第三方面,本申请第三方面的第三种可能的实施方式中,所述补丁模块包括输入随机存取存储器、补丁处理器以及输出随机存取存储器,其中,所述输入随机存取存储器一端耦接节点控制器内的协议处理引擎模块的一端,另一端耦接所述补丁处理器的一端,所述补丁处理器的另一端耦接所述输出随机存取存储器的一端,所述输出随机存取存储器的另一端耦接节点控制器内的协议处理引擎的另一端,所述输入随机存取存储器用于存储节点控制器所发送的查表数据结构、目录信息以及所述出错报文;所述补丁处理器用于所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息;所述输出随机存取存储器用于存储新的报文及新的目录信息。In conjunction with the third aspect, in a third possible implementation manner of the third aspect of the present application, the patch module includes an input random access memory, a patch processor, and an output random access memory, wherein the input random access memory One end is coupled to one end of the protocol processing engine module in the node controller, and the other end is coupled to one end of the patch processor, and the other end of the patch processor is coupled to one end of the output random access memory, the output The other end of the random access memory is coupled to the other end of the protocol processing engine in the node controller, where the input random access memory is used to store the table lookup data structure, the directory information, and the error message sent by the node controller. The patch processor is configured to use the lookup table data structure, the directory information, and the error message execution patch, where the patch is used to process the error message instead of the protocol processing engine. Generating a new message and generating corresponding new directory information; the output random access memory is used to store new messages and Directory information.
结合第三方面的第三种可能的实施方式,本申请第三方面的第四种可能的 实施方式中,所述输入随机存取存储器以及所述输出随机存取存储器的数目都与所述协议处理引擎的数量相同,其中,一个所述输入随机存取存储器的一端分别耦接一个协议处理引擎的一端,所有输入随机存取存储器的另一端同时耦接补丁处理器,所述补丁处理器的另一端同时耦接所有输出随机存取存储器的一端,每个输出随机存取存储器的另一端分别耦接一个协议处理引擎模块的另一端。In conjunction with the third possible implementation of the third aspect, the fourth possible aspect of the third aspect of the present application In an embodiment, the number of the input random access memory and the output random access memory are the same as the number of the protocol processing engines, wherein one end of one of the input random access memories is coupled to a protocol processing At one end of the engine, the other end of all the input random access memories is coupled to the patch processor, and the other end of the patch processor is coupled to one end of all the output random access memories, and the other end of each output random access memory. The other end of a protocol processing engine module is coupled to each other.
上述申请通过增设补丁模块,在协议处理引擎出错时,协议处理引擎挂起出错报文,并对继续对与出错报文地址不同的后续报文进行处理,而出错报文则通过补丁模块进行处理,从而保证在协议出错时,系统依然能够正常工作。The above application is added with a patch module. When the protocol processing engine is faulty, the protocol processing engine suspends the error message, and continues to process subsequent messages with different error message addresses, and the error packets are processed by the patch module. To ensure that the system will still work properly in the event of a protocol error.
附图说明DRAWINGS
图1是现有技术多处理系统一实施方式的结构示意图;1 is a schematic structural view of an embodiment of a prior art multiprocessing system;
图2是本申请补丁系统一实施方式的结构示意图;2 is a schematic structural diagram of an embodiment of a patch system of the present application;
图3是本申请补丁方法一实施方式的流程图;3 is a flowchart of an embodiment of a patching method of the present application;
图4是本申请节点控制器一实施方式的结构示意图;4 is a schematic structural diagram of an embodiment of a node controller according to the present application;
图5是本申请节点控制器另一实施方式的结构示意图。FIG. 5 is a schematic structural diagram of another embodiment of a node controller according to the present application.
具体实施方式detailed description
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本申请。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施方式中也可以实现本申请。在其它情况中,省略对众所周知的装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for purposes of illustration and description, reference However, it will be apparent to those skilled in the art that the present invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the application.
参阅图2,图2是本申请补丁系统一实施方式的结构示意图。本实施方式的补丁系统包括:补丁模块210以及节点控制器220。补丁模块210以及节点控制器220共同对多个CPU进行调度。其中,补丁模块210包括多个输入随机存取存储器(input ram)211、一个补丁处理器213以及多个输出随机存取存储器(output ram)215。节点控制器220包括多个协议处理引擎221以及多个协议表(P-Table)存储器223。而且,输入随机存取存储器211、输出随机存取存储器215、协议处理引擎221以及协议表存储器223的数量均相等。 Referring to FIG. 2, FIG. 2 is a schematic structural diagram of an embodiment of a patch system of the present application. The patch system of this embodiment includes a patch module 210 and a node controller 220. The patch module 210 and the node controller 220 collectively schedule a plurality of CPUs. The patch module 210 includes a plurality of input random access memories 211, a patch processor 213, and a plurality of output random access memories 215. The node controller 220 includes a plurality of protocol processing engines 221 and a plurality of protocol table (P-Table) memories 223. Moreover, the numbers of the input random access memory 211, the output random access memory 215, the protocol processing engine 221, and the protocol table memory 223 are all equal.
一个输入随机存取存储器211的一端耦接一个协议处理引擎221的一端,所有输入随机存取存储器211的另一端同时耦接补丁处理器213,补丁处理器213的另一端同时耦接所有输出随机存取存储器215的一端,每个输出随机存取存储器215的另一端分别耦接一个协议处理引擎221的另一端。每个协议处理引擎221分别耦接一个协议表存储器223。One end of the input random access memory 211 is coupled to one end of a protocol processing engine 221. The other end of the input random access memory 211 is coupled to the patch processor 213. The other end of the patch processor 213 is coupled to all the outputs at the same time. One end of the memory 215 is accessed, and the other end of each output random access memory 215 is coupled to the other end of a protocol processing engine 221. Each protocol processing engine 221 is coupled to a protocol table memory 223.
报文通过协议处理引擎221时,协议处理引擎221会根据报文的类型以及当前的目录状态读取协议表存储器223中的协议,其中,协议表存储器223中的协议包括报文的处理方式、对目录的操作方式、内部表项的操作方式以及协议是否出错等等。如果协议设计不完备,导致报文按照协议处理后,产生错误的结果,协议没法对报文进行处理,则协议出错,而没法处理的报文为出错报文。如果协议没有出错,根据协议对报文进行Cache一致性协议处理,从而获得新的报文及生成新的目录,并将新的报文加入流水线中。When the message is processed by the protocol processing engine 221, the protocol processing engine 221 reads the protocol in the protocol table memory 223 according to the type of the message and the current directory status. The protocol in the protocol table memory 223 includes the processing mode of the message. How the directory operates, how internal entries work, and whether the protocol is in error. If the protocol design is incomplete, the packet will be processed according to the protocol, and the result will be incorrect. If the protocol cannot process the packet, the protocol will be faulty, and the packet that cannot be processed is an error packet. If there is no error in the protocol, the packet is processed by the Cache coherence protocol according to the protocol, so that a new packet is obtained and a new directory is generated, and the new packet is added to the pipeline.
当某个协议处理引擎221执行Cache一致性协议出错时,协议处理引擎模块221挂起出错报文以暂停处理报文,包括当前出错报文以及与当前出错报文地址相同的报文,由于每个报文都有相应的地址,所以很容易查找到上述报文,其中地址是指CPU的系统地址,并将查表数据结构、目录信息以及出错报文发送给与该协议处理引擎221对应的输入随机存取存储器211。查表数据结构用于记录包括出错报文地址、是否处于冲突处理阶段、是否处于写操作阶段以及是否因为冲突而挂起出错报文在内的环境数据。目录信息包括指针和状态,所述指针用于指向处理器中的地址空间,所述状态用于指示指针所指向的处理器的地址空间的具体状态,所述具体状态包括独占、共享和无效。例如,在出错前,正在访问某个控制器的某个地址空间,则指针指向该地址空间,并且状态项记录了这个地址空间的具体状态。而查表数据结构则记录访问这个地址空间时的环境数据。When a protocol processing engine 221 performs a Cache coherency protocol error, the protocol processing engine module 221 suspends the error message to suspend processing of the message, including the current error message and the same message as the current error message address, since each Each message has a corresponding address, so it is easy to find the above message, where the address refers to the system address of the CPU, and the table data structure, the directory information, and the error message are sent to the protocol processing engine 221. The random access memory 211 is input. The lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict. The directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space. The lookup table data structure records the environmental data when accessing this address space.
在另一种实施方式中,查表数据结构包括多个字段定义,所述字段定义用于记录事务处理的状态,包括报文的地址、目标ID号、是否处于冲突处理阶段、是否处于写操作阶段、是否因为冲突而挂起报文。当需要运行补丁时,事务处理处于中间状态,补丁知道中间状态以实现正确完成报文处理。同时,补丁处理完毕后,也需要把中间状态输出,保存下来,以便下次同样事务处理时使用。当然,最终事务处理完毕后,数据结构是会被自动清除掉的。 In another embodiment, the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict. When a patch needs to be run, the transaction is in an intermediate state, and the patch knows the intermediate state to achieve correct packet processing. At the same time, after the patch is processed, the intermediate state output needs to be saved and saved for use in the next transaction. Of course, after the final transaction is completed, the data structure will be automatically cleared.
如果处理器有足够的信息处理报文,则不需继续查询协议表,如果处理器没有足够的信息处理报文,则需要继续查询协议表。如果还需要继续查询协议表,则可以将需要查询的协议表的标识连同查表数据结构、目录信息以及出错报文一起发送给与该协议处理引擎模块221对应的输入随机存取存储器211。协议处理引擎221继续处理与所述出错报文地址不同的后续报文,从而避免堵塞。If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent together with the lookup table data structure, the directory information, and the error message to the input random access memory 211 corresponding to the protocol processing engine module 221. The protocol processing engine 221 continues to process subsequent messages that are different from the error message address to avoid clogging.
补丁处理器213检测出输入随机存取存储器211中有需要处理的出错报文时,从输入随机存取存储器211中读取查表数据结构、目录信息以及出错报文,并根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎221对出错报文按照协议处理引擎原来的处理方式进行处理以生成新的报文及生成相应的新的目录信息。如果还需要继续查询协议表,则补丁处理器213根据需要查询的协议表的标识查询协议表,然后根据查表数据结构、目录信息以及出错报文执行补丁程序以生成新的报文及生成相应的新的目录信息。例如,补丁程序可根据查表数据结构恢复对上述地址空间的操作,又从目录信息中获得该地址空间的状态,所以,补丁程序可代替协议处理引擎221对出错报文按照协议处理引擎原来的处理方式进行处理,并生成新的报文,处理完毕后,该地址空间的状态可能发生改变,于是更新目录信息从而产生新的目录信息。When the patch processor 213 detects that there is an error message to be processed in the input random access memory 211, the table data structure, the directory information, and the error message are read from the input random access memory 211, and according to the table data structure. The directory information and the error message execution patch, wherein the patch is used to replace the protocol processing engine 221 to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information. . If the protocol table needs to be further queried, the patch processor 213 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet. New directory information. For example, the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine. The processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
补丁处理器213将新的报文及新的目录信息写入到与该协议处理引擎模块221对应的输出随机存取存储器215,其中,每个协议处理引擎模块221与一个输出随机存取存储器215连接。该协议处理引擎221监测到对应的输出随机存取存储器215中有新的报文后,将新的报文插入到流水线后半段。其中,报文顺序排列构成了流水线。The patch processor 213 writes the new message and the new directory information to the output random access memory 215 corresponding to the protocol processing engine module 221, wherein each protocol processing engine module 221 and one output random access memory 215 connection. After the protocol processing engine 221 detects that there is a new message in the corresponding output random access memory 215, the new message is inserted into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
如果其它的协议处理引擎221在执行协议时出错了,同样按照上述的方法进行处理。If the other protocol processing engine 221 fails in executing the protocol, it is also processed in accordance with the above method.
在本实施方式中,每个协议处理引擎221分别对应设置一个输入随机存取存储器211以及一个输出随机存取存储器215,并且,多个输入随机存取存储器211以及多个输出随机存取存储器215共用一个补丁处理器213。在协议处理引擎221在执行协议时出错率比较低的情况下,也可以多个协议处理引擎221共用一个输入随机存取存储器211、一个输出随机存取存储器215以及一 个补丁处理器213。相反地,如果在要求比较高的情况下,也可以一个协议处理引擎221使用一个输入随机存取存储器211、一个输出随机存取存储器215以及一个补丁处理器213。In the present embodiment, each protocol processing engine 221 is respectively provided with an input random access memory 211 and an output random access memory 215, and a plurality of input random access memories 211 and a plurality of output random access memories 215. A patch processor 213 is shared. In the case where the error rate of the protocol processing engine 221 is relatively low when the protocol is executed, the plurality of protocol processing engines 221 may share one input random access memory 211, one output random access memory 215, and one. Patch processor 213. Conversely, if the requirements are relatively high, a protocol processing engine 221 can also use an input random access memory 211, an output random access memory 215, and a patch processor 213.
参阅图3,图3是本申请补丁方法一实施方式的流程图。本实施方式的补丁方法增包括如下步骤:Referring to FIG. 3, FIG. 3 is a flowchart of an embodiment of a patching method of the present application. The patch method of this embodiment includes the following steps:
S301:如果节点控制器内的协议处理引擎执行Cache一致性协议出错,挂起出错报文及继续处理与出错报文地址不同的后续报文,则增设的补丁接收节点控制器所发送的查表数据结构、目录信息以及出错报文。S301: If the protocol processing engine in the node controller performs a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the added patch receives the lookup table sent by the node controller. Data structure, directory information, and error messages.
当节点控制器内的某个协议处理引擎模块执行Cache一致性协议出错时,协议处理引擎模块挂起出错报文以暂停处理报文,包括当前出错报文以及与当前出错报文地址相同的报文,由于每个报文都有相应的地址,所以很容易查找到上述报文,其中地址是指CPU的系统地址,并将查表数据结构、目录信息以及出错报文发送给补丁。查表数据结构用于记录包括出错报文地址、是否处于冲突处理阶段、是否处于写操作阶段以及是否因为冲突而挂起出错报文在内的环境数据。目录信息包括指针和状态,所述指针用于指向处理器中的地址空间,所述状态用于指示指针所指向的处理器的地址空间的具体状态,所述具体状态包括独占、共享和无效。例如,在出错前,正在访问某个控制器的某个地址空间,则指针指向该地址空间,并且状态项记录了这个地址空间的具体状态。而查表数据结构则记录访问这个地址空间时的环境数据。When a protocol processing engine module in the node controller performs a Cache coherency protocol error, the protocol processing engine module suspends the error message to suspend processing of the message, including the current error message and the same message as the current error message address. Therefore, since each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and the data structure, directory information, and error message of the lookup table are sent to the patch. The lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict. The directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space. The lookup table data structure records the environmental data when accessing this address space.
在另一种实施方式中,查表数据结构包括多个字段定义,所述字段定义用于记录事务处理的状态,包括报文的地址、目标ID号、是否处于冲突处理阶段、是否处于写操作阶段、是否因为冲突而挂起报文。当需要运行补丁时,事务处理处于中间状态,补丁知道中间状态以实现正确完成报文处理。同时,补丁处理完毕后,也需要把中间状态输出,保存下来,以便下次同样事务处理时使用。当然,最终事务处理完毕后,数据结构会被自动清除掉。In another embodiment, the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict. When a patch needs to be run, the transaction is in an intermediate state, and the patch knows the intermediate state to achieve correct packet processing. At the same time, after the patch is processed, the intermediate state output needs to be saved and saved for use in the next transaction. Of course, after the final transaction is completed, the data structure is automatically cleared.
如果处理器有足够的信息处理报文,则不需继续查询协议表,如果处理器没有足够的信息处理报文,则需要继续查询协议表。如果还需要继续查询协议表,则可以将需要查询的协议表的标识连同查表数据结构、目录信息以及出错报文一起发送给补丁模块。协议处理引擎继续处理与所述出错报文地址不同的后续报文,从而避免堵塞。 If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If you need to continue to query the protocol table, you can send the identifier of the protocol table to be queried together with the table data structure, directory information, and error packets to the patch module. The protocol processing engine continues to process subsequent messages that are different from the error message address to avoid clogging.
增设的补丁相应接收节点控制器所发送的查表数据结构、目录信息以及出错报文。如果需要继续查询协议表,则补丁模块相应接收查表数据结构、目录信息、出错报文以及需要查询的协议表的标识。The added patch corresponds to the lookup table data structure, directory information, and error packets sent by the node controller. If the protocol table needs to be queried, the patch module receives the data structure of the lookup table, the directory information, the error packet, and the identifier of the protocol table to be queried.
S302:增设的补丁模块根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对出错报文进行处理以生成新的报文及生成相应的新的目录信息。S302: The added patch module executes a patch according to the lookup table data structure, the directory information, and the error message, where the patch is used to replace the protocol processing engine to process an error message to generate New messages and corresponding new directory information.
增设的补丁模块检测出有需要进行处理的出错报文时,输入查表数据结构、目录信息以及出错报文,并根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎对出错报文按照协议处理引擎原来的处理方式进行处理以生成新的报文及生成相应的新的目录信息。例如,补丁程序可根据查表数据结构恢复对上述地址空间的操作,又从目录信息中获得该地址空间的状态,所以,补丁程序可代替协议处理引擎221对出错报文按照协议处理引擎原来的处理方式进行处理,并生成新的报文,处理完毕后,该地址空间的状态可能发生改变,于是更新目录信息从而产生新的目录信息。When the added patch module detects an error message that needs to be processed, the data structure, the directory information, and the error message are input, and the patch is executed according to the data structure of the table, the directory information, and the error message. It is used to replace the protocol processing engine to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information. For example, the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine. The processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
如果还需要继续查询协议表,则增设的补丁根据需要查询的协议表的标识查询协议表,然后根据查表数据结构、目录信息以及出错报文执行补丁程序以生成新的报文及生成相应的新的目录信息。增设的补丁将新的报文及新的目录信息插入到流水线后半段。其中,报文顺序排列构成了流水线。If the protocol table needs to be further queried, the added patch queries the protocol table according to the identifier of the protocol table to be queried, and then executes a patch according to the data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet. New directory information. The added patch inserts new messages and new directory information into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
本实施方式中所述的方法中的所有内容可以应用于上述的补丁系统及下述的节点控制器中。All of the methods described in this embodiment can be applied to the patch system described above and the node controller described below.
参阅图4,图4是本申请节点控制器一实施方式的结构示意图。本实施方式的节点控制器包括补丁模块,所述补丁模块耦接节点控制器内的协议处理引擎模块,所述补丁模块包括:接收单元410以及执行单元420。Referring to FIG. 4, FIG. 4 is a schematic structural diagram of an embodiment of a node controller according to the present application. The node controller of the present embodiment includes a patch module, and the patch module is coupled to a protocol processing engine module in the node controller, where the patch module includes: a receiving unit 410 and an executing unit 420.
接收单元410用于在节点控制器内的协议处理引擎模块执行Cache一致性协议出错,挂起出错报文及继续处理与出错报文地址不同的后续报文时,接收节点控制器所发送的查表数据结构、目录信息以及出错报文。比如,当节点控制器内的某个协议处理引擎模块执行Cache一致性协议出错时,协议处理引擎模块挂起出错报文以暂停处理报文,包括当前出错报文以及与当前出错报文地址相同的报文,由于每个报文都有相应的地址,所以很容易查找到上述报文,其中地址是指CPU的系统地址,并将查表数据结构、目录信息以及出错报文 发送给接收单元410。查表数据结构用于记录包括出错报文地址、是否处于冲突处理阶段、是否处于写操作阶段以及是否因为冲突而挂起出错报文在内的环境数据。目录信息包括指针和状态,所述指针用于指向处理器中的地址空间,所述状态用于指示指针所指向的处理器的地址空间的具体状态,所述具体状态包括独占、共享和无效。例如,在出错前,正在访问某个控制器的某个地址空间,则指针指向该地址空间,并且状态项记录了这个地址空间的具体状态。而查表数据结构则记录访问这个地址空间时的环境数据。The receiving unit 410 is configured to: when the protocol processing engine module in the node controller executes a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the receiving node controller sends the check Table data structure, directory information, and error messages. For example, when a protocol processing engine module in the node controller executes the Cache coherency protocol error, the protocol processing engine module suspends the error message to suspend the processing of the message, including the current error message and the same as the current error message address. The message, because each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and will look up the data structure, directory information and error messages. Send to the receiving unit 410. The lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict. The directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the specific state of the address space. The lookup table data structure records the environmental data when accessing this address space.
在另一种实施方式中,查表数据结构包括多个字段定义,所述字段定义用于记录事务处理的状态,包括报文的地址、目标ID号、是否处于冲突处理阶段、是否处于写操作阶段、是否因为冲突而挂起报文。当需要运行补丁时,事务处理处于中间状态,补丁知道中间状态以实现正确完成报文处理。同时,补丁处理完毕后,也需要把中间状态输出,保存下来,以便下次同样事务处理时使用。当然,最终事务处理完毕后,数据结构会被自动清除掉。In another embodiment, the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict. When a patch needs to be run, the transaction is in an intermediate state, and the patch knows the intermediate state to achieve correct packet processing. At the same time, after the patch is processed, the intermediate state output needs to be saved and saved for use in the next transaction. Of course, after the final transaction is completed, the data structure is automatically cleared.
如果处理器有足够的信息处理报文,则不需继续查询协议表,如果处理器没有足够的信息处理报文,则需要继续查询协议表。如果还需要继续查询协议表,则可以将需要查询的协议表的标识连同查表数据结构、目录信息以及出错报文一起发送给接收单元410。协议处理引擎模块继续处理与所述出错报文地址不同的后续报文,从而避免堵塞。If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent to the receiving unit 410 together with the lookup table data structure, the directory information, and the error message. The protocol processing engine module continues to process subsequent messages that are different from the error message address, thereby avoiding congestion.
接收单元410相应接收节点控制器所发送的查表数据结构、目录信息以及出错报文。如果需要继续查询协议表,则接收单元410相应接收查表数据结构、目录信息、出错报文以及需要查询的协议表的标识。The receiving unit 410 correspondingly receives the table lookup data structure, the directory information, and the error message sent by the node controller. If it is necessary to continue to query the protocol table, the receiving unit 410 correspondingly receives the table data structure, the directory information, the error message, and the identifier of the protocol table that needs to be queried.
接收单元410将查表数据结构、目录信息以及出错报文向执行单元发送420。The receiving unit 410 transmits the lookup table data structure, the directory information, and the error message to the execution unit 420.
执行单元420用于接收所述查表数据结构、目录信息以及出错报文,根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎模块对出错报文按照协议处理引擎原来的处理方式进行处理以生成新的报文及生成相应的新的目录信息。The executing unit 420 is configured to receive the lookup table data structure, the directory information, and the error message, and execute the patch according to the table data structure, the directory information, and the error message, where the patch is used instead of the protocol processing engine module to report the error. The text is processed according to the original processing method of the protocol processing engine to generate a new message and generate corresponding new directory information.
执行单元420检测出有需要进行处理的出错报文时,接收查表数据结构、目录信息以及出错报文,并根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎模块对出错报文进行处理以生 成新的报文及生成相应的新的目录信息。如果还需要继续查询协议表,则执行单元420根据需要查询的协议表的标识查询协议表,然后根据查表数据结构、目录信息以及出错报文执行补丁程序以生成新的报文及生成相应的新的目录信息。例如,补丁程序可根据查表数据结构恢复对上述地址空间的操作,又从目录信息中获得该地址空间的状态,所以,补丁程序可代替协议处理引擎221对出错报文按照协议处理引擎原来的处理方式进行处理,并生成新的报文,处理完毕后,该地址空间的状态可能发生改变,于是更新目录信息从而产生新的目录信息。When the execution unit 420 detects an error message that needs to be processed, it receives the table data structure, the directory information, and the error message, and executes the patch according to the table data structure, the directory information, and the error message, wherein the patch is used. Processing the error message instead of the protocol processing engine module to generate Create new messages and generate corresponding new directory information. If it is still necessary to continue to query the protocol table, the executing unit 420 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the table data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet. New directory information. For example, the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine. The processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information.
执行单元420将新的报文及新的目录信息插入到流水线后半段。其中,报文顺序排列构成了流水线。The execution unit 420 inserts the new message and the new directory information into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
参阅图5,图5是本申请节点控制器另一实施方式的结构示意图。本实施方式的协议处理引擎容错处理装置包括补丁模块,补丁模块包括输入随机存取存储器510、补丁处理器520以及输出随机存取存储器530。输入随机存取存储器510的一端用于耦接节点控制器内的协议处理引擎模块,输入随机存取存储器510的另一端耦接补丁处理器520的一端,补丁处理器520的另一端耦接输出随机存取存储器530的一端,输出随机存取存储器530的另一端用于耦接节点控制器内的协议处理引擎模块。Referring to FIG. 5, FIG. 5 is a schematic structural diagram of another embodiment of a node controller according to the present application. The protocol processing engine fault tolerance processing apparatus of the present embodiment includes a patch module including an input random access memory 510, a patch processor 520, and an output random access memory 530. One end of the input random access memory 510 is coupled to the protocol processing engine module in the node controller, the other end of the input random access memory 510 is coupled to one end of the patch processor 520, and the other end of the patch processor 520 is coupled to the output. At one end of the random access memory 530, the other end of the output random access memory 530 is coupled to a protocol processing engine module within the node controller.
输入随机存取存储器510用于在节点控制器内的协议处理引擎模块执行Cache一致性协议出错,挂起出错报文及继续处理与出错报文地址不同的后续报文时,接收节点控制器所发送的查表数据结构、目录信息以及出错报文。比如,当节点控制器内的某个协议处理引擎执行Cache一致性协议出错时,协议处理引擎挂起出错报文以暂停处理报文,包括当前出错报文以及与当前出错报文地址相同的报文,由于每个报文都有相应的地址,所以很容易查找到上述报文,其中地址是指CPU的系统地址,并将查表数据结构、目录信息以及出错报文发送给输入随机存取存储器510。查表数据结构用于记录包括出错报文地址、是否处于冲突处理阶段、是否处于写操作阶段以及是否因为冲突而挂起出错报文在内的环境数据。目录信息包括指针和状态,所述指针用于指向处理器中的地址空间,所述状态用于指示指针所指向的处理器的地址空间的具体状态,所述具体状态包括独占、共享和无效。例如,在出错前,正在访问某个控制器的某个地址空间,则指针指向该地址空间,并且状态项记录了这个地址空 间的具体状态。而查表数据结构则记录访问这个地址空间时的环境数据。The input random access memory 510 is configured to: when the protocol processing engine module in the node controller executes a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the receiving node controller The data structure of the table lookup, the directory information, and the error message. For example, when a protocol processing engine in the node controller executes the Cache coherency protocol error, the protocol processing engine suspends the error message to suspend the processing of the message, including the current error message and the same message as the current error message address. Since each message has a corresponding address, it is easy to find the above message, where the address refers to the system address of the CPU, and the data structure, directory information, and error message of the lookup table are sent to the input random access. Memory 510. The lookup table data structure is used to record environmental data including the error message address, whether it is in the conflict processing stage, whether it is in the write operation phase, and whether the error message is suspended due to the conflict. The directory information includes pointers and states for pointing to an address space in the processor for indicating the specific state of the address space of the processor to which the pointer points, including exclusive, shared, and invalid. For example, before an error occurs, an address space of a controller is being accessed, the pointer points to the address space, and the status item records the address empty. The specific state between. The lookup table data structure records the environmental data when accessing this address space.
在另一种实施方式中,查表数据结构包括多个字段定义,所述字段定义用于记录事务处理的状态,包括报文的地址、目标ID号、是否处于冲突处理阶段、是否处于写操作阶段、是否因为冲突而挂起报文。当需要运行补丁时,事务处理处于中间状态,补丁知道中间状态以实现正确完成报文处理。同时,补丁处理完毕后,也需要把中间状态输出,保存下来,以便下次同样事务处理时使用。当然,最终事务处理完毕后,数据结构会被自动清除掉。In another embodiment, the lookup table data structure includes a plurality of field definitions, the field definitions for recording the status of the transaction, including the address of the message, the target ID number, whether it is in the conflict processing stage, and whether it is in a write operation. Stage, whether to suspend the message because of the conflict. When a patch needs to be run, the transaction is in an intermediate state, and the patch knows the intermediate state to achieve correct packet processing. At the same time, after the patch is processed, the intermediate state output needs to be saved and saved for use in the next transaction. Of course, after the final transaction is completed, the data structure is automatically cleared.
如果处理器有足够的信息处理报文,则不需继续查询协议表,如果处理器没有足够的信息处理报文,则需要继续查询协议表。如果还需要继续查询协议表,则可以将需要查询的协议表的标识连同查表数据结构、目录信息以及出错报文一起发送给输入随机存取存储器510。协议处理引擎继续处理与所述出错报文地址不同的后续报文,从而避免堵塞。If the processor has enough information to process the message, it does not need to continue to query the protocol table. If the processor does not have enough information to process the message, it needs to continue to query the protocol table. If it is still necessary to continue to query the protocol table, the identifier of the protocol table that needs to be queried may be sent to the input random access memory 510 together with the lookup table data structure, the directory information, and the error message. The protocol processing engine continues to process subsequent messages that are different from the error message address to avoid clogging.
输入随机存取存储器510相应接收节点控制器所发送的查表数据结构、目录信息以及出错报文。如果需要继续查询协议表,则输入随机存取存储器510相应接收查表数据结构、目录信息、出错报文以及需要查询的协议表的标识。The input random access memory 510 correspondingly receives the lookup table data structure, the directory information, and the error message sent by the node controller. If it is necessary to continue to query the protocol table, the input random access memory 510 correspondingly receives the table data structure, the directory information, the error message, and the identifier of the protocol table to be queried.
补丁处理器520用于根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎对出错报文进行处理以生成新的报文及生成相应的新的目录信息。比如,补丁处理器520检测出输入随机存取存储器510有需要进行处理的出错报文时,接收查表数据结构、目录信息以及出错报文,并根据查表数据结构、目录信息以及出错报文执行补丁程序,其中,补丁程序用于代替协议处理引擎对出错报文按照协议处理引擎原来的处理方式进行处理以生成新的报文及生成相应的新的目录信息。如果还需要继续查询协议表,则补丁处理器520根据需要查询的协议表的标识查询协议表,然后根据查表数据结构、目录信息以及出错报文执行补丁程序以生成新的报文及生成相应的新的目录信息。例如,补丁程序可根据查表数据结构恢复对上述地址空间的操作,又从目录信息中获得该地址空间的状态,所以,补丁程序可代替协议处理引擎221对出错报文按照协议处理引擎原来的处理方式进行处理,并生成新的报文,处理完毕后,该地址空间的状态可能发生改变,于是更新目录信息从而产生新的目录信息。补丁处理器520将暂存于补丁处理器520中的新的报文及新的目录信息插入到流水线后半段。其中,报文顺序排列构成了流水线。 The patch processor 520 is configured to execute a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used to replace the protocol processing engine to process the error message to generate a new message and generate a corresponding new one. Directory information. For example, when the patch processor 520 detects that the input random access memory 510 has an error message that needs to be processed, it receives the table data structure, the directory information, and the error message, and according to the table data structure, the directory information, and the error message. The patch is executed, wherein the patch is used to replace the protocol processing engine to process the error message according to the original processing mode of the protocol processing engine to generate a new message and generate corresponding new directory information. If it is still necessary to continue to query the protocol table, the patch processor 520 queries the protocol table according to the identifier of the protocol table that needs to be queried, and then executes a patch according to the table data structure, the directory information, and the error message to generate a new packet and generate a corresponding packet. New directory information. For example, the patch can restore the operation of the address space according to the lookup table data structure, and obtain the state of the address space from the directory information. Therefore, the patch can replace the protocol processing engine 221 to process the original error of the error message according to the protocol processing engine. The processing mode is processed, and a new packet is generated. After the processing is completed, the state of the address space may be changed, and the directory information is updated to generate new directory information. The patch processor 520 inserts the new message and the new directory information temporarily stored in the patch processor 520 into the second half of the pipeline. Among them, the order of the messages constitutes a pipeline.
输出随机存取存储器730用于暂存补丁处理器720输出的新的报文及新的目录信息。The output random access memory 730 is used to temporarily store the new message output by the patch processor 720 and the new directory information.
本申请还提供了一种协议处理引擎容错处理系统,包括补丁模块,补丁模块耦接节点控制器内的协议处理引擎模块。在一种具体的实施方式可参阅图2及相关的描述,此处不重复一一赘述。The application also provides a protocol processing engine fault-tolerant processing system, including a patch module, and the patch module is coupled to a protocol processing engine module in the node controller. For a specific implementation, refer to FIG. 2 and related descriptions, and details are not repeated herein.
上述申请通过增设补丁模块,在协议处理引擎出错时,协议处理引擎挂起出错报文,并对继续对与出错报文地址不同的后续报文进行处理,而出错报文则通过补丁模块进行处理,从而保证在协议出错时,系统依然能够正常工作。The above application is added with a patch module. When the protocol processing engine is faulty, the protocol processing engine suspends the error message, and continues to process subsequent messages with different error message addresses, and the error packets are processed by the patch module. To ensure that the system will still work properly in the event of a protocol error.
在本申请所提供的几个实施方式中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device implementations described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
另外,在本申请各个实施方式中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式所 述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。 The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. Including a number of instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform various embodiments of the present application. All or part of the steps of the method. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Claims (11)

  1. 一种补丁方法,其特征在于,如果节点控制器内的协议处理引擎执行Cache一致性协议出错,挂起出错报文及继续处理与所述出错报文地址不同的后续报文,则增设的补丁接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文;A patching method is characterized in that if a protocol processing engine in a node controller performs a Cache coherency protocol error, suspends an error message, and continues to process subsequent packets different from the error packet address, the added patch Receiving a table lookup data structure, directory information, and the error message sent by the node controller;
    所述增设的补丁模块根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息。The added patch module executes a patch according to the lookup table data structure, the directory information, and the error message, wherein the patch is used to process the error message instead of the protocol processing engine. To generate new messages and generate corresponding new directory information.
  2. 根据权利要求1所述的方法,其特征在于,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。The method according to claim 1, wherein the error message comprises a current error message and a message having the same address as the current error message.
  3. 根据权利要求1所述的方法,其特征在于,所述增设的补丁模块根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序的步骤之前进一步包括:所述增设的补丁模块接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要查询的协议表的标识。The method according to claim 1, wherein the step of the patch module executing the patch according to the table lookup data structure, the directory information, and the error message further comprises: adding the added The patch module receives the lookup table data structure, the directory information, the error message, and the identifier of the protocol table that needs to be queried sent by the node controller.
  4. 一种节点控制器,其特征在于,包括补丁模块,所述补丁模块耦接节点控制器内的协议处理引擎模块,所述补丁模块包括:接收单元以及执行单元,A node controller, comprising: a patch module, wherein the patch module is coupled to a protocol processing engine module in a node controller, where the patch module comprises: a receiving unit and an executing unit,
    所述接收单元用于在节点控制器内的协议处理引擎模块执行Cache一致性协议出错,挂起出错报文及继续处理与所述出错报文地址不同的后续报文时,接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文,所述接收单元将所述查表数据结构、所述目录信息以及所述出错报文向所述执行单元发送;The receiving unit is configured to: when the protocol processing engine module in the node controller executes a Cache coherency protocol error, suspends the error message, and continues to process subsequent messages different from the error message address, the receiving node controller The receiving table data structure, the directory information, and the error message, the receiving unit sends the table lookup data structure, the directory information, and the error message to the execution unit;
    所述执行单元用于接收所述查表数据结构、所述目录信息以及所述出错报文,根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎模块对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息。The execution unit is configured to receive the lookup table data structure, the directory information, and the error message, and execute a patch according to the table lookup data structure, the directory information, and the error message, where The patch is used to process the error message in place of the protocol processing engine module to generate a new message and generate corresponding new directory information.
  5. 根据权利4所述的节点控制器,其特征在于,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。The node controller according to claim 4, wherein the error message comprises a current error message and a message having the same address as the current error message.
  6. 根据权利4所述的节点控制器,其特征在于,所述接收单元还用于接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要 查询的协议表的标识。The node controller according to claim 4, wherein the receiving unit is further configured to receive a table lookup data structure, directory information, the error message, and a request sent by the node controller. The identifier of the protocol table to be queried.
  7. 一种补丁系统,其特征在于,包括补丁模块以及节点控制器,所述补丁模块耦接节点控制器内的协议处理引擎,A patch system, comprising: a patch module and a node controller, wherein the patch module is coupled to a protocol processing engine in the node controller,
    所述节点控制器用于在协议处理引擎执行Cache一致性协议出错时,挂起出错报文及继续处理与所述出错报文地址不同的后续报文;The node controller is configured to suspend an error message and continue to process subsequent messages different from the error message address when the protocol processing engine performs a Cache coherency protocol error;
    所述补丁模块用于接收节点控制器所发送的查表数据结构、目录信息以及所述出错报文,根据所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息。The patch module is configured to receive a table lookup data structure, directory information, and the error packet sent by the node controller, and execute a patch according to the lookup table data structure, the directory information, and the error packet, where The patch is configured to process the error message in place of the protocol processing engine to generate a new message and generate corresponding new directory information.
  8. 根据权利7所述的系统,其特征在于,所述出错报文包括当前出错报文及与当前出错报文具有相同地址的报文。The system of claim 7, wherein the error message comprises a current error message and a message having the same address as the current error message.
  9. 根据权利7所述的系统,其特征在于,所述补丁模块还用于接收节点控制器所发送的查表数据结构、目录信息、所述出错报文以及需要查询的协议表的标识。The system according to claim 7, wherein the patch module is further configured to receive a table lookup data structure, directory information, the error message, and an identifier of a protocol table that needs to be queried sent by the node controller.
  10. 根据权利7所述的系统,其特征在于,所述补丁模块包括输入随机存取存储器、补丁处理器以及输出随机存取存储器,其中,所述输入随机存取存储器一端耦接节点控制器内的协议处理引擎的一端,另一端耦接所述补丁处理器的一端,所述补丁处理器的另一端耦接所述输出随机存取存储器的一端,所述输出随机存取存储器的另一端耦接节点控制器内的协议处理引擎的另一端,The system of claim 7, wherein the patch module comprises an input random access memory, a patch processor, and an output random access memory, wherein one end of the input random access memory is coupled to the node controller One end of the protocol processing engine is coupled to one end of the patch processor, and the other end of the patch processor is coupled to one end of the output random access memory, and the other end of the output random access memory is coupled The other end of the protocol processing engine within the node controller,
    所述输入随机存取存储器用于存储节点控制器所发送的查表数据结构、目录信息以及所述出错报文;The input random access memory is configured to store a table lookup data structure, directory information, and the error message sent by the node controller;
    所述补丁处理器用于所述查表数据结构、所述目录信息以及所述出错报文执行补丁程序,其中,所述补丁程序用于代替所述协议处理引擎对所述出错报文进行处理以生成新的报文及生成相应的新的目录信息;The patch processor is configured to use the lookup table data structure, the directory information, and the error message execution patch, where the patch is used to replace the error processing message by the protocol processing engine to process Generate new messages and generate corresponding new directory information;
    所述输出随机存取存储器用于存储新的报文及新的目录信息。The output random access memory is used to store new messages and new directory information.
  11. 根据权利10所述的系统,其特征在于,所述输入随机存取存储器以及所述输出随机存取存储器的数目都与所述协议处理引擎的数量相同,其中,一个所述输入随机存取存储器的一端分别耦接一个协议处理引擎的一 端,所有输入随机存取存储器的另一端同时耦接补丁处理器,所述补丁处理器的另一端同时耦接所有输出随机存取存储器的一端,每个输出随机存取存储器的另一端分别耦接一个协议处理引擎模块的另一端。 The system according to claim 10, wherein the number of the input random access memory and the output random access memory are the same as the number of the protocol processing engines, wherein one of the input random access memories One end of each is coupled to a protocol processing engine The other end of the input random access memory is coupled to the patch processor. The other end of the patch processor is coupled to one end of all the output random access memories, and the other end of each output random access memory is coupled. Connect the other end of the protocol processing engine module.
PCT/CN2014/086042 2013-09-16 2014-09-05 Patching method, device, and system WO2015035891A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310421697.5 2013-09-16
CN201310421697.5A CN103488505B (en) 2013-09-16 2013-09-16 Patch method, equipment and system

Publications (1)

Publication Number Publication Date
WO2015035891A1 true WO2015035891A1 (en) 2015-03-19

Family

ID=49828762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086042 WO2015035891A1 (en) 2013-09-16 2014-09-05 Patching method, device, and system

Country Status (2)

Country Link
CN (1) CN103488505B (en)
WO (1) WO2015035891A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488505B (en) * 2013-09-16 2016-03-30 杭州华为数字技术有限公司 Patch method, equipment and system
CN105162725B (en) * 2015-09-22 2018-06-05 浪潮(北京)电子信息产业有限公司 A kind of method and device pre-processed to protocol processes assembly line message address
CN114125915B (en) * 2022-01-26 2022-04-12 舟谱数据技术南京有限公司 Positioning thermal repair system and method for setting terminal APP

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093464A (en) * 2006-06-19 2007-12-26 国际商业机器公司 High speed caching coherence method and smp system
US20090006769A1 (en) * 2007-06-26 2009-01-01 International Business Machines Corporation Programmable partitioning for high-performance coherence domains in a multiprocessor system
CN103294611A (en) * 2013-03-22 2013-09-11 浪潮电子信息产业股份有限公司 Server node data cache method based on limited data consistency state
CN103488505A (en) * 2013-09-16 2014-01-01 杭州华为数字技术有限公司 Patching method, device and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6892319B2 (en) * 2000-09-08 2005-05-10 Hewlett-Packard Development Company, L.P. Method for verifying abstract memory models of shared memory multiprocessors
US7290085B2 (en) * 2004-11-16 2007-10-30 International Business Machines Corporation Method and system for flexible and efficient protocol table implementation
CN102063406B (en) * 2010-12-21 2012-07-25 清华大学 Network shared Cache for multi-core processor and directory control method thereof
CN102103568B (en) * 2011-01-30 2012-10-10 中国科学院计算技术研究所 Method for realizing cache coherence protocol of chip multiprocessor (CMP) system
CN102880537A (en) * 2012-09-07 2013-01-16 浪潮电子信息产业股份有限公司 Software simulation verification method based on Cache coherence protocol

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093464A (en) * 2006-06-19 2007-12-26 国际商业机器公司 High speed caching coherence method and smp system
US20090006769A1 (en) * 2007-06-26 2009-01-01 International Business Machines Corporation Programmable partitioning for high-performance coherence domains in a multiprocessor system
CN103294611A (en) * 2013-03-22 2013-09-11 浪潮电子信息产业股份有限公司 Server node data cache method based on limited data consistency state
CN103488505A (en) * 2013-09-16 2014-01-01 杭州华为数字技术有限公司 Patching method, device and system

Also Published As

Publication number Publication date
CN103488505B (en) 2016-03-30
CN103488505A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
WO2019127916A1 (en) Data read/write method and device implemented on the basis of distributed consensus protocol
JP6685323B2 (en) Method, device and system for accessing extended memory
TWI516933B (en) Apparatus and method for memory mirroring and migration at home agent and computer-readable medium
US11550819B2 (en) Synchronization cache seeding
CN106843749B (en) Write request processing method, device and equipment
JP6067230B2 (en) High performance data storage using observable client side memory access
CN103458036B (en) Access device and method of cluster file system
CN110673941B (en) Migration method of micro-services in multiple computer rooms, electronic equipment and storage medium
CN107257957B (en) Application cache replication to secondary applications
US11070979B2 (en) Constructing a scalable storage device, and scaled storage device
US20170168756A1 (en) Storage transactions
JP6514329B2 (en) Memory access method, switch, and multiprocessor system
WO2018054079A1 (en) Method for storing file, first virtual machine and namenode
CN110119304B (en) Interrupt processing method and device and server
US20220179792A1 (en) Memory management device
CN105095254A (en) Method and apparatus for achieving data consistency
CN106855834B (en) Data backup method, device and system
CN105359122B (en) enhanced data transmission in multi-CPU system
WO2015035891A1 (en) Patching method, device, and system
WO2016101759A1 (en) Data routing method, data management device and distributed storage system
CN112052230A (en) Multi-machine room data synchronization method, computing equipment and storage medium
US10324646B2 (en) Node controller and method for responding to request based on node controller
US20230385220A1 (en) Multi-port memory link expander to share data among hosts
CN110309224A (en) A kind of data copy method and device
JPWO2007088582A1 (en) Asynchronous remote procedure call method, asynchronous remote procedure call program and recording medium in shared memory multiprocessor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14843584

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14843584

Country of ref document: EP

Kind code of ref document: A1