US6553512B1 - Method and apparatus for resolving CPU deadlocks - Google Patents

Method and apparatus for resolving CPU deadlocks Download PDF

Info

Publication number
US6553512B1
US6553512B1 US09/505,978 US50597800A US6553512B1 US 6553512 B1 US6553512 B1 US 6553512B1 US 50597800 A US50597800 A US 50597800A US 6553512 B1 US6553512 B1 US 6553512B1
Authority
US
United States
Prior art keywords
cpu
deadlock
mca
bus error
abort
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/505,978
Inventor
James Douglas Gibson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US09/505,978 priority Critical patent/US6553512B1/en
Assigned to HEWLETT-PACKARD COMPANY reassignment HEWLETT-PACKARD COMPANY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GIBSON, J. DOUGLAS
Priority to DE10056828A priority patent/DE10056828B4/en
Application granted granted Critical
Publication of US6553512B1 publication Critical patent/US6553512B1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD COMPANY
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Definitions

  • the technical field relates generally to digital computer systems and more particularly, but not by way of limitation, to systems for detecting errors within the instructions processed in such computer systems.
  • a central processing unit may stop making forward progress for various reasons. For example, a CPU deadlock may occur when the code makes a memory reference to a non-existing memory. In some systems, the memory controllers will not respond to such an erroneous memory reference, causing the system to deadlock, waiting for data to return from a memory that does not exist. When a CPU deadlock occurs, there must be some mechanism for releasing the CPU from this deadlocked state.
  • triggering a bus error substantially impacts the system by requiring the system to be restarted.
  • triggering a bus error requires resetting the memory controllers. Triggering a bus error is expensive in terms of time and software required to fix the problem.
  • a bus may have multiple CPUs, in which case all of them usually must be reset upon the triggering of a bus error.
  • What is needed is method and an apparatus to resolve the CPU deadlock without triggering a bus error, if possible.
  • a method for handling errors that deadlock a CPU by first attempting to resolve the deadlock without issuing a bus error and without restarting the computer. If the deadlock cannot be resolved without issuing a bus error, then a bus error is issued and the computer attempts to restart itself.
  • the method involves comparing the number of clock cycles taken to execute an instruction to a designated abort value. When the instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock.
  • MCA machine-check abort
  • the method also involves comparing the number of clock cycles to a larger bus error value. If the MCA does not break the deadlock within a certain period—i.e., before the bus error value is reached—then a bus error is issued and the computer attempts to reset.
  • a computer system includes a CPU, a counter, and a software programmable register.
  • the counter determines the number of clock cycles consumed during the execution of an instruction and stores that number in the register.
  • the number of clock cycles taken is compared to execute an instruction to a designated abort value.
  • MCA machine-check abort
  • the number of clock cycles is also compared to a larger bus error value. If the MCA does not break the deadlock within a certain period—i.e., before the bus error value is reached—then a bus error is issued and the CPU attempts to reset itself.
  • FIG. 1 is a flow chart showing a method for resolving CPU deadlocks.
  • FIG. 2 is a more detailed flow chart of FIG. 1 .
  • FIG. 3 is a block diagram of the computer system capable of resolving CPU deadlocks.
  • a means is provided for handling CPU deadlocks without causing a bus error and without resetting the computer system, when possible.
  • Many CPU deadlocks are caused by memory bus errors and cannot be resolved without resetting the system.
  • Some CPU deadlocks may be caused by errors other than memory bus errors. This is particularly true in mixed-architecture CPUs (i.e. those architectures capable of processing more than one type of instruction set) such as the IA-64 architecture.
  • a deadlock may be caused during hardware emulation of the IA-32 architecture on the CPU.
  • the system may be able to recover without resetting the system.
  • Existing methods of handling CPU deadlocks simply cause the bus error every time and reset the system.
  • An apparatus and a method attempt to resolve CPU deadlocks without resetting the system by invoking a abort (MCA) before triggering a memory bus error. This gives the system the opportunity to resolve the error without resetting if that is possible.
  • MCA abort
  • FIG. 1 shows a flow chart of the method of operation for resolving CPU deadlocks.
  • the CPU processes instructions 110 . This processing continues until a CPU error occurs, which deadlocks the CPU. If an error occurs 120 , then the system triggers 130 an MCA.
  • the MCA invokes a software mechanism that attempts to resolve the CPU error without resetting the CPU. If the MCA is successful 140 , then the CPU continues processing instructions 110 . If the machine-check abort fails to resolve the CPU problem, then a bus error is triggered 150 and the CPU attempts to reset itself in the traditional fashion.
  • FIG. 2 is a more detailed flow chart of one embodiment of the operation for resolving CPU deadlocks.
  • a counter is set 102 to zero.
  • the CPU attempts to process 112 an instruction. As soon as the instruction is retired 104 , the counter is reset 102 to zero and the process begins anew. If the instruction is not retired 104 , then the counter is incremented 106 .
  • a test function 108 determines whether the MCA has been disabled. As explained below, when an MCA issues, future MCAs are disabled 132 . If the MCA is disabled, then the test function 108 skips the MCA-related functions 122 , 130 , 132 .
  • the CPU continues to process the instruction as usual 112 , for so long as the counter does not reach a predefined abort value 120 (2 n ⁇ 1 in the embodiment shown). If the counter does reach the abort value, then the MCA is triggered 130 , with the hope that the MCA will resolve the CPU problem. At the same time, a software programmable bit is also set 132 to prevent any further MCAs from issuing. The counter is then compared 142 to a predefined bus error value (2 n in the embodiment shown). If the counter has not reached the bus error value, then the CPU continues processing 112 . If the counter reaches or exceeds the bus error value, then a bus error is triggered 150 , and the CPU tries to reset 160 . The CPU then clears the counter 102 and continues processing 112 .
  • FIG. 3 shows a block diagram of the hardware for resolving CPU deadlocks.
  • a computer system 10 has a CPU 20 electrically connected to a counter 30 .
  • the counter 30 increments 112 every clock cycle while the CPU 20 attempts to execute an instruction.
  • the counter 30 reaches 122 a predetermined abort value (represented as 2 n ⁇ 1 in the embodiment of FIG. 3 )
  • the system invokes an MCA 40 to attempt to resolve the problem.
  • the counter 30 continues to increment 112 .
  • a predetermined bus error value represented as 2 n in the embodiment of FIG. 3
  • traditional methods of handling the deadlock are used, such as invoking a bus error 50 and restarting the system.
  • the counter 30 receives a retire instruction signal 22 from the CPU 20 whenever an instruction retires. That retire instruction signal 22 resets the counter 30 as illustrated by the reset port 32 shown.
  • the MCA causes the current CPU state to be destroyed and uses a special software handler that tries to repair the CPU.
  • the MCA is an event that causes the system to restart at a particular memory address so that it can attempt to repair the CPU.
  • the MCA may run on all of the CPUs or just some, for instance if only some CPUs take the MCA.
  • the CPU quits the execution of its current code and the CPU is restarted at a particular memory address, from which code is executed.
  • triggering the MCA only the current CPU is reset, and the machine tries to resolve the deadlock without resetting any other CPUs on the bus.
  • the MCA checks the status registers. In the event that the MCA determines that the deadlock cannot be resolved without resetting the entire system, then it triggers a bus error.

Abstract

A method and apparatus for handling errors that deadlock a CPU by first attempting to resolve the deadlock without issuing a bus error and without restarting the CPU. If the deadlock cannot be resolved without issuing a bus error, then a bus error is issued and the CPU attempts to restart. The method involves comparing the number of clock cycles taken to execute an instruction to a designated abort value. When an instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock. The method also involves comparing the number of clock cycles to a larger bus error value. If the MCA does not break the deadlock within a certain period—i.e., before the bus error value is reached—then a bus error is issued and the machine attempts to reset.

Description

TECHNICAL FIELD
The technical field relates generally to digital computer systems and more particularly, but not by way of limitation, to systems for detecting errors within the instructions processed in such computer systems.
BACKGROUND
A central processing unit (CPU) may stop making forward progress for various reasons. For example, a CPU deadlock may occur when the code makes a memory reference to a non-existing memory. In some systems, the memory controllers will not respond to such an erroneous memory reference, causing the system to deadlock, waiting for data to return from a memory that does not exist. When a CPU deadlock occurs, there must be some mechanism for releasing the CPU from this deadlocked state.
One such mechanism is the triggering of a bus error to clear the deadlock. However, triggering a bus error substantially impacts the system by requiring the system to be restarted. In particular, triggering a bus error requires resetting the memory controllers. Triggering a bus error is expensive in terms of time and software required to fix the problem. A bus may have multiple CPUs, in which case all of them usually must be reset upon the triggering of a bus error.
What is needed is method and an apparatus to resolve the CPU deadlock without triggering a bus error, if possible. In particular, what is needed is a method of attempting to resolve the CPU deadlock first through software, and then, if that method fails, invoking traditional methods of resolving the deadlock, such as triggering a bus error.
SUMMARY
A method is provided for handling errors that deadlock a CPU by first attempting to resolve the deadlock without issuing a bus error and without restarting the computer. If the deadlock cannot be resolved without issuing a bus error, then a bus error is issued and the computer attempts to restart itself. The method involves comparing the number of clock cycles taken to execute an instruction to a designated abort value. When the instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock. The method also involves comparing the number of clock cycles to a larger bus error value. If the MCA does not break the deadlock within a certain period—i.e., before the bus error value is reached—then a bus error is issued and the computer attempts to reset.
A computer system includes a CPU, a counter, and a software programmable register. The counter determines the number of clock cycles consumed during the execution of an instruction and stores that number in the register. The number of clock cycles taken is compared to execute an instruction to a designated abort value. When an instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock. The number of clock cycles is also compared to a larger bus error value. If the MCA does not break the deadlock within a certain period—i.e., before the bus error value is reached—then a bus error is issued and the CPU attempts to reset itself.
SUMMARY OF DRAWINGS
FIG. 1 is a flow chart showing a method for resolving CPU deadlocks.
FIG. 2 is a more detailed flow chart of FIG. 1.
FIG. 3 is a block diagram of the computer system capable of resolving CPU deadlocks.
DETAILED DESCRIPTION
A means is provided for handling CPU deadlocks without causing a bus error and without resetting the computer system, when possible. Many CPU deadlocks are caused by memory bus errors and cannot be resolved without resetting the system. Some CPU deadlocks may be caused by errors other than memory bus errors. This is particularly true in mixed-architecture CPUs (i.e. those architectures capable of processing more than one type of instruction set) such as the IA-64 architecture. For example, a deadlock may be caused during hardware emulation of the IA-32 architecture on the CPU. In these cases in which the deadlock is caused by errors other than a memory bus error, the system may be able to recover without resetting the system. Existing methods of handling CPU deadlocks simply cause the bus error every time and reset the system. An apparatus and a method attempt to resolve CPU deadlocks without resetting the system by invoking a abort (MCA) before triggering a memory bus error. This gives the system the opportunity to resolve the error without resetting if that is possible.
FIG. 1 shows a flow chart of the method of operation for resolving CPU deadlocks. The CPU processes instructions 110. This processing continues until a CPU error occurs, which deadlocks the CPU. If an error occurs 120, then the system triggers 130 an MCA. The MCA invokes a software mechanism that attempts to resolve the CPU error without resetting the CPU. If the MCA is successful 140, then the CPU continues processing instructions 110. If the machine-check abort fails to resolve the CPU problem, then a bus error is triggered 150 and the CPU attempts to reset itself in the traditional fashion.
FIG. 2 is a more detailed flow chart of one embodiment of the operation for resolving CPU deadlocks. Initially, a counter is set 102 to zero. The CPU then attempts to process 112 an instruction. As soon as the instruction is retired 104, the counter is reset 102 to zero and the process begins anew. If the instruction is not retired 104, then the counter is incremented 106. A test function 108 determines whether the MCA has been disabled. As explained below, when an MCA issues, future MCAs are disabled 132. If the MCA is disabled, then the test function 108 skips the MCA- related functions 122, 130, 132. If the MCA is not disabled, then the CPU continues to process the instruction as usual 112, for so long as the counter does not reach a predefined abort value 120 (2n−1 in the embodiment shown). If the counter does reach the abort value, then the MCA is triggered 130, with the hope that the MCA will resolve the CPU problem. At the same time, a software programmable bit is also set 132 to prevent any further MCAs from issuing. The counter is then compared 142 to a predefined bus error value (2n in the embodiment shown). If the counter has not reached the bus error value, then the CPU continues processing 112. If the counter reaches or exceeds the bus error value, then a bus error is triggered 150, and the CPU tries to reset 160. The CPU then clears the counter 102 and continues processing 112.
FIG. 3 shows a block diagram of the hardware for resolving CPU deadlocks. A computer system 10 has a CPU 20 electrically connected to a counter 30. The counter 30 increments 112 every clock cycle while the CPU 20 attempts to execute an instruction. When the counter 30 reaches 122 a predetermined abort value (represented as 2n−1 in the embodiment of FIG. 3), then the system invokes an MCA 40 to attempt to resolve the problem. Meanwhile, the counter 30 continues to increment 112. If the deadlock is not resolved by the MCA 40 before the counter 30 reaches 142 a predetermined bus error value (represented as 2n in the embodiment of FIG. 3), then traditional methods of handling the deadlock are used, such as invoking a bus error 50 and restarting the system. The counter 30 receives a retire instruction signal 22 from the CPU 20 whenever an instruction retires. That retire instruction signal 22 resets the counter 30 as illustrated by the reset port 32 shown.
In use, the MCA causes the current CPU state to be destroyed and uses a special software handler that tries to repair the CPU. The MCA is an event that causes the system to restart at a particular memory address so that it can attempt to repair the CPU. The MCA may run on all of the CPUs or just some, for instance if only some CPUs take the MCA. The CPU quits the execution of its current code and the CPU is restarted at a particular memory address, from which code is executed. By triggering the MCA, only the current CPU is reset, and the machine tries to resolve the deadlock without resetting any other CPUs on the bus.
The MCA checks the status registers. In the event that the MCA determines that the deadlock cannot be resolved without resetting the entire system, then it triggers a bus error.
Although the method and apparatus for resolving CPU deadlocks have been described in detail with reference to certain embodiments thereof, variations are possible. For example, although the relative values the abort value and the bus error value and other certain specific information were given as examples, these examples were by way of illustration only, and not by way of limitation. The apparatus may be embodied in other specific forms without departing from the essential spirit or attributes thereof. It is desired that the embodiments described herein be considered in all respects as illustrative, not restrictive, and that reference be made to the appended claims for determining the scope of the invention.

Claims (20)

What is claimed is:
1. A method for handling CPU deadlock comprising:
detecting a CPU deadlock;
initiating a machine-check abort (MCA);
determining whether the MCA cleared the CPU deadlock;
if the MCA did clear the CPU error, continuing processing; and
if the MCA did not clear the CPU error, triggering a bus error.
2. The method of claim 1, further comprising:
counting a number of clock cycles while an instruction is being processed;
comparing the number of clock cycles to an abort value; and
if the number of clock cycles equals or exceeds the abort value, initiating the MCA.
3. The method of claim 2, further comprising:
comparing the number of clock cycles to a bus error value; and
if the number of clock cycles issued equals or exceeds the bus error value, triggering the bus error and attempting to reset the CPU.
4. The method of claim 3, wherein the bus error value exceeds the abort value.
5. The method of claim 4, wherein the CPU deadlock occurs in a CPU that is capable implementing a plurality of instruction sets, and wherein the MCA attempts to clear an error that is not a memory bus error.
6. The method of claim 1, wherein the MCA attempts to clear an error that is not a memory bus error.
7. The method of claim 6, wherein the CPU deadlock occurs in a CPU that is capable implementing a plurality of instruction sets.
8. A method for resolving a deadlock in a mixed-architecture CPU, comprising:
detecting a deadlock;
attempting to resolve the deadlock without resetting the CPU; and
if the deadlock cannot be resolved without resetting the CPU, issuing a memory bus error to reset the CPU.
9. The method of claim 8, further comprising:
counting the number of cycles that an instruction has been pending; and
issuing the memory bus error when the number of cycles reaches a bus error value.
10. The method of claim 8, further comprising counting the number of clock cycles that an instruction has been pending, and wherein the step of attempting comprises using a machine-check abort (MCA) to attempt to resolve the deadlock without resetting the CPU by invoking the MCA when the number of cycles reaches an abort value.
11. A method for resolving a deadlock in a mixed-architecture CPU, comprising:
detecting a deadlock;
using a machine-check abort (MCA) to attempt to resolve the deadlock without resetting the CPU; and
if the deadlock cannot be resolved without resetting the CPU, resetting the CPU.
12. The method of claim 11, further comprising:
counting the number of clock cycles that an instruction has been pending; and
invoking the MCA when the number of cycles reaches an abort value.
13. The method of claim 12, wherein the abort value is less than a memory bus error value at which a memory bus error is caused.
14. A computer system comprising:
a CPU;
a counter electrically connected to the CPU that increments every CPU cycle and resets on each instruction retirement; and
a software programmable register electrically connected to the counter, which register
detects when the counter reaches or exceeds an abort count; and
issues a machine-check abort (MCA) when the counter equals or exceeds the abort count.
15. The computer system of claim 14, wherein the MCA attempts to resolve a CPU deadlock without resetting the system.
16. The computer system of claim 15, wherein the MCA is capable of resolving CPU deadlocks that are not caused by memory bus errors.
17. The computer system of claim 15, wherein the software programmable register sets a software programmable bit, which bit disables further machine-check aborts from the computer.
18. The computer system of claim 14, wherein the software programmable register detects whether the counter reaches or exceeds a bus error value, and if the counter reaches or exceeds a bus error value, triggers a bus error.
19. The computer system of claim 18, wherein the bus error value is greater than the abort value.
20. The computer system of 14, wherein the CPU supports a plurality of instruction sets.
US09/505,978 2000-02-16 2000-02-16 Method and apparatus for resolving CPU deadlocks Expired - Lifetime US6553512B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US09/505,978 US6553512B1 (en) 2000-02-16 2000-02-16 Method and apparatus for resolving CPU deadlocks
DE10056828A DE10056828B4 (en) 2000-02-16 2000-11-16 Method and device for releasing CPU deadlocks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/505,978 US6553512B1 (en) 2000-02-16 2000-02-16 Method and apparatus for resolving CPU deadlocks

Publications (1)

Publication Number Publication Date
US6553512B1 true US6553512B1 (en) 2003-04-22

Family

ID=24012664

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/505,978 Expired - Lifetime US6553512B1 (en) 2000-02-16 2000-02-16 Method and apparatus for resolving CPU deadlocks

Country Status (2)

Country Link
US (1) US6553512B1 (en)
DE (1) DE10056828B4 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010023392A1 (en) * 2000-03-17 2001-09-20 Norihiro Nakatsuhama Abnormality detection device for detecting an abnormality in a communication bus
US20020116664A1 (en) * 2000-12-22 2002-08-22 Steven Tu Method and apparatus for machine check abort handling in a multiprocessing system
US20020124207A1 (en) * 2001-01-23 2002-09-05 Masao Ohwada System for facilitated analysis of PCI bus malfuntion
US20040054876A1 (en) * 2002-09-13 2004-03-18 Grisenthwaite Richard Roy Synchronising pipelines in a data processing apparatus
US20040054837A1 (en) * 2001-01-31 2004-03-18 International Business Machines Corporation Controlling flow of data between data processing systems via a memory
US6742135B1 (en) * 2000-11-07 2004-05-25 At&T Corp. Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US6973590B1 (en) * 2001-11-14 2005-12-06 Unisys Corporation Terminating a child process without risk of data corruption to a shared resource for subsequent processes
US20060061689A1 (en) * 2004-09-09 2006-03-23 Chen Jiann-Tsuen Deadlock detection and recovery logic for flow control based data path design
US20060282205A1 (en) * 2005-06-09 2006-12-14 Lange Arthur F System for guiding a farm implement between swaths
US20070028127A1 (en) * 2005-07-26 2007-02-01 Samsung Electronics Co., Ltd. Universal serial bus system, and method of driving the same
US7383114B1 (en) 2003-08-29 2008-06-03 Trimble Navigation Limited Method and apparatus for steering a farm implement to a path
US20080263379A1 (en) * 2007-04-17 2008-10-23 Advanced Micro Devices, Inc. Watchdog timer device and methods thereof
CN103268276A (en) * 2012-03-29 2013-08-28 威盛电子股份有限公司 Deadlock/livelock resolution using service processor
CN105849705A (en) * 2014-12-13 2016-08-10 上海兆芯集成电路有限公司 Pattern detector for detecting hangs
EP3066559A4 (en) * 2014-12-13 2017-07-19 VIA Alliance Semiconductor Co., Ltd. Logic analyzer for detecting hangs
US9753799B2 (en) 2014-12-13 2017-09-05 Via Alliance Semiconductor Co., Ltd. Conditional pattern detector for detecting hangs
US10169137B2 (en) 2015-11-18 2019-01-01 International Business Machines Corporation Dynamically detecting and interrupting excessive execution time
US10324842B2 (en) 2014-12-13 2019-06-18 Via Alliance Semiconductor Co., Ltd Distributed hang recovery logic

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4348722A (en) * 1980-04-03 1982-09-07 Motorola, Inc. Bus error recognition for microprogrammed data processor
US5006980A (en) * 1988-07-20 1991-04-09 Digital Equipment Corporation Pipelined digital CPU with deadlock resolution
US5664088A (en) * 1995-09-12 1997-09-02 Lucent Technologies Inc. Method for deadlock recovery using consistent global checkpoints
US5682551A (en) * 1993-03-02 1997-10-28 Digital Equipment Corporation System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto
US5889975A (en) * 1996-11-07 1999-03-30 Intel Corporation Method and apparatus permitting the use of a pipe stage having an unknown depth with a single microprocessor core
US6247118B1 (en) * 1998-06-05 2001-06-12 Mcdonnell Douglas Corporation Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry
US6292910B1 (en) * 1998-09-14 2001-09-18 Intel Corporation Method and apparatus for detecting a bus deadlock in an electronic system
US6453430B1 (en) * 1999-05-06 2002-09-17 Cisco Technology, Inc. Apparatus and methods for controlling restart conditions of a faulted process

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61276497A (en) * 1985-05-31 1986-12-06 Hitachi Ltd Method for recognizing deadlock
US6151655A (en) * 1998-04-30 2000-11-21 International Business Machines Corporation Computer system deadlock request resolution using timed pulses

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4348722A (en) * 1980-04-03 1982-09-07 Motorola, Inc. Bus error recognition for microprogrammed data processor
US5006980A (en) * 1988-07-20 1991-04-09 Digital Equipment Corporation Pipelined digital CPU with deadlock resolution
US5682551A (en) * 1993-03-02 1997-10-28 Digital Equipment Corporation System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto
US5664088A (en) * 1995-09-12 1997-09-02 Lucent Technologies Inc. Method for deadlock recovery using consistent global checkpoints
US5889975A (en) * 1996-11-07 1999-03-30 Intel Corporation Method and apparatus permitting the use of a pipe stage having an unknown depth with a single microprocessor core
US6247118B1 (en) * 1998-06-05 2001-06-12 Mcdonnell Douglas Corporation Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry
US6292910B1 (en) * 1998-09-14 2001-09-18 Intel Corporation Method and apparatus for detecting a bus deadlock in an electronic system
US6453430B1 (en) * 1999-05-06 2002-09-17 Cisco Technology, Inc. Apparatus and methods for controlling restart conditions of a faulted process

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Microsoft Corporation. Microsoft Knowledge Base Article-Q171773, How to Eliminate a Process That Is Not Responding Without Restarting the Computer. Jul. 23, 1997. *
Microsoft Corporation. Microsoft Knowledge Base Article—Q171773, How to Eliminate a Process That Is Not Responding Without Restarting the Computer. Jul. 23, 1997.

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7424383B2 (en) * 2000-03-17 2008-09-09 Fujitsu Limited Abnormality detection device for detecting an abnormality in a communication bus
US20010023392A1 (en) * 2000-03-17 2001-09-20 Norihiro Nakatsuhama Abnormality detection device for detecting an abnormality in a communication bus
US7536582B1 (en) * 2000-11-07 2009-05-19 At&T Corp. Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US6742135B1 (en) * 2000-11-07 2004-05-25 At&T Corp. Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US6684346B2 (en) * 2000-12-22 2004-01-27 Intel Corporation Method and apparatus for machine check abort handling in a multiprocessing system
US20020116664A1 (en) * 2000-12-22 2002-08-22 Steven Tu Method and apparatus for machine check abort handling in a multiprocessing system
US7216252B1 (en) * 2000-12-22 2007-05-08 Intel Corporation Method and apparatus for machine check abort handling in a multiprocessing system
US7003701B2 (en) * 2001-01-23 2006-02-21 Nec Corporation System for facilitated analysis of PCI bus malfunction
US20020124207A1 (en) * 2001-01-23 2002-09-05 Masao Ohwada System for facilitated analysis of PCI bus malfuntion
US20040054837A1 (en) * 2001-01-31 2004-03-18 International Business Machines Corporation Controlling flow of data between data processing systems via a memory
US7409468B2 (en) * 2001-01-31 2008-08-05 International Business Machines Corporation Controlling flow of data between data processing systems via a memory
US6973590B1 (en) * 2001-11-14 2005-12-06 Unisys Corporation Terminating a child process without risk of data corruption to a shared resource for subsequent processes
US20040054876A1 (en) * 2002-09-13 2004-03-18 Grisenthwaite Richard Roy Synchronising pipelines in a data processing apparatus
US7024543B2 (en) * 2002-09-13 2006-04-04 Arm Limited Synchronising pipelines in a data processing apparatus
US7383114B1 (en) 2003-08-29 2008-06-03 Trimble Navigation Limited Method and apparatus for steering a farm implement to a path
US7418625B2 (en) * 2004-09-09 2008-08-26 Broadcom Corporation Deadlock detection and recovery logic for flow control based data path design
US20060061689A1 (en) * 2004-09-09 2006-03-23 Chen Jiann-Tsuen Deadlock detection and recovery logic for flow control based data path design
US20060282205A1 (en) * 2005-06-09 2006-12-14 Lange Arthur F System for guiding a farm implement between swaths
US7860628B2 (en) 2005-06-09 2010-12-28 Trimble Navigation Limited System for guiding a farm implement between swaths
US20070028127A1 (en) * 2005-07-26 2007-02-01 Samsung Electronics Co., Ltd. Universal serial bus system, and method of driving the same
US20080263379A1 (en) * 2007-04-17 2008-10-23 Advanced Micro Devices, Inc. Watchdog timer device and methods thereof
US9575816B2 (en) 2012-03-29 2017-02-21 Via Technologies, Inc. Deadlock/livelock resolution using service processor
EP2645237A3 (en) * 2012-03-29 2015-05-27 VIA Technologies, Inc. Deadlock/livelock resolution using service processor
CN103268276A (en) * 2012-03-29 2013-08-28 威盛电子股份有限公司 Deadlock/livelock resolution using service processor
CN105849705A (en) * 2014-12-13 2016-08-10 上海兆芯集成电路有限公司 Pattern detector for detecting hangs
EP3066559A4 (en) * 2014-12-13 2017-07-19 VIA Alliance Semiconductor Co., Ltd. Logic analyzer for detecting hangs
EP3047380A4 (en) * 2014-12-13 2017-07-19 VIA Alliance Semiconductor Co., Ltd. Pattern detector for detecting hangs
US9753799B2 (en) 2014-12-13 2017-09-05 Via Alliance Semiconductor Co., Ltd. Conditional pattern detector for detecting hangs
US9946651B2 (en) 2014-12-13 2018-04-17 Via Alliance Semiconductor Co., Ltd Pattern detector for detecting hangs
US10067871B2 (en) 2014-12-13 2018-09-04 Via Alliance Semiconductor Co., Ltd Logic analyzer for detecting hangs
CN105849705B (en) * 2014-12-13 2019-06-04 上海兆芯集成电路有限公司 For detecting the logic analyzer of pause
US10324842B2 (en) 2014-12-13 2019-06-18 Via Alliance Semiconductor Co., Ltd Distributed hang recovery logic
US10169137B2 (en) 2015-11-18 2019-01-01 International Business Machines Corporation Dynamically detecting and interrupting excessive execution time

Also Published As

Publication number Publication date
DE10056828B4 (en) 2004-05-06
DE10056828A1 (en) 2001-09-06

Similar Documents

Publication Publication Date Title
US6553512B1 (en) Method and apparatus for resolving CPU deadlocks
US6012154A (en) Method and apparatus for detecting and recovering from computer system malfunction
US6438709B2 (en) Method for recovering from computer system lockup condition
EP1588260B1 (en) Hot plug interfaces and failure handling
EP0955585B1 (en) Method and system for handling bus errors in a data processing system
US6880113B2 (en) Conditional hardware scan dump data capture
WO2020239060A1 (en) Error recovery method and apparatus
US10761776B2 (en) Method for handling command in conflict scenario in non-volatile memory express (NVMe) based solid-state drive (SSD) controller
JP2017527902A5 (en)
US20200151048A1 (en) System for configurable error handling
US20100192029A1 (en) Systems and Methods for Logging Correctable Memory Errors
US7200772B2 (en) Methods and apparatus to reinitiate failed processors in multiple-processor systems
EP1967950A2 (en) Multiprocessor system for continuing program execution upon detection of abnormality
US6799285B2 (en) Self-checking multi-threaded processor
US8028189B2 (en) Recoverable machine check handling
JPH06324914A (en) Runaway detecting method for computer
JP2006344087A (en) Task management device for controller and task management method for controller
US11366710B1 (en) Methods and systems for reducing downtime from system management mode in a computer system
CN107423029B (en) Calculation unit
US5673391A (en) Hardware retry trap for millicoded processor
JP2008262557A (en) Task management device for controller and task management method for controller
US20070150703A1 (en) Breaking a lock situation in a system
JP2695775B2 (en) How to recover from computer system malfunction
JPS6159545A (en) Method for detecting interface faults of data processor
JPS6256544B2 (en)

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD COMPANY, COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GIBSON, J. DOUGLAS;REEL/FRAME:010966/0098

Effective date: 20000627

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:013862/0623

Effective date: 20030728

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027