WO2000055953A1 - System and method of event management and early fault detection - Google Patents

System and method of event management and early fault detection Download PDF

Info

Publication number
WO2000055953A1
WO2000055953A1 PCT/US2000/006919 US0006919W WO0055953A1 WO 2000055953 A1 WO2000055953 A1 WO 2000055953A1 US 0006919 W US0006919 W US 0006919W WO 0055953 A1 WO0055953 A1 WO 0055953A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
client
fault
block
list
Prior art date
Application number
PCT/US2000/006919
Other languages
French (fr)
Inventor
Kumar Gajjar
Nghiep Tran
Original Assignee
Smartsan Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smartsan Systems, Inc. filed Critical Smartsan Systems, Inc.
Priority to AU38892/00A priority Critical patent/AU3889200A/en
Publication of WO2000055953A1 publication Critical patent/WO2000055953A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • the present invention relates generally to network fault management via a software event manager (EM) inserted in the network at a central point in the system and controlled by the user through a Graphical User Interface (GUI).
  • EM software event manager
  • GUI Graphical User Interface
  • fibre channel The introduction and proliferation of fibre channel has allowed greatly increasing network connectivity between central servers and local storage so that many more devices can be connected to a network over wider geographical areas.
  • Fibre channel is an ANSI-standard, high-speed data communications technology providing gigabit-per-second transmission rates for server/storage and large-size, high-performance, geographically dispersed networking environments. Increases in computer network speed, size and connectivity require that early fault detection and fault management controls be embedded in the central server or elsewhere with connections to all devices and storage comprising the network.
  • the main components or functions of the fault or EM are: 1. Event Table 2. Registration
  • GUI Graphical User Interface
  • the present invention provides a software system and method for the users or clients of the system to set and change, as needed, the fault reporting, fault logging, fault notification, and fault trigger point thresholds for any event in the network or system.
  • a "point and click" graphical user interface (GUI) can allow users to perform these tasks, or they can be performed by calling API functions.
  • Another advantage of the present invention is the integration into a central point, the EM, of all appropriate fault management functions as follows:
  • FIG. 1 is a block diagram of one embodiment of a controller device according to the invention embodying an Event Manager (EM) for managing events and faults in a computer network;
  • EM Event Manager
  • Figure 2 is a block diagram illustrating one embodiment of the process of client registration
  • Figure 3 is a block diagram of nested hierarchical blocks illustrating one embodiment of the format and the ordered information content in the Client Event Table;
  • Figure 4 is a block diagram illustrating one embodiment of the Event Notification Registration List
  • FIG. 5 is a flow chart diagram of the Event Notification process in accordance with the present invention.
  • Figure 6 is a block diagram of one embodiment illustrating the Event
  • FIG. 7 is a flow chart diagram of Event Thresholding in accordance with the present invention.
  • Figure 8 is a flow chart diagram of Ordered Event Thresholding in accordance with the present invention.
  • Figure 9 is a block diagram of one embodiment of the Event Reporting feature; and Figure 10 is a block diagram of an Event Reporting example in accordance with the present invention.
  • the present invention is a novel system and method of providing fault management and early fault detection, reporting and system response in a computer or logic device network that reaches all the way down to the device level, including logical devices.
  • FIG. 1 is a schematic block diagram illustrating one embodiment of the controller or EM 100 wherein there are identified the key elements of the EM 100. These elements include the Processor Module 1 10 comprizing a Processor 120 connected to a random access memory (RAM) 130, a non-volatile memory 140, a read-only memory (ROM) 150, a Cache/Staging memory 170 and the input/output connections to all the relevant components of the network (FC I/O's 172, 174, etc. and (I O's) 182, 184, 186 etc.
  • RAM random access memory
  • ROM read-only memory
  • Cache/Staging memory 170 the input/output connections to all the relevant components of the network
  • FIG. 2 a block diagram illustrates one of the ways that a client XYZ registers with the EM.
  • the client assembles an event/fault table, as shown in block A, wherein there are listed in the required level of detail the possible or anticipated events that can occur to the client and its components. This table is discussed in great detail in Figure 3.
  • the client XYZ registers with the EM with Client Identification (ID) and a pointer to its Event Table, through a step B to the EM.
  • ID Client Identification
  • FIG. 3 is a set of nested hierarchical blocks of lists illustrating the format and the ordered information content in the Client Event Table 300.
  • All the event elements say 301 to 309 are listed in numerical ascending order for one client.
  • tags 361-366 also correspond each to an entire block of ordered lists of choices, attributes and actions down to the level of required detail to identify the component, the fault and its severity and to take component and system fault remedial actions as illustrated by the information inside the right-hand side blocks 361-366 of Figure 3. Additional options, choices, list members can be added to the lists in blocks 300, 360-366 suitable or required by a specific application by a designer skilled in the art.
  • Event Notification Registration feature allows a client to register itself with the EM in order to be notified after the occurrence of a specified event.
  • FIG 4 illustrates in detail the Event Notification Registration (ENR) function of the EM.
  • ENR Event Notification Registration
  • the EM creates an ENR Element, 450, 136, 142, and adds it to its ENR List, 436, and increments the ENRCount.
  • the ENR function, block 436 of Figure 4 stores this information for the given client when the client sends to block 436 an ENR in the format of block 450, 451 etc. of Figure 4.
  • the format of the client ENR say block 450, includes the Client ID, the Event Code, a data on the previous event occurance, ENR Prev P, data on the next event occurance, ENRNext P and a Callback Function List the contents of which are shown in block 480. For every event received by the EM 120, it checks the ENR
  • FIG. 5 illustrates in detail via a flow chart the Event Notification process.
  • the flow chart starts with the EM receiving an event. Then it checks every notification entry in the list of block 136 or block 436 if there is a next entry. If not then it exits. If yes it checks to find if the event matches the one in the stored list. If not then it returns to the start of the event notification flow chart to test the next event. If yes then it calls the Callback functions in the callback functions list 480. Then it returns to the beginning of the flow chart to check the next notification entry.
  • FIG. 6 illustrates in detail the Event Threshold Registration (ETR) function of the controller 100, shown as block 138, 142 in Figure 1 and as block 638 in Figure 6.
  • ETR Event Threshold Registration
  • the EM creates an ETR element, 650, 138, 142, adds it to its ETR List 638, and increments the ETRCount.
  • the format of the client ETR, say block 650 includes the client ID, the Event Code, data on the previews event occurance ETRPrevP, data on the next (current) event occurance ETRNextP, Occurance Count, Timestamp, Threshold Type, Threshold Duration, Threshold Event Count, Callback Function, Event Count, Event Code List.
  • the Event Code list in block 650 is further delineated into a Threshold Event List, block 680 that tags the threshold events.
  • Each threshold event in block 680 is further delineated into a Threshold Element List, block 690 containing information on the Element Type, Event Number, Client ID, Severity Level, Component Type and Component ID.
  • the ETR feature of EM 100 allows clients to register event(s) with EM, so that EM will notify the client if the threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system (i.e. For failure analysis, one can request to be notified when 5 "Media error” events occur within 2 seconds, when this happens it can decide what to do with the device).
  • Example 1 User, the client, sets the trigger parameter as: "Notify the user via SNMP Trap if 3 Bad Block Errors occur within 10 seconds time interval from Storage Device 0".
  • EM will monitor all Bad Block errors generated by Device 0, log the time the errors occurred and monitors to check if 3 errors occurred within the 10-second time interval. If so then it will notify the user by sending an SNMP Trap to the Management station.
  • Example 2 Fibre Channel Driver, the client, sets the trigger parameter as:
  • EM will monitor all LIP Resets detected on Fibre Channel Port 1 , log the time the errors occurred and check if 5 errors occurred within the 15-second time interval. If so then it will call the function fcdInit(port 1).
  • the function emETR () returns a unique ID which can be used to de-register the ETR.
  • FIG. 7 is a flowchart of steps in a method for checking whether Threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system.
  • a Threshold event list is created as shown in Figure 6 and placed in step 702 of Figure 7.
  • the event thresholding program in step 710 initiates or continues the evaluation of threshold entries. If there no more threshold entries in the list, the program exits the its evaluation process. If there is an additional threshold entry in the list then it proceeds in step 720 to compare its duration against a preset duration. If the given threshold duration is greater than a preset duration then in step 722 it resets the Timestamp and Resets the Counter and proceeds to the next step 730.
  • step 730 it is compared to the preset event for match. If it does not match then the program returns to the initial step 710 where it looks for a next entry to evaluate. If it does match in step 730 then it increments the Counter and proceeds to 740 and checks the Counter to find if it is equal to one (1). If yes it proceeds to step 742 where it resets the timestamp and proceeds to the initializing step 710 where it calls for a next entry to be tested. If the answer is No in step 740 then it proceeds to 750 where it checks to find if the counter value is greater or equal to the threshold event count. If the answer is No then it returns to step 710 to initiate testing a next entry. If the answer is Yes then it continues to step 760 where it calls the Callback function, resets the timestep, resets the counter and returns to the initializing step 710 for evaluating the next entry.
  • Figure 8 addresses the case when the threshold event list is ordered as shown in block 680 of Figure 6.
  • the only difference between FIGS. 7 and 8 occur in the insertion of steps 831 and 833 between steps 830 and 832. They yes option of step 830 leads to a new step 831 where the matched index is compared to the counter. If they are not equal then the timestamp and the counter are reset and the program returns to the initializing step 810 to evaluate the next entry. If they are equal (ordered event) then the remaining steps are identical to the corresponding ones of Figure 7.
  • Figure 9 illustrates in detail how a client reports an event 900 to the EM.
  • the client will call emReportEvent () 910 in EM with the following parameters inserted: client ID 920, event number 930, component ID 940 and software context 950.
  • the software context block 950 contains File Name, Line Number and Version Number.
  • the remaining blocks 960, 970, 980, 991, 992, 993, 994, 995 and 996 are identical in format to those in FIGS. 2 and 3.
  • the EM When the EM receives an Event Reporting request, it will index into the Client Table using the Client ID and find the Client Event Table. Then using the Event Number, EM will index into the Client Event Table and get the Event Element of FIGS . 4 and 6.
  • Figure 10 illustrates an event reporting example from the FC driver: the Al Loop Up Event.
  • Block 1000 identifies the event from the event element.
  • Block 1010 identifies the event element.
  • Block 1020 identifies the relevant Correction Description Table.
  • Block 1030 identifies the two actions that are enabled on this event as specified in the first two elements.

Abstract

Users or clients of a computer system can set and change, as needed, the fault reporting, fault logging, fault notification, and fault trigger point thresholds for any event in a network or system. A 'point and click' graphical user interface (GUI) allows users to conveniently perform these tasks, or they can be performed by calling API functions. Another advantage is the integration of an Event Manager into a central point of all appropriate fault management functions, including an Event Table, Registration, Event Thresholding, Logging and Notification, as well as Recovery Operations or Actions to be Taken.

Description

SYSTEM AND METHOD OF EVENT MANAGEMENT AND EARLY FAULT
DETECTION
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is related to and is extending in a novel way the tasks accomplished and the utility of the copending U.S. Provisional Patent Application Serial No. 60/124,494, entitled "System and Method of Zoning and Access Control, Event Management and Network Management in a Computer Network," filed on March 15, 1999.
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates generally to network fault management via a software event manager (EM) inserted in the network at a central point in the system and controlled by the user through a Graphical User Interface (GUI).
2. Description of the Background Art
The introduction and proliferation of fibre channel has allowed greatly increasing network connectivity between central servers and local storage so that many more devices can be connected to a network over wider geographical areas.
Fibre channel is an ANSI-standard, high-speed data communications technology providing gigabit-per-second transmission rates for server/storage and large-size, high-performance, geographically dispersed networking environments. Increases in computer network speed, size and connectivity require that early fault detection and fault management controls be embedded in the central server or elsewhere with connections to all devices and storage comprising the network. The main components or functions of the fault or EM are: 1. Event Table 2. Registration
3. Event Thresholding, Logging and Notification
4. Recovery Operations or Actions to be Taken.
The prior art for network fault or event management allows only predefined, built-in controls, that is, preset, fixed trigger thresholding.. Therefore, there remains a need for an improved fault management system which permits the users to conveniently set and change, as needed by each application, the trigger point threshold, via an easy to use method, such as a Graphical User Interface (GUI).
SUMMARY OF THE INVENTION
The present invention provides a software system and method for the users or clients of the system to set and change, as needed, the fault reporting, fault logging, fault notification, and fault trigger point thresholds for any event in the network or system. A "point and click" graphical user interface (GUI) can allow users to perform these tasks, or they can be performed by calling API functions.
Another advantage of the present invention is the integration into a central point, the EM, of all appropriate fault management functions as follows:
1. Event Table
2. Registration
3. Event Thresholding, Logging and Notification
4. Recovery Operations or Actions to be Taken, With the EM in the system, other subsystem modules do not need to have code to monitor and track any events or faults, thus freeing them of the burden of booking, reporting and the other features of the EM.
Other advantages and features of the present invention will be apparent from the drawings and detailed description as set forth below.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of one embodiment of a controller device according to the invention embodying an Event Manager (EM) for managing events and faults in a computer network;
Figure 2 is a block diagram illustrating one embodiment of the process of client registration;
Figure 3 is a block diagram of nested hierarchical blocks illustrating one embodiment of the format and the ordered information content in the Client Event Table;
Figure 4 is a block diagram illustrating one embodiment of the Event Notification Registration List;
Figure 5 is a flow chart diagram of the Event Notification process in accordance with the present invention;
Figure 6 is a block diagram of one embodiment illustrating the Event
Threshold Registration List;
Figure 7 is a flow chart diagram of Event Thresholding in accordance with the present invention;
Figure 8 is a flow chart diagram of Ordered Event Thresholding in accordance with the present invention;
Figure 9 is a block diagram of one embodiment of the Event Reporting feature; and Figure 10 is a block diagram of an Event Reporting example in accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention is a novel system and method of providing fault management and early fault detection, reporting and system response in a computer or logic device network that reaches all the way down to the device level, including logical devices.
Figure 1 is a schematic block diagram illustrating one embodiment of the controller or EM 100 wherein there are identified the key elements of the EM 100. These elements include the Processor Module 1 10 comprizing a Processor 120 connected to a random access memory (RAM) 130, a non-volatile memory 140, a read-only memory (ROM) 150, a Cache/Staging memory 170 and the input/output connections to all the relevant components of the network (FC I/O's 172, 174, etc. and (I O's) 182, 184, 186 etc.
During the initialization process, all the software modules or clients initialize themselves and generate an Event Table with all the possible events that can be detected and registers themselves with the EM. The EM then adds the new client to the Client List 132 region of the RAM 130. This is discussed in detail in Figure 2. During the registration, the client passes a Client Event Table pointer to the EM.
Next, during the operation of the system if an event or fault occurs, then the client will call the EM with that event or fault. The EM will then reference the Action Table and take the appropriate measures defined in the Action Table. This is discussed in detail in Figure 3. Now referring to Figure 2, a block diagram illustrates one of the ways that a client XYZ registers with the EM. First, the client assembles an event/fault table, as shown in block A, wherein there are listed in the required level of detail the possible or anticipated events that can occur to the client and its components. This table is discussed in great detail in Figure 3. Next the client XYZ registers with the EM with Client Identification (ID) and a pointer to its Event Table, through a step B to the EM. The EM, in turn, adds Client XYZ to its Client List, as shown in block C. The Event Manager or EM also establishes a link, D, to Client XYZ's Event Table. Figure 3 is a set of nested hierarchical blocks of lists illustrating the format and the ordered information content in the Client Event Table 300. In block 300 all the event elements, say 301 to 309 are listed in numerical ascending order for one client. For each element, say 303, there is given in the event element block 360, a list of attributes or tags (361-366) that identify the components 361, 363, the event description and its severity 364, 362, the recommended correction actions 365 as well as additional actions 366. These tags 361-366 also correspond each to an entire block of ordered lists of choices, attributes and actions down to the level of required detail to identify the component, the fault and its severity and to take component and system fault remedial actions as illustrated by the information inside the right-hand side blocks 361-366 of Figure 3. Additional options, choices, list members can be added to the lists in blocks 300, 360-366 suitable or required by a specific application by a designer skilled in the art.
Next, the Event Notification Registration feature allows a client to register itself with the EM in order to be notified after the occurrence of a specified event.
Figure 4 illustrates in detail the Event Notification Registration (ENR) function of the EM. When a client registers an Event Notifcation request, the EM creates an ENR Element, 450, 136, 142, and adds it to its ENR List, 436, and increments the ENRCount. The ENR function, block 436 of Figure 4, stores this information for the given client when the client sends to block 436 an ENR in the format of block 450, 451 etc. of Figure 4. The format of the client ENR, say block 450, includes the Client ID, the Event Code, a data on the previous event occurance, ENR Prev P, data on the next event occurance, ENRNext P and a Callback Function List the contents of which are shown in block 480. For every event received by the EM 120, it checks the ENR
Link list, Figure 4, for a match. If a match is found then the EM will call the Callback Function 480 that was registered by the client in the EM Control Block 436 so that the client will take the appropriate response. Figure 5 illustrates in detail via a flow chart the Event Notification process. The flow chart starts with the EM receiving an event. Then it checks every notification entry in the list of block 136 or block 436 if there is a next entry. If not then it exits. If yes it checks to find if the event matches the one in the stored list. If not then it returns to the start of the event notification flow chart to test the next event. If yes then it calls the Callback functions in the callback functions list 480. Then it returns to the beginning of the flow chart to check the next notification entry.
Figure 6 illustrates in detail the Event Threshold Registration (ETR) function of the controller 100, shown as block 138, 142 in Figure 1 and as block 638 in Figure 6. When a client registers an Event Threshold request, the EM creates an ETR element, 650, 138, 142, adds it to its ETR List 638, and increments the ETRCount. The format of the client ETR, say block 650, includes the client ID, the Event Code, data on the previews event occurance ETRPrevP, data on the next (current) event occurance ETRNextP, Occurance Count, Timestamp, Threshold Type, Threshold Duration, Threshold Event Count, Callback Function, Event Count, Event Code List. The Event Code list in block 650 is further delineated into a Threshold Event List, block 680 that tags the threshold events. Each threshold event in block 680 is further delineated into a Threshold Element List, block 690 containing information on the Element Type, Event Number, Client ID, Severity Level, Component Type and Component ID.
The ETR feature of EM 100 allows clients to register event(s) with EM, so that EM will notify the client if the threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system (i.e. For failure analysis, one can request to be notified when 5 "Media error" events occur within 2 seconds, when this happens it can decide what to do with the device). There are two types of threshold, ordered and not-ordered. For every Event received by EM, EM checks the ETR Link List for a match. If a match is found, the occurance counter is incremented. If the interval between the timestamp and the current time is greater than the duration then the timestamp is reseted. Else if the counter is equal to the event count the EM call the call back. Else the timestamp is also reseted if the counter is 1 (the first occurance). The difference between ordered and not-order is that in the ordered threshold, the events must occur in sequential order (i.e. event 0 must occur first, then event 1 occurs second...) as shown in Figure 6, block 680, in not-ordered threshold, any event can occur in any order or combination. Example 1: User, the client, sets the trigger parameter as: "Notify the user via SNMP Trap if 3 Bad Block Errors occur within 10 seconds time interval from Storage Device 0".
In this example, EM will monitor all Bad Block errors generated by Device 0, log the time the errors occurred and monitors to check if 3 errors occurred within the 10-second time interval. If so then it will notify the user by sending an SNMP Trap to the Management station.
Example 2: Fibre Channel Driver, the client, sets the trigger parameter as:
"Call the fcdlnit(portl) function if 5 LIP Resets are detected by the FC Driver within 15 seconds time interval from Fibre Channel Port 1".
In this example, EM will monitor all LIP Resets detected on Fibre Channel Port 1 , log the time the errors occurred and check if 5 errors occurred within the 15-second time interval. If so then it will call the function fcdInit(port 1).
The function emETR () returns a unique ID which can be used to de-register the ETR.
Figure 7 is a flowchart of steps in a method for checking whether Threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system. Starting from step 700 where all events are received in the EM a Threshold event list is created as shown in Figure 6 and placed in step 702 of Figure 7. The event thresholding program, in step 710 initiates or continues the evaluation of threshold entries. If there no more threshold entries in the list, the program exits the its evaluation process. If there is an additional threshold entry in the list then it proceeds in step 720 to compare its duration against a preset duration. If the given threshold duration is greater than a preset duration then in step 722 it resets the Timestamp and Resets the Counter and proceeds to the next step 730. If the duration is less or equal to the preset duration then it proceeds to step 730 where it is compared to the preset event for match. If it does not match then the program returns to the initial step 710 where it looks for a next entry to evaluate. If it does match in step 730 then it increments the Counter and proceeds to 740 and checks the Counter to find if it is equal to one (1). If yes it proceeds to step 742 where it resets the timestamp and proceeds to the initializing step 710 where it calls for a next entry to be tested. If the answer is No in step 740 then it proceeds to 750 where it checks to find if the counter value is greater or equal to the threshold event count. If the answer is No then it returns to step 710 to initiate testing a next entry. If the answer is Yes then it continues to step 760 where it calls the Callback function, resets the timestep, resets the counter and returns to the initializing step 710 for evaluating the next entry.
Figure 8 addresses the case when the threshold event list is ordered as shown in block 680 of Figure 6. The only difference between FIGS. 7 and 8 occur in the insertion of steps 831 and 833 between steps 830 and 832. They yes option of step 830 leads to a new step 831 where the matched index is compared to the counter. If they are not equal then the timestamp and the counter are reset and the program returns to the initializing step 810 to evaluate the next entry. If they are equal (ordered event) then the remaining steps are identical to the corresponding ones of Figure 7.
Figure 9 illustrates in detail how a client reports an event 900 to the EM. The client will call emReportEvent () 910 in EM with the following parameters inserted: client ID 920, event number 930, component ID 940 and software context 950. The software context block 950 contains File Name, Line Number and Version Number. The remaining blocks 960, 970, 980, 991, 992, 993, 994, 995 and 996 are identical in format to those in FIGS. 2 and 3. When the EM receives an Event Reporting request, it will index into the Client Table using the Client ID and find the Client Event Table. Then using the Event Number, EM will index into the Client Event Table and get the Event Element of FIGS . 4 and 6.
Figure 10 illustrates an event reporting example from the FC driver: the Al Loop Up Event. Block 1000 identifies the event from the event element. Block 1010 identifies the event element. Block 1020 identifies the relevant Correction Description Table. Block 1030 identifies the two actions that are enabled on this event as specified in the first two elements.

Claims

What is claimed is:
1. A system for early fault detection in a computer network, comprising a: an event manager; a client list; a client event table; and an event notification registration.
2. The system of claim 1 , further including an event threshold registration.
3. The system of claim 2, further including event logging and notification.
4. The system of claim 3, further including a list of actions to be taken.
PCT/US2000/006919 1999-03-15 2000-03-15 System and method of event management and early fault detection WO2000055953A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU38892/00A AU3889200A (en) 1999-03-15 2000-03-15 System and method of event management and early fault detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12449499P 1999-03-15 1999-03-15
US60/124,494 1999-03-15

Publications (1)

Publication Number Publication Date
WO2000055953A1 true WO2000055953A1 (en) 2000-09-21

Family

ID=22415207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/006919 WO2000055953A1 (en) 1999-03-15 2000-03-15 System and method of event management and early fault detection

Country Status (2)

Country Link
AU (1) AU3889200A (en)
WO (1) WO2000055953A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7546488B2 (en) 2004-07-02 2009-06-09 Seagate Technology Llc Event logging and analysis in a software system
US7546489B2 (en) 2005-01-25 2009-06-09 Seagate Technology Llc Real time event logging and analysis in a software system
US20130198573A1 (en) * 2000-07-18 2013-08-01 Apple Inc. Event logging and performance analysis system for applications

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5029170A (en) * 1989-11-30 1991-07-02 Hansen Robert G Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs
US5119377A (en) * 1989-06-16 1992-06-02 International Business Machines Corporation System and method for software error early detection and data capture
US5132972A (en) * 1989-11-29 1992-07-21 Honeywell Bull Inc. Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5432795A (en) * 1991-03-07 1995-07-11 Digital Equipment Corporation System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119377A (en) * 1989-06-16 1992-06-02 International Business Machines Corporation System and method for software error early detection and data capture
US5132972A (en) * 1989-11-29 1992-07-21 Honeywell Bull Inc. Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation
US5029170A (en) * 1989-11-30 1991-07-02 Hansen Robert G Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs
US5432795A (en) * 1991-03-07 1995-07-11 Digital Equipment Corporation System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198573A1 (en) * 2000-07-18 2013-08-01 Apple Inc. Event logging and performance analysis system for applications
US7546488B2 (en) 2004-07-02 2009-06-09 Seagate Technology Llc Event logging and analysis in a software system
US7546489B2 (en) 2005-01-25 2009-06-09 Seagate Technology Llc Real time event logging and analysis in a software system

Also Published As

Publication number Publication date
AU3889200A (en) 2000-10-04

Similar Documents

Publication Publication Date Title
US7525422B2 (en) Method and system for providing alarm reporting in a managed network services environment
US7426654B2 (en) Method and system for providing customer controlled notifications in a managed network services system
US6529784B1 (en) Method and apparatus for monitoring computer systems and alerting users of actual or potential system errors
US6289379B1 (en) Method for monitoring abnormal behavior in a computer system
US8812649B2 (en) Method and system for processing fault alarms and trouble tickets in a managed network services system
US8738760B2 (en) Method and system for providing automated data retrieval in support of fault isolation in a managed services network
US8924533B2 (en) Method and system for providing automated fault isolation in a managed services network
US8676945B2 (en) Method and system for processing fault alarms and maintenance events in a managed network services system
JP6396887B2 (en) System, method, apparatus, and non-transitory computer readable storage medium for providing mobile device support services
EP0831617B1 (en) Flexible SNMP trap mechanism
US5276529A (en) System and method for remote testing and protocol analysis of communication lines
US20040205689A1 (en) System and method for managing a component-based system
US7818283B1 (en) Service assurance automation access diagnostics
US7469287B1 (en) Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects
US20050038888A1 (en) Method of and apparatus for monitoring event logs
US20040006619A1 (en) Structure for event reporting in SNMP systems
US20040098230A1 (en) Computer network monitoring with test data analysis
CN106685744A (en) Fault elimination method, apparatus and system
WO2000055953A1 (en) System and method of event management and early fault detection
CN110521233B (en) Method for identifying interrupt, access point, method for remote configuration, system and medium
CN113810366A (en) Website uploaded file safety identification system and method
JP2003132019A (en) Hindrance-monitoring method for computer system
CN111259383A (en) Safety management center system
CN110489690B (en) Method, server, device and storage medium for monitoring government affair service application system
CN116578538B (en) Cross-platform file processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase