CA1301938C

CA1301938C - Operations controller for a fault tolerant multiple node processing system

Info

Publication number: CA1301938C
Application number: CA000564342A
Authority: CA
Inventors: Alan M. Finn; Roger M. Kieckhafer; Chris J. Walter
Original assignee: AlliedSignal Inc
Current assignee: Honeywell International Inc
Priority date: 1987-04-15
Filing date: 1988-04-15
Publication date: 1992-05-26
Anticipated expiration: 2009-05-26
Also published as: WO1988008161A1; EP0356460A4; US4805107A; US4933940A; EP0356460A1; US4914657A; US4972415A; US4980857A; US4816989A; JPH02503122A

Abstract

ABSTRACT OF THE DISCLOSURE

An operations controller (12) for a multiple node fault tolerant processing system having a transmitter (30) for transmitting inter-node messages, a plurality of receivers (32a...32n), each receiving inter-node messages from only one of the nodes and a message checker (34) for checking each received message for physical and logical errors. A fault tolerator (36) assembles all of the errors detected and decides which nodes are faulty based on the number and severity of the detected errors. A
voter (38) generates a voted value for each value which is received from the other nodes which is stored in a data memory (42) by a task communicator (44). A scheduler (40) selects the tasks to be executed by an applications processor (14) which is passed to the task communicator (44). The task communicator (44) passes the selected task and the data required for the execution of that task to the applications procesor (14) and transmits the data resulting from that task to all of the nodes in the system.
A synchronizer (46) synchronizes the operation of its own node with all of the other nodes in the system.

Description

:a3~

P~ACKGROUND OF THE INVENTION
-1. Field of the Invention The inventiun is related to the field of multiple node processing systems and in particular to an operations controller each node in the multiple node processor system for controlling the operation of its own node in a fault tolerant manner.

2 nescription of the Prior Art _ The earliest attempts to produce fault tolerant computer systems provided redundant computers in which` each computer ~ simultaneously executed every task required for the control opera-tion. Voting circuits monitoring the outputs of the multiple com-puters determined a major1ty output which was assumed to be the correct output for the system. In th~s type of system, a faulty computer may or may not be detected and the faulty computer may or may not be turned off.
The redundant computer concept9 although highly success-ful, is expensive because it requires multiple computers of equivalent capabilities. These systems require powerful computers because éach computer has to perform every task requ~red for the operation of the system. As an alternative~ the master-slave con-cept was int~oduced in which the operation of several computers ~ere controlled and coordinated by a master control. The master ,~, control des~gnated wh~ch tasks were to be executed by the indlvi-dual computers. This reduced the execution time of the control operation ~ecause all the computers were no lonyer required to execute every taskl and many of the tasks could he executed in parallel. In this type of system when a computer is detected as faulty, the master could remove it from actlve participation in the system by assigning the task that would normally have been assigned to the faulty computer to the other computers. The problem encountered in the master-slave concept is that the system ~s totally dependent upon the health of the master and if the master fails then the system fails. This defect may be rectified by using redundant master controls, however, the increased cost of redundant masters lim~ts the appllcability of these systems to situations where the user is willlng to pay for the added reliabi-lity. Typical of such situations are the controls of nuclearpower plants, space exploration and other situations where failure of the control system would endanger lives.

Recent improvements to the master-slave and redundant execut~on fault tolerant computer systems discussed ahove are exemplified in the October 1978 proceedings of the IEEE, Volume fi6, No. 10, which is dedicated to fault tolerant computer systems.
Of particular interest are the papers entitled "Pluribus: An Operatlonal Fault Tolerant Microprocessor" by D. Katuski et al., Pages 1146 - 1159 and "SIFT: ~he Design and Analysis of a Fault Tolerant Computer for Alrcra~t Contnol" by J. H. Wensley et al., Pages 1240 - 1255. The SIFT syste~ uses redundant execution of each system task and of the master control functions. The Pluribus system has a master copy of the most current information which can be lost if certain types of faults occur.
More recently a new fault tolerant multiple computer archttecture has been d~sclosed by Whites~de et al, ~n U.S~ Patent No. 4~25fi,547, in which each o~ the ind~vidual task execution nodes has an appllcations processor and an operations controller ?1~3 which functions as a master f~r its own node.

The present inventlon is an operatlons controller for a fault tolerant multiple node processing system based on the system taught by Whiteside et al in U.S. Patent No. 4,323,966 which has improved fault tolerance and control capabillties. A predecessor of this operations controller has been described by C. J. Walter et al in their paper "MAFT: A tlulticomputer Architecture for Fault-Tolerance in Real-Time Control Syste~s" published in the proceed~ngs of the ~eal-Time System Symposium, San D~ego, necember 1~) 3 ~ 6 ~ 1985 r SUMMARY OF THE I~VENTION

The invent~on i5 an operations controller for each node in a fault -tolerant multiple node processing system. Each node has an applicatlons processor for executlng a predetermined set of tasks and an operations control1er for establishing and main-taining its own node in synchronlzat~on w~th every other node in the system, for controlling the operation of its own node, and for selecting the task to be executed by its own appl~cations pro-cessor in coordination with all of the other nodes ln the system ` through the exchange on inter-no~e messages.

The operations controller has a transmitter for transmitting all of the inter-node messages generated by its own operations controller to all the other nodes in the system. The transmitter has an arb~trator for deciding the order in which the inter-node messages are to be transmitted when two or more messa-ges are ready for transmission. A plurality o~ receivers, each rece~ver associated with a respective one node and only receiving messages from that node. A message checker for checking each rece~ved message for physical and logical errors ~o generate an inter-node error report containing an error status byte iden-tifylng each detected error. The message checker po11s each of the rece~vers to unload t~e received messages in a repetitive 3L~ 3~3 sequence. A voter subsystem has a voter for voting on the content of all error free messages conta~ning the same information to generate a voted value and has a dev~ance checker for generating an inter-node error report identifylng each node which sent a message used ln the generation of the voted value whose conten~
dl~fered from the voted value by more than a prede~ermlned amount.

A fault tolerator for passing all error free messages received fro~ the message checker to the voter subsystem9 for generating an inter-node error message containing all of the error reports accumulated by all of the subsystems of its own opera-tions controller, for generating a base penalty count for each node in the system based on the number of detected errors and the severlty of the detected errors ident~fied ln such inter-node error reports, for g10ba11y verifying the base penalty count for each node through the exchangè uf inter-node base penalty count messages, and for generating a system state vector ident1fying each node whose base pena7ty count exceeds a predetermined exclu-sion threshold. A task schecluler for selecting the next task to be executed by its own applications processor from an active task 11st, for ma~ntaining a global data base on the scheduling and execution of each node through the exchange of task completed/started messages and for generatlng an error report whose scheduling process differs from the scheduling process replicated for that node, The operations control1er also has a data memory and a task communicator for storing the voted values in the data memory.
The task commun~cator further has means for pass~ng the identity of the task selected by the scheduler to the applications pro-cessor, means ~or extracting the voted values required for the execution of the selected task and passing them to the applica-tions processor~ means for generating the task completed/started messages ~dentifying the task just completed and the new task started by the appl~cat~ons processor and for generating inter-node data value messages containing the data values generated by 3L~3~3 _5_ the appl~cations processor in the execut~on of the selected tasks, The operat~Dns controller further ~ncludes a synchro-nizer for synchronizlng the operat~on of its own node with all of the other non-faulty nodes in the system through the exchange of inter-node tlme-dependent messages.
The object of the ~nvent~on is an architecture for a multiple node fault tolerant processing system based on the func-t~ona1 and physlcal partitioning of the appl1cation task and the overhead funct~ons.
Another obiect of the inventlon is a d~stributed multiple node processing system tn which no one node is required to execute every task of the applications task and in which failure of one or more nodes need no~ prevent execution of any applications task.
Another ohject of the invention is a multiple node com-puter architecture ~n which task selection and fault detection are globally ~erlfied.
Another object of the invention is a fault ~o1erant com-puter architecture ln which the exclusion or readmittance of a node ~nto the act~ve set of nodes is made on a global basis.
These and other ob~ects of the invention will become more apparent from a readtng of the speclfication in conjunction w~th the drawings.

BRIEF DES~RIPTION OF THE DRAWINGS

Ftgure 1 ~s a block d~agram of the multi-computer archi-tecture;
F19ure 2 is a block dtagram of the Operations Gontroller;
Ftgure 3 ~s the master/atomlc pertod t~ming d~agram;

B

Figurè 4 is a the atom~c/subatomic period tlming ~lagram;
. F~gure 5 ls a block dlagram of the Transm~tter;
: Figure 6 is a circuit diagram of one of the interfaces;
S . Figure 7 5s a block diagram of the Arbltrator;
- F~gure 8 shows waveforms for the Self-Test Arbitrat;on Loglc;
Figure 9 1s a block diagram o~ the Longitudinal Redundanoy Code ~enerator;
Figure 10 is a block dia~ram of a Receiver;
Figure 11 is a block diagram of the Message Checker;
F~gure lZ is a block diagram of the decision logic for the Between Limits Checker:
Figure 13 is the format for the error status hyte generated by the Message Checker;
F~gure 14 is a block diagram of the Fault To1erator;
Figure 15 shows the part1tioning of the Fault Tolerator RAM;
Figure 16 shows the format of the Message partition of the Fault ~olerator RAM;
F19ur~ 17 shows the fonmat of the Error Code Files par-tition of ~he Fault Tolerator RAM;
. Figure 18 shows the format of the Group Mappiny par-tition of the Fault Tolerator RAM;
: 25 Figure 19 shows the format of the Error Code Files par-tition of the Fault Tolerator RAM;
Figure 20 shows the fonmat of the Penalty Weight par-.

~:

13(~ 3B

tltion of the Fault Tolerator RAM;
Flgure 21 is a block diagram of the Fault Tolerator's Message Checker Interface;
Flgure 22 ls a block d~agram o~ the Fau1t Tolerator's Error Handler;
Figure 23 is a block diagram of the Error Handler's Error Consistency Checker;
Figure Z4 is a block diagram of the Error Handler's Val1dity Checker;
F~gure 25 illustrates the format of the error byte in an error message;
Figure 26 ls a tim~ng diagram of the reconf~guration sequence;
Figure 27 ls a block dlagram of the Yoter Subsystem;
F~gure 28 ls a flow diagram fur the Upper and Lower Medlal Value Sorters;
Figure 29 ls a c~rcu~t d~agram of the Lower Medial Value Sorter Figure 30 is a flow dlagram for the Averaging Circuit;
Flgure 31 is a circu~t diagram of the Averaging C1rcuit;
Figure 32 is a flow d~agram of the Dev~ance Checker;
`, Figure 33 ~s a circu1t d~agram of a Deviance Checker;
F~gure 34 is a block d~agram of the Scheduler;
F~gure 35 shows the data format of the Scheduler RAM;
Z5 Figure 36 shows the data format of the Scheduler ROM;
Figure 37 ls a block dlagram of the Scheduler's Task Selector ~odule;

F~gure 38 ~s a flow diagram of the Wake-Up Sequencer's operation;
F~gure 39 ~s a flow diagram of the Execution Timer's operation;
F~gure 40 ~s a flow d~agram of the TIC Handler's operation F~gure 41 ~s a flow diagram of the TIC Handler's Selection Queue Update sub-process;
Figure 42 ~s a flow d~agram of the TIC Handler's Completion/Termination sub-process;
Figure 43 is a flow diagram of the TIC Handler's Execution Timer Reset sub-process;
Figure 44 is a flow diagram of the TIC Handler's Pr~ority Scan L~st Update sub-process;
Figure 45 is a flow dlagram of the Priority Scanner's operation;
Figure 46 ~s a flow diagram of the Next Task Selector's operatlon;
F~gure 47 ls a block dtagram of the Reconfigure Module;
Figure 48 is a flow d~agram for the Task Swapper's operat~on in response to a Node be~ng excluded from the operating set;
F19ure 49 is a flow d~agram of the Task Swapper's opera-tion in response to a Node being readmitted to the operating set;
F~gure 50 is a flow diagram of the Task Reallooator's operation ~n response to a Node being excluded from ~he opera~ing set;
: F~gure 51 is a flow dtagram of the Task Status Matcher's operat~on;

.

9~
g F~gure 52 is a block d~agram of the Task Communicator;
F~gure 53 is a part~al block d~agram of the Task Communicator showing the elements associated with the operation of the Store Data Control;
Ftg(lre 54 is a flow d~agram of the Store Data Contro1's operation;
Figure 55 ls a partial block diagram of the Task Commun~cator showing the elements associate~ with the operation of the DID Request ~andler;
Figure 56 is a flow diagra~ of the DI~ Request Handler's openation;
Figure 57 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the Task Termlnated Recorder;
F~gure 58 is a flow dlagram of the Task Terminated Recorder's operation;
F~gure 5~ ~s a partlal block diagram of the Task Communicator show~ng the elements assoc~ated with the operation of the Task Started Recorder;
~0 F~gure 60 is a flow dlagram of the Task Started Recorder's operat~on;
Flgure 61 is a partial block diagram of ~he Task Communicator showing the ele~ents associated w~th the opera~ion of the AP ~nput Handler;
- 25 F~gure 62 is a flow d~agram 3f the AP Input Handler's operation;
Figure 63 ~s a partial block diagram o~ the Task Commun~cator show~ng the elements assoc~ated w~th the operation of the AP Output Handler: -F~gure fi4 ~s a flow d~agram showing the AP Output Handler's operation:

F~gure 65 shows the format of the DID information as stored ln the DID L~st;
F~gure 66 shows the format of the DID information with the NUDAT bit appended;
F~gure 67 ~s a partial block d~agram of the Task Communicator showing the subsystems involved tn "reconfiguration";
Figure 68 is a ~low d1agram show~ng the operation of the Reconfigure Control during reconfiguration;
Figure 69 ls a part~al block diagram of the Task Communicator showing the subsystems involved in "reset";
F~gure 70 ls a flow d~agram of the Reset Control during reset;
F~gure 71 is a block diagram of the Synchronizer;
Figure 72 shows the format of the Synchronizer Memory;
Figure 73 shows the format of the Message Memory;
F~gure 74 shows the fonmat o~ the Tlme Stamp Memory;
Figure 75 shows the format of the Scratch Pad Memory;
Flgure 76 shows the waveforns of the signals generated by the Tlming Signal Generator;
F~gure 77 1s a block d~agram of the Synchronizer Control;
Figure 78 ~s a flow d~agram show~ng the operation of the Data Handler and Expected Message Checker;
F~gure 79 ls a flow dla~ram showing the operat~on of the ~5 Within Hard Error W~ndow and Soft Error W~ndow Checker and the T~me Stamper;
Figure 80 ~s a flow d~agram For the operation of Ihe .

"HEW to warning count";
Figure 81 is a partial block dlagram of the Synchronizer showing the elements associated wlth the operation of the Message Generator;
Flgure 82 1s a flow dlagram of the operation of the Message Generator and the Transmitter Interface;
Figure 83 shows the waveforms of the timing signals for generating a TIC message;
Figure 84 shows the.waveforms of the timing signals for generatlng a sync System State message;
Figure 85 shows the format of the "cold start~ pre-sync message;
Figure 86 is a flow diagram showing the operation of the Synchronizer during a "cold start";
Flgures 87 and 87a are flow diagrams showing the genera-tion of the HEW to warntng slgnal during "cold start";
Fiyure 88 is a flow dtagram showing the storing of data during a "cold start";
Flgure 89 ls a flow d~agram showlng the operation of the Operating Condltion Detector durlng a "cold start";
Figure 90 is a tlming diagram used ln the description of the "cold start";
Figure 91 is a flow d~agram of the operation of the Synchronlzer during a "warm start";
Flgure 92 ls a timtng d~agram used ln the descriptlon of a "warm start";
Flgure 93 is a flow dtagram of the operat~on of the Byzantlne Voter to generate Byzantine voted task completed vector and ~yzantlne voted branch condition bits for the Scheduler;
:
Figure 94 ts a perspective of the Byzantine Yoter's three-d~menslonal memory;
Figure 95 shows the two-dimensional format o~ ISW vec-tors resulting from the ~1rst nyzantine vote on the three-dimensional ISW mætr~ces an~
Figure 96 is a functlonal circult diagram of the Byzantine Yoter.

DETAILED DES~RIPTION OF THE PREFERRED EMBODIMENT
:
The multi-computer architecture for fault tolerance is a distributed mult~-computer system based on the functional and phy-; sical partitioning of the appl~catlon tasks and the overhead func-t~bns, such as fault tolerance and systems operations. As shown ~n Figure 1, the multi-computer architecture consists of a plura-lity of Nodes 10A through lON, each havlng an Operations Controller 12 for performlng the overhead functions and an Applications Processor 14 for executing the application tasks.

For each appl~cation, the mult~-computer architecture ls requlred to execute a predetermined set of tasksl col1ectively called appllcation tasks. Each Node is allocated an active task ` set which is a subset of the application tasks. Each Node in coord~nation w1th all o~ the other Nodes is capable of selecting ' tasks from ~ts active task set and executlng them in a proper sequence. The aetlve task set for each Node may be d~fferent from the active task set allocated to the other Nodes and each task in the appllcatlon tasks may be ~ncluded in the act~e task set of two or more Nodes depend~ng upon how many Nodes are in the system and the importance of the task to the particular applicat~on. In thls way, the mult~-compu~er architecture defines a d1s~ributed :
;~

;~ L~3 multi-computer system in which no one Node 10 is required to exe-cute every one of the appllcation tasks, yet the failure of one or more Nodes reed not prevent the executi~n of any application task.
As shall be more fully explained later on, the active task set in each Node is`static for any glven system configuration or system state and will change as the system state changes with an increase or decrease in the number of active Nodes. This change in the active task set cal1ed "reconf~gurationp!' takes place automati-cally and assures that every one of the important or cr~tical appllcation tasks will be included ~n the active task set of at least one of the remaining active Nodes in the system.
Each Node 10A through 10N ;s connected to every other Node in the multi-computer architecture through its Operational Controller 12 by means of a p~ivate communication link 16. For example, the Operations Controller "A" is the only Operations Controller capable of transmitting an communication link 16a. All of the other ~odes are connected to the communication link lha and will recelve every message transmitted by the Operations ~ontroller "A" over communlcatlon link 16a. In a like manner, the ~0 Operations Controller "B" of Node 10B ~s the only Operations Controller capable of transmitting messages on communication link 16b, and Operations Controller N of the Node ION is the only Operations Controller capable of transmitting messages on com-munication link 16n.
External informatlon from sensors an~ manually operated devices collectively ident~fied as Input Devices 20 are transmitted directly to the Applications Processors 14 of each Node through an~input line 18, It i5 not necessary that every Applications Processor receive information from every sensor and/or Input Devlce, however~ each flpplications Processor 14 will receive the lnfonmat~on from every sensor and/or Input DeYice which it reeds in the execution of the applications task.

In a like manner, the Appiications Processor 14 in each ~ode w~ll transmlt data and control signals~ resulting from the execution of the applications task to one or more actuators and/or 3~3 display devices collect~vely identified as Output Device5 22.
The data antl/or contro1 signals generated by the Appl~cations Processor 14 in the indiv~ual Nodes lOA through ION may be com-hlned by a Comb~ner/Voter Network 24 before ~t is transmitted to ~he Outp~t Dev;ces 22. Further, when multiplr values o~ the same ~ata and/or control signals are generated by two or more of the Nodes, the Combiner/Yoter ~etwork 24 ~ay also be used to generate a s~ngle voted value whlch is transm1tted to the Output Devices 22. The use or omisslon of a Combiner/Voter Network 24 is optional. It ~s not necessary that every actuator or display rece~ve the output generated by every No~e in the system. The specif1c actuator or dlsplay only needs to be connected to the Node or Nodes whose Applications Processor 14 is capab1e of generatlng the data or command signals it requires.

The network of Operat~ons Controllers 12 is the heart of the system and ~5 responsible for the lnter-node communications, system synchronlzation~ ~ata voting, error detection, error handl~ng, task schedul~ng, and reconfigurat10n. The Applications Processors 14 are responslble for the executlon of the app'lication tasks and for commun~cations with the Input Devlces 20 and Output Devices 22. In the mult~-computer architecture, the overhead functions performed by the Operatlons Controllers 1~ are ~ransparent to the operatlons of the Appl~cations Processor 14.
Therefore, the structure of the Applications Processor 14 may he ~5 based solely upon the applicatlon reguire~ents. Because of this, dissimilar Applications Processor 14 may be used ln d~fferent Nodes without destroy~ng the symmetry of the multi-computer archi-tecture.

`~ The structural details of the Operations Controller 12 ~n each Node 1OA through lOH are shown in F~gure 2. Each Operations Control1er 12 has a transmitter 30 for serially transmitting messages on the Node's private communlcation link 16.
For discussion purposes, ~t w~ll be assumed that the Operations Controller illustrated in Figure 2 ls the Operations Controller A as shown in Figure 1. In th~s case3 the Transmitter 30 will transm~t messages on the private oommunication l~nk 16a. Each Operations Controller a1so has a plural~ty of Receivers 32a through 32nS each of which ls connected to a d~fferent pr~vate communicatlon l~nk. In the preferred embod~ment, the number of Recelvers 32a through 32n is equal to the number of Nodes tn the multi-computer arch~tecture. In this way9 each Operations Controller 1~ wlll receive all of the messages transm~tted by every Node in the system ~nclud~ng its own, Each Receiver 32a through 32n W11! convert each message received over the private communlcatlon llnk to which it 1s connected from a serial format to a parallel format ehen forward it to a Message Checker 34. Each Receiver 32a through 3~n will also check the vertical parity and the longitudinal redundancy codes appended to each of the received messages and will generate an error signal identifying any errors detected.

The Message Checker 34 monitors the Receivers 3~a through 32n and subjects each rece~ved message,to a variety of phys~cal and log~ca1 checks. After completion of these physical and log~cal checks, the messages are sent to a Fault Tolerator 36.
Upon the detectlon of any errors in any message~ the Message Ohecker 34 wlll generate an error status byte which is also transmitted to the Fault Tolerator 3fi.

The Fault Tolerator 36 performs five~basic functions.
Flrst, the Fault Tolerator performs further logical checks on the messayes ,received from the Message Checker 34 to detéct certain other errors that were not capable of being detected by the Message Checker 34. Second9 the Fau7t Tolerator passes error free messages to a Voter 38 which votes on the content ~f a11 messages containing the same informat~on to generate a voted value. Third, ~t passes selected fields from the error free messages to other subsystems as required. Fourth~ the Fault Tolerator aggregates the internal error reports from the varlous error detection mecha-nisms in the Operations Controller and generates Error messages which are transm~tted to all of the other Nodes ln the system by the Transmltter 30. Flnally9 the Fault Tolerator 36 mon~tors the health status of each ~ode ~n the system and w~ll initlate a loca1 35 reconflguration when a Node ~s added or excluded from the current ~3r~

number of opera~ing Nodes. The Fault Tolerator 36 ma~ntains a base penalty count table whlch stores the current base penalty counts accumulated for each Node in the system. Each time a Node transm~ts a message contalning an error, every Node in the system, lncluding the one that generated the message, should detect this error and generate an Error message ldentifying the Node that sent the message containing the error, the type of error detected, and a penalty count for the detected error or erros.
Each Fau1t Tolerator 36 ~ill receive these Error messages from every other Node and will increment the base penalty count for that Node which is currently being stored in the base penalty count table, if the detection of the error is supported by Error messages received from a majority of the Nodes. The magnitude of the penalty count increment is predetermined and is proportional to the severlty of the error. If the incremented base penalty count exceeds an exclusion threshold9 as shall be discussed later, the Fault Tolerator initiates a Node exclusion and a recon-flguration process in whlch the faulty Node is excluded from active particlpation in the system and the active task sets for the remaining Nodes are changed to accommodate for the reduction in the number of active Nodes.

The Fault Tolerator 36 will also perlodically decrement the base penalty count for each Node in the system so that a Node which was previously excluded may be readmitted into the acti~e system. When a preYlously excluded Node cont~nues to operate in an error free manner for a suffic~ent period of time, ~ts base penalty count will be decremen~ed below a readmlttance threshold which will initiate a Node readmlt-tance and reconfiguration pro-- ~ cess in wh~ch the prevlously excluded Node ~s readmitted into the active system. When the prev~ously excluded Node is readmitted into the system the actlve task set for each Node is readjusted to accommodate for the increase in the number of active Nodes in the system.

The Voter 38 per~orms an "on-the-fly" vote using all of the current copies o~ the data values receiYed from the Fault f~ 3 -17-.

Tolerator ~6. The voted data value and all coples of the received data are passed to a Task Communicator 44 which stores them in a Data Memory 42. The Voter wlll select a voted data value using an appropriate algorithm as shall be discussed relative to the Voter 38 ~tself. Each time a new copy of a data value is received, a new voted data value ls generated which is written over the prior vote~ data value stored ~n the Data Memory 42. In this manner, the Data Memory 42 always stores the most current voted data value assuring tha~ a voted ~ata ~alue ~s always available ~or sub-sequent processing indepen~ent of one or more copies of the datavalue failing to be generated or "hang" causlng a late arrival.
The Voter 38 wlll also perform a devlance check hetween the voted data value and each copy oF the received data value, and will generate an error vector to the Fault Tolerator identifying each Node which generated a data value wh1ch differed ~rom the voted data value by more than a predeterm~ned amount. This arrangement will support both exact and approximate agreement between the copies of the data values. The Yoter 38 suppor~s seYeral data types, 1ncluded pack boolean values, f1xed point formats, and the IEEE standard 32-bit float~ng polnt fonmat.

A Scheduler 40 has two modes of operation, norma1 and reconf1guration. In the nonmal mode of operatlon the Scheduler 40 is an event driven, prior~ty based, globally verified scheduling system which selccts from its actlve task set the next task to be ~5 executed by 1ts assoclated Applicat~ons Processor 14. For a given system configurat~on (set of act~ve Nodes) the active task set assigned to each Node ls s~atic. Each t~me the associated Appl~-cation Processor begins a task~ the Scheduler 40 selects the next task to be executed. The Applications Processor w711 innnediately begin the execution of the selected task and the Task Co~municator 44 w~ mmediately 1n~t~ate the generat~on of a message informîng all of the other Nodes of the identity of the selected task, the ~dentity of the preced~ng task f~nished by the Applications Processor 14, and the branch cond~t~on~ of the preceding task.
Condtt~onal branch~ng ~s controlled by the Applications Processor ~3~

14 and is determ~ned by cond~t~ons in the applications environ-ment The precedence relationsh~p between a task and its suc-cessor task may lnclude condltlonal branches, concurrent forks, and join operattons implemented at task boundaries.

Condlt~onal branching prov~des an efficient means of swltching operat1Onal modes and avoids the necessi~y of scheduling tasks not requ~red by the current conditions. An interactivé con-sistency voting process guarantees agreement on the branch con-ditions generated by the other Nodes wh~ch executed the same task.
I0 The Scheduler 40 in each Node replicates ~he scheduling process for every other Node 1n the system and ma1ntalns a global data base on the schedul~ng and execut~on of tasks by each Node.
Upon the recelpt of a message from another Node identifying the task completed and the task started, the Scheduler 40 w~ll compare the task completed w~th the task previously reported as s~arted and generate a scheduling error s~gnal if they are not the same.
The Scheduler 40 w~11 also compare the task reported as started w~th a task it has scheduled to be started by that Node. If they are different, the Scheduler will also generate a scheduling error s~gnal. The Scheduler 40 will pass all scheduling error signals to the Fa~lt Tolerator 3fi. All o~ the Scheduler's error detection mechan~sms are globally ver~f~ed and have been designed to ensure that failure of one or more cop~es of a task does not upset sche-dul~ng.

In the reconf~gurat~on mode of operation, a reversible path independent reconfiguratlon algorithm provides graceful degradation of the workload as faulty Nodes are excluded from the operating system. Because the algor~thm is reversible it also supports graceful restoratlon of the workload as previously excluded Nodes are readm~tted follow~ng an extended period of error free operation.

In reconflgurat1On, the active task set allocated to each Node ~s altered to compensate for the change in the number of f~

act~ve Nodes. nur~ng reconfigurat~on after the exclus~on of a faulty Node, the actlve task set, or at least the critical task of a faulty Node's act~ve task set, may be reallocated and included in the active task set of the other Nodes. In other instances, ind~vldual tasks may be globally disabled and replaced wlth simpler tasks9 and some noncritical tasks may be disabled wlth no replacement. The reconfiguration process readjusts the active task set for the act~ve Nodes to accommodate the system capah~
ties. The algorithm supports erue distrlbuted processingl rather than just a repllcation of uniprocessor task loads on redundant Nodes.

A Task Communicator 44 functions as an input/output (I/O) inter~ace between the Operations Controller 12 and the Applications Processor 14. The Appl~cations Processor 14 signals the Task Communicator ~4 when ~t ls ready for the next task. A
simple handshaking protocol is employed to synchronize com-municat~ons between the Applications Processor 14 and the Task Communlcator 44~ Upon rece~pt of this signal the Task Com-mun~cator 44 reads the selected task from the Scheduler 40 and transfers it to the Applications Processor 14. Concurrently, the Task Commun~cator 44 wlll 1n~tlate the transmisslon of the task completed/task started message identlfying the task completed by the Applicat~ons Processor 14, the task be~ng started by the Appllcatlons Processor and the branch cond~tlons of the completed task. The Task Commun~cator 44 w~ll then fetch the data required for the execut~on of the started task from the Data Memory 47 and temporarily store ~t ~n a buffer in the order in which it is requ1red for the execut~on of the started task. The Task Communicator w~ll pass these data values to the Appl~cations Processor as they are requested. Effectlvely, the Task Commun~cator 44 looks like an lnput file to the Appllcatlons Process~r 14r The Task Commun1cator 44 also rece~ves the data values generated by the Appl1cat~ons Processor 14 ln the execution of the selected task and generates Data Value messages whlch are B
-2~-broa-lcast by the Transmitter 30 to all of the other Nodes in the system. The Task Communicator w~11 also append to the Data Value message a data identification (DID) code and a message type (MT) code which uniquely ident~fies the message as a Data Value message S The Synchronizer 46 provides two independent functions in the operation of the mult~-computer architecture. The f~rst function pertains to the synchron~zat~on of the operation of the Nodes lOA through 10M during steady state operation, the second function pertains to the synohronlzation of the Nodes on start up.
During steady state operatlon, the Synchronizer 46 effects a loose ~rame base synchronization of the Nodes by the exchange of messa-ges whloh implicltly denote local clock tlmes. The Synchronizer ~6 in each Node counts at its own clock rate, up to a "nominal sync count," then issues a presynchronization System State message which ls immed~ate1y broadcast by the Transmitter 30 to all of the other Nodes in the system. As the presynchronlzation System State messages from all the Nodes in the system~ includ~ng its own, are received at each Node9 they are time stamped in the Synchronlzer as to their time of arrival from the Message Checker 34. The time stamps are voted on to determlne a voted value ~or the arrival time of the presynchrontzat~on System State messages from all the Nodes, The difference between the voted tlme stamp value and the t~me stamp of the Node's own presynchronizati~n System State message is an error estimate which is used to compute a corrected sync count. The error estimate includes any accumulated skew from previous synchron~zatlon rounds and the effects of clock drift.
The Synchron~zer 46 will then count up to the corrected sync count and issue a synchronlzation System State message which i5 imme-dlate1y transm~tted by the Transmlt~er 30 ~o all of the other Nodes 1n the system. The synchron~zation System St~te messages w~ll also be time stamped as to their arrival in the Synchronizers in each Node in the system.

The time stamps of all presynchronizat~on and synchro~
ni~ation System State messages are all compared with the voted time stamp value to determine wh~ch Nodes are in synchronization l~r~

with its own Node and wh~ch are not. When the difference in the time stamps exceeds a first magn~tude a soft error signal is generated s19n~fying a potential synchronization error. However, if the time stamp dtfference exceeds a second magnitude, larger S than the f1rst magn~tude, a hard error signal is generated slgnifying a synchron~zation error has definitely occurred. The soft and hard error slgnals are trans~itted to the Fault Tolerator 36 and are handled ~n the same manner as any other detected error.
Start up ~s defined as a process ~or creat1ng a functional con-figurat~on of Nodes called an "operating set." If an "operatingset" ~s in existence, and the functional conftguratlon is changed by the admittance or readmittance of one or more Nodes, the pro~
cess is called a "warm start." If no "operating set" is in ex1stence, ~ is called a "cold start." In a warm start~ the Synchronizer 46 will recognlze the exlstence of an operating set and will attempt to achieve synchronlzation with the operating set. A cold start is lnitiated by a power on reset (P~REST) signal generated in response to the initial application of electrical power to the systemO Each Synchronizer 46 will attempt to achleve po~nt-to-point synchron~zation with all the Nodes until an operatlng set ls fonmed. Once an operating set ls formeds those Nodes not included in the operating set will switch to the warm start process and will attempt to achieve synchronlzation with the operating set, INTFR~-NOnE MESSAGES

The operation of the multi computer architecture depends upon the exchange of data and operational informati~n by the exchange of inter-node messages. These inter-node ~esssaages are data-flow ~nstructions wh~ch ind1cate to each individual Operat~ons Controller ho~ ~t should be processed.
The var~ous inter-node ~essages and their information content are listed on Table 1.

TABLE I

. Inter-Node Message Formats Message Typ~ Description/ Byte Context Number Abb rev i a ti on Number _ _ _ _ ~
MTO One Byte 1 NID/Message Type Data Value 2 Data ID
: . 3 Data ~alue . 4 B7ock Check MTl Two Byte 1 ~ID/Message Type Data Value 2 . Data I.D.
. 3-4 Data Value Block Check _ MTl ~ask Interactive 1 NID/Message Type Cons~stency tTIC) 2 Data I.D. = O
. 3 Task Compl eted Vector 4 Task Branch Cond~tlon Bits Block Check : 20 MT2 Four Byte 1 NID/Message Type Data Value 2 Data l.D.
(D4B) 3-6 Data Y~lue 7 Block Check . _ _ ~T3 Four Byte I NID/Message Type : 25 Data Value 2 Data I.D.
(D4B2) 3^6 Data Value : 7 Block Check ~: :

. .

3~3 TABLE I
.
MT4 Base Penalty 1 NI~/Message Type Count 2 Base Count 0 tBPC) 3 Base Count 1 4 Base Count 2 ~ase Count 3 6 Base Count 4 7 Base Count 5 8 Base Count 6 . . 9 Base Count 7 . _. _ _ _ _ 10 Block Check MT5 S,ystem State 1 NID/Message Type (SS) 2 Function Bits 3 Task Completed lS Vector 4 Task Branch Condition 8its Current System 6 New System State 7 Period Counter (H~gh) . 8 Period Counter (Low) 9 ISW Byte Reserved 11 Block Check _ MT6 Task Completed/ 1 ~ID/Message Type Started 2 Completed Tflsk IC
(TC/S) 3 Started Task ID
4 8ranch Condition/
ECC
. . .5 Clock Check MT7 Error 1 NID/Message Type (ERR) 2 Faulty Node ID
3 Error Byte 1 4 Error Byte 2 Error Byte 3 6 Error Byte 4 7 Penalty Base Count 8 Penalty Increment Count : . _ 9 ~lock Check -~4-The inter-node messages all have the same bastc format so as to stmplify thelr handltng in the receiving mode. The first byte of each inter-node message contains the Node ident~fication (NID) code o~ the Node from which the message originated and a message type (MT) code identifying the message type~ The last byte in each inter-node message is always a block check byte which is checked hy the Receivers 32a through 32n to detect transmission errors.

There are four dlfferent Data Value messages which range from a one byte Data Value message to a four byte Data Yalue message. These nata Value messages are identified as message types MT0 through MT4. The second byte of a Data Value message is a data identification (DID) code which when combined with the message type code un1quely ~denttftes that particular data value from other data values used In the system. The data ~den-tiftcatton (DID) code is used by the Message Checker 34 to define the types of checks that are to be performed. The MT/DID codes are used to identify which limtts will be used by the Message Checker 34 and the devtance to be used by the Voter 38 to def~ne the perm~ss~ble deviance of each actual data value from the voted values and by the Task Communicator 44 to identtfy the data vaiue to be suppl~ed to the Appllcations Processor 14 ~n the execution of the current task. The bytes followtng the data identification byte are the data values themselves wlth the last hyte being the block check byte as previously indicated.

A Task Interactive Consistency (TIC) mess~ge is a spe-cial case of the two byte Data Yalue ~essage which is identified ; `~ by the DID being set to zero ~0). The Task Interactive Consistency message, message type MTl, is a rebroadcast o~ the task completed vector and branch condition data contalned tn Task Completed/Started (CS~ messages received from the other Nodes and are transmttted at the end of each Subatomtc per~od (SAP), as shall be expla~ned in the discussion of the timing sequence. The tnformatton content of the Task InteractiYe Consistency messages are voted on by each Node and the voted values are used by the -25~ 33~

Schedu1er 40 ~n the task selection and scheduling process.

A Base Penalty Count (BPC~ message, message type MT4, conta~ns the base penalty count that the individllal Node is storing ~or each Node ir the system including itself. Each Node will use thls lnfonmation to ~enerate a voted base penalty count for each Node ~n the systemO Thereafter, each Node wlll store the voted base penalty count as the current base penalty count ~or each Node, This assures that at the beg~nning of each Master period each Node 1s storing the same numher of base penalty counts for every other`Node in the system~ The Base Penalty Count message is transmltted by each Node at the beginning of each Master period timing lnterval.

A System State (SS) messageS message type MT5, is sent at the end of each Atomlc period timing interval and is used for the point-to-point synchronization of the Nodes and to globally affirm reconfigurat~on when a majority of the Nodes conclude that reconf~guratton is require~. The transmission of the System State message is timed so that the end of 1ts transmission coincides with the en~ of the preced1ng Atom~c perlod and the beginning of 2n the next Atomic perlod. The first byte of the System State message contains the node ldent1ficat~on (NID) code of the origi-nating Node and the message type (MT) code. The second hyte con-tains three functlon bits, the f1rst two bits are the synchronizat10n and presynchronization b~ts which are used in the Synchron1zation process descr1bed above. The third bit identifies whether or not the Node is operatlng or excluded. The third and fourth bytes of the System State message are the task completed vector and the branch condition vector, respective1y. Byte five contains the current system state vector and byte six contains the the new system state vector. When the send1ng Node has concluded reconf1guration 1s necessary, the new system state vector w~ll be d1f~erent from the current sta~e vector. Byte seven and eight conta1n the higher and lower order of b1ts o~ the Node's own per~od counter. Byte n~ne ls an "1n sync w~th" (ISW1 vector which defines wh~ch Nodes that particular Node determines it is synchro-nized with, and byte ten is reserved for future use. Byte eleven is the convent~onal block check byte at the end of the message.
The Synchronizer uses the t1me stamp of the pre-synchronization System State messages, identified by the pre-synchronization ~it in the second byte belng set to generate an error estimate used to compute a correction to the tlme duration of the last Subatomic per~odD Thls correction synchronizes the beginning of the next Ato~ic per~od in that Node with the Atom1c period being generated by the other Nodes. The period counter bytes are used to align the Master periods of all the Nodes ~n the sys~em, The period counter counts the number of Atomic periods from the beginning of each period and is reset when it counts up to the fixed number of Atomic periods in each Master period. Byte nine is used only during an automatic cold start as shall also be explained in more detailed in the discussion of the Synchronizer 46.

The Task Completed/Started ~TC/S) message, message type MT6, is generated by the Task Communicator 44 each time the Appllcatlons Processor 14 starts a new task~ The second and thir~
bytes of the Task Completed/Started message contain the task iden-tiflcat~on (TID) codes of the task completed ~nd new task started~y the Node's Appl kations Processor 14. The fourth byte of this message contains the branch condition of the co~pleted task, ard an error correction code (ECC).

The last ~nter~node message ls the Error message, message type MT7, whlch is sent whenever the Transm~tter 30 is free durlng an Atomlc per1Od~ Only one error message reporting the errors attr~buted to a particular Node can be sent in an Atomic period. The second byte of the Errvr message is the Node identiflcatlon (NID) code of the Node-accused of being faulty.
The following four bytes contain error flags identlfy~ng each error detected. The seventh and eighth bytes of the error message conta~n the base penalty count of the ldent~fied Node and ~he increment penalty count wh~ch is to be added to the base .

1 ~1 ?~1~3~

penalty count ~f the errors are supported by Error messages recelved from other Nodes. The increment penalty count is based on the numher of errors detected and the severity of these errors.
This infonmat~on 1s used by the other Nodes ~o generate a new voted base penalty count for the Node ident1fied in the Error message. A separate Error message is sent for each Node which generates a message having a detected error~
TI~IN~ PERIOnS

The overall control system of the multi-computer archi tecture contains a number of concurrently operat1ng control loops with d~fferent time cycles. The system imposes the constraint that each cycle t~me be an integer power of two times a fundamen tal time interval called an Aeom k period~ Th~s greatly simpli-fies the implementation of the Operations Controller 12 and facil1tates the verification of correct task schedul~ng. The length of the Atomic period is selected within broad l~mits by the system designer for each particular appllcatinn. The System State messages, which are used for synchronization are sent,at the end of each Atomic period.

The longest control loop employed by the system is the Master period. Each Master period contains a fixed n~mber of Ato-mic periods, as shown in Figure 3. ,All task scheduling parameters are reinit~al~zed at the beginning of each Master period to pre-vent the propaga~ion of any ~chedullng errors. The Nodes will , 25 also exchange Base Penalty Count messages ~mmed1ately following the beg~nn~ng of each Master per~od.

The shortest time period used in the system is the Subatom1c (SAP3 period, as shown in Flgure 49 wh~ch defines the shortest execution t~me recognized by the Operat~ons Controller 12 for any one task. For example, ~f the execution t~me of a task is less than a Subatomlc per~od, the Operat~ons Controller 12 will not forward the next scheduled task to the Appl kations Processor 3~ 3 14 until the ~eg~nning of the next Subatomic period. However, when the execution time of a task is longer than a Subatom~c period, the Operat~ons Controller 14 will forward the next sche-duled task to the Applications Processor as soon as it is ready for it. There are an integer number of Subatomic periods in each Atomic per~od which are selectable by the systems designer to customize the multi-compu~er architecture to the particular appli-cation. As shown in F~gure 49 each Subatomic period is delineated by a Task Interactive Consistency message as prev~ously described.

TRANSMITTER

Figure 5 is a block diagram of the Transmitter 30 embod~ed in each of the Operations,Controllers 12. The Transmitter 30 has three interfaces, a Synchronizer Interface 50 receiving Task Interact~ve Consistency messages and System State messages generated by the Synchronizer 46, a Fault Tolerator Interface 52 receiving the Error and Base Penalty Count messages generated by the Fault Tolerator 3fi, and a Task Communicator Interface 54 recelvlng Data Yalue and Completed~Started messages generated by the Task Communicator 44. The three interfaces are connected to a Message Arbitrator 56 and a Long1tudinal Redundancy Code Generator 5~. The Message Arbitrator 56 determines the order in which the messages ready for transmisslon are to be sent. The Longitudlna1 Redundancy Code Generator 58 generates a longitudinal , redundancy code byte which is appended as the last byte to each ~5 transmitted message. The message bytes are indlvidually trans-ferred to a Parallel-to-Ser~al Converter 60 where they are framed between a start bl~ and two stop bits, then transmitted in a serlal format ~n communication link lh~

The Transmltter 30 also includes a Self-Test Interface 62 wh~ch upon comm~nd retrieves a predetermlned sel~-test message from an external ROM (not shown) which i~ ~nput ~nto the Long~tudlnal Redundancy Code Generator 5~ and transmltted to the -29- 3L~3~'~3L~

communication llnk by the Parallel-to-Serial Converter 60. ~he Transmitter 30 a7so has an In~tial Parameter Load Module 64 which will load into the Tnansm~tter var10us predetermined parameters, such as the length of the m~n~mum synchronization period between messages, the length of a warning perlod for Interactive Cons~stency and System State messages and the starting address in the ROM where the self-test messages are stored.

As shown in F~gure 6, each of the three interfaces has an elght bit ~nput register 66 which rece~ves the messages to be transmitted from its associated message source through a mult1ple~er 68. The mult~plexer 68 also receives the three hit Node ident~f~cation (NID) code which identifies the Node which is generating the message.

Whenever the associated message source has a message to be transmltted, ~t will hold the message unt~l a buffer available signai is present signify~ng the input reg~ster 66 is empty. The message source w~ll then transm~t the first byte of the message to the input reg~ster 66. A b1t counter 70 w~ll count the strobe pulses clock~ng the message ~nto the Input Register 66 an~ will in coordlnatlon w~th a flip flop 72 and an AND gate 74 actuate the mult~p'lexer 68 to clock the three b~t Node ldentiflcation code into the Input Reg~ster 66 as the last three most significant bits of the f~rst byte. The fl~p flop 72 ~s responslve to the signal "transm~t qu~et per~od" (TQP) generated at the end of its pre-ced~ng message to generate a flrst byte slgnal at its q outputwh1ch enables AND gates 74 and 76. The A~D gate 74 wîll transmit the three most slgnificant bits generated by the bit counter 70 in response to the s~robe s~gnals load~ng the f~rst byte into the nput reg~ster 66 and w~ll actuate ~he mult~plexer 68 to load the three blt Node ~dent~f~cation code into the three most significant b~t places of the input reg~ster 66.

The AND gate 76 w1ll respond to the loadlng of the e~ghth h~t into ~nput reg~ster 66 and will generate an output wh~ch w~ll actuate the flip flop 78 to a set state. In the set state, the ~lip flop 78 w1ll generate a message available signal at its Q output and wlll term~nate the buffer available s~gnal at ~ts Q output. The message available (MA) signal will reset the flip flop 72 terminat~ng the first byte signal wh1ch ir turn d~sables the AND gates 74 and 76. The message a~a~lable (MA) 5 signal is also transmitted to the Message Arbitrator 56 signifying a message is rea~y for transm~sslon.

Tenm~nation of the buffer avallable (BA) signal when the flip flop 78 is put in the set state inh~bits the message source from transmltting the rema~ning bytes of the message to the Transmitter 30. The flrst three least significant of bits of the first bytes, which are the message type code, are communicated directly to the Message Arbltrator 56 and are used in the arbitra-tion process to detenm~ne which message 1s to he sent if more than one message is available for transmisslon or tf the send~ng of that message wlll not interfere with ~he transmlssion of a time critical message generated by the Synchronizer 46.

The Message Arbitrator 56 will generate a transmit (Txxx) signal identifying the next message to be sent when there is more ~han one message ready for transmission~ This message w711 actuate the Long~tudlnal Redundancy Code Generator 58 to pass the selected message to the Parallel-to-Serial Convèrter for transmission. The transmlt signal will also reset the flip flop 78 ln the appropriate interface whlch reasserts the buf~er available (~A) slgnal, actuating the associated message source tn transmit the remain~ng bytes of the message to the interface.
These are then transmitted d~rectly to the Longltudlnal Redundanc-y Code Generator 58 as they are received. When all of ~he bytes of the message are transmitted, ~he Message Arbltrator 56 wlll generate a transmit qu~et period (TQP) signal which actuates the Parallel-to-Seri~l Converter to transm~t a null (synchronization) signal ~or a predetermined per~od of time follow~ng the trans-miss~on of each message. In the preferred embodlment, the quiet period is a time requlred for the transm~sslon of 24 bits or two (2) null bytes. The transmlt qu~et per~od (TQP) signal will also set the fllp flop 72 ~nd~cat~ng that the preced~ng message has -31- ~L~ 3~ 3~

been sent and that the next byte received from the associated message source will be the first byte o~ the next message.

The details of the Message Arbitrator 56 are shown on Figure 7. Under normal operation when no critical time messages, such as Task Interactlve Consistency (TIC) and System State (SS) messages, are to be sent, a Fault Tolerator (FLT) Task Com-municator (TSC) Arbjtrat~on Log1c 82 w~ll generate, in an alter-nating manner, PFLT and a PTSC polling slgnals wh~ch are received at the inputs of AND gates 84 and 8fi, respect~vely. The AND gate lD 84 will also receive the Fault Tolerator Message Availahle (FLT~A~
signal generated by the Fault Tolerator Interface 52 while AND
gate 86 wt11 receive a Task Communicator message available (TSCMA) s~gnal generated by the lask Communicator Interface 54 after the Task Communicator 44 has completed the loading of the f~rst byte of the message ready for transm~ssion. The outputs of the AND
gates 84 and 8fi are transmlt Fault Tolerator (TFIT) and transmit Task Con~un~cator (TSC) signals which are appl~ed to AND Gates 88 and 90, respectively. The alternate lnputs to AND gates 88 and 90 are received from a Time Remainlng-Message Length Comparator 92 which produces an enabling signal whenever the transmiss~on of the selected message will not interfere w~th the transmission of a time dependent message as shall be explained hereinafter. If the AND gate 88 is enabled it will pass the transmit Fault Tolerator 1TFLT) slgnal to the Fault Tolerator Interface 52 to reassert the buffer available signal9 enabling it to receive the remaining bytes of the message from the Fault Tolerator 36 and to the Longitudinal Redundancy Code Generator 58 enabling it to pass the message3 byte-by-byte from the Fault Tolerator Inter~ace 52 to the Parallel-to-Serial Converter 60 for transmiss~on on the com-municat~on link 16. In a like manner, when the AND gate 90 isenabled, and the polling of the Task Co~municator Interface 54 lnd~cates that the Task Communicator 44 has a message ready for transmission; then the AN~ gate 86 wil1 generate a transmit Task Communicator (TTSC) signal whjch, if passed by the AND gate 90, w~11 result in the transmission of the Task Communicator's ~t ~3~3 message. The TFLT and the TTSC signals, when generated, are fed back to ~ock the FLT - TSC Arb~tration Log1c 82 in its current state until after the message is sent.

The message arb~tratlon between the Fault Tolerator's 5 and Task Communicator's messages ls primarlly dependcnt upon the type of the message currently belng transmltted, The logic per-fonmed by the FLT-TSC Arbltratlon Logic 82 is summar~zed on Tab7e II .

TABLE II
FLT-TSC Abitration Loglc Tahle _ - Poll Next Then Polf Next Then Current Message _Alternate _ ~ait for Message Fault ToleratorTask Communicator Task CommunicatorFaul t Tolerator , System State (Mas te r Per i od ) . Fault Tolerator System State (Atom~c Perlod~Task Communicator . , Interact~Ye Conslstency _Task Communicator Self Test Task Commun k ator Nonmally the FLT-TSC Arbltration Logic 82 will po11 the Fault Tolerator Interface 52 and the Task Communicator Interface 54 in an alternating sequence. However, at the beginning of each Atomic period, the FLT-TSC Ar~ltratlon Loglc 82 will first poll the Task Communicator Interface 54 for a Task Completed/Started message which wlll identlfy ~he task belng started by that Node.

If the Task Completed/Started message is not available it w111 then poll the ~ault Tolerator Interface 52.

At the beg1nn~ng of each Master perlod, all of the Nodes should transmlt a Base Penalty Count message which ~s used for global ver~fication of the health of each Node in the system.
Therefore, after each System State message ~hich is co~ncident wlth the beginn~ng of a Master period, the FLT-TSC Arb~tration Logic w~ll first poll the Fault Tolerator Interface 52 and wait until ie rece~ves the Base Penalty Count message from the Fault Tolerator 36. After the transmlssion of the Base Penalty Count message, 1t will then poll the Task Commun~cator Interface 54 and transm~t a Task Completed/Started message identlfying the task scheduled to be started by the Applications Ptocessor. If the Fault Tolerator 36 does not generate a Base Pena1ty Count message with~n a predeterm~ned period of time, the FLT-TSC Arbitration Logic 82 wlll resume poll~ng of the Fau7t Tolerator Interface 52 and the Task Communicator Interface 54 ~n an alternating sequence. In a like manner, after a self-test message, the FLT rsc Arb~trat~on Log~c 82 will poll the Task Communicator Inter~ace 54 and wa~t for a Task Completed/Started message.
The Synchron1zer 46 will load the f~rst byte of either a Task Interact1ve Cons~stency or System State message ~n the Synchronizer Interface 50 a predetermined period of time before the beginnlng of the next Subatomic or Atomic periods. A Warning Period Generator 94 will load a warning period counter with a number correspondlng to the number of b~ts tha~ are capable of being transm~tted before the Task Interactlve Consistency or System State messages are to be transmitted. As described pre-viously, the transmlsslon of the f~al bit of either of these messages marks the end of the prevlous Subatomic or Atomic periods respect1vely, therefore, thelr transmiss~on will begin a predeter-mined t1me (b1t counts~ before the end of the perlod. Since the Task Interact~ve Conslstency and System S~ate messages are of dif-ferent bit lengths, the number loaded lnto the warn~ng per~od counter w~ll be d~fferent. The Warnlng Per1Od Generator 94 w~ll _34~ 7~

decode the message type code contalned ln ~he flrst byte of the message stored ~n the Synchronizer Interface 50 and will load the ~arning perlod counter with a number indlcative of the length of the warning period for that particular type of time critical message. The warning perlod counter wlll be counted down at the bit transmission rate of the Parallel-to-Serial Converter 60 to generate a number indlcative of the time remalnlng for the transmission of a time crltical message. The number of oounts remaining ln the warning perlod counter are communicated to a Synchronizer Transmission Control 96 and the Tlme Remalnlng-Message Length Comparator 92. When the warning perlod counter is counted down to zero the Synchronizer Message Control 96 will generate a transmit synchronizer (TSYN) signal which wlll actuate the Synchronizer Interface 50 to reassert the buffer available slgnal and will actuate the Longltudlnal Redundancy Code Generator 58 to pass the message from the Synchronlzer Interface 50 to the Parallel-to-Serial Converter 60 for transmission on the Node's own communicatlon link 16.
The Tlme Remainlng-Message Length Comparator 92 will decode the message type of a message selected for transmlssion by the FLT-TSC Arbitration Loglc and determ~ne the number of blts that have to be transmltted for that message. To thls number the Tlme Rem~ining-Message Length Comparator 92 will add a numher equal to the number o~ bits Gorresponding to the qulet period bet-ween the messages and compare the sum of the message and the quietperiod with the count remainlng ln the warnlng perlod counter to determlne if the transmlss~on of the selected message wlll or wlll not lnterfere wlth the transmlssion of the tlme critical message from the Synchronizer Interface 50. If the transmission of the selected message wi!l not lnterfere with the sendlng of the time crltical message from the Synchronlzer 467 the Tlme Remalning-Message Length Comparator 92 wlll generate a signal enabling AND
gates 88 and 90 to pass the TFLT or TTSC signals, otherwise the Tlme Remalnlng-Message Length C~mparator 92 will generate a slgnal disabling AND gates 88 and 909 lnhlblt~ng the transm~ssion of the selected message from either the Fault Tolerator Inter~ace 52 or ~3~'1~

the Task Communlcator Interface 54. This signal wlll also toggl2 the FLT-T~C Arbltration Log~c 92 to poll the nonselected interface to detennine if it has a message to transm~t. If the nonselected interface has a message ready for transmission, the Ti~e Remalning-Messaye Length Comparator 92 will determine if there is sufficient time to transmit the message from the nonselected ~nterface hefore the transmlsslon of the time critical message from the Synchronizer Interface 5C. If there is sufficient time, the message from the nonselected interface will be transmitted, otherwise the AND gates 88 and 90 w~ll remain disabled.

The Message Arbitrator 56 also has a Byte Counter 100 which counts the number of bytes transmitted by the Paral1el-to-Serial Converter 60. The output of the Byte Counter 100 is received by a Message Byte Log~c 102. The Message Byte Log~c 102 decodes the message type code of the message being transmitted and determines the number of bytes in that message. After the last byte of the message is transm~tted, the Message Byte Logic 102 wlll flrst generate a transm~t longltud~nal redundancy code (TLRC) signal which enables the Longltud~nal Redundancy Code Generator 58 to transmit the generated longitudinal redundanc~y code as the final byte of the message. The Message Byte Logic 1~2 wlll then generate a transmit quiet period (TQP) s~gnal enabllng the Parallel-to-Serial Converter 60 to transmit the null signal for a predeterm~ned number of bytes wh~ch ~s used for message synchroni-zation. The transmit quiet per~od (T~P) slgnal is also ~ransmitted to the Synchronizer Transm1sslon Control 96 where it is used to terminate the transm~t synchronlzer (TSYN) signa1. At the end of the quiet per~odg the Message Byte Loglc 102 will generate an end of q~iet period (EQP) slgnal whlch will reset the Byte Counter 100 and unlatch the FLT-TSC Arbi~ration Loglc 82 for select~on of the next ~essage for transm~ssion.

A Self-Test Arb~tration Logic 104 recognizes a request for a self-test ~n response to a transmltted Task Completed/Started message ~n wh~ch the task ~dentification (TID) code is the same as the Node identiflcation (NID~ code. After the ~L3~ 3~
-3fi-transmission of a self-test request message, the Self-Test Arbitration Logic 104 wlll inh~bit a Task Commun~cator Enable (TSCE) signal and a Fault Tolerator Enable (FLTE) slgnal as shown in Figure 8 which, when appl~ed to AND gates 84 and 869 respec-5 tively, inhibits all transmissions from the Fault ToleratorInterface 52 or the Task Communicator Interface 54. Immedlately following the next Task Interact~ve Cons~stency or System State message, the Self-Test Arbitration Loglc 104 ~ill generate a transmit self-test (TSLT) signal which will actuate the Self-Test Interface 6~ to read the self-test message from an associated off board (read only memory) ROM. The (TSLT) signal w~ll also enable the Longltudlnal Redundancy Code Generator 58 to pass the self-test message from the Self-Test Interface 62 to the Parallel-to-Serial Converter ~0 for transmlssionO After transmlssion of the self-test message, the Self-Test Arbitration Logic 104 w~ll restore the Task Communicator Enable (TSCE) signal to permit the transmlss~on of a Task Completed/Started message signlfy~ng the complet~on of the self-test. As indicated in Table 117 the F~T-TSC Arbltratlon Loglc 82 wlll automatical1y select the message from the T~sk Communlcator Interface 54 as the next message to be transmltted follow~ng the transmisslon of the self-test message. After the transmlsslon of the Task Completed/Started message the Self-Test Arbltration Loglc 104 will tenminate the Task Communlcator Enable (TSCE~ signal until after the next Task Interactive Cons~stency or System State message is transmltted as indlcated in Flgure 8.
The Self-Test Interface 62 serves to transfer the self test message from the off board ROM (not shown) to the Long~tudlnal Redundancy Code Generator 58. The off board ROM will store a plurallty of Self~test messages wh~ch are transm~tted one at a tlme ln response each t~me a Self-test ~s requested. The flrst byte of eaGh Self-~est message ~s a number lndlcative of the number of bytes ~n ~he Self-test message wh1ch ls passed back to the Message Byte Loglc 102 to ldent~fy the complet~on of the self-test. The tast byte ln each self-test message store~ ~n the off ~ 3l~3 board ROM ~s the starting address for the next Self-test message.
The starting address 1s not transmitted, but rather ~s stored tn the Self-Test Interface 62 to locate the next Self-test message in the off board ROM to be transm~tted. The last byte of the last Self-test message stored ln the of~ board R~M ~ontains the starting address of the flrst Self-test messaye, so that the Self-test message sequence is repeated. T~e start1ng address Sor the first Self-test message ~s loaded int~ the Self-Test Interface 62 by the Initial Parameter Load Module 64 in response to an initial load command generated by the Synchron~zer 46 in response to the electrical power being turned on.

As ~llustrated in Figure 9, the Long~tud~nal Redundancy Code Generator 58 has an 4:1 Input Multiplexer 110 which receives the message bytes from the Synchroni~er I~nterface 50, Fault Tolerator Interface 529 Task Com~unicator Interface 54, and Self-Test Tnterface 62. The Input Multiplexer 110 controls which message w~ll be transm~tted to the P~rallel-to-Serial Converter 60 in response to the transm1t (TF~T, TTSC, TSYN, and TSLT) signals generated by the Message A~b~trat~r 56, as prevlously described.
Each byte of a message selected for transm~sslon by the Message Arb~trator 56 1s transm~tted to an Output Multiplexer 112 by means of nine parallel llnes, one for each blt ln the received byte plus the parity b~t generated by the assoc~ated ~nterface. A
Long~tudinal Redundancy (LR) Blt ~enerator 114 ~s connected to each of the n~ne parallel b~t l~nes and col1ect~vely generate a nine b~t longitudlnal ~edundancy code~ Each bit in the long~tudl-; , nal redund~ncy code is a function of the b~t values in the same bit locations ~n the precedlng bytes~ The outputs of all ~he LR
bit generators 114 are also received by the ~utput Mult1plexer 112. The Output Multiplexer 112 ~s respons1ve to the transmit long~tudinal redundancy code (TLR~ slgnal generated by the Message Arb~trator 56 to output the last b~t generated by each of the LR b~t generators 114 as the last byte of the message being transm~tted. The output of the Output Mult~plexer 112 ls con-.. . . .

-38~ 93~ 3~

nected directly~to the Parallel-to-Serial Converter Ç0 which fra-mes each rece~ved byte between predetermined start and stop bits before ~t ~s transm~tted on the Node's communication link.

RECEIVERS

The structures of the Receivers 32a through 32n are ~de~tical, there~ore, only the structure of the Rece~ver 32a will be d~scussed in detall. Referrlng to.F~gure 1~, the messages from Node A~transm~tted on communicat~on l~nk 16a are receiYed by a Nolse F11ter and Sync Detector 116. The synchronization portion of the Noise F~lter and Sync Detector 116 requires that a proper synchronization lnterval exists prior to the reception of a message. As described relative to the Transm~tter 309 the synchronization ~nterval preferably ~s the time requ~red for the ~ransmitter 30 to transmit two complete null bytes after each lS transmitted message.
The low pass port~on of the Nolse Fl1ter and Sync Detector 116 prevents false sensins of the "start" and "stop" bits by the Recelver 32a due to noise which may be present on the com-mun~cat~on llnk 16a. The low pass fllter portlon requires that the slgnal on the commun~cat~on llnk 16a be present For four (4) consecutl~e system clock cycles before ~t ~s 1nterpreted as a start or a stop b~t. The Nolse F~lter and Sync Detector 116 will generate a new message signal tn response to recelving a start bit after a proper synchronization interval.
After pass~ng through the Noise ar;d Sync Detec~or 116 the message, byte-by-byte, ls converted from a ser1al to a `~ para~lel format in a Serial-to-Parallel Converter 118. The Ser~al-to-Parallel ConYerter 118 also determlnes ~hen a complete 12-b~t byte has been received. If the 12-b~t byte ~s not properly 3Q framed by a "start" and two "stop" blts, a new b~t is added, the blt f~rs~ rece~ved ~s d~scarded and the fram~ng ~ rechecked~
Framlng errors are nst flagged by the Reoeiver 32a since this fault will man~fest 1tself.dur~ng a vert~cal par~ty check. After convers1On to a parallel format, the start and stop bits are str~pped from each byte and the remain~ng 9-bit byte is trans-~39~ 33~3 ferred to a Longitudlnal Redundancy Code and Yert1cal Par1ty Codc(LRC and VPC) Checker 122 to check for parity errorsO The error check1ng logic outputs the current combinational va1ue of the Yer-tical parity and the longitud~nal redundancy codes. The vertical S pari~y check portion checks the parlty vertlcally across the received message while the longltud~nal redundancy code checker portion perfonms d long~tud1nal redundancy code check on each byte received from the Serial-to-Parallel Converter 118. The Message Checker 34 decodes the message type information contained in the first ~yte of the message and determ~nes whlch byte is the last byte in the message and, therefore, for which byte the long1tudi-nal redundancy code check is valid. The Message Checker 34 will îgnore all other LRC error signals generated by the LRC and VPC
Code Checker 122.

In parallel with the vert~cal parity and longitudinal redundancy checks, the 8-bit message byte is trans~erred to a Bu~fer 12~ which interfaces with the Message Checker ~4. The ~uffer 120 temporarlly stores each 8-bit message byte until the Message Checker 34 is ready to check it. Upon receipt of a ZO mK~SSage byte, the Buffer w~ll set a byte ready flag signifying to the Message Checker 34 that it has a message ~yte ready for transfer. The Mes6age Checker 34 wlll unload the message bytes from the Buffer 120 ~ndependent of the load~ng of new message bytes by the Serial-to-Parallel Converter 118. The 8-bit message ZS bytes are transferred to the Message Checker 3fl via a common bus 124 which is shared with all of the Receivers 32a through 32n in the Operations Controller 12~ The transfer of the message between the Receivers 3~ and the Message Checker 34 is on a byte-by-byte hasis ~n response to a polling signal generated by the Message Ohecker. The Message Checker 34 will systematically poll each Receiver one at a tlme in a repetitious sequence.
MESSA~E CHECKER

The details of the Message Checker 34 are shown in Figure 11. The Message Checker 34 processes the messages _40_ 3~ L~

received by the Rece~vers 32a through 32n and verif~es their log~-cal content, records any errors detected, and forwards the messa-ges to the Fault Tolerator 360 The operat~on o~ the Message Checker 34 ~s controlled hy a Sequencer 126 which context switches among the multiple Rece~vers 32a through 32n in order to prevent overrun of the Buf~ers 120 in each Recelver. Each Receiver 32a through 32n is polled in a token fashion to dete~nlne if it has a message byte ready for process~ng. If the message hyte is ready for processing when lt is polled by the Sequencer 12fi the byte w~ll he processed immediately by the Message Checker 34.
Otherwise the Sequencer 126 will advance and poll the next Receiver in the polllng sequence. The Sequencer 126 stores the Node identificat~on (NIn) code of the Node 10 associated with each Receiver. The Sequencer 126 also has a Byte Counter associated with each Receiver 32a through 32n which is indexed each time the Sequencer 126 unloads a byte from that partlcular Rece~ver~ The hyte count unlquely identifles the particular byte being processed b~ the Mcssage Checker 340 , The Sequencer 126 will transfer the Node identification 20 code and the byte count to a Data Multiplexer 128 to tag the message ~yte as it is transferred to the Fault Tolerator 36. The Node identification code and the byte count are also transmitted to an Error Check Log~c 130 and a Context Storage 132. The Error Check Log1c 130 w~ll check the Node identificatlon code expected by the Sequencer l~fi w1th the Node identificatlon code contained ln the first byte of the message being checked to determine if they are the same. When they are the different the Error Checker Logic 130 wlll generate an error signal which is recorded in an error status byte belng generated in the Context Storage 132. The Node ident~f~cation code is also used as an address into the Context Storage 132 where the relevant information pertainlng to the message be~ng processed is stored~ The Context Storage 132 has a separate storage location for each Node 10 in the system which is addressed by the Node identificatlon code contained in the message.

9~3 The Context Storage 132 stores the message type (MT) code~ the data identificatlon (DID) code, the byte count, an error status byte, a data value mask, and an intermed~ate error signal for each message as it is being processed. As each byte is unloaded from the Receivers, the lnfonmation in the Context Storage 132 will be used by an Address Generator 134 w~th the message type (M~) code9 the data identification (DID) code, and the byte count which identlfles the specific byte to be processed.
In response to this informat~on, the Address ~enerator 134 will output an address where the required process~ng infonmation is stored in a Message Checker ROM 136 The Message Checker ROM 13 stores the maximum and minimum values for the data contained in the message, the valid data identif~cation numbers for each message type~ and a data mask which identlfies how many data values are contained in the message being processed and the number of bytes in each data value.
The maximum and minlmuln data values are transmitted to a ~etween Llmlts Checker 138 which will check the data con-tained in each data byte against these maximum and minimum values.
The Between Limits Checker 138 will generate four dlfferent error signals as a result of the between limlts checks. The flrst two are the maxlmum value (MXER) an~ mlnlmum value tMNER) error signals, signifylng the data value exceeded the maximum value or was less than the minimum valueO The other two err~r signals are the equal to maxlmum value (MXEQ~ and equal to minimum value (MNEQ) signals. These latter error signals are transmitted to the Error Check logic 130 which w~ll store them in the Context Storage 132 as intenmedlate error s~gnals.

The Error Check Logic 130 w~ll OR the vertical parity code and the longitudinal redundancy code error signals generated by the Recelver and generate a parity error signal which ls recorded in the error status byte being generated in the Context Storage 13~. As prevlously described, the Error Check Loglc 130 wlll check the expected Node identification (NID) code against the Node ~dentlflcatlon code contained in the first byte of the message and will check the message type (MT) sode hy checking to ~ f~ 3~

see if b1ts in bit position 1, 3, and 4 of the first byte are identical. As previously descrlbed ~n the deta~led description of the Transmitter 30 the middle blt of the 3-bit message type code is repeated in ~it positions 3 and 4 for message type error detec-t~ons. The Error Check Logic 130 w~ll also check the validity ofthe data identification (DID~ code contained ln the second byte of the message against the maximum value for a (DID) code received ~rom the Message Checker ROM 136 and will generate an error signal if the data identificat~on code has a value greater than the maxi-mum value. The Error Check Log1c 130 will further check the two'scompliment range of the appropriate data byte and generate a range error (RNGER) signal when a two's compliment error range is detected. It will also record in the Context Storage 13~ the maxim~m (MXER) and the min1ml~ (MNER) error siynals generate~ hy the Between L~mits Checker 138.

With regard to the Between Llmlts Checker 1389 often it can be determined from the first byte of a multi-byte data value 1F the data value within or outside the maximum or minimum values received from the Message Checker ROM 136 and checkin~ of the remain1ng bytes 1s no longer necessary. However, when the Between LTmits Checker 138 generates a MXEQ or MNEQ signal signifying that the data value of the byte belng checked ls equal to either the maximum or min1mum limit value, it will be necessary ~o check the next byte against a max1mum or a m1nimum value to make a factual determ1nation of whether or not the recelved data value is within or outside the predetermlned llmits. The Error Check Logic 130 ln response to an MXEQ or an MNEQ slgnal from the Between Limits Checker 138 wlll store in the Context Storage an intenmediate value s19nal which signifies to the Context Storage 132 that the between l~mlts check is to be continued on the next byte con-taining that data valueO Th~s process will be repeated with the next subsequent byte if necessary to make a final de~ermlnation.
During the check~ng of the next byte of the particular data value, the Context Storage 132 w~ll supply to the Error Check Logic 130 stnred ~ntermed1ate value wh1ch ident1~les to which limit, maximum or m1n~mum, the data value of the precedlng data byte was equal.
From th~s ~nfonmation7 the existence or non-ex~stence of a between ~, _43_ ~ 3 the limlts error can readlly be determined hy relatlvely simple logic as shown on Figure 12. A Decoder 140 responsive to the intermediate value stored in the Context Storage 132 wil1 enable AND gates 142 ~nd 144 ~f the precedlng between limits check generated a slgnal signi~ying the data value contained ~n the preceding byte was equal to the maximum value. Alternatively, the lntenmed~ate value w~ll enable AND gates 14fi and 148 signifying that the data value contained in the preceding byte was equal to the minimum value. If on the second byte the Retween Limits Checker 138 detects a maximum l~mlt error (MXER~ and AND gate 142 is enabled, the maximum lim~t error MXER will be recorded in the error status byte being generated in the Context Storage 132. In a like manner~ if a m~nimum limit er~or (MNER) is detected on the second byte and the AND gate 146 is enabled~ the minimum limit error (MNER) will be stored 1n the error status byte. If the second byte appl~es an equal to maximum (MXEQ) or eqlial to m~nlmum (MNEn) signal to the lnputs of the AND gates 144 and 148, respec-tively, an intermed~ate value will again be stored in the Context Storage 132 and the final decision delayed to the next byte. The d~t~ value mask rece~ved by the Context Storage 132 from the Message Checker ROM 136 identifies the number of individual data values that are ~n the Data Value message belng prooessed in which data bytes belong to each data value. This mask is used by the Error Check Logic 130 to ldentify the last byte in each data value. On the last byte of any data value9 only maximum or mini-mum limit errors will be recorded in the Context Storage error status byte. ~he MXEQ and MNEQ signals will be ~gnored.

The Error Check Logic 130 will also detec~ if the message contained the correct number of bytes. The Context 3U Storage 132 stores the message type (MT~ code for each message being processed. In response to a message slgnal received with a message byte from a particular Rece~ver 32, the Error Check Logic 130 w~ll decode the message type code stored in the Context Storage 132 and generate a number corresponding to the number of bytes that type of message should have. It will then compare this number with the byte count generated hy the Sequencer 126 prior to - 44 ~ ~3 ~

receivlng a new message s19nal from the Receiver 32 and will generate a message length error (LENER) ~ignal when they are not the same. ~ecause the length error (LENER) signal may not be generated unt~l after the error status byte has been sent to the Fault To1erator 36, the message length error slgnal will be passed to the Fault Tolerator 36 in the error status byte for the next message received from that Node.

The format of the error status byte fonmed in the Con-text Storage 13~ is shown ~n F~gure 13. In an ascendlng order of bit posltions, starting with the least s~gnlficant or zero bit position the error status byte contains a flag for the parity error (PA~ER) a flag for the lenyth error (LENER~ for the pre-ceding message, a flag bit for the Node ldentlfication (NID) error, d flag b~t for the data identification (nID) error, a flag bit for the message type (MT) error, a flag bit for the two's compliment range error ~RNGER) and flag bits for the maximum and minimum lim~t (MXER and MNER) errors.

Returning to F~gure 11 the Data Multiplexer 128 transmits each message byte directly to the Fault Tolerator 36 as 20 it is processed by the Message Checker 34. The Data Mult~plexer w~ll append to each message byte a descriptor byte which contalns the Node identifk ation code (NIO) and the byte count (BYTC) recelved fro~ the Sequencer 126 for that particular byte of the message, At the end of the message9 independent of its length, the Data Multiplexer 128 w~ll transm~t the error status byte stored in the Context Storage 13~ as the last byte. The last byte is iden~ified by ~ byte count "15" so that ~t can readily be iden-tif~ed by the Fault Tolerator 36 for fault ana1ys1s.

FAULT TOLERATOR

The deta~ls of the Fault Tolerator 36 are shown on Figure 14. The Fault Tolerator 3~ has a Message Checker Interface ~L~36~3l~3~

150 whlch recelves the messages byte-by-byte after being checked by the Message Checker 34. Upon receipt of an error free Task CompletedtStarted message~ the Message Checker Interface 150 ~ill forward the identity (NI~) of the Node which sent the message con-dltion contained in the message to a Synchronizer Interface 152,the i~entity (TID) of the new task started, and the branch con-dition contained in the message to the Scheduler Interface 154.
The Message Checker Interface 150 will also send the Node iden-tification ~NID) code and the message type (MT) ~ode to a Voter Interface 1~8 and the data along wlth a partition bit to a Fault Tolerator RAM Interface 160. The Message Checker Interface 150 will also forward the error status byte (byte = 15~ generated by the Message Checker 34 to an Error Handler 164 for processing.
The Synchronizer 46 w~ll report to the Error Handler 164 through the Synchronizer Interface 15~ any errors it has detected ~n the Task Interactive Consistency (TIC) and System State (SS) messages. The Scheduler Interface 15~ will forward to the Scheduler 40 the task ~dent~fication (TID) code of the task started and the Node ldentity (NID) of each received Tas~
Completed/Started message. In return, the Scheduler 40 will transmlt to the Error Handler 164 through the Scheduler Interfase 154 any errors it has detected.

The Transm~tter Interface 156 w~ll forward to the Transm~tter 30 the Base Penalty Count and Error messages generated by the Error Handler 164, As prev~ously described, ~he Transmitter Interface 15h w~ll load the flrst byte of the message to be transferred lnto the Transmltter's Input Reg~ster to slgnify lt has a message ready for transm1ssion. It wlll then await the reasserîion of the buffer avai~able (BAB) signal by the Transmitten 30 before forward~ng the rematnder of the message to the Transm~tter 30 for transm~ss~on.

A Reset Generator 157 ls responsive to a reset signal generated by the Error Handler 164 when it determines its own Node is faulty and to a power on reset (POR) s~gnal generated ~hen electrical power is f~rst applied to the Node to genera~e an .

-46~ 3 Operations Contro~ler reset (OCRES) signal and an initial para-meter load (IPL~ slgnal wh~ch are transm~t~ed to the other sub-systems affecting a reset of the Operations Controller 12.
The Fault Tolerator RAM Interface 160 will s~ore in a Fault Tolerator RAM 162 the data contained ln the message bytes as they are received from the Message Checker Interface 150. The Fault Tolerator RAM 162 ts a random access memory partitioned as shown in Figure 15. A message partition section 166, as shown on Figure 15, stores in predetermined locations the messages receive~
from each Node. In the message partition section 166 the messages are reassembled to their or~ginal format us~ng the identifier byte appended to the message bytes by the Message Checker 34. A
double buffering or double partit~oning scheme is used to prevent overwritlng of the data that is st~ll being used by the Voter 38.
A context bit generated by the Message Checker Interface 150 detenm~nes into which of the two part~tions the new data is to be written. Separate context bits are kept for each Node and are toggled only when the error status,byte ~nd~cates the current message is error free. As prev~ously dlscussed relative t~o the Message Chccker 34, the message length (LENER) byte of the error status byte s~gnifies that the preced~ng message had a message length error and, therefore, ~s ignore~ in the detenmination of an error free condition for the current message.
The format for a single message in the message partition section 166 ls illustrated in Figure 16. As shown, the message is reconstructed in its original fonmat in the Fault ~olerator RAM
162 usin3 the Node ident~ficat~on (NID) code and the byte count appended to each mes~age byte in the Message Checker as a portion of the address. The context blt generated by the Message Checker Interface 150, along with the message part~t10n code (bits 8 through 11) generated by the Fault Tolerator RAM Interface 160 completes the address and ldent~f~es which of the two locatiDns in the message partition 166 the message from each Node 1s to be stored.

The Fault Tolerator RAM 162 has three sections used by the Error Handler 164 for generating the Base Penalty Count and -41~ 3 Error messages.

An error code file sect~on 170 stores the error codes used to generate the Error messages transmitted immediately after the beginning of each Atom~c period and to generate the increment penalty count whlch ls ~ncluded ~n the Error message.
Slnce there are th~rty-five d~fferent error detection mechanisms in each Operat~ons Controller 12, there is a possibi-lity of two to the thirty-f~fth power of error comb~nations that may result from each message transmitted in the system. In order to reduce the number of comb~nation of errors to a reasonable number, compatible with the state of the art storage capabil~ties o~ the Fault Tolerator RAM 162, the error reports from the various subsystems are fo mated ~nto spec~al error codes as they are received. The formated error codes, as shown on F~gure 17, include an identi~icat~on of the subsystem which reported the error plus a ~lag indication of the errors detected. For example9 the error status byte received from the Message Checker 34 is for-mated lnto two separate error codes. The first error code con-talns the subsystem code 0000 wh~ch reported the errors and the error flags from the four least signlficant b~ts of the error sta-tus byte. The second error code contains the sub-system code 0001 and the error flags from the four most significant bi~s of the error s~atus byte. These error codes are stored ~n the error c~de file sect~on 170 at an address def~ned by the faulty Nodes iden-t~fication (NID) code and report number as shown ~n Flgure 197The error code file sect~on 170 ls double part~tioned the same as ~he message partit~on sect~on 166 so th~t two error f~les are stored ~or each Node. The context bit generated by the Message Checker Interface 150 ~dentif~es ~n wh~ch of the two error ~lles for that Node the error code will be repor~ed.
~ .
Each error code ls used to address a group mapp~ng sec-t~on 168 of the Fault Tolerator RAM 162. The error code addresses a penalty weight point*r, as shown in Figure 18, which addresses a penalty we~ght section 172 of the Fault Tolerator RAM. As shown ln F~gure 20, the penalty weight pointer addresses a speclf~c penalty welght wh~ch is assigned to the specific combination of reported errors contatned in the ~ormated error code. The penalty weights resulting from each error code store~ in the error f1le for that Node are summed ~n the Error Handler 164 and appended to the Error message as an increment penalty count (byte-8) for that Node. As prevlously ind~cated~ the Error Handler 164 w~ll generate only one Error message ~n each Atom~c period for each Node wh~ch transmltted a message whlch contained an error.
The Fault Tolerator RAM 162 will also store the deviance limits for the one byte (MT0) two byte (MT1), and four byte (MT3 and MT4) Data Yalue messages in four separate sections, 174, 17fi, 178 and 180, which are used by the Voter 38, as shall be explained w~th reference to the Voter hereinafter.
The detatls of the Message Checker Interface 150 are illustrated ~n Figure 21. A Store Message Module I82 receives the message bytes directly from ~he Message Checker 34 and stores them ~n the message partition sect~on 166 of the Fault Tolerator RAM
I62. The Store Message Module 182 w~ll add the context bits stored ~n the a Message Checker Interface Contex~ Store 190 to the descriptor (NID plus byte count) appended to the messaye byte by the Message Checker 34 to generate a part~tion address (PID). The partit~on address ident~fies the 10cation in the message partition section lfi6 where the particular message byte is to be stored. As prevlously discussed, at the beginnlng of each Master period, each Node will first transm~t a Base Penalty Count message followed by a Task Completed/Started message. The Store Message Module 182 stores for e~sh Node a f~rst flag signifying the receipt of the Base Penalty Count message and a second flag s~gnifying the receipt of the subsequent Task Comple~ed/Started message. These flags are set to false at the beginning of each Master perlod and are rese~ to true when the ~ase Penalty Count and the Task Completed/Started messages are rece~ved for that Node. Unless both of these flags are set to true the Store Message Module 182 w~ll dlsable the wr~ting of the address of any subsequently _49_ ~ 3 ~

recelved messages from that Node ln a Voter Inter~ace Buffer 184.
As a resu1e, the subsequently received data from that Node will not be processed by the Voter 38 and will be ignored during any subsequent processing. The Yoter Interface Buffer is a 8 x 7 ~lrst in-first out buffer ~n wh~ch the four most slgnificant bits are the four most sign~ficant b~ts of the partition address ~context bits plus NID) for the received message in the message part~tion section 166 of the Fault Tolerator RAM lfi2. The remainlng three bits are the message type code contained in the 1~ first byte of the message.
Ah Error Status ~yte Detector lB6 listens to the messa-ges being transmitted from the Message Checker 34 to the Fault Tolerator 36 and w~ll detect the receipt of each error status byte tbyte 15) generated by the Message Checker 34~ If the content of the error status byte, with the exception o~ the length error (LENER) b~t9 are all zeros, the Error Status Ryte Detector 18fi wl~l enable the Message Checker Inter~ace Context Storage 190 to load the Voter Interface Buffer 184 through the Store Message Module 182, or to load a Task Completed Reglster 202 or to load a 8ranch Conditlon Register 200 as required. Otherwise the Error Status Byte Detector 18fi will load each non-zero error status byte ~n an Error Status Buffer 188 for subsequent processing by the Error Handler 164. The Error Status ~yte Detector 186 will also detect if a message is a self~test message (TID=NID) set a self-test flag in the Error Status Buffer 188. The Error Status Buffer188 is an 8 x 12 first in-first out buffer in which the most sign~f~cant bit is a self-test flag, the next three bits are the Nodes ld~ntificat~on (~ID~ code and the remaining 8-bits are the received error status byte.

The Message Checker Interface Context Storage l90 tem-porar~ly stores for each Node the in~ormatîon conta;ned in Table III. This information ls temporarily stored since it ls not known if the message is error free until the error status byte is recelved, ' _50 ~L~ 3~3 Message Ohecker Interface Context Storage Rit Descr1ption When Wr1tten ' 13 TIO Flag MT1, Byte Count = 2 (DID=O) 12 Part~tlon Context 81t 3yte Count ~ 15 11-9 Message Type Code Byte Count = 1 8 Branch Cond1t10n B1t MTfi, ~yte Count - 4 7-D Started TID MT69 Byte Count - 3 The most significant bit, bit 13, signifies that the received message ls a Task Interact1ve Consistency (TIC) message which is processed by the Synchronlzer 46. This flag is set by a Task Interact1ve Conslstency Message Detector 192 in response to a message type MT1 hav1ng a data ~dentificat10n code which are all zero's9 (DID=O) and wlll 1nhlb1t the loadlng of the address of thls,message ln the Voter Interface Buffer 184 since it is only used by the,Synchron1zer and no other subsystem of the Operations Controller. The twe'1fth b1t 1s the part1t10n con~ext bit which ident~fies in wh1ch part1t10n of the message partit10n section 166 the message w111 he stored. The context blt ~s ~oggled when the Error Status ~yte netector 186 ln~1cates the prior message was error free. If the message 1s not error free, the context bit is not togg1ed ard the next message recelved from that ~ode is writ-ten over the pr10r message in the Fault Tolerator RAM 162.

The message type code b1ts are recelYed direc-tly from the f1rst byte of the message. The branch condition bit, bit-8, , ls received from a Branch Conditlon Detector 194 wh k h detects the branch condlt10n contalned In the fourth byte of the Task Completed/Started (Mt6) message. The 1dertlf~cat10n of the started task (TID) 1s obtalned from a Task Started Detector 196 wh1ch loads the TID of the started task 1nto the seven least signif1cant b1t locat10ns of the Me$sage Checker Interface Context Storage 190~

, 3~3 Upon the receipt of an error status byte which signifies that the received messaye was error free and ~f the message is not a Task Interactive Consistency message, the Message Ch~cker Interface Context Storage I90 will transfer the context blt and the message type to the Store Message Module 182. In the Store Message Module 182, the context btt is added to the Node iden-t~ficat~on (NID~ code to form the starting partition (PID) address of that message ~n the Fault Tolerator RAM 162. The message type code is appended to the partltton address and they are transferred to the Yoter Interface Buffer 184 for subsequent use by the Voter 38 to extract the data necessary for the voting process.
Upon the receipt of an error status byte signifying the receipt af an error free Task Completed/Started (MT6) message, the Message Checker Interface Context Storage 190 will transfer the ident~ficatton (TID) code of the stared task and the Node iden-t~ication (NT~) code to a Scheduler Interface Buffer 198 where it ls transferred to the Scheduler 40 when requested. The Scheduler Interface Buffer 198 ts an 8 x 11 blt fi~st in-first out buffer which ls reset at the end of the soft error window (SEW). The soft error wtndow 1s generated by the Synchronizer 46 and defines a pertod of ttme brackettng the end of each Subatom k period durtng which the ttme cr1tical messages from other Nodes should be rece1ved if they are in synchronization with each other.

In parallel, the Message Checker Interface Context Storage 190 will transfer the stored branch condition (BC) bit to the Branch Condition Register ~00 and transfer the node iden-tlficati~n (NID) code of the Node that send the message to the Task Completed Reg~ster 202. These registers are read by the Synchrontzer Interface 152 when requested by the Synchronizer 46.
The Branch Cond~tion Register 200 and the Task Completed Registers 202 are double buffered with a different set of registers being reset at the end of each hard error wlndow (HEW) signa1. The hard error window signal is generated by the Synchrontzer 46 and brackets the soft error w~ndow (SEW) at the end of each Subatomic per1Od and deftnes the max~mum deviance in the arrival time of the time critical messages from the other NodesO The function of the -52- 3L~ L~3 8 hard error wlndow {HEW) and soft error window (SE~) will be discussed ln greater detail in the deta~led descr~ption of the Synchronizer 46.

the Error Handler, as shown on Flgure 22, includes an ~rror Filer 204, an Error Consistency Checker 206, an Error Message Generator 208, and ~n Error Uandler Context Store 210.
The Error Filer 204 polls the Message Checker Interface 150, tne Synchronizer Interface 152, the Scheduler Interface 154, and the Voter Interface 158 for error reports from the various subsystems within the Operations Controller~ The Error Filer will format the rece~ved error reports into a formated error code, as shown on Figure 17, and tag them with an error file address, as shown on Figure 19. The error fller address is a 3-bit error file iden-tif~catlon code, a context bit which ls the one generated by the Message Checker Interface 150 for filing the message in the message partition of the Fault Tolerator RAM 162~ the Node iden-tif~cation (NID) code and a report number. As previously described,.the formated error code contains a 4-bit code which identifies the subsyst~m which detected the error and f~ur flag bits identifying the errors detected~

The Error Filer 204 will pass these formated error codes to the Fault Tolerator RAM Interface 1~0 which will store them in the error code ft1e section 170 of the Fault Tolerator RAM 162.
The Error Filer 204 will also forward the number of error reports written to the Error Handler Context Store 210 so that the Error Message Generator 208 will be able to determine how many error reports to process from the Fault Tolerator RAM 1fi2. The Error Filer 204 will also detect the self-test flag generated by the Message Checker 34 and forward this flag to the Error Message Generator 208. The self-test flag ~5 part of one of the group ' codes whose penalty we~ght is programmed to be zero or a very small value. The self-test error message will ident~fy all of the errors detected and will include the Incremental and Base Penalty Count.
The Error Consistency Checker 20fi is responslble for conslstent handling of the error reports and the base penalty counts for each Node 1n the system. A form of implicit interac-. -53~ 3B

tlve consi~tency ls used to achieve thls goal. At the beginnirg of each Master perlo~, the Error Conslstency Checker 206 receiYeS
through the Voter Interface 158 a voted base penalty count ~V~PC) wh~ch is generated by the Voter 38 in response to the Base Penalty Count messages received from a11 the Nodes ~n the system including its own, Referr~ng now to Figure 23, these voted base penalty counts are stored in a Base Penalty Count Store 212 as the base penalty counts for each Node independent of the values of the base va~ue penalty count stored for the preceding Master period. In this manner all the Nodes in the system will begln each Master period with the same base penalty counts for each Node in the systemO The Base Penalty Count Store ~12 also receives a voted increment penalty count (VIPCl which is generated by the Yoter 38 fro~ the error messages received from all of the Nodes including its own, The voted increment penalty count (VIPC) is added to the base penalty count of the accused Node when the error is verifled by a Validity Checker 2I8. P~eferably the Validty Checker 218 is embodied in the Voter 38, but may be part of the Error Consistency Checker 206 as shown in Figure 23.

The Error Cons~stency Checker Z06 also main~ains a Current System State Reg~ster 214 wh1ch st~res a voted current system state (CSS) vector and a Next System State Register 216 which stores a next system state (NSS~ vector~ The current system state vector identifies which Nodes are currently active in the system and which are excluded, wh~le the next system state vector ident~fies wh~ch Nodes are to be included and/or which are to be excluded ~n the next systems state of the system. The system will change its state ~t the beg~nn~ng of the next Master period if the voted next system state vector is different ~rom the current system state vector. The current and next system state vectors . have 8 flag bits, one for each Node, wh~ch are set when the Node is excluded and wh1ch are reset when the Node is readm~tted to the operating set of NodesO

Pr~or to the d7scuss~on of the Valid~ty Checker 218, the various types of errors that are detected in each Node will be 54 1~ 3B

d~scussed brlefly. Table IV ts a list of twenty-f~ve fault detec-tlon mechan~sms used ln the systems.

TABLE IV

E~

Error Suhsystem Sym/Asym Message Vertical Panity MSC A
~essage Longltudlnal Redundancy MSC A
Message Length MSC A
Synchronization - ~lard MSC A
10 Synchronizatlon - So~t MSC A
Send Node ID MSC S
Invalid Messa~e Type MSC S
Inval~d Data ID MSC S
Task ID Sequence FLT S
15 Data ID Sequence FLT S
Data Limlt MSC S
Data Deviance FLT S
Task Run Time . SCH S
Current System State F~T S
20 Next System S~ate FLT S
Penalty Count Base DevianceFLT S
Penalty Count Increment Deviance FLT S
~lssed BPC Message FLT S
Unsupported ~rror Report FLT S

~ 4~ 3~

M~s~ng Error Report FLT S
Self Detection Monitor F~T S
M.P. M~salignment SYN S
Sync Sequence Error SYN S
Sync M~ssing Message SYN S
Too Many Data Messages VTR S
AP Reported Error TSC S
Last DID Shipped TSC S
Wrong Message during SEW FLT A
10 Too Many Error Reports VTR S
Too Many BPC YTR S
Exceeded Max. No. of Errors FLT A

This tahle lists the error, the subsystem which detects the error, and whether the detecti~n of the error is symmetric (S~
or asymmetrlc (A). Since the system ~s symmetr~c in its struc-tureJ most of the error~ contalned in the messages transm1tted to each other shoul~ be detected by every other ~ode. Therefore, every Node should generate an error message which identifies the error detected and the incremental penalty counts to be charged against the Node that made the error. These errors which are detected by all of the Nodes are called sy~metric errors~
Therefore, the existence of symmetric errors should be verified by at least a majorlty of the active Nodès ~n the system. There also is the case where channel no~se occurs so that an error manifests 25 itself differently among the receiving Nodes. In this case, ~he majority of the Nodes w~ll agree which Node ~s faulty. However, the error or errors detected may be different for each No~e and the incremental penalty count reported in the var~ous error messa-ges may llkewise be different, A med~an vote on the incremental penalty count will be used to increment $he base penalty count for that Node. Howeverg the Validity Checker 218 w~ll not generate a ~ .

deviance error report to the Error F~ler 204 identify1ng those Nodes who~e incremental penalty counts dlffered from the voted incremental penalty count by more than the allowed amount. This is to prevent the unjust penallzing of a healthy Node.
S Turning now to Figure 24, the ~alidity Checker 218~
whether embod~ed in the Voter 38 or the Fault To1erator 3fi, has a Majority Agree Detector 224, an Asym~etric Error Detector 226, and an Error Reporter 230. The Ma~ority Agree Detector 2~4 receives a "missing vector" from the Voter 3B indicating from which Nodes the Base Pena1ty Count or Error messages were not received and a "dev1ance vector'J indicating which Nodes generated a Base Penalty Count or Error message in which the base penalty or increment penalty counts were outside of the permitted deviances about the voted values. The Majority Agree Detector 224 also receives a current system state vector from the Current System State ~egister 214. The Ma~ority Agree netector 224 will subtract the "deviance vector" and the "missing vector" from the current system state vector to generate a number corresponding to the number of Nodes whlch agree wlth the voted value. This number is then compared 20 w~th the number of Nodes currently acti~e in the system identifled by the current system state vector. If a tie or a majority of the Nodes sent messages whose values agree ~ th the voted values, then ~f the message ls a Base Penalty ~ount message, the voted base penalty counts ara stored in the Base Penalty Count Store 212.
Otherwise, if the message is an Error message7 the base pena1ty count stored in the Base Penalty Count Store 212 is incremented by the voted increment penalty count. If the messages received from ~he other Nodes do not represent a majority, then the Majority Agree Detector will ~enerate a write inh~b~t si~nal which i5 applied to the Base Penalty Count Store 212 through an AND gate 234. Thls wr1te inhlbit signal w~ll inh~bit the writing of the voted values in the Base Penalty Count Store 212 prov~ded the reported error or errors ~s not an asymmetr~c error.
The Asymmetric Error Detector 226 receives the deviance 3L3~ 3 vector, the mtssing vector, the current system state vector, and generates a deviance report inhtbit s~gnal when a majority of the Nodes send error messages ~dentlfylng a part~cular Node as faultly but they d~sagree as ~o the ~ncremental penalty counts to be charged against the faultly Node. The Asymmetric Error Detector wtll interrogate the Error Handler Context Store -210 and will generate the deviance report inhibit slgnal when the detected errors are determined to be asymmetrlc errors of the type iden-tified in Table lV. The devlance report inhib~t s~gnal will inhi~it the Error Reporter 230 from reporting to the Error Filer 204 a deviance error for any Node whlch sent an error message con-taining an incremental penalty count which dev~ated from the voted lncremental penalty count by more than the perm~tted tolerance.
The dev~ance report inhib~t signal is also applied to an inverted (negat~ve) 1nput of the AND gate 234. The dev~ance report inhlbit slgnal w~ll d~sable the AND gate 234 and block the write inhibit s~gnal generated by the MaJor~ty Agree Detector 224. This will enable the voted incremented penalty count to be added to the base penalty count stored ~n the Base Penalty Count Store 212.
The Error Reporter 230, rece1ves the m~ssing and deviance vectors from the V~ter 389 the current system state (CSS) vector from the Current System State Register 214, the error report inh~b~t s1gnal from the Asymmetric Detector 226, and the write inhibit signal from the output of ~he AHD gate 234. In response to the absence of a write lnh~bit signal, the Error Reporter 230 wlll report to the Error F~ler 204 the Node iden-tlf~ed ~n the dev~ance veetor as havlng deviance errors, it will also report in response to the missing vector each Node which did not send a Base Penalty Count or Error message as required. In response to a write ~nhtbtt s~gnal and the absence of an error report inhibit s~gnal ~rom the Asymmetr~c Error Detector 226, the Error Reporter 230 will report each Node having reported an unsup-- ported error. No deY~ance errors are reported for these unsup-ported Error messages. Flnally, ln response to an error report ~nhib~t s~gnal from the Asymmetr~c Error Detector 22h, the Error -58- 3L~ L~

Reporter 230 wlll report to the Error F~ler 204 any Node which fails to report the Asymmetr~c error as ident~fied by the miss~ng vector. As previously described, the Error Reporter 230 will not report any deviance errors in the presence of a devlance report S inhiblt signa1 from the Asymmetric Error Detector.
Returning to F19ure ~3, the Error Conslstency Checker 206 also includes an Exclude/Readm~t Threshold Comparator 220 responsive to the incrementing of the base penalty count ~n the Base Penalty Count Store 212 by the voted 1ncrement penalty count.
The Exclude/Readmit Thresho1d Comparator 220 w~ll compare the incremented base penalty count with a predetermined exclusion threshold value and when the incremented base penalty count exceeds the exclusion threshold value 9 the Exclude/Readmit Threshold Comparator 220 wlll set the excluded flag in the next System State Reg1ster 216 ~n the blt posttion whlch corresponds to the faulty Node. The setting of the excluded flag signifies that in the next System State the Fault Tolerator has determined that the Node whose exclus~on flag was set should bc excluded from the operating set. At the end of each At~m1c perlod, the current (CSS) and next (NSS) System State vectors are transferred to the Synchron~zer 46 and are ~ncluded ln the next System State (MT5) message as the current system state and the new system state vec-tors, respectively. The new system state is globally verified by the V~ter 38 upon the rece1pt of the System State messages from all of the participating Nodes in the system. The majority v~ew of what the new system state ~s to be is the medlal value generated by the voting process. Thus, an error in a local deci-sion to exclude or ~nclude a Node will manifest itself as a dev~ance error.

~ 30 Actual reconfiguration of the workload to the new voted : system state is carr~ed out by the Scheduler 40 and the time at which the sequence 1s ln~tiated ls based on an appl1cation's des~gner selectable parameter. Reconf~guration can either occur at the next Atom~c perlod after which a new system state is glo-~3~`7~38 bally ver~fied or wa~t unt~l the next Master period. If recon-figurat10n occurs at any Atomic per~od then the voted new system st~te vector is passed to the Scheduler 40 as a system state vec-tor dur~ng normal transfer sequence between the Fault Tolerator and the Scheduler 40. However, ~f reconfiguration occurs at the M~ster per~od boun-daries, the voted new system state vector is passed to the Scheduler 40 only when the flag signifying the last Subatom~c per~od (LSAP) ln the Mas~er period is true.
To permit the readmittance o~ an excluded Node following I0 an extended period of error free operation, the Error Consistency Checker 20h has a Base Penalty Count ~ecrementor 222 which will decrement the Base Penalty Count for each Node by a predetermined quantity at the end of each Master per~od. A~ter decrementing the base penalty count for each Node, the Base Penalty Count decremen-tor 222 w~ll enable the ExcludetReadmit Threshold comparator 220 to compare the ~ecrementéd base penalty count of each excluded Node wlth a predetermined readmittance va1ue. The Exclude/Readmit Threshold Comparator 220 wlll reset the flag in the Next System State Reglster 216 for each prev~ously excluded Node whose decre-mented base penalty count ~s less than the readm~ttance threshold value. This per~lts Nodes to be readmitted to the operating set the next time the system ls reconflgured slnce its operation has been error free for an extended per10d of t~me. This error free operat~on 1ndicates that the or~ginal ~alllt was transient or had been corrected (repaired or replaced)~ Prefera~ly9 the rea~mit-tance threshold value is less than the exclusion threshold value to prevent the system from oscllla~ing between two different system states ~f a Node has an intermittent fault which causes its base penalty count to fluctuate about the exclusion threshold value. The Base Penalty Count Store 212, the Current System State Reg~ster 214 and the Next System State Reg~ster 216, preferably, are incorporated ln the Error Handler Context Store 210 but may be ~ndependent elements in the Error Cons~stency Checker 206, as shnwn ~n Flgure 23, The Exclude/Readmlt Threshold Comparator 2~C will also ~f~J3 detect the exclusion of lts own Node and generate a RESET signal which activates a Reset Generator 157 shown ~n Figure 22 to generate an Operat~ons Controller Reset signal (OCRES) and an Initial Parameter Load signal (IPL~ which w111 cause the Operations Controller to reset and reload the initial parameters as prev~ously descrlbed. The Reset Generator 157 ~s also respon-sive to the Power On Reset (POR~ signal to generate the OCRES and IPL s~gnals each tlme the electr1cal power to the Operations Controller is turned on~
The Error Message Generator 208 will generate, duriny each Ato~ic period, an Error Message for each Node which generated a message containing an error detected by 1ts own Operations Controll~r. The Error Message Generator 208 w~ll also generate a ~ase Penal~y Count Message at the beginning of each Master period.
These messages are trarlsmitted to the Transm~tter 30 through the Transmitter Inter~ace 156.
At the beginning of each Atomlc perlod~ the Error Message Generator 208 will set to zero tO) the Increment Penalty Count for each Node. It w~ll then check the error code file section 170 of the Fault Tolerator RAM 162 for any error reports.
The error code of each error report is used to address the Group Mapping Section lfi8 to nhtain the pointer to the penalty weight section 172 of the Fault Tolerator RAM 162 to extract a penalty weight, This penalty welght is s~ored as the Increment Penalty Weight for the faulty Node in the Error Handler Context Store 210 and is used to increment the hase penalty count currently being stored for that Node. This prncess ~s repeated for each reported error for each Node until thc Fault Tolerator receives a System State message slgnify~ng the end of the Atomic period for each 3U indivldual Node. In response to receivlng a System State message from a part~cular Node, the lncrement penalty count and base penalty count for that Node are frozen. The Message Generator will then, using the content of the error code file section 170 of the Fault Tolerator RAM l62 and the stored 1ncrement penalty count and the base penalty counts stored ~n the Error Handler -fil- ~L~ 3L~3~

Context Store 210 construct an error message for each Node for which a fault was detected. In the event a System State ~essage is not received from a faulty Node, the base penalty count and the increment penalty count wlll be fro~en by sensing the High End Of S Fuzzy (HEOF) signal generated by the Node's own Synchronizer 46 which sign~f~es that all valid System State messages should have been received. Th~s prevents the transmission of the error and base penal~y count messages from being hung up while waiting ~or the missing System State message.
The format of the four (4) error bytes of the Error message (bytes 3 through 7) are shown in Figure 25. The most signi~kant blt of the ~irst ~yte ~s a se~f-test flag which iden-tifies that the reported errors were detected in a Self-Test message. In a Self-Test Error message, the incremen~ penalty count Byte 8 will have a small value or be zero.
At the beginning of each Master per~od, the Error Message Generator 208 will retrieve the base penalty counts currently stored for each Node and will generate a Base Penalty Count message which is the first message transmttted after the System State message ~lich ~s sent by the Synchronizer 46 at the end of the last Atomic perlod 1n each Master perlod. As discussed relative to the Transm~tter 30~ the Transmltter's Message Ar~itrator 56 will poll the Fault Tolerator Interface 52 after it sends a System State message at the end of the Master period, then wait for a Rase Penalty Count message generated by the Fault Tolerator 36.
Flgure 26 depicts the sequence o~ operatlons of the Operations Controller wh~ch results in a reconfiguration of the system and the role played by the Fault Tolerator 36. Referr~ng to Figure 26, at the beginnlng of each Master period, (a) signl-f~es each Node will broadcast ~ts Base Penalty Count message and reset all of the ~ncrement penalty counts ln ~ts Error Handler Context Store to zero. The Enror ~and1~r 164 will then begin the processing of any errors detected by its ohn Operations Controller's error detect~on mechan~sms. At the end of the first or any subsequent Atomlc per~od, (b) 9 1n which a message con-taining an error occurs, each Node will broadcast Error messages ~dentifiying the Node which sent the message for wh1ch the errors were detected and the increment penalty count and the base penalty count for that Node. By the end of the next Atomic period (c) the Error mess2ges from all of the Nodes should have been received.
Dur~ng the next Atom~c period (d) the Fault Tolerator will process the received Error messages and detect any unsupported Error messages from other Nodes and perform a medial vote on the incre-ment penalty count for the accused Node whose reported error orerrors are supported by a majority of the ~odes. Thls medial increment penalty count is then added to the base penalty count of the accused Node. The incremented base penalty count is ~hen com-pared with the exclusion thresho1d. If the incremented ba~e penalty count exceeds the exclus~on threshold, the exclusion bit for that Node is set ~n the ne~t System State Vector which is passed to the Synchrontzer 46. At the end of that Atomic period, (e3 the Synchronizer 46 will inclu~e the next System State Vector in the System State message wh~ch is broadcast to all of the other Nodes. At the beg~nnlng of the next Atamic period (f) the Fault ~olerator 36 w111 verify the correctness of the nex~ System State by using a median vote of the healthy Nodes and pass this infor-mation to the Synchronizer 46 and to the Scheduler 40. Upon receipt of this infonmation (g) the Synchronlzer 46 and the Scheduler 40 wi~l initiate a reconfiguration process ln which the System State identified ~n the voted next System State Vector becomes the current System State for the System. AFter the recon-,figuratlon is completed (h) the system w;ll begin a new Master period in the new System State. Although the above example is directed to a s~ngle fault by a single Node9 the Fault Tolerator ', operation 1s no d1fferent if more than one fault is detected for any one Node and more than one Node is accused of being faulty in the rece~ved Error messages~ This sequence can be overliad if successive failures occur 1n d1fferent Atom~c periods.
VOTR
The Voter 38 perfonms two primary functions in the pro--63~ L~

cessing of the data. First ~t generates a voted value for all available copies of the data and second, it performs a deviance check to determ~ne if the data value of each copy of the data is w1thin a predetermined tolerance or allowed deviance. Referring to Flgure 279 the Voter 38 has a Loader 236 which receives from the Fault Tolerator 236, the message type (MT) code, the node identification (NID) code, and ~he data ldentification ~DID) code for each message for which a voted value is to be generated. As each such message is received, the Loader 236 will retrieve and temporarily store each copy of the data currently available in the Fault Tolerator RAM 162, which has the same data ldentification (DID) code as the received message. The Loader 236 will also, us~ng the message type code, retrieve from the deviance sec~ions, 174 through 180l of the Fault Tolerator RAM 162 the predetermined deviances for that particular message.
The Loader 236 wlll flrst transmit the most significant bit of each copy of the data in parallel to an Upper Medlal Value Sorter 233 and a Lower Medial Value Sor~er 240 which will, respec-tively, sort the rece~ved bits to generate an upper (Un) and a lower (Ln) medial blt value. These upper and lower medial bit values (IJn and Ln) are transferred, as théy are generated, to an Averaging Circuit 242 and a Deviance Checker 244. At the end of the hard error w~ndow (HEW~ the Loader 236 wil generate a missing vector (MY) identifying each Node which dld not send a Task Interactive Cons~steny or System State message. The Loader 23fi will also generate a missing vector at the end of each Atomic period identifying each Node which dld not generate an Error Message or a Base Penalty Count message~
The Averaging C~rcuit 242 adds the upper and lower median b~t values and divides the sum by two to produce a voted average.
The De~ance Checker 244 recei~es the upper (Un) and the lower (Ln) medial b1t values, the deviance values retreived from -64- ~L~ L9 3~

the Fault Tolerator RAM 162 and the corresponding data bit from each copy of the data being processed and w~ll determine for each bit ~n each copy of the data value whether or not it is within the allowed devlance. Thts process ls repeated for each bit in each copy starting w~th the most significant bit to the least signifi-cant bit. At the end of each message, a deviance error tDERR) vector is sent to the Fault Tolerator 36 ~dentifying each ~ode whose message contained a deviance error.
The voted data value generated by the Averaging Circuit 242 for the Data Value messages (MT09 MT1, MT2, and MT3) are transmitted to a Voter-Task Communicat~r Interface 246 which passes them to the Task Communicator 44 along wlth the data ldentification (DID) code for that data value. The voted values for the base penalty counts conta~ned in the Base Penalty Count messayes, the voted values for the current and new System State Vectors contained ~n the System State messages and the voted values for the incremental and base penalty counts in the Error messages are transmitted to a Voter-Fault Tolerator Interface 248 where they are passed to the Fault Tolerator 36 along with the ~ev~ance error (DERR) and the m~ssing vector, as previously described.
The voting process and the deviance checks are repeated each time a message is received which requires a voted value to be generated. This assures that at all times the Task Communicator 44 and the Fault Tolerator 36 wlll have the best and most current voted values for the data value they may need~ Using this type of on-the-fly votlng9 the system w~ll not hang up if one or more copies of the data is unavailable due to a detected fault in the rece1ved message or a faulty Node fails to generate the requ~red message.
The Upper and Lower Medial Value Sorters 238 and 240, respectively, extract two values from the data values being pro-cessed. The values chosen depend upon whether the median select (MS3 or the med~an of the med~al extremes (MME) voting is imple-. .

~l~3~L~3 mented. To avo~d confusion, only the implemention for extracting the upper and lower medlan values w~ll be dlscussed. Mlnor changes to implement the mean of the medlal extremes (MME) sorting w111 be dlscussed br1efly herelnafter.
The process used by the Upper and Lower Medial Value Sorters 238 and 240 are shown on the flow diagram lllustrated ln Figure 28, while the details of the circult for the Lower Medial Value Sorter 240 are shown ln F~gure 290 Since the Upper Medlal Value Sorter 238 ~s a mlrror lmage of the Lower Medial Value Sorter 240, it need not be shown or discussed in detail.
Referring first to Flgure 28, the process begins hy ini-t~aliz~ng the bit count (n) to zero and to generate an initial median value S'n = Sn = (m-l)/2 as shown in block 250. In the calculat~on of the Inltlal median value Sn, m ls the actual number of cop1es of the data being processed whlch is obtained from the Loader 236. The Upper Medi~l Value Sorter 238 then counts, as shown in block 252, the numben of ones (1's) contained in the most signlficant blt positions of all the copies to generate a number n1, whlch is the number of one blts counted.
The Upper Me~ial Value Sorter 238 wil1 then lnquire lf nl - S'n is equal to or less than zero, as shown in decls1On block 254. If the number of l's ls less than S'n than the upper medial bit value ls a 0-blt as indicated in block 256. If the upper medial value Un is a 0-bit, then all the cop~es hav~ng a l-bit in the same bit posit1On are excluded from the subsequent processlng to determine the value of the rema~ning upper med~al value b~ts. ~ecause some coples of the data value are now excluded, a new value S~n is com-puted as indlcated ~n block 258 by subtractlng From S'n the number of excluded coples (nl) to generate a new value for the analysis of the next h~ghest blt.

When the number of 1-bits ~s greater than S'n then the upper med1an ~alue Un is a l-bit as lndicated ln block 260, and ~; ~ all of the copies having 0-blts 1n the same bit position are ;

~3~ ?3 -6h-excluded from the subsequent processing. After the upper median value Un for the most significant bit is determined, the process wlll proceed to the next most slgn~f~cant bit, block 27~, and the above procedure is repeated until all the bits in the data value (n=nmaX) haYe been processed dS lndlcated in dec~sion block 274.
In a s~m~lar manner, the Lower Medlal Value Sorter 240 w~ll count the number of O's as lnd~cated ~n block 262 to generate a number nO equal to the number of O's counted. If the rum~er oF
O's (nO) ~s less than Sn, as ind~cated ln decision block 264, (nO ~ Sn < O) then the lower med~al blt (Ln) is a 1-bit and all the data copies havlng a O-b~t in the same bit position are excluded from processing o~ the subsequent lower medial bits.
Aga~r, the med~al Yalue S~ is corrected by subtracting the number of excluded cop~es tnO) from the precedlng values for Sn as indi-cated in block 268. If the numher of O-bits (nO~ is greater than Sn~ then the lower me~al value of the b~t pos~tion is a O-bit as ~ndicated ~n block 270 ~nd the data copies hav~ng a 1-bit in the same bit position are excluded, This process is repeated until all of the lower medlal value b~ts are detenmine~.
The c~rcult deta~ls for the Lower Medlal Value Sorter 240 are shown ~n F~gure 2~. The clrcu~t details for the Upper Medial Value Sorter 238 are the mlrror of the circu~t shown in F~gure 29 except that a Zero Counter 280 ~s replaced by an equiva-lent One Counter. Referrlng to Figure 299 the data values from ~5 each copy of the data value retr~eved by the Loader 23fi are received bit by b~t9 from the most significan~ value to the least signiflcant valuep by a plurality of OR gates 276-0 through 276-N~
and to a l~ke plurality of exoluslve NOR gates collectlvely indi-cated by block 278~ The OR gates 276-0 through 276-N will pass the receiYed bits to the Zero Counter 280. The Zero Counter 280 ; w~ll actually count the number of 1-bits received and subtract that number from the number of cop~es ~m~ being processed to generate the number of 0'5 (nO). The Loader 236 counts the number of coples of the data it retr~eves from the Fault Tolerator RAM
162 and suppl~es th~s number to the Zero Counter 280 and to an Sn Generator 282. The Sn Generator 282 subtracts one from m and divi~es the remainder by two to generate the initial ~alue for Sn, The output of the Sn Generator 282 is received by a 2:1 Mult1plexer 284 which will pass the 1nitial value of Sn to a Register 286. The output (nO) of the Zero Counter 280 and the content of the Register 286 are rece~ved by a Dual Comparator 288 which performs the dual comparison of nO = Sn and nO ~ Sn~ The outputs of the Dual Comparator 2B8 are appl~ed to the inputs of an nR gate 290 which outputs the lower med~al value Ln. If nO = Sn or nO < 5n~ then t~e lower medial bit value Ln ~s a l-bit, as ~ndicated by decislon block 264 and block 266 of Figure 28~ The excl~sion of the cop1es havin~ O-b~ts ~n the same b~t position is perfonmed by the Excluslve NOR gates 278 and Register 294. The 1-bit value produced by the OR gate 290 ls appl~ed to the input to each of the Exclusive NOR gates~ The excluslve NOR gates 278 will generate a loglcal 1 signal for each copy of the data value which has a O-bit in the same bit position. This logical 1 is stored in the Excluslon ~eg~ster 294, the output of which ts connected to the alternate inputs of OR gates 276-0 through 276-N, As a result, the inputs of their respective OR gates 276-0 throuyh 276-~ whose data has a O-bit in the s~me bit position will be a 1-bit which is passed to the Zero Counter, thereby excluding them from further participat10n in the determ~nation of the lower medial bit values. If the lower med~al bit value, Ln~ ~s a 0, then a loglcal O signal is applied to the Exclusive NOR gates 278 which causes a logical 1 to be stored in the Exclusion Register 294 frr each copy of the data value whlch presented a l-bit for processing, The output, nO, from the ~ero Counter 2~0 and the lower medial bit value Ln are applied to the input ~f a Gating Circuit 296 which passes the value of nO to a Subtract~on Circuit 298 when the lower med~al blt value Ln ls equal to 1. The Subtraction Circu~t 298 also receives the current value of Sn st~red ~n Reg~ster 2~6, and perfonms the subtract~on Sn = Sn - nO indi-34 cated in block 268 of F~gure 28D Th1s new value of Sn is applied to a second input of the Multiplexer 284 and is passed to the Reg~ster 286 for use in the processing of the next lower medial b~t value~ A sequencer circu~t (not shown) will mon~tor the number of b1ts processed and w~ll c1ear the Exc1usion Register 294 and 5n Reglster 286 after the las~ lower med~al bit Ln is generated f~r the current set of data values in preparation for the processing o~ the next set of data values.
The operation of the Vpper Medial Value Sorter 238 and the Lower Med~al Value Sorter 240 for producing upper (Un~ and lower (Ln) med~al values for generating a Mean of the Medial Extremes (MME) voted value ~s identical to that described above except that the medlam values S'n and Sn are replaed with values T'n and Tn which are the smallest ~f S'n and Sn, respectively, or a fixed value. The resulted voted value generated by the Averag~ng Circuit 242 using these values is then the Mean of the Med~al Extremes.
The operation of the Averaging Circuit 242 will be explalned with reference to the flow diagram shown in Figure 30 and the circu~t d~agram shown ~n F~gure 31. The averaging process forms the mean of the upper and lower values by keeplng two ver-sions of the mean value M and choos~ng between them as later bits arrive. This process rests on the following two facts:
a) If the b~ts Un and Ln are identical at any par-ticular bit posit~on, the mean bit M is the same excep-t for the case described in (b) below. If the blts are different9 then the mean ~s 1~, which in binary fonm ~s a 0,1; and b) A sequence 1~, l~, 1~, ... 1~ can be resolved ~nto a b~nary format ~nl~ when the first identical pair following the sequence arr~ves. For example, the sequence I~, 1~, 1~, ... I~, where 0 represents the arrlval of Un and Ln both having n values, resolves to o~ and the sequence 1~9 lk~ 1~, ... lk9 1 where 1 represents th~ arr~v~l Of Un and Ln both hav~ng l values resolveS
to 100..~00.

.3B

Referring to Figure 30, the process beg1ns by ini-tlal~z~ng the value A to O and the b1t number n to O as ~n~icated by block 300. The value A is the Exclusive OR of the precedtng values of Ln and Un and is set to O at the beg~nn~ng of the pro-S cess. The process then inqu~res dec~sion block 302 1f theExclusive OR of Ln and Un ~s equal to zero (Ln Un ~ O). If the Exc1us~ve OR of Ln and Un is equal to zero, the process then inquires if A ~s equal to 0, as lnd~cated by decis~on block 310.
~f A ~s equal to 0, the value of Ln ~s inser~ed ~nto both reg~sters M1 and M2 as ~ndicated by block 3120 Register MI and M2 keep two dlfferent versions of the mean M in order to reso1ve the problem d~scussed above ln (b) where Un and Ln are different. If A in decislon block 310 is not equal to 0, then the Averaging C~rcuit 242 enters the cnmplement (Un) oP Un lneo registers M
~15 and M2 as ~ndicated by block 314. The process then inquires, declslon block 31fi, ~f Ln iS equal to 0. This is the resolution of the sequences dlscussed ~n (b) above, if the sequences exist.
In the instant embodiment Ml stores the sequences (1, O, O, ... O, O) described above and M2 stores the second sequence (O, 1, 1, ...
1, 1)~ If Ln = then the sequence is resolved to be the sequence stored in M2~ therefore, M1 is made equal to M2, as ~ndicated in hlock 320. Otherwise, if Ln is not equal to 0, then Ln iS a 1, ~ and the sequence is resolved to be the sequence stored in Ml and ; M2 ~s made equal to M1, as ~ndicated in block 318.

If the Exclusive OR of Ln and Un ~n decision block 302 ~ is equal to 1 signify~ng Ln and Un ane d1fferent, the process: inqulres, declion block 304, lf thls is the f~rst time ~his has .; ' occurred ~A-01. If.A=09 then 1 i5 ~nserted into the correspondlng bit posit~on of the reg1ster Ml9 s~arting the sequence (1, O, O, ... O, O) and a zero ~s inserted ~nto khe reglster M2 starting the sequenee (O, 1, I, ... 1, 1). If A=l s~gnifying that this is not a ~lrst occurrence of Ln and Un be~ng dlfferent9 a zero is inserted into the correspond~ng b~t positions oF register Ml and a 1 is inserted ~n the corresponding blt pos~tions of register M2.
the proeess then generates a new value for A depending upon the - 7o -exclus~ve OR of the current values of Un and Ln as indicated by block 322. The process will then 1ndex the blt count to n=n~1, block 32~, then inqulre9 decision block 326, if the last bit (nmaX~ has been processed. If not, the Averag~ng Circuit 242 will proceed to process the next values of Un and Ln generated by the Upper and Lower Medial Value Sorters 238 and 240, respectively.
Referr~ng now to F~gure 31, the medial values, Un and Ln~ respectively, are received by an Excluslve OR gate 328 which produces a 1 at its output when Un and Ln are different and a O
bit at its output when Un and Ln are the same. The output of the Exclusive OR gate 328 is transm~tted to the input of a 2-bit Sh~t Register 330 an inverted lnput of A~D gate 332 and an input of AND
gate 348. The Shift Reglster 330 temporarily stores the output of the Exclus~ve OR gate 328 for use in processlng the next Un and Ln bits rece~ved from the Upper and Lower Medial Value Sorters 238 and 240. The delayed output of the Shift Register 330 is the value A d~scussed w~th reference to Flgure 30. ThQ upper medial bit Un 1s also applied to the O and 3 ~nputs of a pair of 8:1 Multiplexers 334 and 336. The other ~nputs to Multip1exers 334 20 and 336 are preset as shown. The values of A, Un, and Ln are used to address the Multiplexers 334 and 336 to output the value Un or one of the preset values. For example, if A=Ln=Un=O then the Multlplexers 334 and 336 would both output the O lnput which is the va1ue of Un as ~nd1cated by block 312 in Flgure 30. Likewise, : 25 if A=O and Ln=Un=l then the Multiplexers 334 and 336 would bothoutput the value o~ Un applled to the third input to the Multiplexers 334 and 336 wh~ch ~s the value of Un. In the first , example, Un was equal to O and ln the second example, Un was equal to 1. Note, lf A is O and Un and Ln are different then the Mult~plexer 334 w111 output a 1 and the Mult~plexer 336 w~ll out-put a 0 as ~ndicated by block 306., However~ if A=1 and Ln and Un are different, the outputs of the Multlplexers .334 and 336 will be reversed as ~nd~cated by block 308 of F~gure 30.
The outputs of the Mult~plexers 334 and 336 are received --71~

by 3:1 Mult~plexers 338 and 340 as shown. The Multiplexers 338 and 340 also receive the outputs of an Inverter 342 which is the complement (Un) of the upper medlal bit value Un. ~he outputs of the 3:1 M~lltlplexers 338 and 340 ~re received by a Ml Reg1ster 344 and M2 Register 346, respectively. The outputs of the 3:1 Multiplexers 338 and 340 are controlled by ~JANI) gate 332 and AND
gate 348. The NAND gate 332 produces a log1cal 1 output when the output of Exclusive OR gate 328 is O and and the value A is 1.
This actuates the 3:1 Multlplexers 338 and 340 to store ~he compl~nent of Un of the upper medial bit value ~n both the Ml Register 344 and the M2 Reg~ster 346, respect~vely, as indicated by block 314 of Flgure 30. The AND gate 348 produces a logical 1 output when the output of the Exclus~ve OR gate 3~8 is a I and A
is a 1 which causes the output of the Multlplexer 334 to be stored 15 ln the M2 Register 346 and the output of ~ ltiplexer 336 to be stored ~n the M1 Reg~ster 346 as lnd~cated by block 30~ in Figure 30.
The output of the NAND gate 332 ls also used to actuate the Ml Re~ster 344 and the M2 Regilster 346 to copy the content of 20 the M1 ~eglster 344 lnto the M2 Register 346 or vlce versa dependlng upon the value of Ln as indicated by block 316 ~n Figure 30. The output of the NAND gate 332 and the lower medlan b~t Ln are appllen to the ~nputs of an AND gate 350, the output of which detenn~nes whether the content of the Ml Register 344 will be 25 transferred to the M2 Reglster 34h or vtce versa as ind1cated by blocks 318 and 320 of Figure 30.
The operat~on of the De~/~ance Checker 244 shall be ~, d~scussed wlth respect to the flow d~agram shown in Fiyure 32 and the c~rcuit inplementation shown ~n F19ure 33. The circu~t shown 30 ln Flgure 33 ls repllcated ~n the Dev~ance Ohecker 244, one cir-cult for each Node in the system, so that the dev1ance checks on all ~he data values belng checked can be checked in parallel.
In order not to cause any s~gn~icant delays ~n check1ng the deviance, the Deviance Checker 244 processes the data being checked on a bit-by-bit bas~s from the most s~gnificant b~t to the least signif~cant bit as the upper medial Un and the l~wer med~al Ln values become available from thc Upper Med1~l Value Sorter 238 and the Lower Med~al Value Sorter 240. The dev~ance checking pro-cess ~s based on the cond~tion tha~ A > B can be distinguished from A < B by adding B to the two's complement of A and looking for an overflow at the most s~gn~ficant b;t ~MSB). In the instant application the Deviance Checker actually checks the relationship of the follow~ng equat~on:
M - D < V < M + D (1) where: M is the medial value 1~ (Ln ~
D ls the predetermined deviance ~imit retrieved from the Fault Tolerator RAM 162 for the particular data value being checked; and V ~s the data value being checked.

Since the solut~on for M - D < V is substantially equ1valent to the solut~on for V < M t D we will only d~scuss the latter in detail. The d1fferences between the two solutions are well within the perview of one sk~lled ln the art.
_ The process adds the four avallable ~nputs Ln~ Un. D and V and looks for an overflow at the ~ost s~gnificant bit position using the equat~on:
M ~ 0 - V = 1~ (L + U) + D - V ~ (2) 25 which can be rewritten as: (3) L + U ~ 20 - 2V = L ~ U + 20 ~ ~ + 1 = L + U ~ ZD. ~ 1 + 2V < 0 where 2V is the 2's complement of 2 times the data value V.
~hi~ ~roce~ com~ic~te~ by t~ ~dc ~ha~ we a~e add~ng ~ol~r b~t* rather than threè s~nce c~e h~ va~?Je D~ ~the ~3~

mean M is not available. In the addltion o~ four b~ts there ~s the posslbil~ty that all four bits are l's causing a double carry to the second previous hit. The solution to this is as follows:
a) A sequence such as ...llOXX cannot oYerflow. For example, in the worst case (X=Y=l) even two double carries gives the results ...1110~0. Therefore, if an overflow has no~
already occurred, a zero (0) ~n the second prev~ous bit position unconditionally ~ndlcates that no overflow will occur at the most s~gnificant bit position whatever happens to the latter bits; and b~ The sequence before the second previous bit will always be 111...111 ~f neither an overflow nor the condition in ~a) above has occurred. ~herefore, a carry past the second pre-vious bit w~ll always cause an overflow.
The process proceeds by ~uccessfully examining the value lS of the second previous bit B as carries from the later bits are added to it. If a carry occurs beyond the second previous bit, then an overflow occurs and V c M ~ D. However, if the second previ~us btt B is 0, without a prior or current overflow, then V > M ~ D. Finally, if all b~ts pass without either of the above conditions occurring, then the sum M ~ D - V is less than 0 and V > M + D.
Re~erring now to the flow diagram shown in Figure 32, the circu~t is initialized as shown in block 352 by setting the bit number n = 0, the initial sum blt S' - 1, and the interim sum bit of the second preceding b~t B'_l = 0. The process then pro-ceeds to add 2V~ D ~ L ~ U as indlcated in block 354, where 2V is the 2's complement of 2~ D' ls 2n 4 1 which is the deviance value actually stored ~n the Faul~ Tolerator RAM lh2 and Un and Ln are the upper and lower medlal values recelved from the Upper and Lower Medlal Value Sorters 238 and ?409 respect~vely. As ind~-cated ~n block 3549 the results of th~s addltlon produces a f~rst prevlous sum bit S_l which ~s the sum value obtained during the processlng of the preceding da~a value bit, a carry bit C and a double carry b~t C' obta;ne~ in the process1ng D~ the current data b1t value.
Next, the process adds the first previous sum bit S_I
generated durlng the processlng of the preceding data value bit, with the current carry b~t C as ~ndicated ln block 35fi to generate a second previous sum b~t interim value B'_2 whlch is used in the processing of the next data value hit. The add1tion also produces a third carry bit C" whlch is indicative of an additional carry resulting from the processing of the current bits and the first I0 prevlous b1ts. The carry bit C " , from block 356, is added to the double carry bit C' of the current data value being processed and to the interi~ value B'_2. The sum and carry bits resulting from the addlt~on of tc' C " ) + B 2 a carry bit A for the second pre-ceding blt and blt value B whlch ls the final b1t value of the second prev~ous b~t after correcting for all carriesl The processthen inquires lf the carry bit A is e~ual to I as ind kated in decislon block 3601 lf A=I then V < M ~ D as previously lndlcated and the "pass" flag 1s set as lndicated 1n block 362. Howevèr, if A=0, the process inqulres, decision block 364, if ~=0? If the second prevlous blt B ~s equal to zero, ~hen there will be no overflow at the most signlf1cant blt pos1tlon. Therefore, V ~ M +
D and the "fall" flag ~s set lndicating that the data value failed the devidnce test as 1ndicated in block 366~ If B is not equal to zero, the process wlll proceed to check the next bit o~ the data value as indlcated by block 368. F1nally, after checking all of the data value bits and ne1ther the l'pass" nor "fail" flags are set, the process will automatlcally set the "fail" flag as indi-, cated ending the process.
Referr~ng now to F1gure 33, an Adder 372 adds the first 3 b1ts Un, Ln~ and ~', to produce an interim sum bit S' and afirst carry bit CI. The lnterlm sum bit S' ls received by an AND
gat~ 374 and an Exclusi~e OR gate 376 as lndicated in equat~on 3.
The AND gate 374 wlll output a second carry b~t C~ wh k h is - appl1ed ~o one 1nput of an AHD gate 378 and an input of an .

._75- ~3r~3l~,3~3 Exclusive OR gate 380. The AND gate 378 receives the carry bit C
from the Adder 372 at its other input. The Excluslve OR gate 380 also receiYes the carry b1t Cl from the Adder 372 at iSs alternate input.
The output of the Exclusive OR gate-376 is a sum bit S
which is temporarlly stored in a two bit Shift Reg~ster 382 until the processing of the next data value bit. The output of the Exc1us~ve OR gate 380 is a single carry b~t G wh1ch is received at the inputs of an AND gate 3~4 and an Exclus~Ye OR gate 38fi. The AND gate 384 and the Exclusive OR gate 386 rece~ve the sum bit S_ at their other inputs from the Shift Register 382~ The sum bit S_l ~s the sum bit S generated during the processing of the pre-vious data value bit. The output of the Exclusive OR gate 386 is the sum of the sum bit S_l and a carry b~t generated durlng the process~ng of the current data bit which is a prelim~nary bit value B'_l which ls stored in a second Sh~ft Register 392. The preliminary value B' 1 is an interim value of the second preceding bit value before correction for the carry bits. The output of the AND gate 384 ~s a carry bit C " wh~ch is recelved at an input to an Exclusive OR gate 390 wh~ch also receives at its alternate input the double carry output C' from the AND gate 378.
The output of the Exclusive OR gate 390 is received at an input to an AND gate 38B and an input to an Exclusive OR gate : 394. The output of the Shift Reglster 392 is received at the alternate input to the AND gate 388 and Exclusive OR gate 3S4.
The output of the AND gate 38B is the carry bit signal 'iA" for the second preceding bit which is applied to the set input of an S-R
fllp flop 3980 The Q output of the S-R flip flop 398 ~s applied to the D lnput of a D-type fl~p flop 400. The output of the D-type flip flop 400 is the pass-fail flag for the deviance check.
If A=l as ~nd~cated ln declsion hlock 360 of F~gure 32, then the Q
outputs of the S-R flip flop 398 and D-type flip flop 400 are lls sign~fy~ng that the data value (V) ~s less than the median (M) plus the dev~ance (D~ the Q output of the ~-R fl~p flop 398 .

-76~

and D-type flip flop 400 are O's, then the data value failed the deviance check.
The output of the Excluslve OR gate 3q4 is the final bit value 8 of the second preceding data value after corrections for single and double carr~es. The final bit value B is inverted by an Inverter 402 whose output is connected to the SET input of a second S-R flip flop 404. The Q output of S-R flip flop 404 is applled to one 1nput of an AND gate 406 whose output ls connecte~
to the clock input of the D-type flip flop 400 throuyh an OR gate 408. A clock pulse (CLK) is applied to the alternate input of the AND gate 406 which is applied to the input of the D-type flip flop 400 when the ANG gate 406 is enabled by the Q output of the S-R
fl~p flop 404.
A bit counter 410 counts the number of bits processed and generates an overflow pulse after all the blts have been pro-cessed. The overflow pulse is appl~ed to the clock input of the D-type flip flop 400 through an AN~ gate 412 and the OR gate 408.
The alternate input to the AND gate 412 is received from the Q
output of the S-R flip flop 398 and is disabled when the S-R flip flop 398 ~s placed in ~ts SET state by the carry signal A being a 1.
In operat~on, the Adder 372 produces the interim sum bit S' and the carry bit C1 resulting from the addlng of Un, L~, and D'. The AND gate 374 produces a carry bit C2 which results from addlng the 2's complement (2V) of 2V to the sum of Un~ Ln~ and D'.
The carry.b~t C2 1s combined with the carry bit Cl from the Adder 37Z in AND gate 378 to prodluce the double carry bit C' when both C1 and C2 are l's. The output of the Exclusive OR gate 3~0 is ind~cat~ve of a s~ngle carry btt C from either the Adder 372 or 30 ~he AND gate 374~ The sum signal S 1 is the sum S output from the Excluçive OR gate 376 wh~ch is output from the Shift Register 3~2 during the processlng o~ the next subsequent data bit. These are the operations speclf~ed ~n blook 354 of Figure 32. The opera-77_ ~ 3 a tions of block 356 are carried out by the AND gate 384, Exclusive OR gate 386, and Shlft Register 3g2. The Exclusive OR gate 386 produces an lnter1m sum va~ue b1t B'_l from the sum bit S_l from the Shlft Reg~ster 382 and the carry b~t C fro~ the Excluslve OR
: 5 gate 3800 The sum b~t B'_2 ~s the s~gnal 8'_1 output from the Shift Reglster 392 dur~ng the process~ng of the second subsequent da~a value bit. The carry b~t C" ts ~he output of the AND gate 384 which is a continua`t70n of the carry b~t ~ when the sum bit S l gener~ted ~n the process~ng the preceding data value bit is a : IO l. During the processing of the next data ~alue bit, the - Exclus~ve OR gate 390 and the AND gate 388 will generate the Yalue A and the Exclusive OR gate 394 w~ll generate the bit value ~ as indicated ln block 358~ The value of A ~s O when the interim value of the second preced~ng sum b1t, ~'-2 is O or when both C' and C" are O's ind~cating no carry blts Cl or C2 have resulted from the process~ng of the current data value b~. The value of A
is 1 when C' or C " ls a 1 and the ~nterim value of the second preced~ng ~um bit B'_2 ~s 1~ The ~alue o~ B is 1 when B'_2 is 1 and C' and C " are O's or when B'_2 is O and C' or C" are a 1.
~hen A ~s a 1, the S-R fl~p flop 398 wlll be set and its Q output will be a 1, wh1ch when appl1ed to the D input of the D-type flip flop 400 w111 cause lts Q output to become a 1. A 1 at the Q output of the D-type fl~p flop 400 is the pass flay as indi--cated ln block 362. The Q output of the S-R flip flop 398 will disable the AND gate 412, preventing the ovrr~low bit from the Bit Counter 410 from toggling the D-type flip flop 400 after the pro-cessing of the last bit. If the AND gate 412 is not disabled by : i the Q output of the S-R fl~p flop 398, the overflow bit from theBit Counter 410 wlll toggle the D-type fl1p flop ~00, changing its Q output from a 1 to. a ~. A O Q output of the D-type fl~p flop 4~0 1s the fail flag, as ~nd~cated by blook 366.
The funet~on of ~he decis10n block 364 is carried out by the Inverter 402, the S-R fl~p flop 404 and the AND gate 406.
When B 1s 0, the Inverter 402 w~ll cause the S-R ~lip flop 404 to ~33 be placed in the set state causing its Q output to be a 1. A 1 from the Q output from the S-R flip flop 404 enables the AND gate 406 to pass a clock (CLK) pulse whlch w~ll toggle the n-type flip fl~p 400 through the OR gate 408. The n output of the D-type flip flop 400 in the absence of the Q output of the S-R flip flop 398 being a 1 w~ll go to a low or O s~gnal. A low or O Q output of the D-type fl~p flop 400 as prevlously indicated ~s the fail flag indicated in block 36~o The pass or fail flag is passed to the Fault Tolerator 36 throuyh the'Voter Fault Tolerator lnterface 248 as a deviance error (DERR).

SC~EDULER

The Scheduler 40 has two modes of operation,,a normal rnode and a reconfiguration mode. In the normal mode~ the Sche-duler 40 schedules the application task for each operating Node in the system ~ncluding its own, and mon~tors the execution of these tasks. The reconf~guration mode ~s entered whenever the Fault Tolerator 36 determines ~f one or more Nodes are ta be excluded or readmitted to the operatlng set. The two modes interact through an activaticn status which defines which tasks are eligible for execut~on by each NodeO The reconfiguration mode modif~es the activation status, whereas the normal mode utilizes the activation status to schedule the task.
During normal mode operation, the Scheduler 40 imple-ments a dynamic, prior~ty base; nonpre-emptive task scheduling process. Concurrent programming practlces and the resolution of , inter task dependencies are supported at the boundaries between the tasks. Task-to-node alloca~ion is static for any given System State (conf~guratlon)~ but the sequencing of tasks and the resolut~on of dependenc~es are perfonned dynamically. The Scheduler 40 ~n each Node repllcates the scheduling process for every act1~e Node ~n the system. Fault detection mechanisms per-mit each Node to recognize erroneous behaY~or in the sequencing or timing of the task executed by any Node.

-7~- 3l~r~ 3~3 During reconfiguration, tasks may be reallocaten among the operating Nodes. Tasks may also be added or deleted fr~m the actlve task set to conform to the changes in the overall system capabilities.

During start up or reset of the Operations Controller 12 the Scheduler 40 enters the reconf~guration mode wlth the assump-tion that no Nodes are operattng. Whcn the Fault Tolerator 36 recognizes an "operating set," that information is passed to the Scheduler 40 as a new System State Vector. The Scheduler then reconfigures the tasks in accor~ance w;th the received new System State Vector. By using this method the operation of the Scheduler 40 is self-hoot strapp~ng.

A block dlagram of the Scheduler 40 is shown in Figure 34. A Task Selector Module 414 receives lnformation from the Fault Tolerator 36 through a Fault Tolerator Interface 416, from the Synchronizer 46 through a Synchronizer Interface 418, and from the Task Co~munlcator 44 through a Task Commun~cator Interface 420. The Task Selector Module 414 also communicates with a Scheduler RAM 422 and a Scheduler ROM 424 through a Memory Interface 426.

A Reconf~guration Module 428 is responsive to the recep-tion of a new System State Vector from the Fault Tolerator 36 to reallocate the task to be selected and executed by the new set of operating Nodes. The Reconf~guration Module 428 w~ll change the ~5 activation status of the tasks stored ln the Scheduler RAM 422 using predetennlned ~nformat~on stored 1n the Scheduler ROM 424.
A map o~ the Scheduler RAM 422 ~s shown ln Flgure 35.
The entry Old TID contains an entry for each Node in the system and stores the TID previously started by that Node. The Swap table entry contalns an entry ~or each task (TID) and stores a predecessor count wh~ch ls the total number of lmmediate prede-cessors to that partk ular task. A perlodlcity corresponding to -80~ 3l~3~3 how many Atomic~periods must pass between the execution o~ the task and two swap count numbers which are used to swap or change the active status o~ a task on a particular Node shall be explained during the discussion relat~ve to reconfiguration.

The Allocat~on Table stores an allocation count for each task-node pair in the system and is used in the reconfiguration process to determlne the distr1bution of the active tasks among the Nodes.

The Selection Queue 450 has 3 pages9 NEXT, PREVIOUS an~
10 CHECK. Each paye contains three entries for each Node corresponding to the three highest priority tasks currently ready for execution by that Node. "Used" is a Boolean value indicating whether the current iteration of the task in the entry has been started by that Node, ITER is the interation number of that task in the entry~ and T[D is the task ldentlfication code for that task. The NEXT page is the entry from which the next task to be executed for each Node is selected, the PREVIOUS page lists the tasks selected dur~ng the precedlng Subatom~c period, and the CHECK page contalns the tasks selected during the second preceding Subatom~c period for that Node. The pages are rotated at the beginning of each Subatomic period~ and the newly selected task for each Node is stored ~n the NEXT page.

The Completion Status List contains~ for each task, a complet10n count wh~ch corresponds to the number of copies of that task that have been completed, the hranch condition count which stcres a number correspond~ng to the number of received Task Completed/Started messages ln which the branch conditlon has a value of 1 and an allocation entry which contains the allocation of that task among the various Nodes.

The Prlority Scan List stores for each ~ask the prede-cessor count which ~s the number of preceding tasks which have to be completed before that task can be executed, the iteration number of that task and ~ts allocation. The Task Activity List entry stores for each task the predecessor count, the periodicity ~ 33~3 of the task, and its allocationO

A map of the Scheduler ROM 424 is shown ;n Figure 36.
The first entry is the Success~r L~st which llsts the successor tasks for each termlnated task~ ~h~s 11st is accessed by the address of the Successor Offset as shall be exp1ained hereinafter.
There are two Successor-Lists, one for each of the two possible branch condit~ons. The next four entries are the Preference Vec-tors for each task and ident~fies those Nodes preferred for the execution of that task. The Relevance Yector contains two entries, IO the first INCLUDE/EXCLUDE 1dentifies whether the task is to ~e executed by the Nodes included in the Operating Set or executed hy the Nodes excluded from the Opera~ing Set, and a Relevance Vector which identifies to which Nodes the task is relevant. The Initial Swap Table entry contains for each task, the initial predecessor count, the per~odicity, and the initial swap counts for each task which are loaded into the Task Activity List of the Scheduler RAM
422 during reset or recon~lguration as shall be dlscussed later.

The next two entr~es are the Initial Allocation Counters for each task and lists the in~tial allocation count or toggle point for each task-node combination. These values are loaded into the Allocation Tables ~n the Scheduler RAM 4~ following reset or power-up. The entry Maximum Execution Time Table stores the 2's complement of the maximum execution time for each task and is loaded ln~o the execution timer, for that Node9 when the task is started. The entry Mlnimum Execution Time Table stores the 2's complement o~ the m~nimum execution time for each , task and is used to check the execution time of each task when it ls reported as being completed~ The Successor Offset entry con-tains for each task, the startlng address 1n the Successor List 30 where the successor tasks are stored. F1nally~ the Initializing Table entry stores ~he max~mum Node Identification code (~ID) and the maximum Task Identlfication code (TID) used ln the system which are used to identify when a particular operation is compl eted, Figure 37 shows the details of the Task Selector Module414. The NID and started TID f~elds of the Task Completed/Started messages are transferred dlrectly from the Fault Tolerator Interface 416 to the Task Communicator Interface 420, and are also temporarlly stored in an On-~oard RAM 430. A Completed/Started Handler 432 transfers the TID and NID of each task identified in a Task Completed/Started message from the Orl-~oard RAM 430 to a Started TID Register 434 shortly after the end of the Soft Error Window (SEW) at the end of each Subatom~c period. This is the per~od of time when all non-faulty Operat~on Controllers are transmitting their Task Interactive Consistency or System State messages and all the Task Completed/Started messages from the pre-ceding Suhatomic period should have been receivede The Started TlD Register 434 for each Node is a 3-deep queue in which the new IS NIn and TID are added to the tall of the queue and removed from the head.
The Task Selector Module 414 also has a TIC Handler 436 is responsiYe to the Byzant~ne voted values of the task completed vector and the branch cond~t10n bits of the Task Interactive Consistency (TIC) ~essages. Th1s data, received from the ~yzantine Voter ~n the Synchron~zer 46 is used to update a Selection Queue 450 and a Completion Status List 438, a Wake-up Sequencer 440 respons~ve to the various period signals generated by a Per~od Counter 442 for transferrlng active tasks from a Task Activity List 444 to a Prlority Scan L1st 446 and t~ the Completion Status List 438, a Priority Scanner 448 which selects the tasks in the Priority Scan List 446 which are placed in the Select10n Queue 450, the Next Task Selector 452 which selects the highest priority task in the Selection Queue 450 and places it in a Next Task Reg~ster 454 ~rom where it is trans~erred to $he Task Gommunicator 44 for exesution by the Appllcations Processor, an Executlon Timer 456 whlch monitors the execution time of each taskbe~ng executed by the indiv1dual Nodes ~n the system, and an Old TID List 458 which stores the current task being executed by each Node. The Task Activity List 444, the Priority Scan List 446, ~he Completion Status List 438, ~he Selection Queue 450 and the Old TID Llst 458 are embodied 1n the Scheduler RAM 422 as discussed ~ 33 relative to Figure 3S.

The operation of the Wake-up Sequencer 440, the Execution Timer 456, the TIO Handler 436, the Priority Scanner 448, and the Next Task Selectnr 452 w~ll be discussed relative to the flow diagr~ms shown ~n Flgures 38 through 4~. The operation of the Completed/Started Handler 432 is relatively simple in that it transfers the content of the On-Board RAM 430 to the Started TID Register 434 at the beginning of each Subat~mic period.
The flow diagram shown in Figure 38 describes the opera-IO tion of the Wake-up Sequencer 440. The process begins hy repeatedly inquiring if the Subatomic period is the last Subatomic period (LSAP) or is the third Subatomic period (SAPJ, as in~icated by Inquiry Blocks 460 and 462. If it is the last Subatomic period, the proce~s ~nitial~7es the TI~ pointer to the Task Activity List 444 to 0, as indicated by block 4fi4. The process then inquires decis10n blnck 468, if the perlodicity of the tasks TID is less than the period indicated by the Period Counter 442.
IF it is, the Pr~ority Scan List is initialized, as indicated in hlock 470. The Priority Scan List iteration set is equal to the current iteration for that task. The predecessor cou~t is set equal to the predecessor count contained in the Task Activity List and the allocation is set equal to the allocation contained in the Task Activity List~ The process then proceeds to inquire, deci-s~on block 472, if the task just processed was the last task. If 25 it is, the operation of the Wake-Up Sequencer 440 is completed~
otnerwise, the process will index to the next task on the Task Act~v~ty List 444 as indicated by block 480 and again check if the periodlcity of that task is less than the period of the Period Counter, as indicated by decision block 4~8. lf the TID period is greater than the Per10d C~unter then the task is not entered into the Pr~ority Scan List 446 an~ the task pointer is indexed to the next task in the Task Activity L~st as indicated by block 480.
The last task ~n the Task Activity List 444 is a null task which has a period1city of OJ Thus the las~ task will always be entered into the Priority Scan ~1st 446 as indlcated by block 470 when ~ 3~3 there is no other task whose periodicity ls less than the period of the Period Counter 442.

If the period indicated by the Period Counter 442 is the third Subatomic per~od, the Wake-Up Sequencer 440 w~ll again in~-tiali2e the pointer to the Task Activity List to the flrst task as1ndicated by block 482. The Wake-Up Sequencer will then inquire, dec~sion block 484, ~f the period~city of thc task is less than the period indicated by the Per;od Counter 442. If it is, the Wake-Up Sequencer will initial~ze the Completion Status List 438, IO as indicated by block 48fi. It will then set the iteration in the Completion St~tus List to 0, the Branch Condition List to 0, and set the allocation to the allocati~n indicated ~n the Task Actlv~ty List. The Wake Up Sequencer 440 will then inquire, deci-sion block 4889 if ~t is the last task in the Task Activity List.
If it is, the operation of the Wake-Up Sequencer 440 ls completed.
Otherwise the TID pointer ~n the Task Activity List will be indexed to the next task as indicated by block 490 and the above procedure w~ll be repeated. If the period~city of the task is greater than the per~od indicated by the Period Counter 442, the Comp1etion Status List 438 w~ll not be updated and the pointer to the task ~n the Task Act~v~ty List will be ~ndexed to the next task. ~hen the pointer ln the Task Actlvity List ls indexed to the last task9 lt w~ll always be entered ~nto the Completed Started List s~nce it has a period1city of 0.
After the wake-up process is completed, the Execution T~mer 456 w~ heck the executi~n timer for each Node as shown in the flow dlagram in Figure 39. As prev~ously lnd1cated, the exe-, cution time for the task belng executed by each Node is ~he 2's complement of the maxlmum execut~on time. Th~s ts done because w~th current technology it ~s eas~er to increment the tlme rather than decrement the t~me. The operation of the Execution Timer 456 beglns by init~aliz~ng the timer po~nter to the first Node, as ~ndicated in block 492, The Execution T~mer w~ll then increment the times stored by each Node by one (I)~ as indicated in block 494. The Execut~on Tlmer 456 w~ll then check each timer ~or the -85~ r~3l~

time rema~n~ng for the execut~on o~ the task, as ind~cated by decls~on block 496. 1f the t~mer for any part~cular Node is equal to 0, then the timer w~ll set an error flag for that Node t~ true.
Thls 1nformatlon is ti~en sent to the TIC Handler 436 befnre it is passed to the Fault Tolerator 1nterface 416 for reasons wh~ch shall be expla~ned later, If the current time is not equal to 0, the Execut~on T~mer 456 w~ nqu~re, declsion block 500, lf it has checked the last Node and, if lt has~ lt will ex~t the execu-tion timer processO Otherw~se ~t w~ll lncrement the Node pointer to the next ~ode, as lnd~cated by block 50Z and check the current time of the next Node.
The operation of the TIC Handler 436 w~ll be descrlbed w~th reference to F~gures 40 through 44. The TIC Handler responds to the arrlval of the voted value of ~he Task Interactive Cons~stency message and mod~f~es the ma~n d~ta structure based upon that data. It treats the occurrence of a time error as equ~valent to a conf~rmed complet~on so that a stalled or per-manently hung copy of a task does not hold up the rest of the work load. The operatlon of the TIC Handler 436 starts following the complet~on of the Execution T~mer checks and the receipt of the Byzant~ne data from the Synchronizer 4Ç. The TI~ Handler 436 selects a Node for wh~ch e~ther a confirmed completion or an over-t~me error has been reported. If a conf~ ~ed completion has been reported, the TIC Handler clears the timer error blt associated w~th that Node s~nce the complet10n was conflr~ed during the same Subatomlc per~od in wh~ch ~he tlmer exp~red. The TIC Handler then searches the CHECK page of the Selectlon Queue 450 for the TID of the f~rst unused task encount~red for the Node wh~ch was reported to have completed a task. Thts ts the TID of the task wh~ch the Node should haYe star~ed. If th~s TID does not match the TID
currently stored in the Started TID Reglster 434 for tha~ Node, then a sequence error ~s recorded. Flnally, the TIC Handler calls each of ~ts sub-processj Selectisn ~ueue Update, Completion Ter-m~nat~on, Execut~on T~mer Reset, and Pr~or~ty Ssan Update, and ~5 sequentlally updates the data structure for the selected Node.

~ 3~3 -8h-The TIG Handler process is repeated ~or each N~de.
As shown ~n F~gure 40, the operat~on of the TIC Handler beglns by inqu~ring if the Byzant~ne data ~s available~ as ind~-cated by dec~sion block 504. If ~t 1s not available, the ~IC
Handler 436 w~ll wait untll ~t does become available. Otherwise the TIC ~andler w111 init~al1ze the po~nter to CHECK page of the Selectlon Queue 450 to the f~rst Node pos~t~on, as ~ndlcated by block 506. The process w~ll then ~nquire, as ind~cated by dec~-slon block 508, if the Node completed a task~ as ~ndlcated by the Byzant~ne data~ In parallel, if the Byzantlne data did not indl-cate that a task was completed by that Node, the process will check to see if a time error had occurred9 as indicated 1n deci-sion block 5240 If the ~yzant;ne data ;nd~cated -that the Node did not complete a task and there was no time error~ the process w111 lncrement the Node polnter to the next Node, as ~nd~cated by block 526. The process w~ll then check to determ~ne if it had investl-gated the last or the maximum Node as indlcated by block 528. If ~t was the last Node, ~t w~ll ex~t the program, otherwise it w~ll proceed to check the next Node to see ~f lt had completed a task or a t~me error had occurred.

~ hen a Node has completed a task and a t~me error has been recorded for that Node, the TIC Handler wlll set the tlme error to false, as ind1cated by block 510 s~nce the task was completed ln the same Subatom~c per~od in whlch the time error was letecte~. Therefore, the t~me error is ~nval1d and it is can-celled, If e~ther the Node had completed a task or a time error had occurred, the process will then mark as used the f~rst unused entry for that Node ~ound ~n the CHECK page of the Selection Queue, as ~ndicated by block 512. It w111 then store as the current T~D the TID of the entry that had just been marked used and lt will store the current lteratlon as the lteration of that same entry, as ind~cated by block 514. The process w~ll then check to determ~ne that the current ~ask ~s also the same task that was reported by that Node in ~ts last Task Completed/Started message whlch was stored ~n ~he Started TID Reg~ster 434 as shown ln F~gure 37. I~ the current task and the task reported as compl~ted in the 1ast Task Completed/Started message for that Node are not the same, the TIC Handler ~36 w~ll set the Sequence Error flag to "true," as ind~cated by block ~2~ The Process w~ll then 5 call the Selection Queue (SQ) Update sub-process, as ind1cated by blnck 518 and wait for the complet~on of the Priority Scan List (PSL) Update sub-process, as 1ndicated by block 522. When the Pr10r~ty Scan List Update is completed9 ~he process w111 then lndex a Node pointer to the next Node, as ind~cated by block 526 and then check to see if ~t has processed the last Node, as ind~-cated by decision block 528u The sub-process Select~on Queue Update for the TIC
Handler searches t~e NEXT and the PREVIOUS pages of the Selec~ion ~ueue 450 for the Nodes selected by the TIC Handler. When an entry ~s found containing both the current task and the current lteratlon7 lt is marked "used." Such entr~es may or may not be found because the tasks with a h~gher prlor~ty than the current task may have become available between the generation of the CHECK
page and the generation of the PREVIOUS or NEXT page. It is not necessary to mar~ the CHECK page entry since it wlll no~ be accessed again before it ls refreshe~. The Selection Queue Update sub-process begins by init~aliz1ng the po~nter to the PREVIOUS
page to the O entry, as 1nd~cated hy block 530. The process will then index the entry to the f~rst entry~ as ind~cated by block 532, and w~ll inquire lf the current TID and iterat~on are equal to the T]D and ~teration of the entry~ as lnd1cated in declsion block 5340 If they are the same, then the entry 'lused" is marked "true,'l as indicated by block 5360 Otherw~se the process wlll '~ lnqulre ~f ~t has checked all of the three entries of the PREYIOUS
page, as lnd~cated by decislon block 538. If ~t has not ~hecked all of the entrles on the PREVIOUS page of the Select~on ~ueue 450, ~t will proceed to ~ndex the entry to the second entry and so on untll it has checked all three entries on the PREYIOUS page.
After e~ther f1nding the TID 1n one of the entries ln the PRE~IOUS
page or complet~ng check~ng the PREYIQUS page and not flnd~ng an 88- 3~ 3 entry, the program wlll then proceed to the NEXT page of the Selection Queue 450 and again will set the po~nter to the O entry, as ~ndlcated by block 540, It will index the entry, as 1nd1cated by block 542, then lnqu~re ~f the current TID and ~teratlon are the same as the TID and iterat~on of the entry, as ind1cated ~y block 544. If they are9 ~t w~ll mark the "used" entry "true," as lndicated by block 546. Otherwise the process will then inquire 1f lt has checked all three entries, as ~nd1cated by decision block 54a. If it has not, it will then index the po~nter to the next entry and cont~nue to investigate untll ~t has elther found ~he current TID and ~teration in the entry or it has checked all three entr~es. The process will then call the Completed Term~nation (CT) sub-processg as ind~cated by block 550.

The TIC Handler sub-process Completion/Termination records the completion of each copy of a task in the Completion Status L~st. If the flnal copy has been completed (or timed out), then the task ~s "term~nated." The Successor Llst entr~es ass-o-clated with the terminated task and the major~ty branch condltions are accessed via the base ad~ress in the Successor-Offset List, as ~nd kated ~n F~gure 36. The predecessor count for each successor of the tenm~nated task ~s then`decremented. If the branch con-d~t~ons generated by the various copies result in a tiel then the branch cond~tion O ls selected by the default.
The TIC Handler 436 retains an old valid b~t for each Node indicatlng whether the TID llsted in the Old TID section of the Scheduler RAM 422, as shown ~n Figure 35 ls a val~d Old TID or not. All blts of the old valid are set to false during system reconfiguration to ~nd~cate that the nex~ task to be executed by each Node ~s the flrst task and that there are no prev~ous tasks to proc2ss. The old valid is set to true after the confirmed start of the f~rst task on the Node and before the conflrmed start o~ the second task on the Node.

If the old valld ~s false9 then the started task ~s the flrst task being executed on that Node following a recon--89~ 3i~ 3 figuration. Therefore, there 1s no cnmpleted task to process and the Completion/Termlnat~on sub-process need not be executed.
Simllarly~ ~f the completed task 15 a null task, there ~s no need to termlnate the task. In the flow diagram shown in Figure 41, the polnt at wh~ch the task ~s compared to the max1mum task 1s the latest point at which the compar~son can be made w~thout poten-tially reporting a term~nat~on of a null task~ and thls makes the content of the maximum task entry on the Complet10n Status L1st irrelevant.
lQ Referring now to Flgure 42, the sub-process Completion/Term1natlon heg~ns by checking the old valid flag for the Node, as indlcated by block 552. As previously ind~cated, if old valid is not true, the process wlll then proceed to the next sub-process Execut~on T1mer Reset as shall be dlscussed here~nafter. However, ~f old valid ls true, the process wlll then record the complet~on of the task using the TID stored in Old TID
as the TI~ of the completed task then accessing the Completion Status L~st 438 and sett~ng the allocat10n for that TID-NID allo-cakion to false, as ~ndlcated by block 554. The process will then inqu~reO as ind~cated in dec~s10n block 556, if the branch con-dlt~on ~s equal to 1. If ~t 1s, it w111 then increment the branch condltion entry 1n the Complet10n Status L~st 438, as 1nd1cated by block 558. However, lf the branch condltlon is equa1 to 0, the process will proceed to lnquire, as lnd1cated by decision block 560, 1f all of the copies of that task have been completed. This ls 1ndlcated by all the entrles ln the aliocat~on sectlon of the Completed Status L1st being set to false.
If all of the copies of the task have been completed, the sub-process wlll proceed to report to the Task Communicator the ~dentlty of the term~nated task, as ind1cated by block 562.
A~ter report~ng the termlnatlon of the task to the Task Communicator 44, the process w~ll then get the address of the flrst successor task from the Successor-Offset entry contained 1n the Scheduler ROM 474, as 1nd1cated by block 56~. The process w~ll then inqutre, as ~ndlcated by dec~sion block 56fi, ~f the successor ~ 3~L~ 3 task 1s eq~al to the max1mum successor task whlch corresponds to the end of the Successor Task List for the terminated task. If that is the end of the Successor Task Llst, the program w~11 then proceed to call the Execution T1mer Reset sub-process, as indi-cated by block 572. If -the successor task 1s not the maximum TID
11sted on the Successor List for the term~nated task, the process wlll con~inue to update the Complet10n Status-Table by decre-menting the predecessor count for each successor task by 1, as indicated by b10ck 568. The process will then 1ncrement the address to the Successor List, as indicated by block 570 and proceed to analyze the next task on the Successor List.
The Execu~ion Timer Reset sub process oF the TIC Handler 436 checks the execut~on timer for each Node for a minimum -time error and reloads the t1mer for the newly started task. If the ; 15 old valid flag for that Node 1s false, then there is no comp~eted task and the error 1s not recorded. The Execution Timer Reset is the 1ast process to access the Old TI~ entry 1n the Scheduler RAM
422. It 1s, therefore, a conven1ent place 1n wh1ch to copy the current TID and to set the flag ol~ valid true.
F1gure 43 ls a ~low d1agram showlng the process executed by the Executlon T1mer Reset sub-process. The process be~ins by setting the TID equal to the Old TID for that particular Node, as 1nd1cated by block 57~o The process then compares, as indicated by dec1sion block 576, the ~urrent execution time for that TID
wlth the m1n1mum t1me. If the current execution time is greater than the m1n1mum execut10n tlme 1t then inquires~ as indlcated by block 578, 1f the old valld flag is trueO If old val1d 1s true, then the Execut10n T1mer Reset sub-process w~ll set the time error flag for that Node to "true," as ind1cated by block 580. If the current execut10n t1me 1s not greater than the m~nimum time or 1f the old valld flag 1s not true or lf a time error has been recorded, the process w~ll then reset the Execution T~mer, as 1nd1cated by block 582, by settlng the current t1me for that Node ~- ~ equal to the max1mum t1me for the currer,t task which is contained 1n the Scheduler ROM 424 1n the entry entltled Maximum Execution T~me Table~ as shown 1n F~gure 36. The process will then update the Old TID entry ln the Scheduler RAM 422 by setting the Old TID
for that Node equal to the current TID, as ind~cated by block 584, then set the old valid flag for that Node as true, as indicated by 5 block 586. The process w~ll then proceed t~ call up the P~iority Scan L~st Update sub-process, as ~nd1cated by block 588.
The Prlorlty Scan L~st Update sub-process of the TIC
Handler 436 records the start of the current task on the Node by clearlng the Pr~ority Scan List current TID allocation for the lU Node~ Th~s process makes the current task ~nellgible for reexecu-tion by that Node until its next lterationD at which time the Wake-up Sequencer 440 re~nitializes the Priority Scan List entry for the task. Two conditions must be satisf~ed before the update ls performed: 1) the started task must not be a null task, s1nce lS a null task must always be available, ~t may never be removed from the Priority Scan L~st; and 2~ ~he iteration number of the started task is the same as the iteration number in the Prlority Scan List. The two iteration values may d~ffer w~th~n the first three ; Subatomic per~ods of an Atom1c period if the task ran durlng the last three Subatomlc periods of the previous Atomic period.
Figure 44 ~s a flow d1agram showing the procedure exe-cuted by the TIC Handler 436 in the execution of the Priority Scan List Update. The process begins by ~nquir~ng ~f the entry is current, as indicated by block 590. If the entry is current, the process will then proceed to inqulre ~f the current task is a null task (maximum TID), as indicated by b10ck 5g2~ I~ the current task ~s not a null task, the Pr10r~ty Scan L~st is updated by recording that the Node has started that particular task, as indicated by block 594~ Effectively, the process sets the flag in the alloca-t~on entry of the Pr10rity Scan List for that particular Node tofalse. If the entry 1s not current, or ~f the task is a null task, the process returns, as ~ndicated by block 596 to the TI~
Handler process ~llustrated in Figure 40.

The Pr~ority Scanner 448 selects a candidate task for ~ 2 ~

the next Subatom~c per10d based on the latest confinmed data a~out the progress of the appl~cat~on work loadO The operat~on of the Pr1Ority Scanner 448 follows the updattng of the Pr10r~ty Scan List by the TIC Handler 436. The Pr~or~ty Scanner 448 w111 first rotate the page pointers of the Select~on Queue 450 then select three tasks for each Node by scann1ng the Pr~or~ty Scan L~st in the order of increas1ng TID's. In the Pr10r~ty Scan L~st the h19hest prlority tasks have the lower TID numbers and the lowest priorlty tasks have the h~gher TID numbers. The selected tasks 10 are then written into the NEX~ page of the Selection Queue For the~ r respective Nodes.
The operation of the Pr~or~ty Scanner 448 beglns by rotating the pointers in the Selection Queue 450, as ind1cated by block 598. The Prlor~ty Scanner then sets all of the Node en~ry po~nters to the first entry as ~ndlcated by block ~00. It then starts at the top of the TID 11st for the flrst task, as indlcated by block 602. The Priority Scanner 448 then lnqu1res, as indi-cated by block 604, if the precedent count for that task is equal to O ind kating that all of the predecessor tasks have ~een completed. If all of the preced~ng cond~tlons are satisfled, the Priority Scanner 448 will ~nvest1gate ~f the task has been pre-v~ously started on that particular Node, as indicated by decision block 606. If the task has not been prevlously started on that Node, the P~iority Scanner will then lnquire if that particular Node already has three entr~es, as ind~cated by block 60~. If it does have three entries, ~t w111 then check to see if that Node was the last Node, as ind~cated by block 610. If it is not the last Node, ~t will then index to the nex-~ Node, as 1ndicated by block 612 and will proceed to check the entries for the next Node.
If the Node be~ng evaluated is the last Node, the Prinr1ty Scanner 448 w~ll procee~ to check lf each Node has more than three entr~es, as ind~cated by block 618. If each Node has more than three entr~es, then the operation of the Prlor~ty Scanner is completed and ~t w111 ex~t. However, lf not all of the Nodes have three entries, then the Pr~or1ty Scanner 448 w~ll inqulre, as ind~cated 1n block 620, 1f lt has processed the last task. If it B

has processed the last task, then ~t w~ll f~ll all the rema~n~ng entr~es w~th the null task which ls the max~mum TID, as indicated by block 622. Uowever, ~f the TID ls not the maxl~um or last task ln the llst~ the process will ~ncrement the TID number and witl repeat.
Referring back to decision block fiO8, ~ the entries for ; a particular Node are not greater than 3 then the process w~llcopy the TID and iterat~on from the Pr~ority Scan List to the NEXr page of the Selection Queue 450 for that Node, as indicated by lU block 614. It ~ill then increment the entry for that Node, as ind1cated by block 616 and then inqu~re, as ind1cated by decision block 610, ~f that Node was the last Node. If it is not the last Node, then the process will proceed to the next Node~ as indicated by hlock 612 or will check i~ the entries in all the Nodes are full, as lndicated by dec~s~on block 6180 The Next Task Selector 452 exam~nes the first entry of the NXT page of the Selection Queue 450 for its own Node (NID).
If that task has not been previously s~ar~ed by its own Node, then ~t records that task ~n its Next Task Reg~ster 454 wh~ch is passed to the Task Communlcator 44 through the Task Communicator Interface 420 when requested by the Appl~cations Processor. If the task has been prevlously startedg then the next entry on the NEXT page of the Selectlon Queue 450 ~s exam~ned for the same cri-teria. The process continues unt~l an entry ~s found which has not bee~n executed, or until the t~ird entry has been examlned.

S~nce the Selection Queue 450 ls not updated until the third Subatomlc period after a task ls started, the Next Task Selector must maintain a local record of tasks started on its own Node~ The TID's of the prev~ous tasks started are maintained ~n a two en~ry deep stack to record the prev~ous two tasks actually started by the Node's Task Communlcator 44. The Scheduler 42 receives immedlate notif~catlon from the Task Communicator whe-- never a task ls started. It then pushes the ~urrently selected task onto the previous TID stack allowlng the oldest entry to fall off the bottom of the stack. The operatlon of the Next Task Selector 452 is trlggered by the beginning of the soft-error w~n-dow, while the Transm~tter is occupted with the transm~ss~on of a Task Interactive Cons1stency or a System State message.
Therefore, the Task Communicator cannot transmit a task Completed/Started message or start the selected task wh~le the Next Task Selector 452 ls mod1fylng the selected task. The Next Task Selector 452 ~s the only module ~n the Scheduler whlch has access to its own Node Identif~cation (NID) code.
I0 The operation of the Next Task Selector 452 will be discussed with reference to the flow diagram shown ln Figure 4~.
The operation of the Next Task Selector begins with the setting of the entry pointer to the the NEXT page to its own NID and to entry 0, as ind~cated by block 62fi. The Next Task Selector then incre-ments the entry pointer to the f~rst task, as ~nd~cated by block 628 and records as the selected task the task that is entered for ~ts own Node in the entry of the Selection Queue 450, as ~ndicated by block 630. The Next Task Selector wlll then inqu~re, decision block 632, if thls is the third entry ~n its own entry of the NEXT
page. If ~t is, it will store the selected tasks in the ~ext Task Reglster 454. However, if it ls not the thlrd entry, the Next Task Selector will ~nqulre, as ind~cated by decision block 636, if the selected task and iteration are the same as the first or second previously selected task and iteratlon. If the selected task and iteration are the same as a first or second prev~ously selected task and ~teration~ the Next Task Selec~or w~ll proceed to increment the entry and examine the next task in the Select~on Queue, as ind~cated by block 628. However9 ~f the selected task and lteration were not prev~ously selected, the Next Task Selector will stone the selected task ln the Next Task Reg~ster 454, as ; ~nd~cated by block 634 complet~ng the select~on process.

It can be seen from the above flow dtagram that if the first two entries in the Selectlon Queue 450 have been yrevi~usly executed by thls Node, the Next Task Selector 452 selects the th1rd entry regardless of ~ts previous selection status. This ?

feature allows multiple entrles of the null task to be placed ~n the Selection Qlleue simultaneously in the event there are no other tasks ready to run~ Thus, when no other tasks are ready to exe-cute, the Node w111 start the null task every Subatomic period untll another task becomes ava11able.

- The operatlon of the Task Selector Module begins with the end of the Soft Error Window (SEW), at which time no Task Complete/Started messages shou1d be arriving from the non-faulty Nodes. First, the Completed/Started Handler wlll transfer the IO content of the Task Completed/Started messages stored on the On-Board RAM 430 to the Started TID Register 434 bef~re the earliest possible arrival of the Task Comple~ed/Started messages for the next Subatomlc period~ All of the other processes exe-cuted by the submodules w1th the exception of the Next Task Selector 452 ~ust be completed before the beginn1ng of the Next Soft ~rror Window. The operatlon of the Next Task Selector 452 is tr~ggered by the beglnning of the soft error w~ndow and must be completed by the time the Transmitter completes sendihg its Task Interactive Cons~stency andJor System State messages and becomes avallable to the Task Commun~cator for sendlng Task Completed/Started messages. The operation of the Wake-up Sequencer is trlggered by the end of the operatlon of the Completed/Started Handler 432, After the operat~on of the Wake-up Sequencer 440 ~s completed the Execution Ttmer 456 will perform ~ts execution tlmer checks. The TIC Handler 436 w~ll then proceed to update the Selectlon Queue 450 and the Completion Status List 458, to reset the execution tlmers, and update the Priority Soan List 446. After the Pr10rity Scan Llst is updated, the Priority fronl the Priority Scan L~st 446 to the Selectlon Queue 450.
; 30 Finally, the Next Task Selector 452 wlll select the next task from the Selection Queue 450 and place it ln the Next Task Register 4~4.

The details of the Reconflguration Module 428 will be dlscussed relatlve to Flgure 47, When the System State is modifled by the exclusion or readm~sslon of a Node, it is 96 ~ 3~
_ necessary to reconfigure the assignment of tasks to the remaining operating Nodes. There are 2N posslble states for an N Node system. Thus, ~n an 8 Node system there are 256 possible states.
The storage of a Separate Assignment List for each of these states would require an excessive amount of memory. Therefore, recon-f~gurat~on is effected by a trans~tion-based algorithm which does not deal wlth the new state d1rectly. Rather~ ~t reconfigures the task load based upon the change between the old and new states.
The transition-based approach is inherently less complex than a state based approach since there are only 2N possible transitions, representing exclusion or readmission of each of the N Nodes.
The act~ve task set for a Node is def1ned as a set of tasks enabled for execut~on on that Node. For g~ven tasks and Nodes a Boolean value "act~vat1On-status" may be used to represent whether a g~ven task is enabled for execution on the given Node.
The purpose of reconf~guratlon is to mod1fy the activation-status For each task Node pa~r when the System State is modified by the exclus~on or readm~ssion of a Node. Three independent operations are need~d to correctly manage the act~vation-status values.
1) Indiv~dual tasks may be enabled or dtsabled for all Nodes in the syste~ to account for changes in the overall system capab11i-ties, For example, when the total number of operating Nodes falls below some preset value, a task may be eliminated completely from the actlve task set or replaced by funct~onally equiYalent simpler tasks. Th~s operation of actiYat~on or deactivati~n o~ a task is referred to as swapp~ng. A task wh~ch may be enabled for execu~
tion ~s sa~d to be swapped ~n wh~le a task whlch is d1sabled is said to be swapped out.
`
2) Active tasks m~y be reallscated among the operatlng Nodes of the system. For example, ~f a Node ~s excluded, one copy of each task a~ executed by that Node w111 be lost. In order to maintain the des~red redund~ncy of each task, one copy of each affected task must be executed by some other Node. The Scheduler does not requ~re all these tasks to be reassigned to one Node but rather may d1str1bute these tasks among the rema~ning Nodes as desired.

' 97~ 3l~3~

A s~de affect of reallocation ~s that ~t may require that lower pr~ority tasks be swapped out 1f the remain~ng Nodes are highly utillzed.

33 Tasks may be proh~b~ted ~rom execut~ng on ~ndlv~dual Nodes hased upon their operat~onal status~ For example~ when a Node is excluded by a state transit~on~ ~t ~s generally desirable to pro-hibit any appl~cat~on tasks from execut~ng on that Node. However, it is desirable for the excluded Node to init~ate a comprehensive sequence of d~agnostlc tasks. The set of all the tasks in the system are divided into two mutually exclus~ve subsets, the included task set and ~he excluded task set. Members of the included task set may only be executed by the ~ncluded Nodes and ~he members of the excluded ~ask set may only be exeouted by excluded Nodes.

, The follow~ng d1scuss~ons define the operat~ons required for the reconflgurat~on of the task 1n response to a State Tran-s~tion. If multiple chanyes to the System State are required, they are performed sequentlally one Node at a time in any multiple reconf~guration, all readm1ss10ns are processed before any exclu-sions are processed.
Referr~ng now to F19ure 47, the Reconflguration Module includes a Task Swapper 638, a Task Reallocator 640, and a Task Status Matcher 642. A Current and Next System State Comparat~r 644 rece~ves the System State Vector from the Fault Tolerator Interface 416, as ind~cated ~n F~gure 34 an~ generates a nelta System State Vector whlch ldent~f~es only those Nodes whose System State has changed between the next System State and the current System State. The Delta System State Vector also includes a flag ~nd~cating whether any ~ode has been readm1tted to the current 3n operating set. The Task Swapper 638 genera~es a 8001ean swap-status value ind~cating whether the task ~s swapped in or swapped out o~ the Act~ve Task set. Th1s process uses the Swap Table 646 wh~ch ~s conta~ned ~n the Scheduler RAM 422 as pre-v~ously descr~bed. The Task Reall,ocator S40 generates one ~oolean allocation-status value for each task-node pa~r in the system.
.

-98- ~ ~rt3L~

The Task Reallocator 640 uses the Allocatton Tables 648 which are contained ln the Scheduler RAM 422, as ind kated ln F~gure 35. The Status 642 Matcher genera~es a Boolean match-status value for each task-node pa~r. The Task Status Matcher 642 uses the Relevance S Vector Table 650 wh~ch ~s one of the tables stored in the Sche~uler ROM 424~ as prev~ously discussed w~th reference t~
Ftgure 36. The swap~status value9 the allocatton-status value;
and the match-s~atus value are handled ~ogether as symbol~cally indl~ated by AND Gate 652 and stored in the Task Activ~ty List 444 shown in F~gure 37.

The operation perfonmed by the Task Swapper 638 ts largely one detenmining the relevance of each Node to each task.
The applicat~on designer may deflne any subset of the Nodes as relevant to the performance of each task. The swapped-status of each task is determlned solely by the number of relevant Nodes ~ncluded ~n the opera~ing set. When a State Transitlon occurs, the new System State ts examtned t~ determ~ne whether the number of relevant Nodes in the operat~ng set wtll change the swap-status of each task. The number of relevant Hodes at whtch the change is requ~red ts deftned as the "toggle value" for ~hat task. Ih the preferred embod~ment of the system5 two toggle values are provided to enhance the flexlb11ity for system reconflgurat10n. The opera-tion of the Task Swapper 638 will be dtscussed relative to the flow dlagram in Flgure 48.

The operatton of the Task Swapper 638 beg~ns w~th the setting of the po1nters to the Relevance Vector in the Scheduler ROM 424 and the pointer to the Swap Tables in the Scheduler RAM
422 to the f~rst task, as 1nd~cated by block 653~ The Task Swapper will then 1nqu~re ~f the task is relevant to the Node 3Q excluded from the operat~ng set, as indicated by dec~sion block 654, I~ the task ls not relevant to the exeluded Node, the Task Swapper will proceed t~ evaluate the next task, as indlcated by block 662, However, ~f the task ~s relevant to the excluded Node, ~he Task Swapper w~ll inqu~reD as ind~cated b~ block 656, if the number o~ relattve Nodes ~n the System State is equal to the .

-99- 3~34 ~ 3B

Toggle Point (swap count = O). If the number of relevant ~odes equals the Toggle Point, the Task Swapper 638 w~ll complement the swap status~ as ~ndicated by block 658, then will decrement the swap count for that task in the Swap Table 646 as ind~cated by block fi60~ However, ~f the swapped count is not equal to 0, the Task Swapper 638 ~ll not complement the swap status of that task, but will s~mply decrement the swap count stored 1n the Swap Table fi46. After decrementlng the Swap Table 646, the Task Swapper will proceed to increment TID po~nters to the next task as indlcated by block 662 then inqulre 7~ this task is the last task in the system as indicated by dec~sion block 664T I~ it is the last task, the operation of the Task Swapper is completed~ otherwise the Task Swapper will repeat the above process until all the tasks have been evaluated.

The operation of the Task Swapper 638 when the nelta System State Vector ~nd~cates that a Node has been readm~tted t~
the system is indlcated 1n the flow d1agram ln Figure 49. As ~nd~cated with reference to operat~on of the Task Swapper for an excluded ~ode, when a Node is readm1tted into the operating set, the Task Swapper 638 wlll first set the pointers to the Preference Vector entry of the ROM 424 and the Swap Table 646 to the f~rst task (TID = 1) as 1ndlcated by block 666. The Swap Table 646 is part of the Scheduler RAM 422 as ~llustrated in Figure 34. The Task Swapper w~ll then inqu1re, decis10n block 6fi8, if the task is relevant to the Node which has been readm~tted into the operating set. If the task ~s not relevant to the readmitted Node, the Task Swapper w~ll proceed to eY~luate the next task, as indlcated by b~ock 676 and declsion hlock 678. Hokever, if the task is rele-vant to the readmitted Node, the Task Swapper wlll increment the swapped count in the Swap Table 646 as ind1ca~ed by block 670 then ~nquire9 as ~nd~cated by decision block 672, if the number of relevant Nodes is equal to the Toggle Point. If the number of r~levant Rodes equals the Toggle Po~nt, then the Task Swapper 638 w~ll comp1ement the swap status of that task, as ind~cated by block 674 and proceed to the next task, as 1nd~cated by block 676.

~ 3f~ 38 If the number of relevant Nodes ~s not equal to the roggle Po~nt ~swap count ~ 0)~ the swap-status of the task w~ll not be comple-mented and the Task Swapper wtll proceed to evaluate the next task, as ~nd1cated ~n block 676. The Task Swapper w~ll then 5 lnquire, as ~ndicated by decision block 678, if the t~sk was the las~ task to be evaluated. If the last task has been processed, the Task Swapper 638 ts ~lnished w1th tts operation, otherwise the process will be repeated for each task until the last task is pro-cessed.

The swapping process has the following properties:
1) All tasks and toggle points are treated indepen-dently;

2) The swapped status depends on the number of available relevant Nodes, not on the ~dentity of those Nodes; and 3) The process is reversible and path independent.
The swapped status of a task depends only on ~he System State and not on the sequence of transltiDns which preceded that state.
The operation of the Task Reallocator fi40 is very simi-lar to the process of the Task Swapper. There are, however, two major d~fferences between swapping and reallocation:
1) In reallocatton, not all Nodes respond identically to a par-ticular change of state. For example, if a give Node ls excluded, a second Node may be required to assume the excluded Node's tasks, whiloe the rest of the Nodes take no act~on whatsoever. It is, therefore, necessary to treat each Node independently.

2) In order to reallocate actlve tasks, it is not sufficient to note ~ust the relevanoe of a g~ven Node to each task. A method is requ~red to determ~ne whtch of the operating Nodes will assumQ or drop tasks in response to the transit~on. This ts accompltshed by allocattng each task to var~ous Nodes tn a predetermined order of preference.

'-101~

The "preferred set" f~r a given task-node pair is def1ned as the set of Nodes which are more preferred than others for execution of a given task. The applicat~on designer may define any subset o~ system Nodes which are the pre~erred set ~or each task-node pair. The allocation-status of each task-node pair ~ detenm7ned solely by the number of preferred Nodes ~ncluded ~n the current System State. When a State Transition occurs, the new System State i5 examined to determine whether the number of pre-~erred Nodes in the operatlng set w~ll change the allocation-status of each task. The number of preferred Nodes at which thechange is required is defined as a Toggle Value for that task and Node. In general 9 any number of Toggle V~lues may be defined for any task pair. However, only one Toggle Value is required for each task-node pair to prov~de the ~lexibility desired for system reconflguration.

The Reallocat~on process begins w~th the first task, as lndicated by block 680, in Figure 50. The Task Reallocator 640 wlll then start w~th the flrst Node (NID - 0), as ind1cated by block 682. The Task Real70cator 640 wlll then inqulre if the excluded Node (l) is a more preferred Node for that task than the Node (n) b~ing evaluated, as indlcated ~n decision block 684. If the excluded Node (i) it ~s not a more preferred Node for that task, the Task Reallocator w~ll then proceed to determine if it is a more preferred Node than the next Node, as indicated by block 692 and decision block 694. If the excluded Node is a more pre-ferred Node for the execution of the task, then the Task Reallocator inquires if the number of preferred Nodes in the state equals the Toggle Point (allocation count = O), as indicated~in bl~ok 686, 1~ the number of preferred Nodes is equal to the Toggle Point~ the allocat~on-status for Node (n) is complemented, as ~ndicated in block 688, otherwise the allooation-status is not complemented and the allocation count for that task-node com-bination ~s decremented~ as lndicated by block h90~ After decre-mentiny the allocation count, the Task Reallocator w~ll increment the pointer to the next Node, as indicated by block 692 then -102~ 3~

~nqu~re, as indlcated by decis~on block 694~ lf the Node is the last Node in the system. 1~ 1t ~s not ~he last Node, the Task Reallocator will repeat the process ~or each Node until the 1ast ~ode ~s evaluated then the Task Reallocator will index to the next task, as ind~cated by block 696 and repeat this process until all of the task-node comblnations have been completed, as ~nd~cated by dec~s~on block 698.
The operations of the Task Reallocator treats all tasks Nodes and Toggle Po~nts independently. The allocation-status depends on the number of available preferred Nodes and not on the identity of these Nodes. Also, the operation of the Task Alloca-tor is reversible and path independent. for example, if the Delta System State Vector ind~cates a Node has been readmitted into the operat~ng set, the operatlon of the Task Allocator parallels that of the Task Swapper in that the allocation count is incremented rather than decremented, as ~ndicated by block 680 and the incre-menting o~ the allocation count takes place prior to the inquiry to detenmlne if the number of preferred Nodes in the state ~s equal to the Toggle Point, as ~ndicated by dec~s10n block 676. The 20 allocatlon-status of the task-node palrs depends only on the System State and not on the sequence of transitlons whlch preceded that task.
The operation of the Task Status Matcher 642 wlll be d~scussed with reference to F~gure 51. When a Node is excluded from the operating set for a faulty behavior, it is generally des~rable to prohibit application tasks from executing on that Node. However, lt is desirable, to institute a comprehensive set of d~agnost~c tasks on the excluded Nodes. The Swapping and Allocat~on processes descr~bed above are not capable of supporting 30 this function. There~ore, ~he to~al task set ~s d1v~ded into two mutually exclusive sets~ the lncluded task set and the excluded task set. The tasks of the included task set are permitted to be actlve only on the Nodes lncluded ln the operat~ng set.
S~m~larly, the tasks in the excluded task set are penm~ted to be actlve only ln the Nodes excluded from the operatlng set.

-103~

If the included/excluded status of a g~Yen task matches the included/excluded status of a given Node, then the actlvation-status of that task on the Node is determ~ned by the Swapping and Allocation processes. Conversely, ~f the status of the task does not match the status of the Node9 then that task is prohibited from be~ng executed on that Node regardless of the results of the swapping and reallocat~on processes.
Referring now to Flgure 51, the Task Matching Process begins by setting the task polnter to the f~rst task, as indicated by bl~ck 700. The Task Status Matcher 642 then sets the pointer to the first Node (NID = 0~ as 1ndicated by block 702. The Task Status Matcher then detenm~nes if there is a match between the included/excluded sta~us of the task with the included/excluded status of the Node, as indlcated by the System State Vector. Th~s lS is indlcated in block 704 whlch says "TID-NID ma~ch equal ~o the Exclusive OR the Relevance Vector containe~ in ROM 424 and bit "n"
the System State Veotor." The Task Status Ma~cher will then pro-cess the next Node, as indicated by block 706 and decision block 708 untll all of the Nodes have been evaluated with respect to the glven task. The Task Status Matcher 642 w~ll then index the task pointer to the next task, as ind~cated by block 710 and repeatedly eva1uate all of the tasks until the last task has been evaluated, as lndicated by declsion block 712. After all o~ the task-node comb~nations have been evaluated, the operation of the Task Status 25 Matcher ~s completed.
TASK CO~lMUN I CATOR

The deta~ls of the Task Commun~cator 44 are shown in Fi~ure 52~ However9 the operation of the Task Communicator will be discussed in deta~l w~th reference to the subsystem diagrams and flow d~agrams shown ln F~gures 53 through 70.
The Task Communlcator 44 coordinates and suppor~s all the communications between the Operations Controller 12 and its assoclated App11cattons Processor 14. Upon demandg the Task Co~mun1cator 44 provides the Applications Processor 14 with the -104- ~ 3r~ 3~3 Task Identification Code (TID) of the next task to be executed and suppl~es the requ~red input data values. The Task Commun~cator receives all outpu~ data generated by the Applications Processor 14 and broadcasts them to all the other Nodes in the system via the Transm~tter 30. When the Appl~cations ~rocessor 14 reports an error condition, the Task Communlcator 44 reports the error cond~t~on to the Fault Tolerator 36 through the Voter 38. When a task ls completed by the Appllcat~ons Pr~cessor 14, the Task Commun1cator receives the value of the Branch Cond~tion (BC~
generated by the Applications Processor 14, and broa~casts it to all the other Nodes in the next Task Completed/Started message.
The Task Communicator 44 communlcates directly with the Scheduler 40, the Voter 38, the Transm~tter 30, and the Appl~cations Processor 14. The Task Communlcator has a table whlch lists by tasks the expected sequence of the input da~a to be ~sed by the Applications Processor lfl and the expected sequence of the output data generated by the Applications Processor. Using messages from the Scheduler 40, the Task Communicator keeps track of the current tasks currently be~ng executed by all the Nodes and uses thls infonmation to supply infonmatlon to the ~o~er relating to the message currently being processed.
Re~erring to Figure S2, the Task Communicator has a Voter Interface 714, a Scheduler Interface 716~ and a Transmitter Interface 718. The Voter Interface 714 lnterfaces with the Voter 38 and receives the vuted data and deviance vector an~ the Message Type Code ~MT)~ and Data Identif~cat~on Code (DID) of the voted data. The Voter w~ll also send the Node Identification Code (~ID) of a Node from wh~ch it has received data and request that the Task Communicator identi~y the DID of the da~a it has received.
3Q The Voter Interface 714 will also receive an Error Report from an Error Reporter 754 which is commun~oated to the Voter ~n~erface 74.

The Yoter Interface 714 receives data and the associated - MT and DID codes from the Voter 38. A Store Data Control 720 passes the data to the Data Memory where ~t ~s stored using the MT

-105- ~ 3~L~ 3 and DID co~es and the complement of a context bit taken from a Context Blt Memory 732 as an address~ The Data Mem~ry is par-titioned ln a manner s1milar to that prevlously dlscussed with re~erence to the Fault Tolerator RAM 162 and sho~n in Figure 16.
The context bit taken from a Context Blt Memory 732 is used to store the data in the Data Memory q2 in the appropriate partit~on.
A nID Request Handler 724 recelves a DTD request from the Voter 38 through the Yoter Interface 714 ~n the ~orm of a Node Identlfication Code (NID~. The ~ID Request Handler 724 w~ll 10access a Pbinter Table 726 and transmit back to the Voter 389 the identity (DID) of the expected data that the Voter should currently be process~ng. I~ the Voter detects a ~ismatch between the expected DID and the DID o~ t~e data value ~t is currently processing 1t will set an error flag.
15The Scheduler Interface 716 recei~es from the Scheduler 40 the identification of the task termlnated, the ;dentification of the task started, the identlflcation of the Node, and the next task selected hy the Scheduler. A Task Term1nated Recorder 730 will flip the context blt of the Context Bit Memory 732 for output DIn's of the task reported as terminated. The Task Termlnated Recorder 730 w111 also set a new NUDAT bit in a Nudat Bit Memory 722 indlcating to the Voter next tlme lt requests that same DIn ~t wlll be the flrst request for tha~ DID since the task that generated it was termlnated. The Task Terminated Recorder 730 25 wi l l al so OR all the deviances stored ln the Data Memory 42 for the termlnated task output DID's and stDre them ln a Dev~ance Error Register 734.
~ .
A Task Started Recorder 736, upon the receipt of a message from the Scheduler Interface 716 lndicating the starting 30 of a new task by any Node ~n the system, wlll . access ~he Pointer Table 726 to determlne 1~ the address ~tored in the Pointer Table polnts to a null l)lD ~ndlcatlng that that Node had completed the preced~ng ~ask. If the address stored ln the Pointer Table 72fi does not point to a null DID, the Task Started Recorder 736 will set a sequence error flag for that Node whlch ls s~ored ln a 106- 3~ t,~

Sequence Error Reglster 73a. After it has comple~ed this check, the Task S~arted Recorder 73~ w~l.l access the pointer ~n a DID
List 728 with the Task Identificati~n Code (TID) of the task started and store the address for the first DID in that task in the Po~nter Table 726. A ~ext Task Recorder 7~0 will store the next task received from the Scheduler in a Next Task Register 742.
An AP Input Handler 744 wlll transfer the identification of the next task stored ~n the Next Task Register 742 to an AP
Input FIFO 746. The AP Input Handler will then access the Pointer and DI~ List 72B w~th the Task Identification Code (TID) an~ get the address for the data stored in the Data Memory 42 needed for the execution of that task. This data will then be stored in the AP Input FIF0 746. When the Applications Processor 14 is ready to begln the execution of the next task, it will access the AP Input FIF0 74fi for the Task Identif~cation Code and the data necessary for the e~ecut~on of the task. The AP Input Handler 744 will also generate a Task Completed/Started message which is sent to the ~ransmitter 30 through the Transmitter Interface 718, which transmits this message to all the other Nodes in the system. When ~0 ~he Applicatlons ~rocessor 14 executes the selected task, the data resulting from the execution of the task will be stored in an AP
Output FIF0 748. An Appl~catlons Output Handler 750 will access the po~nter and DID List 728 and obtain the Message Type Code (MT) and the Data Identlfication Code (nID~ for each data value generated by the Applications Processor. Each Message Type Code, and the ~ta Ident~ficat1On Code along with the data is transmitted to the Transmitter Interface 718 and transmitted by the Transmitter 30 to all the other Nodes in the system. The last word generated by the Applications ProcessDr 14 contains an Appl~catlons Processor Reported Error (APRE) vector which is st~red in an APRE Register 752. The last word generated by the Applicat~ons Processor 14 also contalns the branch ~ondi~ion bit wh~ch ~s to be inc1uded ~n the next Task Completed/Started message generated by the hP Input Handler 744. This branch condition ls stored in the Transmitter Interface 718 unt~l it receives the re~ainder of the Task CGmpleted/Started message from the AP Input -1()7~ A~

Uandler 744.
The Error Reporter 754 receives the Deviance Error Vector from the Deviance Error Reg~ster 734, the Sequence Error Vector from the Sequence Error Reg~ster 738, and the Application S Processor Reported Error Vector from the APRE Error Reg~ster 752 and transmits these Error vectors to the Voter 38 through the ; Voter Interface 714. An Arbltrator 756 arbitrates the operation of the varlous Modules in the Task Communicator.

The operation of the Store Data Control will be discussed relative to Flgure 53 and the flow diagram shown in Figure 54. As more clearly indicated in F~gure 53, the Store Data Control recelves the voted data and devlances from the Voter.
A10ng with this in~ormatlon, it also rece~ves the Message Type Code ~MT) and the Data Identification Code (DID). Referring now to Figure 53, the operatlon of the Store Data Control 720 begins by evaluat~ng the flrst byte received from the Voter Interface 714, as indicated by declsion block 760. When the first byte is all O's, as lndicated in block 760, there is no data available and the Store Data Control 720 wlll wait until ~t receives the first non-~ero first byte, After receivlng a non-zero first byte, the Store Data Control 720 wlll ~nquire, as ind~cated in decislon block 762, lf the data is a System State vector. If the data is not a System State vector~ the Store Data Control 720 will access the Context Btt Memory 732 for the state o~ the context bit, using 25 MT and DID codes as ~ndlcated by block 764. The Store ~ata Control 720 w~ll then complement the context bit, as indicated by block 766 then generate an address, block 768, uslng the comple-mented context b~t an~ the devlance bit set equal to zero [nEY =
0)~ However, 1F the data ls a System State Vector, the Store Data Control 7~0 wlll access a TOC B~t Fllp Flop 758 for the TOC bit, as ind~cated by block 770 then generate the address using the TOC
bit ~s the context bit and setting ~he dev~ance bit ~o zero, as lndlcated by block 772 The TOC b~t marks the Atomic period in whlch the System State Vector was generated. The TOC Btt F1ip Flop 758 ~s complemented to mark the beginnlng of each new Atomlc perlod ln response to the rlsing edge of the A'om~c period (AP~

, :

' 108-signal.
In either case~ after the address is generated, the Store Data Control 720 will then store the voted data in the Data Memory 42 at the generated address, as ind1cated by block 774.
The Store nata Control 720 will then generate an address for the deviance vector by settlng the dev~ance bit equal to 1 and the context bit equal to 0 as tnd~cated by block 776. It will then store the deviance vector in the Data Memory 42, as indicated hy block 778.

As previously indicated9 a change in the System State will only occur at the Atomic period boundaries. Therefore, for those tasks which require the current System State Yeotor, this Vector is stored in the Data Memory 42 and the TOC bit identifies where the current System ~tate Vector ~s stored. The Context Bit Memory 732 stores a context bit for each DID used in the system and the context bit identifles which of the data stored in the Data and Deviance RAM is the current data and which data is pending or incomplete s~nce all of the copies of that data have not heen received. The context bits ln the Context Bit Memory 732 for the output DID's of each term~nated task are complemented by the Task Terminated Recorder 730 in response to a message from the Scheduler that a particular task ~s terminated. The Task Terminated Recorder 730 will then complement all of the DID's which resulted from the term~nated task as shall be discussed hereinafter.

The DID Request Handler 724 receives from the Voter Interface 714 the ident~f~cation or NID of the Node whose data is being processed by the Voter 38 indlcating that the Voter is request~ng the Data Identification (DID) Code of the data currently being processed a NUDhT bit embedded ln th1s DID code tells the Voter 38 whether thls is the first t1me the Yoter has requested thls part~cular DID slnce term~nation of the task that generated the data.

As shown ~n F~gure 55, the DID Request Handler will address'the Potnter Table 726 with'the NID to obtain a Pointer to ~ 3 B

a DID Llst 830 wh~ch ls part of the Pointer and DID List 728 shown ln ~igure 52. The DID Request Handler wlll then access the ~ID
List 830 and obtaln the Data Identlflcatlon Code DID from the DID
List 830, It w~ll then access the Nudat Bit Memory 722 and 5 ~ransmit the nudat bit with the DID back to the Yoter 38 through the Voter Interface 714.
Referring now to Flgure 56, the operatlon of the DID
Request Handler 724 beglns by accesslng the Po~nter Table 726 with the NID to get the Pointer to the DID Ltst 830, as ~ndlcated by 1~ block 780. The DID Request Handler w111 then access the DID Llst 830 to get the DID of the data which is curren~ly being processed by the Voter, as ~ndicated by block 782 The DID Request Handler 724 w~ll then access the Nudat Bit Memory 722 to get ~he NUDA~
. bit, as indicated by block 784. It wlll then append the NUDAT bit to the DID as lndlcated by block 786 and pass the DID and the NUDAT blt to the Voter Interface 714. The DID Request Handler wlll then set the NUDAT blt to 0, as indlcated by block 788, then lnquire~ as lndlcated by decis~on block 790~ ~f the DID was a null task. If it was not a null task, the DID Request Handler 724 will then increment the polnter ln the Pointer Table 726 to the next nID for that Node, as indlcated by hlock 792. However, if the DID
was a null DID, the DIn Request Handler 724 wlll not increment the pointer ln the P~lnter Table 726 but wlll leave it at the null DID indi~atlng that all the DID's for that task have already been transmltted.

The data stored ln the DID Llst has two 8 bit bytes, as , shown ln Flgure 65. The first byte consists of three Fields, a ' Data Value Mask9 a Data Type9 and a Message Type~ The second byte ls the actual Data Identiflca~lon Code (DID) of the data. This : 30 lnfonmatlon ls used by the AP Output Handler 750 to tag the datagenerated by the Appllcatlons Processor 14 ln the transmltted Data Value messages. The DID Request Handler 724 will append the NUDAT b~t to the most signlficant bit position (MSB) o~ the flrst byte obtained from the Data Llst 83OD as indicated in Figure 66, slnce the most s~gnlflcant bi~ of the Data Value Mask is not neede~ hy the Yoter 38.
The operation of the Task Terminated Recorder 730 shall be discussed relative to the block diagram shown in Figure 57 and the flow diagram shown in Figure 58. Referring first to the block diagra~ shown in F;gure 57, the Task Terminated Recorder 730 receives the Task Identification rode (TID) r,f the Terninated Task from the Sche~uler Interface 71~. The Task Terminated Recor~er 730 will then access the Output Pointer List 794 which is part of the Pointer an~ nID List 728 shown in Figure 52. The Output lQ Puinter List 794 and the DIn List 830 are emhodied in a com~on offhoard ROM not shown. The Task Terminated Recorder 730 will then access the ~ontext ~it Memory 73~ and the Nudat ~it Memory 7?2~ and complement the context bits and set the NUDAT bits for all the D~Ds that resulted from the terminated task. The Task Terminate~ Recorder 730 will then, using the addresses obtained from the Dln List 830 access the deviance vectors stored in the nata Memory 42 for all the deviance vectors associated with the DIDs of the terminate~ task It will then OR all of these deviance vectors wlth the content of the Oeviance Error Register 734.

The operation of the Task Terminated Recorder 730 will now he discussed with reference to the flow diagranl shown in Figure 58.
The operation of the Task Terminated Recorder 730 2~ hegins hy inquiring, as indicated by block 79~, if there is a task to be terminated. If there is no task to be terminated, as indi-cated, by the two bits of the Status ~ord being O's, the Task Terminated Recorder 730 will do nothing. ~owever, iF either of the bits of the Status Word is a 1 then the Task Terminated R~corder will inquire, as indicated by decision block 798, if the Status Word is a 10. The least significant bit of this Status Word indicates whether the task is a null task which requires no further action hy the Task Terminated Recorder 73D. 1~ it is not a null task, the Task Terminated Recorder 730 will access the Output Pointer List 794 to get a Pointer to the DID List 830, as ind~cated by block 800. The Task Tenmtnated Recorder will then set the least sign1ficant b~t of the Status ~ord to 1, as ind~-cated by block 802. The Task Term7nated Recorder 730 wlll then access the DID L~st 830 w~th the Po~nter and will inquire, as indicated by block 804, 7f the DID ~s a null DID, 7nd7cating that it is the last DID of that task. I~ it is a null ~In, then the Task Terninated Recorder 730 w~ll set the least sign~ficant bit of the status word to 0, as lnd k ated by block 814 and termlnate the processing of the terminated task, If, however, DID is not a null nID, the T~sk Terminated Recorder will set the nudat hit for that ~ID t~ 1, as ind~cated by block 80fi, co~plement the context ~it in the Context Bit Memory for that DID9 as indicated by block 808.
The Task Terminated Control 736 will then OR the deviances stored in the Data Memory 42, as ~nd~cated hy block 810 and store the OR
values ln a Deviance Error Reg~ster 734 to generate a deviance vector which is transm~tted to the Voter Interface 714 by the Error Reporter 754. The task term~nated pointer ~n the Pointer Table 726 is ~ncremented as ind~oated by block 812~
The operation of the Task Started Recorder 736 will be discussed wlth reference to the block d1agram shown in F~gure 59 and the flow d~agram ln Figure 60. Referring first to the block d~agram of ~gure 59, the Task Started Recorder 736 receives the TID and NID of the task started which is contained in a received Task Started message from the Scheduler 40. The Task Started Recorder 73fi will first check the Polnter Tahle 726 to determine if the current OID ~s a null DID~ If it ~s not, lt will record a sequence error in the Sequence Error Register 738 for the iden-tified Node. This check is made because a new task should not have been started unt~l all the data from the preced1ng task has been recelved. The Task Started Recorder 736 wlll then update the Pointer Table 726 by accessing the OUtpllt Pointer Llst 794 with the TID to get the ~ointer to the DID L~st 830 for the first DID
res~ltlng ~rom the execution of that task. The Task Started Recorder 736 wlll then store the Po1nter obta~ned from the Output Pointer List 794 tnto the Polnter Table 72S~
Re~err~ng now to Flgure 60, the operat~on of the Task Started Recorder 736 begins wlth access7ng the Pointer Table 726 .
., , ~

~o get the polnter to the DID L~st B309 as lnd~cated ln b~ock 816.
The Task Started Recorder w~ll then access the DID List 830 with the pointer to get the DID currently be~ng stored for that Node, as ind k ated by block 818. Then l$ w111 inquire, as ~ndicated by dec~s~on block 820~ ~f the DID ~s a null DID. If ~t Is not, the Task Started Recorder wlll record a schedullng error in the Sequence Error Reg~ster 738, as ind~cated by block 826. However, 1~ the DID task 1s a null DID, the Task Started Recorder wlll access the Output Poin~er L~st 794 w~th the TID of the started task to get the Pointer to the DID l~st 830 to the first DID for that task, as indicated by block 822. It wlll then access the DID
list 830 wlth the Pointer then write the address of the first DID
into the Pointer Table 72h for that Node, as ind~cated by block 824, completing the recordlng of the task started.
The deta~ls of the A~ Input Handler 744 w~ll be di~cussed relat~ve to the block d~agram shown in Flgure 61 and the flow dlagram shown in Flgure 62, The AP Input Handler 744 iden-tif1es and fetches the ~nput data values required by the Applicatlons Processor 14 ~or the executlon of the next task.
From the v~ew point of the Applicat~ons Processar 14, the Task Communicator 44 acts l~ke a storage dev~ce containlng an lnput flle, The values are prov~ded ln a predetermined order as spec~fled by the ~nput TlD-to-DID Mapping L1st for the current TID. As prevlously described, each ~nput data mapping word DID
consists of the Data Value Mask (DVM), the Data Type (DT), and the Message Type (MT), and the actual Data Ident~f1cation Code (DID) wh~ch ~s the starting address of ~he data value ~n the Data Memory 42. The addresses are amb~guous because each address points to ' two d~fferent locat~ons ~n the Data Memory 42. The context b~t for a particu1ar DID defines the locat~on that has ~he current ~nput data.

The AP Input Data Handler 744 fetches each current data value from the Data Memory 42 and luads ~t ~nto the AP Input FIFO
746. Upon a reques~ from the Appl~cations Processor 14, the data 35 values from the AP Input FIFO 746 are trans~erred to the 3~

Applicat~ons Processor. Th~s cycle ~s repeated until all ~nput data values have been transferred. Referring now t~ Figure 61, the AP Input ~landler 744 tnteracts with an Input Pointer List 828, the DID L~st 830, an AP Input Po~nter Table 832, and the AP Input FIFO 746. The Input Po~nter Llst 828 and the DID List 830 are part of the PO~nter and DID L~st 728 shown in F~gure 52 and are embodied in an off-board ROM (not shown). The AP Input Handler 744 rece1ves the next task from the Next Task Register 742 and 9 using the Input Pointer L~st 82B, DID List 830, and the AP Input Pointer Table 832, wlll extract from the Data Memory 42 ~he data requ~red for the execut~on of the task. Thls information is stored ~n the AP Input FIFO 746 and made available to the Applications P~ocessor 14. The AP Input Handler 744 will also generate a Task Completed/Started message ldenti~ying the task completed and the next task to be started by its own Applications Processor 14. The AP Input Handler 744 will also generate a Task Release message sent to the Scheduler 40 through the Scheduler Interface 716. The contents of the TOC B~t Fllp Flop 758 are appended to the addresses stnred ~n the AP Input Pointer Table 832 to ldentify the current system state. The contents of the Context Bit Memory 732 are appended to the addresses to identlfy the current data values stored ln the Data Memory 42. The mapping list for each task ~n the Input Polnter List 828 consists oF a contiguous group of DID's terminated by a null DID. A pointer to the begtnn~ng of each mapp~ng list is stored ~n the Input Pointer List 828 and ~s addresséd by the TID of the task. The nul DID is used to identify the end of the contiguous group of DIO's in the DID Llst 830.
Referr~ng to the flow d~agram shown 1n Figure 62, the AP
Input Handler 744 f~rst inquires ~f this ~s the first task to be executed ~n thls Subatom1c period9 as indicated by decision block B34. Thls ls because only one task can be started in any g~en Subatomic period. If th~s ~s not the f~rst task, then the AP
Input Handler 744 w~ll wa~t unt~l the beg~nnlng of the next Subatomtc perlod. Otherw~se the AP Input Handler 744 will inquire ~f the next task ~s a new task rece~ved slnce the last CS message .

or reset or reconf~gunat~on commands~ as ~nd~cated by dec~s10n block 836. If the task ~s not a new task, the AP Input Handler 744 w~ll wait unt~l lt rece~ves a new task. Ify however, the task ~s a new task and lt is the f~rst task 1n the Subatom~c period, the AP Input Handler 744 w~ll send a Task Completed/Started message to the Transmltter Interface 718, as indicated by block 838. Th~s Task CcmpletedlStarted message w~ll subsequently be transm~tted by the Transm~tter 30 to all of the other Nodes in the system~ The AP Input Handler 744 w111 then lnqu~re, as ind k ated by decis10n block 840, ~f the transmlssion of the Task Completed1Started message has staFted. In the event that the transm1ss~on of the Task CompletedJ~tarted message is delayed because of the transmission of a time dependent message, such as a System State or a Task Interactlve Consistency message, the AP
15 Input Handler will wait unt~l the transm1ssion of the Task Completed/Started message has begun. After the transm~ssion of the Task Comp~eted/Started message has begun, the AP Input Handler 744 w~ll send a Task Release message to the Scheduler 40 through the Scheduler Interface 71fi, as lnd1cated by block 842, informing lt that the Task Commun~cator has transmltted a Task Completed/Started message identlfy~ng the selected next task as the started task. The AP Input Handler 744 w~ll then transfer the next task from the Next Task Reg~ster 742 and store it as the current task ~n the Appllca$ions Input Pointer Table, as ind~cated by block 844. The AP Input Handler w~ll then write the TID of the current task into the AP Input FIF0 746, lnform~ng the Appllcations Processor 14 of the ~dent1ty of the task, as ind~-cated by block 846. Then us~ng the current task, the AP Input ~andler 744 w111 get the DID Po~nter from the Input Pointer List 828 and store the DID Pointer ~n the AP Input Po~nter Table B3?, as ind~cated by block B48. Us1ng the DID Pointer, the AP Input H~ndler w~ll- then get the address of the DID ln the Data and Devlance RAM and store th~s address ~n the AP Input Pointer Table, as ~nd~cated by block 850. The AP Input Handler 744 w~11 then lnquire, as ~nd~cated in dec~slon block 85~, if the DID ~s a null DID. If the DID ls a null DID 1ndlcat~ng ~t is the last DID in the task, the operat~on of the AP Input Handler 744 ls completed.

-115~

Otherwise, the AP Input Handler 744 wlll inquire, as indlcated ~n decislon block 854, ~f the DID is a System State Yector. If the DID is a System State Yector, the AP Input Handler 744 will access the TOC B~t Flip Flop and get the TOC bit which is used 1n place of the context ~it for addressiny the Data Memory 42~ as indicated by block 866. Otherwlse, ~f the DID ls not a null DID nor a System State Vector9 the AP Input Handler w~ll access the Context B~t Memory for the context bit as lnd~cated by block 856 and append it to the address in the AP Input Pointer Table. Using the address ~n the DID list 83 and the context bit or the TOC bit from the TOC Bit Flip Flop, the AP Input Handler will access the Data Memory 4~ and wrlte the first two dat~ bytes into the AP Input FIFO9 as ~ndicated by block 858. The AP Input H~ndler will then inquire, as indlcated by block 860, if the Message Type is either a Message Type 2 or Message Type 3 which have 4 bytes rather than 2 bytes. If the Message Type ~s not elther a Message Type 2 or 3, the AP Input Handler 744 w111 lndex the U~D Pointer and store the indexed DID Polnter in the AP Input Pointer Table, as lndicated by block 8h4. If the Message Type ls either a Message Type 2 or a Messa~e lype 3, the AP Input Handler 744 wlll address the Data Memory 42 again and write the f~nal 2 data bytes into the AP Input FIFO 746, as indlcated by block 862, then increment the DID
Phinter as ind~cated ln block 864. The AP Input Handler 744 wlll repeat this process to obtain all of the data values needed by the Applications Processor 14 for the execution of the task. As prev~ously discussed, the last DID ~or any task ls a null DID.
This null DID 1s recognized by the AP Input Handler 744, as indi-cated in decision block 852 and will terminate the loading of the AP Input FIFO 7fl6.
The AP Output Handler 750 wlll be dissussed relat~ve to the block diagram shown in Figure 63 and the flow diagram shown ~n Figure 64. The AP Output Handler 750 receives and processes the output data values generated by the Applications Processor 14 and broadcasts them to all the Nodes ~n the system~ As with input values~ output values are expected in a predetermined ord~r spec~fied by the output TID-to-DID Mapp~ng L~st for the current task~
: ..

~116~ f~ 3~

When a Data Value ~s rece~ved from the Applicat~ons Processor 14, the AP Output Handler 750 loads it ~nto an Output Data Reglster along w~th lts MT bnd.DID codes and initlates broad-cast of the Data Value message. After outputing the last value for each task~ the Applicatlons Processor generates a combined Appl~cations Processor Reported Error Vector (APRE) and the Branch Condition (BC). The Appl1ca~ions Processor Reported Error portion 1s used wlth the previous Applications Processor Reported Error data to be reported to the Yoter 38 through the Error Reporter 754. The Branch Condition (BC) is used as part of the next Task CompletedJStarted message generated by the AP Input Handler 744.
Referring now to Figure h3, the AP Output Handler 750 responds to the AP Input Handler completing its operation. The AP
Output Handler 750 flrst accesses the Pointer Table 726 and obtalns the po1nter which ldentlfies the current task which is stored in the DID Llst 830. Th1s Pointer ~s then used to address the DID
List 830 to obtain the Data Identiflcatlon Code (nID), the Message Type (MT~, and the Data Value Mask (D~M). The AP Output Handler 750 will then store the Message Type b~ts and the Data Identl~lcat10n Code 1n an Output Data Reglster 8700 As the data i6 generated by the Appllcat10ns Processor 14, the data i5 stored 1n the AP Output ~IF0 748. The da~a 1s then transferred from the AP Output FIF0 748 to the Output Data Reg1ster 870. The AP Output Handler 750 wlll then transmit the flrst 2 bytes to the Transmltter Interface 718, and will transm~t the Applications Prncessor Reported Error Vector to the APR Register 752.
The details of the operation of ~he AP Output Handler 750 will now be dlscussed w~th reference to the flow diagram shown ~n Flgure 64. After the AP Input Handler 744 has completed ~he load1ng of the 1nltlal data value 1nto the AP Input FIFO 746, the AP Output Handler 750 will read the current DID Polnter from the Pointer Table 72fi and store it ln the AP Output Pointer Table 8689 as 1nd1cated by block 872. The AP Output Handler will then access the DID L1st uslng the polnter stored ln the Po1nter Table 726 and store the Message Type and DID 1n the Output Data Register 870, as ind~cated by block 874. After the Message Type and DID bytes are -117~ 3L~3~

stored ~n the Data Output Rcg~ster 870, the AP Output Handler 750 w111 lnquire, as ~nd~cated by declsion block 876, whether the Appllcations Processor has place~ any data values in the AP Output FIF0 748. IF no data has been placed ~n the AP Output FIF0 748, the AP Output Handler 750 will wait until data ls generated by the Appllcat~ons Processor 14, as ~ndicated by decision block 876.
After the Applications Processor 14 has stored the generated ~ata values in the AP Output FIF0 7fl8, the AP Output Hand1er 750 wlll : transfer the data values to the Output Data Reg~ster 87~, as indi-cated by block 878. The AP Output Handler 750 wlll then 1nquire, as ~ndicated ln decls~on ~lock 880, ~f the Message Type is a Message Type 2 or a Message Type 3. If the Message Type is a Message Type 2 or 3, the AP Output Handler 750 w111 wait until the final two data bytes of the data are generated and placed in the AP Output FIFO 748, as lnd1cated by dec~sion block 882. After the ; Appl~cations Processor has written the second two data bytes into the AP Output FIF0 74R9 the AP Output Handler 750 will transfer the contents of the AP Output FIF0 748 lnto the Output Data Reglster 870, as indicated ~n block 884~ If the Message Type is a Message type 0 or 1, or after the content of the second two data bytes are written lnto the Output Data Register 870, the AP Output ~andler wlll lnquire if the DID o~ thls data ls a null DID, as ~ndlcated by decislon block 886. If the DID ls not a null DID, the AP Output Handler 750 wlll send the Message Type and Da~a Ident~ficatlon Code bytes (DID) to the Transmitter Interface 718, as indlcated by block 888. The AP Output Hand1er wlll then send the data bytes to the Transmltter Interface 718 to complete the Data Value message, as lnd~cated by block 890. The AP Output Handler 750 w~ll then increment the current DI~ Polnter in the AP
Output Pointer Table 868, as indicated in block 892 and repeat the processing of the next data value gene~ated by the Applications Processor 14. As previously indlcated, the last DID ~n the DID
List 830 for that partlcular task wlll be a null DID which will be detected by the AP Output Hand1er 750, as ind~cated by decision block 886. If the DID ls a null DID ~ndlcatlng that the Appllcatlons Processor 14 has generated all of the data values for that task, the Applicatlons P~ocessor wlll generate a last data word contalnlng the Appl~cat~ons Processor Reported Error and the 3~?3L~ t3 ~ranch Condit~on (BC) b7t for the subsequent tasks. The AP O~tputHandler 750 w~ll store the Applications Processor Reported Error Vector in the APR~ Reglster 752, as indlcated by block 894, and then w~ll proceed to process the branch condttion, as ~ndlcated by decision block 896~ The AP Output Handler will first inquire if the majority of the branch cond~tion bits are equal to 0. If the : majority of branch condition b~ts are Ols~ then the AP Output: ~ Handler w~ll generate a branch cond~t~on byte consisting of all O's, as ~ndlcated by block 8989 otherw~se, if the majority of branch condition bytes are l's~ the AP Output Handler 750 will generate a branch condition byte of consecutiYe l's, as ind1cated by block 900~ Finally, the AP Output Handler ?50 will store the branch condition byte in the Transmitter Interface 718, as indi-cated .by block 902. This byte wlll be appended to the next Task Completed/Started message generated by the AP Input Hand1er 744 as the branch condition of the completed task.
The Task Communicator 44 also includes a Reconfigure Control 904, as shown in Figure 67, and a Reset ~ontrol 920, as shown in Figure 69. Referring first to Figure 67, the Reconfigure 20. Control 904 transmlts an interrupt to ~he Applications Processor 14 and awaits acknowledgement~ After acknowledgement, the Reconfigure Control 904 wlll inltialize the Output Handler 750 and ~ts Output.Data Reg~ster 870, the AP Input FI~O 746, the AP Output FIFO 748~ and the Pointer Table 726. The opera~on of the Reconf~gure Control 904 will be d~scussed relati~e to the flow diagram shown in Flgure 68r In response to a reconfiguration request from the Srheduler 40, t~he Reconfigure Control 904 w~ll send a reconfigure interrupt to the Applicatlons Processor 14~ as ~ndieated in block 906. It will then termina~e all messages to ~he Transmitter Interface 718, as ind~cated by block gO8, by clearlng the Output D~ta Reglster 870. The Reconfigure Control ~04 w~ll then await acknowledgement of the interrupt signal from the Appllsat~ons Processor 14~ as indlcated by decision block 910.
After the Applicatlons Processor has acknowledged the interrupt, the Reconf19ure Control wlll clear the AP Input ~IFO 746 and the AP Output FIFO 748, as indicated by block ~12, then se~ all the pointers ~n the Pbinter Table 726 to null DID's, as indicated by -119- ~ 3~9~

hlock 914. After the Input and Output FIFO's haYe been cleared, the Reconfigure Control wlll restart the AP Input Handler 744, as lndlcated ln b~ock 916, then send a Task C~mmunicator Ready (TSCRDY) siynal to the Scheduler 40, as lndlcated in block 918, ind~catlng that the Task Cam~unicator 44 ~s ready to begln pro-cesslng data ln the new reconflgured System State.

The Task Communicator also has a Reset Control 920 responsive to the Operat~ons Controller Reset ~OCRES) s;gnal, as indicated In Flgure 69. The Reset Control 920 ~nterfaces with the Appllcations Processor 14~ an AP Ready Flag g22, a Reset flag 924 the AP Input FIFO 746, the AP Output FIFO 748, the Pointer Table 726, and the AP Input Handler 7~4. Referrlng to Figure 70, the operation of the Reset Control 920 beglns by sendlng a reset request ~o the Appllcatlons Processor 14, as 1ndicated ~y block 926O The Reset Control 920 will then set the Reset Flag 924 to "ON," as Indlcated by block 928 to slgnlfy tn the other subsystems of the Task Co~unleator 44 th~t the Operatlons Controller is belng reset. The Reset Control 920 wlll then set the AP Ready Flag 922 to "OFF," as indlcated by block 930, to signify to the Scheduler 40 that the Appllcations Processor is not yet ready to start process~ng any tasks. The Reset Control 920 wlll then proceed to clear the AP Input FIFO 746 and the AP Output FIFO 748, as indlcated by block 932, then set all o~ the pointers In the Polnter Table 726 to null DID's, as indlcated by block 934. The Reset Contr~l wil1 then start the AP Input Handler 744, as indi-cated by block 93fi, and wait for the ApplicatlDns Processor to signify that ~t is ready, as indicated by decision block 938.
A~ter the Appllcations Processor 14 s~gnifles that it is ready to ~ start processing data, the Rese~ Control ~20 will turn the Reset ; 30 Flag 924 "OFF~ and the AP Ready Flag 922 "ON," signifying that the Task Communlcator 44 Is now ~eady to start processing data9 as :~ lndlcated hy block 940~
SYNCHRONIZER

The Synchronlzer 46 establlshes and maintains the ~S~

synchron~zat~on between all of the Operation Control1ers in the System~ The multi computer architecture uses loose synchron1za-tion whtch ~s accompl~shed by synchronous rounds of message transm~ssion by each Node in the system. In th~s method9 each Synchronlzer 46 detects and time stamps each time dependent message rece~ved by its own Node. These tlme dependent messages are transm~tted by every other Node in the System at predetermined intervals and they are rece~ved by all the other Nodes in the system. As a result of the wrap-around lnterconnection shown in Figure 1, 2 Node w~ll receive ~ts own time dependent messages along w~th the other tlme dependent messages sent by the other Nodes. The comparison of the time stamps on a Nodes own ti~e dependent message with the time stamps on all of the other time dependent messages ~s what drives the fault tolerant conversion algorithm, The synchronization is done over two timing message ~ntervals. These intervals are delim~ted by a pre-sync message and a sync message which are transmitted alternatlvely. A Node is defined to be in po1nt-to-point synchron~zation with another Node when it is sending its own sync and pre-sync messages at the same time the other Node is sendlng its sync and pre-sync messa-ges~ Since the synchron~zation cannot be exact, a window is spe-cif~ed by the user which defines the allowable error in the ~ime that messages are received from the Nodes that are considered to be synchronizat~on with each other.
Fundamentally, the mechan~sm which forces the Nodes into synchronization with each other lnvolves a calcula~ion done by each Node, that detenn~nes where in time the center of the last , cluster of pre-sync messages occurred. Each Node w~ll then deter-30 mine the d~fference between ~ts own pre-sync time stamp and that of the center. Each Node w~ll then use that d~fference to adjust the length of the ~nterval ~rom the t~me it sent its pre-sync message to the transm~ssion of ~ts sync message. Fault tolerance in these calculations is requ1red and is accomplished w~th an 35 approximate agreement algor~thm. System convergence is accompl~shed by having all Nodes repeat these steps con~inuously.

-121~ 3~3 OYer every sync to pre-sync ~nterval all Nodes l~sten to the t~me dependent messages received from all of the Nodes~ lnclud~ng the~r own. determine a correct~on, and then apply the correctlon over the follow~ng pre-sync to sync ~nterval~ The pattern is repet~t~ve: tak~ng the f~rm measure error, make correction, measure error, make correct~on,... etcO The time interval from the sync to the pre-sync message is a nominal user def~ned value.
The explanation of the synchronizat~on procedure ~escribed above is accurate. However, the actual mechan~sm imple-mented ~s ~ore involved than the steady state system described because it must-also be fau!t tolerant under all conditions, be capable uf detecting and characterizing system timing errors, and must support the other time dependent functions of the system.
The implemented synchronizat~on logic supports three catagories of operatlon: a cold start, a warm start9 and steady state operation. Cold start synchronizat~on logic handles the situation where the system has just powered up and no one Node is necessarily in synchronization with any other Node. In par-t~cular, a cold start is executed when no operat1ng set of Nodes ex~sts. In thls case, every good Node in the sys~em a-ttempts to synchronlze w~th every other g~od Node and then s~multaneously and consistently dec~de together which Nodes are ~n synchroni~ation and whether or not the number which are synchronized ls larger than the user specifled start ~p s~ze. In order to accomplish ~5 initial synchronization and start up, each Node maintains a byte of information calle~ the "~n-sync-wjtha (ISW) vector. The con-tent of the "ln-~ync-with" vector deflnes wh~ch other Nodes in the system the local Node belteves it ls in synchronlzat~on with.
Byzant~ne Agreement on the ISW vectors of all the good Nndes in the system ls used to def~ne the in~t~al operat~ng set. ~yzantine Agreement ~s requ~red concerning th~s "~n-syne-w~th" data ~n order for oold starts to be fault tolerant. Once enough Nodes reach Byzant~ne Agreement on a po~ential operat~ng set (POS), all the Nodes that are ~n that set ~eg~n operating~ All the other healty Nodes not in the potentlal operat~ng set w~ll reach the same concluslon, that an opera~lng set is fonmed, but they w~ll also recognlze that ~hey are not included ln the operating set and wlll sw7tch to a warm start mode of operation.
In ~he warm start ~ode of operation each good Node con-t~nues its efforts to synchronlze with the exist~ng operating set.
Once a Node ~n the wanm start mode of operation belie~es it is in synchronizat~n wlth the operatlng set it w~ll begin normal opera-t~on. After the wanm start~ng Node has behaved correctly long enough, the other Nodes w~ll admit ~t lnto the operating set.

The last sltuatlon is the steady state mode of operation where each operating Node simply maintains synchronization and alignment w~th the other good Nodes in the system~ In practice the s~eady state convergence algorithm runs under all conditions s~nce it has the ab~l~ty to converge the local Node to a common synchroni~at~on point with all other good Nodes ~n a fault tolerant mannerO The real differences between warm and cold starts centers around the logic used to detenmine when the operat~ng set is formed and the Nodes in the operating set are ready to start scheduling tasks to be executed by their Applicat~ons Processor~

The detalls o~ the Synchronizer 46 are shown on Figure 71. The Synchronlzer lncludes a Synchronizer Control 952 which rece~ves the Task Interact~ve Cons1stency (TIC) messages and the System State (SS) messages from the Message Cheoker through a Message Checker Interface 94~. The System State messages are the sync and pre-sync messages previously described and are dlstlngulshed by a funct~on bit wh~ch ident~fies the System State message as a sync or pre-sync message. A Timing Slgnal Generator 950 generates t~m~ng signals wh~th are transmitted to the Synchron~zer Controller 952. The signals generated by the Timing Signal Generator are the Subatom1c period ~SAP) signal. The Atomio period (AP) slgnal the Master per~od (MP) signal the Last Subatom~c perlod (LSAP~ signal the Last Atomlc period ~LAP~ signal the So~t Error ~1ndow (SEW) s~gnal and ~he Hard Error Wlndow (HEW) s~gnal. The Synchronizer Control 952 also receives a Clock signal and Reset signal from the system bus. The Reset signal may be either the power on Re~et (RESET) or 1:he internal Operation .

-123- ~ 34~ 3~

Controller Reset (OCRESET) s~gnals. These signals have been pre-v10usly d~scussed and need not be repeated here. The Synchronizer Contro1 952 wlll also receive ~ts own 3 bit Node identification (NID) code.

A Byzantlne Yoter 954 performs a byzantine vote on the ''in-sync-w4thU matrices recelYed from 1tself and the other Nodes during the cold start m~de of operation and on the content of the Task Interact1ve Consistency messages. The byzantine vote on the content of the Task Interactive Consistency (TIC) messages is transmltted directly to a Scheduler Interface 944, while the result of the byzantine vote on the "~n-sync-with" matrices is passed to a Faul~ Tolerator Interface 94fi. A Tlme Stamp Voter 956 w~ll vote on the time stamps of the System State messages received from all of the Nodes to generate a voted time stamp value. A Synchronizer Memory 948 stores the data received from the Message Checker Interface and other data required by the Synchronlzer Control 952 for establishing and maintaininy synchro-nlzat~on between its own Node and the other Nodes in the system.
The Synchronizer Memory 948 has a Scratch Pad Memory 962, a Message Memory 964 and a Time Stamp Memory 966, as shown in Figure 72. An Error Reporter 958 receives the errors detected by the Synchronlzer Control 952 and transmits them to the Fault Tolerator Interface 946. The Synchronlzer Control 952 generates the time dependent Task Interactive Consistency (TIC) and the System State (SS) messages which are transmitted to the other Nodes in the system through the Transmltter Inter~ace 960, as prevlously described.
.
Flgure 76 shows the wavef~rms of the various slgnals generated by the Tlming Slgnal Generator 95~. The Master period (MP) ls a tim~ng signal wh~ch reflects the length of each Master period interval of the Operations Cont~oller. This in~erval is ; the longest of the synchron~zat~on clocks and reflects the period~c1ty of the lowes~ repetition task be~ng run in the appli-cat~on. The Master perlod oan be cons~dered to be the "frame slze" of the appl~cat~on. Dur~ng the Mas~er period interval the total pattern of tasks are repeated. The Atomic period (AP) ~s a -12~ 3 t~m~ng s19nal wh~ch reflects the beg~nnlng and end of each Atonic per~od interval. Th~s interval ~s representat~ve of the fastest repetit~on task being run ln the appl~cation. The Master period descrlbed above ~s user specifled as an integer number of Atomic periods~ The Subat~m~c period (SAP) ~s a timing signal which reflects the beg~nn~ng and end of each Subatnmic period interval.
The Atomic period interval is user spec~fled as an integer number of Subatomic per~ods. The last Atomic period (LAP1 is an active high s~gnal that w~ndows the last Atomic period that occurs in each Master period. The last Subatomic period (LSAP) ~s an active high signal that windows the last Subat~mic period that occurs in each Atomlc period. The soft error window (SE~ is an active high signal that brackets a span of t~me around an event time mark that defines the soft error window for the arrival of System Synchronlzatlon messages. Finally, the hard error window (HE~) is an act~ve h~gh signal that brackets a span of time around an event time mark that definès the hard error w~ndow for the arrlval of system synchronization messages.
The format of the Message Memory 964 is shown in Figure 73. The Message Memor~ 964 stores for each Node the branch con-dit~on bytc, the task completed vector, the next system state vec-tor, the current system state vector, the content of the A~omic period counter, and two bytes, one reserved for a cold start and the other reserved for a warm start. This format is repeated for each Node in the syste~.

The format of the Time Stamp Memory 966 is shown in figure 74. The Time Stamp Memory consists of a coarse time count and a flne time count and includes an update ~u) flag and a time stamp (TS~ flag. The update flag s~gnif~es that ~he stored ~ime stamp is for a System State message received dur~ng the current t~me stamp interval. The t~me stamp fl2g ind~cates whether the ttme stamp ~s for a System State message in which the sync func-tion blt ~s set or for a System State message in which the pre-sync funct~on blt 1s set~ The coarse count of the time stamp is ~ndicat~ve of the number of Subatomic periods tha~ have passed since the preceding System State message was generated. The fine t~me stamp count corresponds to the number of synchronizer clock pulses received during the last Subatom1c period of the Atom1c period. The coarse or Subatomic period counts are used primarily during the cold start and the warm start to achieve rapid con-vergence of the synchronization between the local Node an~ theother Nodes in the system. The fine or synchronlzer clock time stamp counts are used primarily during the steady state operation to maintain the synchronization between the Nodes. The Time S~amp ~em~ry 966 wi)~ store a t~me stamp ~r eae~ ~ode in t~e system ana iO ;nc~udes a spec~al entry for storlng the t~me stamp of the System State messages transmitted by its own Node.
~he fonmat of the Scratch Pad ~e~ory 9fi2 is shown in Figure 75, The Scratch Pad Memory 962 strJres the "in-sync-with"
(ISW) vectors for each Node in the system. These "tn-sync-with"
~ectors are contained in the sync and pre-sync System State messa-ges. The Scratch Pad Memory 962 w~ll also store two message warning counts, one ind~cat~ve of ~he time from the end of the warning period to the transmlss10n of the Task Interactive Cons~s~ency message and the other ind~cative of the time from the end of the warn1ng per10d to the transm1ssion of the System State message. The Scratch Pad Memory w111 also store ~he Suba~om1c per10d count wh k h ls used to t1me stamp the rece1~ed messages.
The Scratch Pad Memory also has an entry stor~ng the number of Subatomic periods per an Atom1c period, the Atomic per~Dd count, and the number of Atom~c periods per Master per10d. The Scratch Pad Memory also w111 store an actual hard error wlndow (HEW) to ; warning period count and a nom1nal hard error w~ndow (HEW) towarnlng period count. The actual hard error w1ndow to warning period reflects the corrected length of the Atom1c period between the pre-sync and sync messages which is computed from the dif-ference between the voted time stamp value and its own time stamp valueO The next entries in the Scratch Pad Memory are the err~r window parameters. The error window parameters include a hard error window count and a soft error window count. The next two entr1es in the Scratch Pad Memory are the computed correction for the Subatom1c per~od delta and the computed correct~on for the Subatomic per10d count~ The next entry is the m~ximllm allowed -12fi~ 3l~3B

correction for the Subatom~c per~od delta. The flnal entry in the Scratch ~ad M~mory ~s the m~n~mum s~art up s~ze for determinlng the existence Df a poten~al operating set.

The deta~ls of the Synchronizer Control 952 are lllustrated in F~gure 77~ The data recelved by the Message Checker Interface 942 ts passed d~rectly to a Data Handler 968 ann a T1me Stamper 972. The Data Handler 968 will store the data in the Message Memory 964, the Scratch Pad Memory 962, or a Byzantine Voter Memory 970 as requ~redO Pr~or to the Tlme Stamper 972 storing the tlme stamp of the message in the Time Stamp Memory 96~
the received message is checked by an Expected Message Checker 974 and a Within Hard Error Window and Soft Error Wlndow Checker 976.
If the message is not an expected message, as shall be discussed later, the Expected Message Checker 974 wlll generate a sequence error s~gnal wh~ch ~s transm~tted to an Error Reporter 978 and to the T~me Stamper 972. In a llke manner, ~f the received message ls outside the hard error w~ndow or the soft error window the Within Hard Error ~ndow and Soft Error ~ndow Checker will generate either a hard error or a soft error wh~ch ~s also transm~tted to the Error Reporter 978 and the Tlme Stamper 972.
The T~me Stamper 972 wlll not record the t~me stamp in the Ti~e Stamp Memory g66 ~f e~ther a sequence error or a hard error is detected. Any message wh~ch ts rece~ved outslde the hard error wlndow or not received at all ~s essentially ignored by the Synchronizer. Ho~ever, a rece~ved vector wlll be generated showing a m1ssiny message error for each Node wh~ch falled t~
report dur~ng the hard errDr window lnt2rval. The synchronization process w~ll nut use any time stamp value assoc~ated w~th the Nodes which failed to report w~th~n the hard error wlndow~ Thls prevents good Nodes from try~ng to move towards badly out-o~-sync Nodes that may posslbly be faultyO The T~me Stamper 9729 however, will record the t~me stamp ~n the T~me Stamp Message if only a so~t error is detected~

The Ttme Stamper 972 w~ll record the number of Subatom~c 35 per~ods counted ~n a Suba~omlc Period (SAP) Collnter 971 as the 3 ~ L~ 3 coarse time stamp count and the number of sync clock bits from a Sync Clock 969 as the fine time stamp. The Time Stamper 972 will set the update flag in the Time Stamp Memory and set the time stamp flag to indicate if the received message was either a sync or pre-sync System State message.
A Pre-sync Message Counter 980 and a Sync Message Counter 982 w~ll count~ respectively, the number of sync messages or pre-sync messages received during the particular time stamp interval. A Comparator 984 wlll determine whether the numher of pre-syrc messages was greater than the sync messages or vice versa. This information is communicated to the T~me Stamp Voter 95fi which will extract from the Time Stamp Memory 966 the ti~e stamps corresponding to whether there were a greater number of pre-~ync messages or sync messages received,during the time stamp interYal. The Time Stamp Voter 956 will vote on the received t1me stamps and generate a medial or voted value for the time stamps of the rece;ved messages. This inFormation is passed to a Sync Correctlon Generator 990 wh;ch compares the t;me stamp of the Synchronizer's own System State message w;th the media1 or voted value generated by the Time Stamp Voter 956 and w;ll generate a correction delta. Thls correct~on delta i5 ad~ed to the nominal ~nterval between System State messages 1n the Adder 992 to correct for the detected differences. The sum of the nominal lnterval plus the correction delta are then stored in the Scratch Pad Memory 962 as the computed correct~on Subatomlc period count and the actual HEW to warning period count, as indlcated in the ~ormat vf the Scratch Pad Memory shown ~n Flgure 75. The computed correction Subatom~c period count and the actual HEW to warning period count are used ln the tim~ng of the duration o~ the inter-', 30 val from the pre-sync System State message to ~he sync System State message, as shown ;n F~gure ~4.
A Byzantlne Voter 954 performs a byzantine ~ote on the task completed vector and the branch condition b;ts contained in the Task Interact~ve Cons~stency messages wh~ch are passed back to the Scheduler 40 through the Scheduler Interface 944. During a cold start the ~yzantine Voter ~54 w111 also perfonm a byzantine ~ 03L.~3~3~3 vo~e on the "in-sync-with" ma~rix transm~tted in the pre-sync Systen7 State messages to generate a voted "ln-sync-with" vector.
Thls "in-sync-with" vector is transmltted to an Operat7ng Condit~on Detector 1000 which sums the number of Uin-sync-with"
bits contained in the voted "1n-sync-with" vector and compares this sum with the mlnimum start up s~ze for a potentlal operating set (POS) of Nodes. If the sum of the bits in the Byzantine voted ~ln-sync-with" vector is greater than the m1nlmum start up size9 the Operat~ng Cond~tion Detector 1000 wil1 then determine if its own Node is contained in the "1n-sync-w~th" vector. If its own Node is contained within the "~n-sync-with" vector> the Operating Condition Detector will set an Operating Flag 1004 indicating that it is ready to start operating. However~ ~f an operating set is detected and the Uperating Condition Detector 1000 determines that its own Node is not within the operatlng set, it will set a Warm Start Flag 1002 1ndlcatlng the exlstence of an operat1ng set and that it ~s not in synchronlzat~on w~th that set. Th~s will cause the Synchroni2er 46 to enter into a warm start mode of operation as previously discussed. If an operating set is detected, and its own Node ~s in the operatlng set, the Operating Condltion Detector lOOO wlll then transmit the "ln-sync-w~th" (ISW) vector and the operatlng flag blt to the Fault Tolerator 36 through the Fault Tolerator Interface 946.
An In-Sync-Wlth Detector (ISW~ 944 will compare the tlme stamp of its own System State message with each time stamp stored in the Time Stamp Memory 96~ to generate an "ln-sync-w~th" vector wh~ch is stored In an Ir. Sync With (I~W) Reglster 996~
The "in-sync-~ith" vector stored in the In-Sync-W~th Register 996 an~ the state of t~e Operating Flag lOM are passed to a Message Generator 998 and are used ln the generation of the next System State message. The output of the Message Generator 998 is passed to the Transmitter through the Transmltter Interface 960.

The steady state operat~on of the Synchronizer will be dlscussed w7th reference to the flow diagrams illlustrated in .

~ 33 ~

F~gures 7B through 82 and the waveforms shown ln F~gures 83 and ~4~ ~he flow dlagram shown ln F~gure 78 descr~bes the operation of the nata Handler 968, the Expected Message Checker 974, and the W~th~n Hard Error Window and Soft Error Window Checker 976 As ind~cated by block ldOfi, all of the data rece~ved from the Message Checker Interface 942 ls stored ln the Message Memory 964. The system w~ll then lnqu1re, as indicated by dec~sion block 1~08, if the operating flag is true. If the operat~ng flag is not true, the system w~ efault to e1ther a cold start or a warm start as w~ll be described later here~n. If the operating flag is ~ true the Expected Message Checker w~ll then tnquire if it is a Task Interactive Consistency ~TIC) message as indica~ed by deci-sion block 1010. If it ~S not a Task Interact~ve Consistency message, then the message is a System State messge, then the Expected Message Checker 974 will inquire if the Subatomic period count in the T~me Stamp Memory is equal to zero, as indicated by block 1012. The Subatomic period count stored in the Time Stamp Memory ~s the two's complement of the number of Subatomic periods in the Atomlc perlod. Th~s Subatomic per~od count is incremented ZO each time the Tim~ng S~gnal Generator 950 generates a Subatom k per~od s~gnal~ When the Subatomic per~od count ~n the Time Sta~p Memory ~s equal to zero, then a System State message is the expected message~ If the Subatomtc perlod count 1s equal to zero, the Expected Message Checker w~ll reload the time stamp counters for the Node from which the message was rece~ved, as indicated Dy block 1014, and then inquire lf the sync/pre-sync (s/p) function bit contalned in the message was equal to the complement of the (TS~ flag stored ~n the T~me Stamp Memory. In normal operation -the syne and pre-sync System State messages are sent in ar, alter-nat~ng manner, therefore, the function b~ts in the received message should be the complement of the functlon bits of the pre-vious message which is currently stored by the TS flag ~n the Time Stamp M~mory 96fi. If the sync/pre-sync functlon blt is the complement of the t~me stamp flag stored in ~he T~me Stamp Memory, then the sequence error flag for the Node f~om wh~ch ~he message was recelved (NODE ;) ~s set to false, as ~nd~cated by block 1020.
Conversely, ~f the sync/pre-syrc blt Is not the complement of the time stamp flag the sequence error flag for the Node from which the message was received is set to true, as ind~cated by block 1022. If the Subatom~c period count ~n declslon block 1012 ts not equal to zero, the Expected Message Checker will set the sequence S error flag for the Node fro~ which the message was received to true, as indicated by block 10249 then reload the time stamp coun-ters for Node j, as indlcated by block 102fi.
If the received message is a Task Interact~ve Consistency (TIC) message, as determined in decision block 1010, the ~xpected Message Checker 974 will then ~nqu~re if the Subatomîc period count in the Time Stamp Memory is equal to or greater than zero, as indicated by decis;on hlock 1028. If the Subatomic period time stamp count is equal to or gre~ter than zero, then a System State message should have been received and, therefore, there ~s a sequence error. The Expected Message ~hecker 974 will then set the sequence error flag for the Node from whlch the message was recelved to true, as indicated by block 1034. However, if the Subatomic period count in the Time Stamp Memory ls less than zero, the Expected Message Checker 974 will increment the time stamp count stored in the Time Stamp Memory for that Node (Node j), as indicated by block 1032.
The operation of the W~thin Hard Error Window and Soft Error W~ndow Checker 976 and the T~me Stamper 972 w~ll be dlscussed w~th reference to the flow diagram shown in Figure 79.
The operation of the W~th~n Hard Error Window ar,d SoFt Error Window Checker 976 begins by checking to determine if a sequence error has been detected, as Indicated by decision block 1036. If a sequence error has been detected by the Expected Message Checker, the Wlthln Hard Error Window and Sof~ Error Window Checker g76 will set ~he update flag ln the Time Stamp Memory 9Ç6 to false, as ind~cated by block 1054. Otherwise, the Within Hard Error W~ndo~ and Soft Error W~ndow Checker 976 will inquire whether the message was rece~ved with~n the hard error window, as ~ndicated by dec~sion block 10400 If the message was not received w~thin the hard error wlndow (HEW~, the ~lthln Hard Error Window and Soft Error W~ndow Checker 97~ will set the hard error window ' f1ag to true, as ~nd1cated by block 1042, then set the update flag ln the Tlme Stamp Memory to false9 as indicated by block 1054. If the message was received withln the hard error window, the ~ithin llard Error Window and So~ Error Wlndow Checker 976 will ~nquire, as indicated by dec~sion block 1044~ whether the message was rece~ved wlthin the soft error window. If the message was not recelved w~thin the soft error wlndow, the Checker will se~ the ` soft error window flag to true9 as ind1cated by block 1046 and the Checker w~ll proceed tD ask~ as ~nd~cated by decis~on block 1048, whether the received message was a Task Interactive :~ Cons~stency ~TIC) message. If the message is not a TIC message, the Checker will then proceed to ask the if the message was a pre-sync System State message, as ind~cated by decision bloc~ 1049.
If the message was a pre-sync System ~tate message, then the Time Stamper will be enabled to time stamp the received message. The Time stamp equals the S~P count received in the SAP
Counter 971 and the sync clock count received from the Sync Clock 9h9, The Time Stamper 972 w~11 then set the TS flag bit to pre-sync and the update flag equal to true, as lndicated by block 1050. However~ if the message is a System State sync message, the Time Stamper 972 will time stamp the received message and set the tlme stamp flag to sync and the update fl~g to true, as ~ndicated by block 1052. After the message has been ~ime s~amped it is stored in the Ttme Stamp Memory 966, as indicated by block 1038.
As indicated by decision block 1049, the Task Interactive Consistency (TIC) messages are not t~me stamped~

The generation of the actual HEW to warning period count is descri~ed with reference to the flow diagram shown in Figure 80.
The generat~on of the actual HEW warn~ng period counts beg1ns by ; 30 setting the Node (NID) pointer in the Time Stamp Memory to the first Node (NID=01, as ~ndicated by block lU56. The Time Stamp Voter wlll then inquire if the update flag is true, as indicated by decision block 1058. If the update flag is not true, indi-cat1n~ that the time stamp value has not been updated durin~ the current Atomlc period, the Tlme Stamp Vot~r will then increment the time stamp memory Node pointer to the next Node and inquire if -132~ 3~

the update flag of that Node is true. If the update flag is true, then the time stamp value is loaded into the Time Stamp Voter, as indicated by b10ck 1060, then the Node pointer to the Time Stamp Memory is lncremented, as indi~ated by block 1062~ The Time Stamp Yoter 956 wlll then inquire if the Node to which the pointer is pointlng is the maxlmum or last Node to be polled, as ind~cate~ by decision block 1064. If the Node ~s not the last Node, (MAX ~ID~
the process of loading the Tlme Stamp Voter w~ll cont1nue until the time stamp value from the last Node is loaded into the Time Stamp Yoter 956. Once the T~me Stamp Voter is loaded w~th all of the current time stamp values it will vote on the time stamp values which were loaded into it and generate a voted time stamp value (TSV~, as indicated by block 10fi60 The Sync Correction Generator 990 wlll then subtract the Node's own time stamp value from the voted time stamp value to generate a sync delta, as indicated by block 1068. The actual HEW to warning period is ~hen generated by add~ng in Adder 992 the sync delta to the nomi-nal HEW to warning period stored in the Scratch Pad Memory, as indicated by block 1070. This actual HEW to warn~ng count is then stored in the Scratch Pad Memory, as indicated by block 1071.
The operat~on of the Message Generator 998 will be discussed relative to the block diagram shown 1n Figure 81 and the ~low diagram shown in Figure 82. Referring ~irst to the block diagram shown in F19ure 81, the Message Generator receives the clock (CLK), the Last Subatomlc (LSAP) period, and the HEW signals from the Timing Signal Generator 950. It also receives the Harm Start Flag 1002, the Operating Flag 1004, and the "~n-sync-with"
vector from the In-Sync-With (ISW) Reglster 996. The data used ~n , the messages generated by the Message Generator 9g8 is obtained from ~he Sync Memory 948, which includes the Scratch Pad Memory 962, the Message Memory 9649 and the Time Stamp Memory 966. The messages generated by the Message Generator are passed to the Transm~tter Interface 9fiO which ult~mately passes these messages to the Transmltter 30.

Referr~ng now to the flow diagram shown in Figure 8~
the Message Generator 998 f~rst waits until the end of the hard error wlndow, as indicated by decision block 1074. A~ the end of the HEW the Message Generator will inquire if the Subatomic period ls a Last Subatomlc period, as indicated by decision block 1076.
If the Suhatomic period is not a Last Subatomic period, then the S message to be generated is a Task Interactive Consistency (TIC) ~essage in which the data identification code (DID) is set equal to zero~ as indicated by block 1~78. If the current Subatomic period ls the last Subatomic period then the message to be transmitted is a System State message in which the sync/pre-sync bit is equal to the complement of the TS flag currently stored in the Time Stamp Memory, as indicated ~y block 1094.

If the message type is a Task Interactive Consistency message, the Message Generator will inquire if the operating Flag is true, as ind~cated hy decision block 1080~ If the operating flag is not true then no TIC message is to be sent. However, if the operating flag is true, the ~essage fienerator 998 will load an Event Counter 1072 w~th the nominal HEW to warning count stored in the Scratch Pad Memory 962, as indicated in hlock 1082, then assemble a normal Task Interactive Consistency message, as indi-cated by block 1083. As shown in Table 1, the normal Task Interactive Consistency message includes the task completed vector and the branch conditlon bits obtained from the Message Memory 964. The Message Generator will then wait until the Event Counter 1072 is incremented to zero by the clock signals, as indicated by decislon hlock 1084. When the Event Counter is equal to zero, the Message Generator 99B will send the first byte of the message to the Transmltter through the Transmitter Interface 960, as indi-cated by block 1086, then transfer the remaining bytes of the message to the Transmitter Interface 960, as indicated by block 1088, The Transmitter Interface 960 will then wait for the buffer available tBA~ signal from the Transmitter 30, as indicated by decision block 1090, then send the remaining by~es o~ the message to the Transmitter, as ind~cated by block 1092.

As prev~ously describe~ relative to the operation of the Transmitter 30, the sending of the first byte of a message from the Synchronizer will start the warn1ng period for the time depen--134~ 3~

dent Task Interactive Consistency and System State messages. At the end of the warning period9 the Transmitter will begin the transmlssion of the time dependent message and wi11 transmit a buffer available signal to the Transmitter Interface, which triggers the transferring of the remain~ng bytes stored in the Transmitter Interface to the Transmitter.

If the message type is a System State message, as indi-cated by b10ck 1094, the Message Generator will then inquire if the System State message to be transmitted is a sync or pre-sync - 10 message, as indlcated by block lO9fi. If the message is a sync message, the Message Generator will load the Event Counter 1072 with the actual HEW to warning CoURt from the Scratch Pad Memory, as indicated by block 1n98, and then will generate a normal System State message, as indicated by block 1099. If, however, the message is a pre-sync System State message, the Message Generator will load the Event ~ounter 1072 with the normal HEW to warning count, as lndicated by block 1097, and then will interrogate the wanm start and operational flags to determ1ne if the system is in the cold start mode, as indicated by decis~on block 1077. A cold start is ind1cated by both the Wann Start and Operating Flags being false~ If the system 1s not 1n a cold start ~ode, the Message Generator 998 will then generate a normal System State message, as 1ndicated hy block lO99o However, 1f the Synchronizer is in the cold start mode, the Message Generator will generate a cold start pre-sync message, as lndicated by block 1081. The cold start pre-sync message has a format as indicated-in Figure 85, wh k h is dif~erent from the normal pre-sync System State message shown on Table I. This cold start pre-sync message contains an "in-sync-w1th" matrix conta1ning the in sync vectors received from all the operating Nodes in the system. The Message Generator 998 will then wait for the end of the HEW to warning period by moni-toring the Event Counter, as indicated in decision block 1084.
The Message Generator will then send the first byte o~ the message to the Transmitter 30, as indicated.by block 1086, then transfer the rema~ning bytes of the message to the Transmitter Interface, as ind1cated by block 1088O When the Transmitter generates the buffer available signal, as lnd;cated in decision block 1090, the Transmitter Interface 960 will then pass the remaining bytes of the System State message to the Transmitter, as indlcated by block 10~2.

The timing sequences for the Task Interact;ve Consistency messages and the System State messages are shown in Figures 83 and 84, respectively. Referring first to the waveforms shown on Figure 83, the Message Generator's Event Counter 1072 is loaded at the end of the hard error window (~EW~ with the nominal HEW to warning count, as indicated by block 1082 in flow diagram Figure 82. The Message Generator 998 will then wait until the end of the HEW to warning period and then transmit the first byte of the Task Interactive Consistency message to the Transmitter, as indicated by the waveform sync dat. As previously described with reference to the Transmitter 30, the receipt of this first byte of the Task Interactive Consistency message wil1 initiate the begihning of the Task Interactive Consistency warning period and will also termlnate the buffer available (BA) signal, as indicated hy buffer available (BA) waveform, as indicated on Figure 83. At the end of the Task Interactive Consistency warning period, the Transrnitter wlll initiate the transmission of the first byte to all of the other Nodes in the system. It wlll also reassert the buffer available signal~ causing the Trars~itter Interface 960 to send the remaining data hytes to the Transmitter, as indicated by Z5 the sync dat and buffer available waveforms. As shown, the last byte transmitted by the Transmltter is the longitudinal redundancy code check byte, the end of which is timed to coincide with the end of the Subatomic periodO As discussed relative to Figure 82, when the next message to be sent is a pre-sync System State message, the HEW to warning period is the same as for the Task Interacti~e Consistency message, however, the Transmitter will substitute the System State warning perlod for the Task Interactive Consistency warning period and will begin the transmlssion of the System State message at a point in time earlier than it would have started to transmit the Task Interactive Consistency message, as indicated in Figure 84.
Referring now to Figure 84, there is shown the timing -136- ~3~

sequence for the transmission of a sync System State message. In the transmlsslon of a sync System State message, the Event Counter 1072 in the Message Generator 998 is loaded with the actual HEW to warning count, as indicated by hlock 1098 in Figure 82. As pre-viously d~scussed, the actual HEW to warning count is the sum ofthe nominal HEW to warning count plus the calculated sync delta.
At the end of the actual HEW to warning count, the Message Generator will transmit the first byte of the sync System State message dlrectly to the Transmitter 30 through the Transmitter Interface 960. The Transmitter then will initiate the transmission of the System State message at the end of the System State message warning period and will reinstate the huffer available signal~ as indicated, causing the Transmitter Interface to transmit the remaining sync System State message to the Transmitter 30. The transmission of the last byte of the System State message defi~es the end of the Atomic period. The adding of the sync delta to the nominal HEW to warning period corrects the length of ~he Atomic period so that its ending should coincide with the ends of the Atomic periods generated by the other Nodes in the system, thus, establishlng point-to-point synchronization with all of the other Nodes.

The operation of the Synchronizer in a cold start mode ~s discussed relati~e to the flow diagram shown in Figures 86 through 89 and the timing diagram shown in Figure 90.
Referring first to Figure 86, the cold start procedure begins by inquiring if the Synchronizer is in the cold start mode, as indicated hy decision block 1100. The cold start is indicated by the absence of the warm start flag and the operation flag. If the Synchronizer ~s not in the cold start mode of operation it wlll inquire whether it is in the warm start mode of operation, as ind~cated by declsion block 1102. If the Synchronizer is in the - warm start mode of operation~ as indicated by the warm start flag being true, the Synchronizer will call the warm start procedure, as indicated by block 1105. Otherwise, the Synchronizer will exit the cold start routine and default to ~he steady state mode of operation, as indicated hy block 1103.

-137~

If the Synchronizer is in the cold start mode of opera-tion, the Synchronizer will listen for messages from the other Nodes over the listening perlod shown in Figure 90. The Synchronizer will then inquire, as indicated by decision block S 1104, if the message received is a sync or a pre-sync message, as determined from the function bits contained in the message. If the message is~a pre-sync message, the message is time stamped, as ind~cated by block 11Cfi, and the pre-sync Message Counter 980 is incremented, as indicated by block llORo The In Sync With ~etector 994 will then inquire if the time stamp of the received message mlnus the time stamp of its own message is less than the hard error window, as indi~ated by decision b10ck 1110. If the difference between the time stamp of the received message and the -time stamp of its own message ~s less than the hard error window the "tn-sync-with" flag corresponding to the Node from which the message was received is set to true, as lnd1cated by block 1112.
Otherwise if the difference between the time stamp of the received message minus the time stamp of its o~n message is greater than the hard window error the "in-sync-with" flag in the In-Sync-With 20 Register 996 is set to false, as indicated by block 1114.
Returning now to decision block 1104, if the sync/pre-sync function bit contained in the received message is a sync blt, the Ttme Stamper will time stamp the received message and set the TS flag to sync and the update flag to true, as indi-cated by block 1116. The Synchronizer wi!l then increment the Sync Message Counter 982, as indicated by block 1118.

The operation of the Sync Correction Generator ~qO and Adder 992 shall be explained wlth reference to the flow diagram ` illustrated in Figure 87. The Sync Correctjon Generator 990 first inquires, as indicated by decislon block 1120, if the listeningperiod is done. The listen~ng period during a cold start is equal to a full Atomic period plus the hard error window, as indicated in Figure 90, Durlng this phase of the operation the Within Hard Error Window and Soft Error W~ndow Checker 976 will not generate any error signals during this period ln response to the Warm Start Flag and the Operating Flag being set to false.

3~f~ ~

Once the listening period is over the Sync Correction Generator 990 will inquire if the number of pre-sync counts stored in the Pre-Sync Message Counter 980 1s equal to the number of sync counts stored in the Sync Message Counter 982. If the pre-sync S count ;s equal to the sync counts, the Sync Correction Generator will set the Subatomic period delta equal to zero and the sync delta equal to zero, as indicated by block 1148. If the pre-sync count is not equal to the sync count, the Sync Correction Generator 990 will then inquire if the pre-sync count is greater than the sync count, as indicated by block 1124. If the pre-sync count is greater than the sync count the Time Stamp Voter 95fi will extract from the T;me Stamp Memory all of the time stamps for which the TS flag is set to the pre-sync and the update flag is set to true~ The Time Stamp Voter 95h will then generate a voted Subatomic period count and a voted sync clock count using the extracted values. The Sync Correction Generator 990 will then subtract its own Subatomic period count from the voted Subatomic period count to generate the SAP delta and will subtract its own sync clock count from the voted sync clock to generate a sync delta, as 1ndlcated by block 1126.
Alternatively, i~ the sync count is greater than the pre~sync count, the Tlme Stamp Yoter 956 wlll generate a SAP delta and a sync delta using the time stamps having their TS flag set to sync and the update flag equal to true, as indicated by block Z5 1146.
I~ the Subatomic period delta is equal to zero, as indi-cated by decision block 1127, then the Sync Correction Generator 990 w~ll set the Subatomic period delta equal to zero and the sync delta equal to computed sync delta, as indicated in block 1129.
The Sync Correction Generator 990 will then ~nquire lf the sync delta is greater than the maximum sync delta, as in~icated by declslon block 1132, If it is, the Sync Correction Generator will set the sync delta equal to the maximum sync delta stored in the Scra~ch Pad Memory 962, as ~ndicated in Figure 75. If the sync delta is not greater than the maximum sync delta, as determined by decislon block 1132, the Sync Correction Generator will inquire if ,~ .

-139- 3L~ 3~

the sync delta is greater than the two's complement of the maximum sync delta~ as indicated by decision block 113~. If the sync delta is greater than the two's complement of the maximum sync delta9 the Sync Correction Generator 990 wlll set the sync delta equal to the two's complement of the maximum sync delta, as indi-cated by block 1138. Otherwise, the sync delta w~ll remain the computed sync delta.

Returnlng now to declsion block 1127, if the Subatomic period delta is not equal to zero then the Sync Correction fienerator 990 will inquire if the Subatomic period delta is greater than zero, as indicated by decision block 1128. If the ~ubatomic period delta is greater than zero, the Sync Correction ~enerator will set the Subatomic period delta egual tG the Subatomic period delta minus 1 and t~e sync delta equal to the 15 maximum sync delta, as indlcated in b10ck 1130. Otherwise, the Sync Correction Generator will set the Subatomic period delta equal to the Subatomlc period delta plus 1 and the sync delta equal to the two's complement of the maximum sync delta, as indi-cated by block 1144.
nnce the Subatom~c period delta and the sync deltas are ~etermined, the actual Subatomic period per Atomic period count is generated by adding the Subatomic period delta to the nominal Subatomic perio~ per Atomic period count, as lndicated in block 1140. The actual HEW to warning period is generated hy adding the sync delta to the nominal HEW to warn~ng period in the Adder 992, as indicated by block 1141. The actual Subat~mic period per Atomic period count and the actual HEW to warning period counts are stored in the Scratch Pad Memory 960, in the locations iden-' tified in Figure 75. The final operation of the Sync Correction : 30 ~enerator 990 is to set the Pre-sync Message Counter 980 and the Sync Message Counter g82 to zero, as indicated in block 1142~
The operation of the Data Handler 968 during the cold start mode of operation is shown in the flow diagram of Figure 88.
As each ~essage ~s received from the Message Checker Interface 942, the Data Handler inquires i~.the sync~pre-sync bit is a sync .?~ 33~

bit, as ind~cated by decislon block 1150. If the sync/pre-sync function bit contained in the message from Node j is a sync ~it, the Data Handler, as indicated by block 115~, will store the "in-sync-with" vector of the received message in the In-Sync-With matrix (row j) contained in the Scratch Pad Memory, as shown in Figure 75. However, if the synctpre-sync function bit contained in the message is a pre-sync bit, the In-Sync-With matrix con-tained in the pre-sync message is stored in the Byzantine Voter Memory 9709 as indlc3ted by block 1154.

The determination of a potential operating set of Nodes and the settlng of the Operating and Warm Start Flags shall he discussed relative to the flow d~agram shown in Figure 89. The Byzantine Voter 954 will wait until the listening period is over, as indicated by decision block 1156, then execute a byzantine vote using the In-Sync-With matrix stored in the Byzantine Voter Memory 920, as indicated by block 1157~ Since each Node sends an In-Sync-With matrix whlch is stored in the Byzantine Voter Memory, these In-Synce-With matrices form a three-dimensional cube of "in-sync-with" vectars, as shown in Figure 94. The ~yzantine Voter makes a first vote through the In-Sync-With matrix, as shown by the arrow 1204 in Flgure 94, which will reduce the three-dimensional matrix to a two-dimensional matrix, as shown in Figure 95. The Byzantine Voter 954 will then take a second vote in a direction of the arrow 1206, shown ~n Figure 95, to generate a Byzantine Agreement as to whlch Nodes are in synchronizat;on with each other. The Byzantine Agreement ls then forwarded to the Operating Condition Detector 1000 as a potential operating set (POS), as indlcated by block 1158. The Operating Condition Detector 1000 w1ll ~hen compare the number of Nodes in the poten-tial operating set with the minimum number of Nodes ~equired forstart up, as indicated by decislon block 1160. If -the number of Nodes in the potent~al operating set is less than the minimu~
: start up size, the Operating Condition Detector will set the Warm Start Flag 1002 and the Operatlng Flag 1004 to false, as indicated by block 116i. However, if the number of Nodes in the potential operat~ng set is greater than the start up size~ the Operating -141~ 3~93~

Condition Detector 1000 will then determine if its own Node is in the potential operating set9 as indicated by block 116~. If the Node ~s ln the potential operating set, the Operating Condition Detector will set the Operatlng Flag equal to true, as indicate~
hy block 1164~ and then send the potential operating set to the Fault Tolerator along with the Operating Flag, as indicated by block 11h6. If the Node is not wlthin the potential operating set, the Operation Condition Detector will set the Warm Start Flag 1002 to true, as in~icated in block 1168. The setting of the Warm Start Flag to true will switch the operation of the Synchronizer from the cold start mode to the warm start mode, as indicated by block 1105 in Figure 8fi. The potential operating set and the nperating Flag transmitted to the Fault Tolerator are trans~erred to the Scheduler 40 and inltiates the operation of the Operations Controller.

Figure 90 shows the operation of the Synchronizer during a cold start. At the beginning of the cold start each Synchronizer will transmit an initial sync System State message in which the "ln-sync-with" vector is all zeros. The Time Stamper ZO 972 will then time stamp the System State messages received from all of the other Nodes in the system during the time stamp listening perlod whlch is equal to an Atomic period plus the hard error window interval, as indicatedO During this period, the Synchronizer will count the number of Subatomic periods which elapsed from the initial sync System State message and will send a pre-sync System State message at the appropriate time. In this f1rst pre-sync message the In-Sync-With matrices are all zeros since lt has not received any in-sync-with vectors from the other Nodes at this point in time. At the end of the hard error window follow1ng the transm~ssion of the pre-sync System State message, the Synchronizer will process the received time stamps and will generate the required SAP delta and the sync deltas and adjust the interval between the pre-sync message and the next sync message.
The Synchronizer will also compare its t1me stamp with the voted time stamp and determine which Nodes it is in synchronization with. At the end of the adjusted interval, the Synchronizer will again transmit a sync message which contains the generated ~ 3~

"ln-sync-with" vector. During the interval from the preceding pre-sync message to the followtng pre-sync message the Synchronizer will collect an~ store the l'in-sync-with" vectors received from the other Nodes 1n the Scratch Pad Memory and 5 assemble an "In-Sync-With" matrix in the Scratch Pad Memory.
The Synchronizer then wlll count the nominal numher of Subatomic periods per Atomlc period and wi~l generate the special "cold-start" pre-sync System State message which contains the In-Sync-With matrix assembled in the Scratch Pad Memory. ~uring the listening period preced1ng the send1ng of the second pre-sync System State message the Synchronizer will t1me stamp all of the sync messages received from the other Nodes. `In the hard error window intervdl either side of the transmission of the second pre-sync System State message, the Synchronizer will collect the In-Sync-With matrices transmitted hy the other Nodes and store them in the Byzantine Voter Memory 370. After the end of tne hard error window the Synchronizer will compute the sync correction for the interval between the pre-sync and the next sync message to effect synchronization hetween the Nodes. It will then determine its own "in-sync-with" vector and perform a byzantine vote on the In-Sync-With matrices stored ;n the Byzantine Voter Memory.

During this processi~g interval immediately following the end of the HEW 1nterval, the Synchronizer will also test to determine if a potential operating set exists and whether or not its own Node is included in the potential operating set.
At the end of the adjusted synchronization interval the Synchronizer will once again transmit a sync System State message which will include its own "in-sync-with" vector. It will also assemble a new "~n-sync-with" matrix from the I'in-sync-with" vec-tors generated by the other Nodes hetween the second and the thirdpre-sync System State message. This process is repeated until a potentlal operating set is determined by the result of the byzan-tine vote on the In-Sync-With matrices stored ~n the ~yzantine Voter MemoryO

The operation of the Synchronizer during a warm start w111 be discussed relative to the flow dlagram shown in Figure 91 and the timing dlagram shown in Figure 92~ Duri~g the warm start, the Synchron~zer recognizes the existence of a potential operating set and its main function is to establish synchronization with that operating this set.

Referrlng now to Figure 91, the warm start begins by the detection of the warm start flag, as ~ndlcated by decision block 1170. If the wann start flag ls true, the Tlme Stamper will time stamp each received message, as ~ndicated by block 117~. ~he In Sync With Detector 994 will then determ~ne if it is "in-sync-with"
any of the other Nodes, as indicated by decision block lt74, in which the difference ~etween the Nodels own time stamp and the time stamp for each received message is compared with the hard error wlndow. If the difference hetween the Nnde's own time stamp and the time stamp of the received message is less than ~he hard error window interval the "ln-sync-with" flag in the ISW Register 996 is set to true for each Node in which ~his occurs, as indi-cated hy block 1176. If the dlfference between its own time stamp and the time stamp of the received message is greater than the hard error window interval, the ISW Detector 994 will set the "in-sync with" bit for that particular Node stored in the ISW
Reg~ster 996 to false~ as indicated by hlock 1178.

During the warn start the Synchron;zer will time stamp all of the System State messages received during a listening period which is equal to one Atom~c period plus the hard error window interval. This ls the same listening period used during a cold start. When the listening period is done, as indicated by decision block 1180, the Synchronizer will compute the sync correction which will adJust ~he length of the Atomic period bet-ween the pre-sync and the next syne System State message, as indi-cated by block 1184. The computation o~ this correction is the same as the computation used during a cold start. If the Operating Condition Detector 1000 concludes that its own Node is in synchronizat10n with the exist~ng operating set of Nodes the Operating Conditlon Detector wlll set the operating flag equal to true and the warm up flag equal to false, as lnd~cated by block -lq4-1188, and then lt will send the "In-sync-with" vector and the operating flag to the Fault Tolerator 36, as indicated by block 1190. The Fault ~olerator 36 will use this in-sync-with vector as its initial system state vector during the subsequent start up operations.

Referring now to the timing diagram shown in Fiyure 92, during the warm start perlod the Synchronizer will only transmit sync and pre-sync System State messages in an alternating sequence. In the processing interval following the hard error window associated with each sync and pre-sync System State message the Synchronizer will compute the sync correction to adjust the length o~ the Atomic period following the pre-sync message to effect synchronization with the existing operating set. It will also generate its own local "~n-sync-wlth" vector during the same processing interval and test thls 'l1n-sync-with" vector to deter-mine if its own Node is in synchronization with the operating set.
I~ its own Node 7s in synchronization w;th the existing operating set~ the Synchronlzer will then go to the operating state and will exit the warm up state. As shown ln Figure 92, this process is repeated until the Synchronizer is ln sync with the existing operating set.
The Synchronizer also performs a byzantine vote on the task comp1eted vector and the branch condltiQn bits, as previously descrihed w~th reference to the Scheduler 40. The task completed vector and the branch condition bits are embodied as separate bytes in the Task Interactive Consistency and the System State messages and are stored in the Message Memory 964~

Re~erring now to Figure 93~ at the end of each hard error w~ndow, as ind~cated by block 1192, the Synchronizer will transfer the task completed vectors from the Message Memory 964 to the Byzantine Voter Memory 970, as indlcated by block 1194. After all the task completed vectors are transferred to the 8yzantine Yoter Memory the Byzantine Voter 954 will e~ecute a byzantine vote on all o~ the tr~nsferred task completed vectors and genera~e a 3S voted task completed (T~) vector, as indicated by block ll9h. The Synchron~zer will then transfer the branch condition bits to the ~ 7t ~ 3 ~

Byzantine Voter MemGry 970, as indieated hy block 1198, then exe-cute a byzantine vote to generate voted branch condit~on bits, as ind1cated by block 1200. The Byzantine Voter 954 wlll then pass the vote~ task completed vector and the voted branch cond1tion bits to the Scheduler, as indicated by block 1202. This assures that the Scheduler in each Node will record the task completed in a consistent fault tolerant manner.
BYZANTINE VOTER

The function of the Byzantine Voter is to guarantee con-IO sistency among the Nodes in reaching agreement on certain criticalmatters. The reliability of a distributed fa~lt-tolerant system depends on the ability of all the non-faulty Nodes to reach a con-sistent agreement despite the presence of one or more faulty Nodes. ~ince all failure modes o~ the faulty Node cannot be enu-merated, any mechanism for achieving agreement must be provablycorrect in the presence of arbitrary failures.

The problem of reaching agreement was originally expounded by analogy to the several divisions of the Byzantine army encamped around an enemy city9 as described by Lamport, L., Shostak, R., and Pease, M., "The Byzantine General's Problem," ACM
TOPLAS, Volumn 4, Number 3, July I9, 1982 and "Reaching Agreement in the Presence of Faults," JACM, Volume 27, No. 2, April 1980.
In the Byzantine army ar,alogy, each division is commanded by a general (Node) which can communicate with the other generals via messengers (communication links). The generals need to reach a consistent decision about whether to attack or retreat. Some of the generals may be traitors who will attempt to conf~se the other generals. Since all possible failure modes must be considered a traitorous general is permitted to lie, send different messages to different generals9 to tamper wlth relayed messages, to act in collusion with other tra~tors, or otherwise appear to act in a pern~cious manner.

The system state which guarantees system consistency is referred to as a Byzantine Agreement, and is defined by two con-ditions.

-146~

1. Agree~ent: All loyal generals agree on the con-tents of every message sent~
2, Validity: If the sending general is loyal, then all loya1 receiving generals agree on the content of his messages 5 as originally sent.

These Agreement condltions embody three important con-cepts. First, if the sending general is a traitor the spec;~ic ~ec;s;on made by the loyal generals is immaterial provided they all make the sa~e decision. Second, reaching agreement does not require identification of the traitors. Third, no assumptions have been made restricting the traitor's beha~vior.
In order to guarantee Byzantine Agreement regarding a given message, one or more synchronous rounds of transmission are required. During each round, every general broadcasts a copy of every message received during the previous round. Agreement can he guaranteed in the presence of one traitor if there are at least four generals and two rounds of messages are transmitted.

For numerical data, it is also possible to ~efine a state of approximate agreement as meeting two s~milar conditions.
1. Agreement: All non-faulty Nodes eventually agree on the values that are within some small difference o~ each other.

~. Validity: The voted val~e obtained by each non-faulty Node must be w~thin the range of initial values generated by the non-faulty Nodes, The details of the Byzantine Voter are shown in Figure 96 and will be discussed with reference to the "~n-sync-with"
matrices and vectors shown in Figures 94 and 95, respectively. It is to be recognlzed that the Byzantine Voter discussed here is ;~ not limited to the voting on the "in-sync-wlth" vectors, the task completed vectors, or the binary bits as applied to the instant app1lcation.

-147~ 3 As previously discussed, each Synchronizer will generate its own "ln-sync-with" vector which is transmitted to all o~ the other Nodes in the System State messages. Each Node will store the "in-sync-wlth" vectors received from all o~ the other Nodes in the Scratch P~d Memory 962 to form an "in-sync-with'l matrix, as shown ;n Figure 75~ nuring the cold start mode o~ operation this "in-sync-with" matrix, as shown ~n Figure 8~, is transmitted with each pre-sync System State mess~ge to all of the other Nodes in the system. Each Synchronizer will then store each of these "in~sync-w~th" matrices in the Byzantine Voter Memory to form a three-dihnensional cube, as shown in F~gure 94. This constitutes the two rounds of transmission required for a Byzantine Agreement.
The Byzantine Voter will first vote on the value of each "in-sync-with" bit of the matrix longitud1nally through the matrix, as indicated by the direction of the arrow 1204 in Figure 94. The flrst vote will reduce the three-dimensional cuhe to a two-dimenslonal matrix, as shown in F~gure 95, where each "in-sync-with" bits is the voted value of the first vote. The Byzantine Voter 954 wlll then vote on the values of the "~n-sync-with" ~it in each column of the "in-sync-with" matrix shown in Figure ~5. The direction of the second vote by the ~yzantlne Voter is lndicated by arrow 1206. The result of the second vote will be a Byzantine Agreement of the individual "in-sync-with" bits for each Node in the system, which is transmitted to the Operating ConditiDn ~etector 1000, as shown in Figure 77. The circuit details of the Byzantine Voter are shown in F1gure ~h.

Referring now to Flgure 96, t~e Data Handler 968 will load the data to be voted on by the Byzantine Voter into the Byzantine Yoter Memory 97~. A Byzantine Voter Control Logic 1230 will activate an Address Generator 1210 tn response to the message type and the operating flays~ As previously discussed, the Byzantine Voter will vote on the task completed and ~ranch con-dition vectors contained in the Task Interactive Consistency messages and the "in-sync-wlth" matrices contained in the pre-sync System State messages during the cold start mode of operation.

'~ 3~

The Address Generator 1210 will address the Byzantine Voter Memory in the appropriate manner and store the addressed data in a Data Register 1208. Each bit in the Data Register 1208 ts applied to one ;nput of a plurality of AND gates 1212 through 1226. Each AND
gate receives a respective one o~ the data bits stored in the Data Reglster 1208. A Decoder 1228~ responsive to the addresses ~eing generated by the Address Generator, will selectively deactivate one of the AND gates which corresponds to the "in-sync-with" bit generated ~y each Node with respect to itself, as indicated ~y the X's in the blocks shown in Figure 95.

A Message Counter 1284 monitors the-number o~ vectors or matrices loaded into the Ryzantine Memory 970 and generates a two's complement value corresponding to one half of the numher of vectors or matrices loaded into the Byzantine Voter Memory 970.
This value is loaded into a plurality of Accumulators 1264 through 127~ through a like plurality of Z:l Multlplexers 1232 through 124fi and Adders 1248 through 126~. Under the control of the Byzantlne Voter Control Logic the Data Register wil~ then switch the 201 Multiplexers 1232 through 1246 to the output of the AND
gates 1212 through 1226 and will add the content stored in the Data Registers to the amount stored in the Accumulators. The Byzantine Control Logic will then load the data from the Byzantine Voter Memory into the Data Register 1208 ~n aceordance with the message type and the operating flags. For example, if the byzan-tine vote is being taken among the "ir-sync-with" matrices generated during the cold store, the Address Generator 1210 will sequentially load the in sync with vector from Node 0 f~om the matrix transmltted by Node 0 and then sequential~y the matrix from . ' Node 1 through Node 7. During the adding of the bit value in each ; 30 of the Adders 1248 through 1262 to the amount stored in the ; Accumulators 1264 through 1277, an overflow blt w~ll be generated when the sum is greater than 0. Any overflow bits generated during the addition process w1ll be stored in a Byzantine Voted Yalue Register 1280. After the data from the matrix from Node 7 ls pro-cessed, the content of the 3yzantine Voted Yalue Register is passed to a 1:3 Mul~iplexer 12B2 which stores thls data in the Byzantine Voter Memory 970 to form the two-dimensional matrix, such as shown in Figure 95.

The Address Generator will then index and will process the "ln-sync-with" vectors from Node 1 from the matrices from Node 0 through Node 7 as previously done with regard to the "in-sync-with" vectors of Node 0. First the Message Counter 1284 will loa~ the Accumulators w~th a two's complement value corresponding to one half of the numher of matrices that will be processed. The Address Generator will then load the "in-sync-w~th" vectors generated by Node 1 taken ~nom the matrices : 10 received from Node 0 through Node 7, as previously described.
Again~ the overflow bits signifying that the results from the addition exceed zero, are stored in the Byzantine Voted Value Register 1280 which also is restored in the ~yzantine Voter Memory 970 through the Multiplexer 128~. This process is repeated for the "in-sync-with" vector generated by each of the Nodes and is terminated when the vectors from Node 7 are completely processed and all the Byzantine Voted Values are stored back into the Byzantine Voter Memory 970, forming the two-dimensional matrix shown in Figure 95.

~() After the first vote has been completed on all o~ the "~n-sync~with" vectors stored in the "in-sync-with" matrices from all of the Nodes, the Byzantine Voter Control Logic 1230 will ini-tiate the second vote in which the voting is taken down the columns, as indlcated by arrow 1206 in Figure 95. During the.
secon~ Yote the Address Generator 1210 w~ll load the column for Node 0 into the Data Register 1208. The Message Counter again will load the two's complement corresponding to the value of one half the number of b~ts to be processed hy the Byzantine Voter into the Accumulators 1264 through 1278. The.Adders will then add the bits stored in the Data Register to the value stored in the Accumulators 1264 through 1278. Thi s process is repeated until the columns for all of the Nodes have been processed. Again, the overflow bits from the Adders 1248 through 12fi2 are stored in the ~yzantine Voted Yalue Register 1280. The Byzantine Voter Control Loglc 1230 will then activate the.1:3 Multiplexer to pass the "in-sync-with" Yector stored in the Byzantine Voted Value Register 3 ~

to the Operating Condition Detecto~ 1000, as previously described.
This "in-sync-with" vector represents a Byzantine Agreement on which Nodes are in synchronization with each other.

When the Byzantine Voter is voting on the task completed vectors and the branch condition bits contained in the Task Interactive Consistency and System State messages, the Data Handler will load these values into the Byzantine Voter Memory 970. The Byzantlne Voter Control Logic 1230 will then activate the Address Generator 1210 to load the columns o~ the task completed vectors into the Data Register 1208, as previously described with reference to the second vote on the "in-sync-with"
vectors. The voting process is then identical to the voting pro-cess for the second vote on the "in-sync-with" vectors and the voted value is loaded into the Byzantine Voted Value Register from the overflow outputs of the Adders 1248 through 1262. The Byzantine Voter Control Logi~ 1230 will then activate the 1:3 Multiplexer 1282 to pass the voted task completed vector and the voted branch condition bits to the Scheduler Interface 944, as previously described.

The Operations Controller and the suhsystems discussed herein represent a distributed multi~computer fault-tolerant architecture based on the functional and physical partitioning of the application task and the overhead ~unctions. It is not intended that the invention be limlted to the structures illustrated and discussed herein. It is known that those skilled in the art are capable of making changes and improve~ents within the spirit of this inventiorl as described above and set ~orth in the appended claimsO
' What is claimed is:

Claims

1 . In a multiple node fault tolerant processing system capable of processing a set of application tasks in which each node (10) has an applications processor (14) for executing a pre-determined subset of said set of application tasks and an opera-tions controller (12) for controlling the operation of the node (10) and scheduling the order in which the individual tasks in said predetermined subset of tasks are to be executed by the applications processor (14) through the exchange of inter-node messages containing data and operation information with all of the other nodes (10) in the processing system, the operations controller (12) generating at least two timing period intervals, a fundamental timing period and a master period which is an integer multiple of the fundamental timing period, the master period defining the timing interval during which every task in said pre-determined subset of tasks is scheduled for execution by the applications processor at least once, a task scheduler (40) characterized by:
a task activity list (444) containing an entry for each active task in said multiple node processing system, each entry containing an execution periodicity and a node allocation for that task;
a priority scan list (446) containing a selected portion of the active tasks in the task activity list (444) which are available for execution, said selected portion of said active tasks being stored in their preferred order of execution;
a completion status list (438) storing the same selected portion of said active tasks stored in said priority scan list (446);
a selection queue (450) storing for each node the active tasks ready for selection in their preferred order of exe-cution;
a period counter (442) for counting said fundamen-tal timing periods to generate a period count corresponding to the number of fundamental periods which have expired since the beginning of a new master period;
wake-up sequencer means (440)for interrogating said task activity list (444) to transfer to said priority scan list (446) and said completion status list (438) all of the active tasks whose periodicity is greater than said period count;

priority scan means (448) for transferring to said selection queue (450) for each node entry the highest three priority active tasks which are ready for execution by that node;
task selector means (452) for selecting the highest priority active task currently stored in said selection queue for its own node as the next task scheduled for execution by its own applications processor; and a task interactive consistency handler (436) for updating the status of each task in said task activity list (444).
said priority scan list (446) said completion status list (438) and said selection queue (450) which are identified in inter-node messages reporting the completion of a task.

2. The task scheduler (40) of Claim 1 wherein each task entry of said completion status list (438) has a completion count entry storing the 2's complement of the number of nodes which are scheduled to execute that task, said task interactive consistency handler (436) having means for incrementing said completion count in response to inter-node messages identifying which node completed that task and for setting a terminated flag when said completion count is decremented to zero indicating that the task has been executed by all of the nodes (10) scheduled to execute that task.

3. The task scheduler (40) of Claim 1 wherein each task entry of said task activity list (444) and said priority scan list (446) has a predecessor count entry indicative of the number of tasks which must he terminated before it can be executed said task interactive consistency handler (436) having a successor list storing the identity of all the tasks for which the terminated task is a predecessor and means responsive to the termination of a task for accessing said successor list to identify each task for which the terminated task is a predecessor and for decrementing the predecessor count in said task activity list (444) and said priority scan list (448) for each of said identified tasks.

4. The task scheduler (40) of Claim 3 further having an "old task" list (458) storing for each node (10) the task currently being executed by that node said task interactive con-sistency handler (436) having means for recording as "used" in said selection queue (450) the highest priority task currently stored for each node (10) which reported it has started a new task and for recording the identity of said highest priority task in said "old task" list (458).

5. The task scheduler (40) of Claim 1 in which the inter-node messages exchanged between the nodes (10) includes a task completed/started message and a task interactive consistency message, said task completed/started message is sent to all of the other nodes (10) whenever a node begins a new task, said task completed/started message containing at least the identity of the task started and the identity of the task completed by that node and said task interactive consistency message is sent at predeter-mined timing intervals and contains a task completed vector iden-tifying each node (10) which sent a task completed/started message, said task completed vector being a voted composite of the task/completed started messages received from all the nodes (10) in said predetermined timing interval, said task scheduler (40) having a started task register (434) for storing the identity of the task reported as started in said task completed started messa-ges received from that node (10) and said task interactive con-sistency handler (436) responsive to the task completed vector contained in said task interactive consistency message to compare the identity of the highest priority task stored for each node (10) identified as having completed a task in said task completed vector with the identity of the task stored in said task started register (434) and to generate a sequence error signal when they are not the same.