CA2040322C

CA2040322C - Asynchronous resynchronization of a commit procedure

Info

Publication number: CA2040322C
Application number: CA002040322A
Authority: CA
Inventors: Kathryn H. Britton; Andrew P. Citron; James P. Gray; Barbara A. Maslak; Timothy J. Thatcher
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1990-05-16
Filing date: 1991-04-12
Publication date: 1995-10-10
Anticipated expiration: 2011-04-12
Also published as: EP0457112A3; US5319773A; JPH0831043B2; BR9102018A; US5613060A; EP0457112A2; CA2040322A1; DE69132065D1; JPH04229333A; EP0457112B1

Abstract

A computer system and process efficiently provides resource recovery for a failure during a commit procedure. An application is run on a processor and requests a work operation involving a resource such as a protected conversation with another application in a different real machine. A commit procedure is begun for the work request, and if the commit procedure fails before completion, the following steps are taken to optimize the use of one or both of the applications. At some time after the commit procedure fails, a return code is sent to at least the application that initiated the commit indicating the result of the application commit order and that the application can continue to run and does not have to wait for resynchronization (recovery).
Then, while the initiating application continues to run and do other useful work, resynchronization is implemented in parallel, asynchronously.

Description

~- EN9-90-001 ~0~22 ASYNCHRONOUS RESYNCHRONIZATION OF A COMMIT PROCEDURE

BACKGROUND OF THE INVENTION

The invention relates generally to computer operating systems, and deals more particularly with a distributed computer operating system for a distributed application which operating system can automatically and efficiently resynchronize a two-phase commit procedure after a sync point failure.

The operating system of the present invention can be used in a network of computer systems. Each such computer system can comprise a central, host computer and a multiplicity of virtual machines or other types of execution environments. The host computer for the virtual machines includes a system control program to schedule access by each virtual machine to a data processor of the host, and help to manage the resources of the host, including a large memory, such that each virtual machine appears to be a separate computer. Each virtual machine can also converse with the other virtual machines to send messages or files via the host. Each VM virtual machine has its own CMS portion of the system control program to interact with (i.e., receive instructions from and provide prompts for) the user of the virtual machine.
There - EN9-90-001 2 20~22 may be resources such as shared file system (SFS) and shared SQL relational databases which are accessible by any user virtual machine and the host.

Each such system is considered to be one real machine. It is common to interconnect two or more such real machines in a network, and transfer data via conversations between virtual machines of different real machines. Such a transfer is made via communication facilities such as AVS Gateway and VTAM~facilities.

An application can change a database or file resource by first making a work request defining the changes. In response, provisional changes according to the work request are made in shadow files while the original database or file is unchanged. At this time, the shadow files are not valid. Then, the application can request that the changes be committed to validate the shadow file changes, and thereby, substitute the shadow file changes for the original file. A one-phase commit procedure can be utilized. The one-phase commit procedure consists of a command to commit the change of the resource as contained in the shadow file. When resources such as SFS or SQL resources are changed, the commits to the resources can be completed in separate one-phase commit procedures. In the vast majority of cases, all resources will be committed in the separate procedures EN9-90-001 3 2 Q~Q32~

without error or interruption. However, if a problem arises during any one-phase commit procedure some of the separate commits may have completed while others have not, causing inconsistencies. The cost of rebuilding non-critical resources after the problem may be tolerable in view of the efficiency of the one-phase commit procedure.

However, a two-phase commit procedure is required to protect critical reso~rces and critical conversations.
For example, assume a first person s checking account is represented in a first database and a second person s savings account is represented in a second database. If the first person writes a check to the second person and the second person deposits the check in his/her savings account, the two-phase commit procedure ensures that if the first person s checking account is debited then the second person s savings account is credited or else neither account i.s changed. The checking and savings accounts are considered protected, critical resources because it is very important that data transfers involving the checking and savings accounts be handled reliably. An application program can initiate the two-phase commit procedure with a single command, which procedure consists of the following steps, or phases:

(1) During a prepare phase, each participant (debit -- EN9-90-001 4 204032~

and credit) resource is polled by the sync point manager to determine if the resource is ready to commit all changes. Each resource promises to complete the resource update if all resources successfully complete the prepare phase i.e. are ready to be updated.

(2) During a commit phase, the sync point manager directs all resources to finalize the updates or back them out if any resource could not complete the prepare phase successfully.

An IBM~System Network Architecture SNA LU6.2 architecture (reference SC31-6808, Chapter 5.3 "Presentation Services - Sync Point Verbs", published by IBM Corp.) was previously known to coordinate commits between two or more protected resources. This architecture previously addressed sync point facilities consisting of a sync point manager which performed both sync point and associated recovery processing running in a single application environment. Several adapters could run simultaneously in this environment. The LU6.2 architecture supports a sync point manager (SPM) which is responsible for resource coordination, sync point logging and recovery. The prior art CICS/VSETM environment supports such an architecture.

According to the IBM SNA LU6.2 architecture prior art, in phase one and in phase two, commit procedures are executed and the sync point manager logs the phase in the sync point log. Also, the sync point manager logs an identification number of a logical unit of work which is currently being processed. Such logging assists the sync point manager in resource recovery or resynchronization in the event that a problem arises during the two-phase commit procedure. If such a problem arises after the two-phase commit procedure has begun, the log is read and resource recovery precessing is implemented to bring associated resources to a consistent state. The problems include failure of a communication path or failure in a resource manager.

The aforesaid SNA LU6.2 sync point architecture manages a commit failure in the following manner. The sync point manager that knows its second phase decision based on the state in the log entry invokes a complete resynchronization operation with any failed resources to which it was coordinating before returning control to the application program that re~uested the commit. One of the failed resources can be a protected conversation. In the aforesaid SNA LU6.2 sync point architecture, the initiators sync point manager must reestablish a session with the partner sync point manager or recovery facility in the system where the failure occurred. If such a session is not immediately - EN9-90-001 6 2 0 ~ 0 3 2 ~

available, the sync point manager continues to seek a session until one is available. For other protected resources which also need to be resynchronized, a session may also be needed with the resource manager that encountered the failure. The sync point manager cannot complete its processing until recovery takes place. The delay can be protracted and the initiating application and possibly other participation applications is prevented from doing other useful work during the delay.
The SNA LU6.2 sync point architecture permits a heuristic decision (manual or system default intervention) to force resynchronization. The intervention could be proarammed or directly controlled bv an operator to prevent indefinite interruption to the application program.
However, the intervention may cause heuristic damage whereby some resources involved in the sync point are committed and some are backed out.

It was also known from an article entitled "A Commit Protocol for Resilient Transactions" by Pui Ng from the University of Illinois at Urbana - Champaign, to provide an application program which is checkpointed at certain intervals in its processing. During each checkpoint, information about the state of a process is written onto a back-up node. If a failure occurs after a completed checkpoint and before the next checkpoint, all processing and updates occurring after the completed checkpoint must be backed out. This backout occurs asynchronously relative to 2~4~32~

the application program, and the application program can restart at the checkpoint without waiting for the backout to occur. When restarted, the application program can attempt a new instance of the same routine to process the same data under a new name. This new instance becomes the valid one, and the prior one under its original name becomes invalid. The article also describes a method for naming the instances to differentiate the valid one from the invalid one. However, this article is not concerned with asynchronous recovery of a failed commit procedure.

Accordingly, a general object of the present invention is to provide a process for resynchronizing a commit procedure for protected resources and conversations while avoiding extensive delays in the operation of an application program that initiated the commit procedure.

Another object of the present invention is to allow an application to make a local decision whether or not the sync point manager should wait for resynchronization to occur before returning to the application.

~ EN9-90-001 8 2 0 4 ~ 322 S~Y

The invention resides in a system and process for resource recovery which efficiently handles a failure during a commit procedure. An application is run on a processor and requests a work operation involving a resource such as a protected conversation with another application in a different real machine. A commit procedure is begun for the work request, and if the commit procedure fails before completion, the following steps are taken to optimize the use of one or both of the applications. At some time after the commit procedure fails, a return code is sent to at least the application that initiated the commit indicating the intent of the application commit order and that the application can continue to run and does not have to wait for resynchronization (recovery). Then, while the initiating application continues to run, resynchronization is implemented in parallel, asynchronously.

~ EN9-90-001 9 2 ~ 2 BRIEF DESCRIPTION OF ~1~ FIGURES

FIG. 1 is a block diagram of a computer system which incorporates all commit and recovery functions in each execution environment, according to the prior art.

FIG. 2 is a block diagram of a computer network including two interconnected computer systems according to the present invention. Each of the systems supports multiple execution environments with a common recovery facility and log.

FIG. 3 is a flowchart of a two-phase commit procedure for resources used by an application running in an execution environment of FIG. 2.

FIG. 4 is a flowchart of recovery processing that is implemented when an interruption occurs during the two-phase commit procedure described in FIG. 3.

FIGS. 5 (A) and 5 (B) are a flowchart of a two-phase commit procedure for resources used by partner applications running in two distributed application environments connected by a protected conversation supporting sync point facilities of FIG. 2.

2~32~

FIG. 6 is a block diagram illustrating plural work units defining different commit scopes within a single application environment of FIG. 2, and a commit scope transversing more than one system of FIG. 2.

FIG. 7 is a flowchart illustrating the use of local work units and a global logical unit of work by one application environment of FIG. 2 to define the scope of and facilitate commit processing.

FIG. 8 is a flowchart illustrating the use of local work units and the global logical unit of work of FIG. 7 by another related application environment of FIG. 2 to define the scope of and facilitate commit processing.

FIG. 9 is a timing diagram of a protected conversation in the global logical unit of work of FIGS. 7 and 8.

FIG. 10 is a block diagram that illustrates automatic and generic registration of resources within the systems of FIG. 2.

FIG. 11 is a flowchart illustrating a procedure for registering resources in a sync point manager of FIG. 6 for a suitable type of commit procedure and the steps of the commit procedure.

- EN9-90-001 11 2Q4~22 FIG. 12 is a block diagram illustrating registration on a work unit basis within the systems of FIG. 2.

FIG. 13 is time flow diagram of bank transactions illustrating registration on a work unit basis.

FIG. 14 is a flowchart illustrating a procedure for registering resources, changing registration information for resources and unregistering resources in the sync point manager.

FIG. 15 is a flowchart illustrating the procedure used by resource adapters, protected conversation adapters, and the sync point manager to unregister resources.

FIG. 16 is a flowchart illustrating processing by the sync point manager in response to a sync point request, and optimizations by the sync point manager in selecting one-phase or two-phase commit procedures.

FIG. 17 is a flowchart illustrating the two-phase commit procedure.

FIG. 18 is a-flow diagram illustrating three distributed application programs participating in a two-phase commit procedure.

- EN9-90-001 12 20~0~

FIG. 19 is a block diagram illustrating the components and procedure for exchanging log names to support recovery of a failed commit procedure when a protected conversation is made between an application in one system and a partner application in another system of FIG. 2.

FIG. 20 (A) and 20 (B) are flowcharts of communications facility processing associated with FIG. 19 for an initial event and a subsequent conversation event, respectively.

FIG. 21 is a flowchart of recovery facility processing associated with FIG. 19 that results when a local communications facility requests that the recovery facility exchange log names for a path.

FIG. 22 is a flowchart of recovery facility processing associated with FIG. 19 that results from receiving an exchange of log names request from another recovery facility.

FIG. 23 is a block diagram illustrating the components and procedure for exchanging log names with a local resource manager in a section of FIG. 2.

- EN9-90-001 13 2 0 ~ ~ 3 2~

FIG. 24 is a block diagram illustrating the components and procedure for exchanging log names using a system of FIG. 2 and a remote resource manager.

FIG. 25 is a block diagram illustrating the contents of a recovery facility of FIG. 2.

FIG. 27 is a flowchart illustrating the processing for exchange of log names between a participating resource manager and the recovery facility.

FIG. 28 is a block diagram illustrating portability of the sync point log and capability for activating back up recovery facilities.

FIG. 29 is a block diagram which illustrates participation by the resource adapter and sync point manager of FIG. 2 in passing an error flag and information that defines a problem in a commit procedure to an application program.

FIG. 30 is a flowchart illustrating a procedure for using the components of FIG. 29 to pass the error information to the application program.

~40~22 FIG. 31 is a control block structure for sharing the pages used by error blocks associated with FIG. 29 in order to reduce system working storage.

FIG. 32 is a block diagram of components of FIG. 2 that participate in the generation and management of the error flags and information of FIG. 29.

FIG. 33 is a block diagram illustrating three systems including commit cycles that encompass more than one of the systems commit scopes incorporating resource managers that reside in the same and different systems as an initiating application and communications paths employed during commit processing as well as paths used for sync point recovery processing.

FIG. 34 is a block diagram illustrating three participating application and application environments from FIG. 33 and the resource managers that they employ, forming a tree of sync point participants.

FIG. 35 is a high level flowchart illustrating the recovery facility procedures for pre-sync point agreements and procedures for recovery from a sync point failure.

2~4~2 FIG. 36 is a flowchart illustrating in more detail the recovery facility procedures for recovery from a sync point failure.

FIG. 37 is a block diagram illustrating the contents of logs 72 of FIG. 2 and control structures re~uired to control the procedures represented by FIG. 35.

FIG. 38 is a flowchart providing detail for FIG. 35, steps 299 and 300.

FIG. 39 is a flowchart providing detail for FIG. 35, steps 301 and 302.

FIG. 40 is a flowchart providing detail for FIG. 36, step 311.

FIG. 41 is a flowchart providing detail for FIG. 36, step 312.

FIG. 42 is a flowchart providing detail for FIG. 36, step 313.

- EN9-90-001 16 2 ~ ~ 0~2 FIG. 43 is a flowchart providing detail for FIG. 36, step 314.

FIG. 44 is a flowchart providing detail for FIG. 36, step 315.

FIG. 45 is a flowchart providing detail for FIG. 36, step 304.

FIG. 46 is a flowchart providing detail for FIG. 36, step 317.

FIG. 47 is a flowchart providing detail for FIG. 36, step 318.

FIG. 48 is a flowchart providing detail for FIG. 36, step 319.

FIG. 49 is a flowchart providing detail for FIG. 36, step 306.

- EN9-90-001 17 20 4~2~

FIGS. 50 (A) and 50 (B) are block diagrams which illustrate application 56A and application 56D requesting asynchronous resynchronization should an error occur during sync point processing.

FIG. 51 is a flow graph illustrating the steps of the asynchronous, resynchronization-in-progress process involving an additional system 50C.

FIG. 52 is a flow graph illustrating the steps of the asynchronous, resynchronization-in-progress process involving a failed backout order originating from system 50C.

FIG. 53 is a flow graph illustrating the steps of the asynchronous, resynchronization-in-progress process involving a failed backout order originating from system 50A.

FIG. 53A is a flow graph illustrating the steps of asynchronous, resynchronization-in-progress process involving a failed prepare call originating from system 50A.

FIG. 54 is a block diagram of another embodiment of the invention as an alternate to FIG. 2.

~0403~

DETAILED DESCRlP~ION OF THE PREFE~E~ED EM~ODIMENTS

Referring ~o the drawings in detail wherein like reference numerals indicate like elements throughout the several views, Figure I illustratc~ an LU6.2 syncpoint tower model or architecture according to the Prior Art.
Thi~ architecture is defined as one e~ecution environment. In the illustrated e~arnple, three application pro-grarns 14, 16, and 18 are run in e~ecution environment 12 in a time-shared manner. Resource Managers 26 and 27, DB/2 or CICS File Control (DB/2 and CICS are trademarks of IBM Corp.), control access to resources 22 and 24, respectively. It should be noted that if a DB/2 (CICS/MVS operating system) or a SQL,/DS (CICS/VSE operating system) resource manager were located outside of env~Dhrl.ent 12, then envi-ronment 12 would include a resouree adapter to interface to the resource manager aceording to the prior art.
In this prior art architecture, application program 14 makes a work request invoking 1~ soul. el 22 and 24 to syncpoint manager 20 ~efore le~.Jwt,ng committal of l~;iSOUI~,c5 involved in the work request.

Ne~t, application program 14 requests a commit from synepoint manager 20 to eornmit the data updates of the previous work request. In re~ponse, syncpoint manager 20 implements a ~wo-phase eornm~t procedure by polling resource managers 26 and 27 to determine if they are ready to eommit the resourees and i~ so, to sul sequently order the eommit. At eaeh phase (and each step of each phase) of the two-phase commit procedure, the syncpoint manager transfers syncpoint information to log 30 indieating the state of the two-phase eommit proeedure. If a failure oecurs during the two-phase commit proce~lure, the syncpoint manager will implement a synchruni~tion point I~CO~ly proeedure to bring the l~loulccs to a eonsistent state. The ~ynepoint manaBer relie~ on the syllcLùnL,ation point information in log 30 to deterrn~ne how far the two-phase eommit proeedure had progressed before interruption.

Synepoint manaBer 20 and the two-phase eornrnit proeedure are also used when any one of the applieations 14, 16 or 18 attempts to eoll~21unicate via proteeted conversation manager 40 using a proteeted con~. . .ation to an application partner in a separate environment in the sarne system (not shown) or to an applieation partner within another system (not shown) which is in~erconneeted via a eommunieation facility. Aceording 204~3~

_ Iq_ to the prior _rt s~.l~,l.l~r~t;on point &. I.it~,~t~ .;, this other sy~tem/other e~ uurullcllt is lull~,Lûnatly iden-tical to the c~ulion cn~uu~ent 12 qnd includes another ~ pou.~ 6~ îu...,lionally i-l~nti-~ql to 20, another ~ ,L.ur~l;ûn point log lu l onqlly identicql to 30, another p~ut~,~ted con~lc.~dtiûn --~, func-tionlly i-l~ntjrql to 40 . nd other resource ~ lul~ ionally - d ~ t ~ ' to 26 ~nd 27. This other ~nviron-mcnt p~u~;d~ coGIduldt;on and .~.~.~ lull~,t;om which are sep_rate from tho~e of ex~ulion C.l~uu.ll..~.lt 12.

20~o3~

--~o--COORDINATED SYNC POINT MANAGI~MENT OF rROTECTED RESOI)RCES

FIG 2. illustrates a syncpoint architecture according to the Present Invention. The invention includes a dis-tributed computer operating system which supports distributed and non-distributed applications e~ecuting within their own e~ecution envirùnl.lent such as a UNIX environment, OS/2 en~..unl~ent~ DOS environ-ment in OS/2 operating system, CMS environment in VM operating system, AIX env~trhllent in VM oper-ating system, CICS in VM operating system, and MUSIC environment in VM operating system. A
distributed application is distinguished by using a resource in another e~ecution env-,unl..ent or having a communications conversation - a special type of resource - with an application partner in another e~ecution environment. The e~ecution environment for the resource manager or application partner may b,e in the same system or a different one; it can be in the same typ~e environment or a foreign environment. A dislrib-uted application e~ecution environlnent comprises one or more systems supporting applications in their own environments that might not have all the resources required; those .~,soulces are distributed elsewhere and are acquired with the aid of a cornmunication facility. The complete environment of a distributed applica-tion appears to be full function because the distributed application involves resources that are in other en~i.unl..ents--especially the l~co~ly facility and co.~..unication facility.

The present invention comprises one or more systems (real machines or central electronic comple~es (CECs)) 50 A, D. In the illustrated embodiment, system 50A comprises a plurality of identical, distributed application environl..ents 52A,B, and C, a Coll~ l~tion manager 53A and e~ecution environment control programs 61A,B, and C which are part of a system control program 55A, and a recovery facility 70A.By way of e~ample and not limitation, each of the cnvLur.,nents 52A, B, and C can be an enhanced version of a VM virtual machine, recovery facility 70A can reside in another enhanced version of a VM virtual machine and system control prograrn 55A can be an enhanced version of a VM operating system for virtual m~^hin~.s 52A, B, and C. Applications running in distributed application environments 52A-C in real m~ ine 50A
can co....nunic~te with application partners running in similar distributed application en~irùrir.-ents running in real machine 50D or other systems (not shown) via comrnunication facilities 57A and D. By way of e~cample, communication facility 57A comprice~ Virtual TelecG..ununication~ Acces~ Method (~AM') facility and APPC/VMVTAM Support (AVS) gateway facility. Each distributed application envilon--.el~t 52 comprises a single syncpoint manager (SPM) 60A and a plurality of protected resource adapters 62A-B
and 64A.A syncpoint manager allows a group of related updates to be comrnitted or backed out in such a 2 ~ 32 ~

_ ~, way that the changes appear to be atornic. Thc updates performed between syncpoints (i.e.commit/backout are calied a logical unit of work and the related updates are identified through a uni~ue name assigned by the syncpoint manager via the recovery facility called a logical unit of work identifier. The logical unit of work can involve multiple protected resources accessed by an application in the same distributed application envi-ronment and can also involve protected resources accessed by a partner application in other application envi-ronments via a conv~ .sation which is one type of protected resource.

A conversation is a path estabiished in an architected manner between two partner appiications. The use of the conversation by each application is determined by the applications' design and the conversation paradigm uscd. When a conversation is to be included in the syncpoint process, it is called a protected conversation.
Protectcd resources become part of the logical unit of work by contacting the syncpoint manager through a process calied registration ac described below in Regi~tration of Re~iources for Cornmit rr~ . Each pro-tected resource adapter provides an interface to a resource manager both for an application and for the syncpoint manager. (Alternatively, the protected resource adapter can be merged with the resource manager if the resource manager resides in the same e~ecution environment as the application.) In the illustrated embodirnent, protected resources are files and conversations. In other embodiments of the prcsent invention, protected resources could be database tables, queues, remote procedure calls, and others.
Protccted resource adapters 62A and B handle interfaces on behalf of application 56A for resource managers 63A and B, respectively, which manage files 78A and B. Resourcc managers 63A and B are located in the same system. Altemativcly, they could reside in a different system in a communication network. In the illustrated embodiment, conversations are m~n~g~.d by a conversation manager which m~n~g~s the convc.~a-tions or paths from an application to other partner applications running in different distributed application envil~)nlllcnts in the same system, or different distributed application envu ,nlllcnt~ in different systems in a c~;l,u..unication network. If the protected conve.~ation is between two application partnors running in dif-ferent application envu.,nllle.lls in the same system, e.g. between application partners rurming in S2A and S2B, then the conversation manager is totally contained in the system control P~ UII 55A of ~ystern SOA, and communication is made between the application partner~ via each protected conversation adapter 64A
and 64B (not shown). If the protected conversation is between differcnt application ehv~oh,l.ents in dif-fcrent systems, e.g. between application partncrs running in 52A and 52D, thcn commu~ication jc made between the conversation managers S3A and 53D in systems SOA and SOD via communication facilities S7A

~- 2Q~03~

and 57D. In this embod~u~ , sueh co~ulluilicàlions utilize a peer to peer communication format. Conver-sation l"ana"_.~ 53A, D use an intra e.l~uulullcn~ format to cG"u,nJ,Licate with collulluilicdtion facilities 57A, D. Cs.. ~ lion facilities 57A, D translate the htra c~vuu~u~ t format to an architected inter-system cu~ iol- standard forrnat and vice versa. By way of e~ample this architected intersystem com-municat;on standard format can be of a type defined by IBM's System Network Architecture, LU 6.2 protocol.

Recovery faeility 70A serves all distributed applicàtion environments 52A,B, and C within real machine 50A.
It eontains log 72A, its p~V~ ,S handle logging for the syncpoint managers 60A,B, and C and it provides recovery for failhg ~ CpOi lt~ for ali distributed application environments 52A, B, and C. The sarne is true for recovery facility 70D and its log 72D, and syncpoint manager 60D on system 50D~

When ap~,litalioll 56A within d;i,ll;b.lh,d àpplication ~ uurullc~lt 52A desires to update fLles 78A and 78B, applicalion 56A makes two separate update requests via a file application program interface within applica-tion 56A. The requests hvoke pl~,t~l~ i resource adapters (hentefvllh called protected fLle adapter for this type of resource) 62A and 62B l~ JCC~ for fLles 78A and 78B (step 500 of FIG. 3). Based on resource manager specifie ' ~ - n, the prvt~t~,d file adapter knows the file is protected. If not already regis-tered with the ~u.,~t manager for the work unit, protected fLie adapters 62A and 62B register with ~y~lc~ull manager 60A the fact that they want to be involved in ali Commit/Backout requests for this work unit (step 502). A ~work unit~ is a grouping of all resources, directly accessible and ~isible by the applica-tion, that participate in a syne point. It is generally associated with a logical unit of work identLfier. For a further e~planation of work units, see Locai and Clobal Commit Scoi~ Tailored lo Work Units below.
Then p,ul~l~,ci file adapters 62A and 62B eontact their ~ ., resouree managers 63A and 63B to update fiies 78A and 78B (Step 504). Return is made to applicalion 56A. Ne~t applieation 56A requests a synepoint 58A, i.e. a commit in this case, to ~ ~Ull manager 60A (Step 506). In response, syncpoint manager 60A
initiates a two-phase eornmit p,.,.c~iu.~ (step 508) to be earried out for both of its ~ t~"cd ~C;.Oul~s, fLies 78A and 78B, l~pl~scul~d by proteeted file adapters 62A and 62B and their l~ resouree ."ânat,_.~
63A and 63B. In step 508, 5~ ~i.ll manager 60A eails eaeh of its registered l~iSOUI.~5 at the adapter e~it cl~ull e~it entry point, given to the orll~,~)Oult manager by each resouree adapter during registration, with a phase one ~prepare~ eali.

~()4~32 --~3 ~

During the course of e~ecuting its two-phase commit procedures, syncpoint manager 60A issues a request to recovery facility 70A to force log (~force log~ means to make sure the information was written to the actual physical device before returning to syncpoint manager 60A) on log 72A phase one syncpoint manager infor-mation (Step 508). This information includes the logicai unit of work identifier, the syncpoint manager state and the names and other pertinent information about each registered protected resource adaptcr pal~icipating in the comrnit request. This information was given to syncpoint manager 60A when file adapters 62A and 62B registered. Syncpoint manager 60A's state is determined by the rules of the two-phase commit paradigm being followed. For e~ample, the two-phase cornrnit paradigrn is of a type described by System Network Architecture l,U 6.2 Rererence: Pffl Protocols, SC31-6808, Chapter 5.3 rresentation Services- Sync Po;nt ~erbs published by the IBM Corporation. If a failure occurs during the syncpoint processing, the syncpoint manager state is used to deterrnine the outcome (Cornmit or Backout) of the logical unit of work. As per the rules of the two-phase comr~ut paradigm used by this embodiment, the syncpoint manager phase one state is, Initiator, Syncpoint Manager Pending. If the first phase of the two-phase commit procedure is not interrupted and is completed (decision block S12), syncpoint manager 60A issues a second request to recovery facility 70A to force log 72A to its phase two state. Based on the replies from the protected file adaptcrs and resource managers and the rules of the two-phase cornmit paradigm being used, S)~ncpoint manager 60A knows its second phase decision. In this embodiment, the paradigm is as follows. If one or more protected resources adapters respond ~backout~ to the phase one request, the phase two decision is ~backout~; if all respond ~request commit~, the decision is ~commit~. In the e~ample illustrated in ~;igure 3, protected file adapters 62A and 62B responded ~request comrnit~ (Step S10) and the phase two state is logged by syncpoint manager 60A as Initiator Comrnitted. It should be noted that in this e~arnple, file managers 63A &nd 63B after replying ~request cornmit' through their respective file adapters 62A and 62B to the phase one request are in a state of 'indoubt~, that is they can commit or backout the file updates based on the phase two decision from ~ l manager 60A.

I~tter logging, 5~11. pOillt manager 60A then issues the phase two call with the decision of commit to pro tected file adapters 62A and 62B (Step 513). When the file managers 63A and 63B receive the phase two commit decision, each proceeds to do wha~ever processing is necessary to comrnit the data, i.e. make the updates permanent (Step 516). When a successful reply is received from protccted file adaptets 62A and 62B
on behalf of their respective resource manager~ and there is no interruption in syncpoint processing (decision block 514), syncpoint manager 60A calls recovery facility 70A to write to log 72A the state of ~forget~ for 2~()32~

this logical unit of work (Step 515). This does not have to be a force log write which means the log record is written to a data buffer and retum can be made to syncpoint manager 60A. The buffer can be written to the physical media at a later point in time. Based on the two phase comrnit paradigm used in this embod-iment, syncpoint manager 60A updates the logical unit of work identifier (increments it by one) which guar-antees uniquenels for the ne~t logical unit of work done by applicalion 56A. The syncpoint manager then return~ to application 56 (Step SISA).

The two-phase commit pal.ld;g.lls have rules for recovery processing, such that recovery facility 70A knows how to coll,plc1e an interrupted syncpoint (Step 517 and FIG.4). If ~yllCpOIllt manager 60A'5 process was interrupted, decision block 514 leads to step 517 in which syncpoint manager 6nA contacts recovery facility 70~. In step 517 recovery facility 70A receives the logical unit of work identifier and infommation about the associated failed resource or IC~OI~ICCS from syncpoint manager 60A. Recovery facility 70A then finds the correct log entry (Step 518 of FIG.4). The log information, in combination with the two phase comrnit paradigm being used, allows recovery facility 70A't~ procedures to complete the interrupted syncpoint proc-essing (Step 519). Based on the two-phase commit paradigm being used in this illustrated e~ample, if the syncpoint state entry for the logical unit of work identifier on log 72A i~ Initiator, Syncpoint Manager Pending, each failed resource manager 63A or 63B will be told to backout; otherwise, each will be told the syncpoint manager phase two state which is on the log, i.e. cornrnit or backout (Step 520). Once the recovery state is deterrnined, ~GCo~.ly facility 70A will start recovery p~ ,GS~,CS vith each failed protected resource manager as described below in Log Name Exch~nge For Recovery or rrotected Resources and in Rccovery Facility For Inc~"..plete Sync roints For Distributed Appllcation. This processing consists of e~changing log narnes and a comparison of states whereby the rccov~ly process of recovery facility 70A tells the failed tesource manager 63A or 63B what to do, i.e. commit or backout, and the resource manager 63A
or 63B tells the recovery proce~s what it did. The recovery process of recovery facility 70A knows how to contact the failed re~ource based on information written by syncpoint manager 60A during its phase one logging activity. If the failed resource manager can be contacted (decision block 521) recovery takes place immediately (Step 522). After recovery takel place with each failed resource (decision block 523~ return can be made to syncpoint manager 60A (Step 523A). Syncpoint manager 60A will then return to the application 56A (Step SISA). If the failed resource manager could not be contacted, decision block 521 leads to decision block 524 in which recovery facility 70A checks to see if it must complete the recovery processing before rcturning to application 56A . This decision is based on information contained in the log record for the ) 3 ~ ~

logical unit of work written by the syncpoint manager during phase one logging. If it must complete recovery, the recovery process keeps trying to contact the failed resource (Step 525); if it ean complete the recovery at a later point in time, i.e. wait for recovery was previously selected, recovery facility 70A returns to syncpoin~ manager 60A with the intent of the recovery processing (i.e. commit or backout) and an indi-cation that the recovery will be completed later (Step 526) as described below in A~ chr~n~L~ R~ c ' 7,ation of a Commit Pro~e~.lc. When all resources are recovered (Step 525A), ~ cpoint manager 60A
returns to application 56A (Step SIS) with this information.

Figure 2 also illustrates that application 56A can be pan of a distributed application. This means there is at least one panner application that ean work with application 56A to eomplete it~ processing. To establish a distributed application, application 56A initiates a protected conversation whjch stans panner application 56D in system 50D by invoking the application program conversation initiate interface and in~ir~tes the con-versation is to be protected (FlG.Sa, Step 530). This request j5 handled by protected conversation adapter 64A. Protected conversation adapter 64A asks syncpoint manager 60A for the logical unit of work identif~er and includes it along with a unique conversation identifler in the information sent to the remote system 50D
Protected conversation adapter 64A then sends the request to the conversation manaBer 53A which sends it to communications facility 57A. Proteeted conversation adapter 64A gets an indication that the conversation initiate request was (or will be ) sent from communications facility 57A to communications facility S7D. At this time protected conversation adapter 64A registers with syncpoint manager 60A (Step 532). Asynchro-nously to this registration proce~s, the convt.~tion initiate request is transmitted to communication facility 57D, and then to conversation manager 53D, and then to proteeted conversation adapter 64D (Step 532 of FlG SA). Protected conversation adapter 64D retrieves the logical unit of work identifier and unique conver-sation identifier and registers with s~llepoint manager 60D on behalf of the conversation manager (Step 532).
Protected conversation adapter 64D at this time also gives syncpoint manager 60D the logical unit of work idcntifier it reeeived on the conversation initiate reque~t. Protected work done by application 56D will be associated with this logical unit of work originally started by application 56A (Step 532). The logical unit of work identifler will also be assigned to a new work unit for application 56D and application 56D is started.

Thu~, applications 56A and 56D are partner applications, and together they are called a distributed applica-tion. The protected conversation allows application 56A and 56D to send and reeeive data in a peer to peer manner. This means each side, application 56A or application 56D, can orig nate the send or receive which 2() ~) 3~, determined by the applieation writer and the paradigm being used by the communieation manager. As deseribed above, a protected con~lsation is registered with both syncpoint managers by proteeted eonversa-tion adapters 64A and 64D, r~ ly. During synepoint p~uCCssillg for the application that issued the first eommit, a protected conversation adapter ~ ,sel~ts a resource to the syncpoint manager that must respond if it can cornmit (first phase) and whether or not it sueeessfully pe~ led the work l~ ci,lcd (seeond phase). To the other proteeted co~ alion adapter reeeiving the first phase eall from its partner protected eonversation adapter, the protected con~ ation is a partner syncpoint manager over whieh it will receive phase one and phase two orders. Its loeal syncpoint manager acts lilce a re~ource manager, that is the pro-tected conversation adapter will get the results of what the syncpoint manager'~ resourees did (phase one and phase two). It should be noted that the syncpoint paradigm used provides rules for which application panner can issue the f~rst commit. In this e~ample, any applieation panner can issue the eommit first and this is determined by the distributed application design.

Application 56A gets eontrol with the indication that the request to start was successfully sent by cornmuni-eation faeility 57A. At this point applieation 56A is able to send reque~ts to applieation 56D and application 56A sends a request to application 56D over the established conv~ r~&lion. In this illustrated e~ample, this request eventually causes application S6D to invoke a file application prograrn interfaee to update file 78D.
As described above, the update request cau~es protected file adapter 62D to register with ~ cpoil~t manager 60D under the same work unit (previously assigned for application S6D (Step 532) when application 56D
was started) (Step 533). Also in step 533, application 56D ~ends a reply to application 56A over the conver-sation indicating that it completed its work. Ne~t, application 56A issues update requests for files 78A and 78B. As previously described, protected file adapters 62A and 62B had previously registered with ~yncpoint manager 60A and they each contact resource managers 63A and 63B to perforrn the updates(Steps 533 and S33A).

Application 56A now issues a cornmit 58A to ~ynepoint manager 60A (Step 534). As described above, syncpoint manager 60A eontaets reeovery faeility 70A for its pha~e one logging and is~ues a phase one 'prepare' call to eaeh registered resouree (Steps 534A and 535A). Proteeted file adapters 62A and 62B
behave a~ described above. When proteeted eonversation adapter 64A reeeives the phase one ~prepare~ eall, it sends an ~ntersystem arehiteeted 'prepare~ eall over the protected conver~ation it ~,ple3ents, i.e. the one originally established by applieation 56A to applieation 56D (Step S3S). Protected conversation adapter 64D

~()4~3~

recogni7.es this ~prepare' call and gives application 56D, which had issued a conversation message receive eall, a return code requesting it to issue a cornmit (Step 536). Application 56D then issues a comrnit 58D to syncpoint manager 60D (Step 537). As described above, syncpoint manager 60D contacts its reeovery facility, in this ease reeovery faeility 70D to force log 72D with phase one information (Step 538). Beeause applieation 56A issued the original corn~.ut request which eaused application 56D to subsc~lucntly issue a cornmit, and based on the two-phase eommit paradigrn used in this embodiment, syncpoint manager 60D's phase one state is ~Initiator Caseade, Synepoint Manager Pending~ (Step 538). Syncpoint manager 60D eon-taets proteeted file adapter 62D with a phase one ~prepare~ eail (Step 538). Proteeted file adapter 62D and its assoeiated resouree manager 63D perform phase one processing as previously described and retums a reply of ~request commit~.

In thi~ example, there were no interruption~ and decision block 539 leads to ~tep 540 in which ~I.cpoint manager 60D contacts recovery facility 70D to force log 72D to a state of ~Agent, Indoubt~. This state means that if an interruption subsequently oceurs sueh that synepoint manager 60D does not receive the phase two decision from synepoint Illana~l 60A, it would have to wait for l~ecove.y ploc~sslng from recovery facility 70A to eomplete its ~yllcpoinl process~g. Syncpoint manager 60D then eontaets proteeted eonversation adapter 64D with a reply of ~request commit~. Proteeted conv~.~ation adapter 64D then sends an intersystem architeeted ~request commit~ reply to protected conversation adapter 64A (step 541) whieh in tum replies ~request cornmit~ to syncpoint manager 60A (Step 542). As described above, syncpoint manager 60A reeeived ~request cornrnit~ from proteeted file adapters 62A and 6213 (Step 535A). Sinee there are no interruptions in the illustrated example. decision block 543 leads to ttep 544 in which synopoint manager 60A eontact~ the recovery facility 70A to force log 72A to a phase two state of ~Initiator, eornrnitted~ (Step 544). Syncpoint manager 60A then calls each registered proteeted resouree adapter with the phase two deei-sion of ~Committed~ IG. Sb, Step 545). Protected file adapters 62A and 62B process the cornrnit decision a described above (Step 545A). When protected conversation adapter 64A reeeives the commit decision, it sends an intersystem arehiteeted ~committed~ call over the proteeted con~ ation it l~iplc3ents, i.e. the one originally established by application 56A to applieation S6D (Step 546). Proteeted conve,~ation adapter 64D
receives the ~eommit~ eall and replies to synepoint manager 60D the phase two deeision of ~eomrnit~ (Step S47).

204~3'Z~

--2~--As described above syncpoint manager 60D contacts 1~ covt,ly facility 70D to force log 72D to the phase two state. Because application 56A issued the original commit request which caused application 56D to subse-quently issue a commit, and based on the two-phase commit paradigm used in this embodirnent, syncpoint manager 60D's phase two state is ~Initiator Cascade, CGII1JI illCd~ (Step 548). Syncpoint manager 60D con-tacts protected file adapter 62D with the phase two commit decision. (Step 549). Protected file adapter 62D
and it.s associated resource manager 63D perform commit processing as previously described and returns a reply of 'forget'. Since there were no interruptions (decision block SS0), syncpoint manager 60D contacts re~source facility 70D to log in log 72D a state of 'Porgct' for the syncpoin~ Iog record for this logical unit of work (Step SSI). 'Porgct' means that syncpoint plucessing is complete and the log record can be erased.
Syncpoint manager 60D then contacts protected conversation adapter 64D with a reply of 'forgct~. Based on the two-phase commit paradigm used in this embodiment, syncpoint manager 60D increments the logical unit of work identifier by one and returnl to application 56D with an indication that the comrnit completed successfully. (Step 552). Updatir,~ the logical unit of work identifier guarantees uniqueness for the ne%t logical unit of work done by the distributed application.

Ne~t, protected conversation adapter 64D sends an intersystem architected 'forget' reply to protected con-versation adapter 64A which in tum replie~ 'forget~ to syncpoint manager 60A (Step 553). As described above syncpoint manager 60A also receives 'forget' replies from protected fille adapters 62A and 62B (Step 54SA). Assuming there are no interruptions, decision block 554 leads to ltep 555 in which syncpoint manager 60A contacts recovery facility 70A to log in log 72A a state of 'forget~ for this logical unit of work.
Again based on the paradigm of the two-phase commit process being used, syncpoint manager 60A then increments the logical unit of work identifier by one (Step 556). This change guarantees a new unique logical unit of work identifier for the distributed application. Syncpoint manager 60A then notifies applica-tiûn 56A that the Commit request completed successfully. If during the two-phase commit p.uccdule, the ~yncpoint processing was interrupted in either ~yncpoinl manager 60A, syncpoint manager 60D or both recovery facility 70A and recovery facility 70D would i.npl ll,cnl a .~ co-~ .y operation which is ~ ~nted in the logical tlow by ~tepl 557,558 and 559,560 and i~ more fully delcribed below in l,og Name E%change For Recovery or Protected Rcsources, Recovery Focility For Ir.co".plete Syne Points For Distributed Awl;cation, and Asynel.ronou~s Raynchron;?o~on or a Commlt Proc~iu. c.

32~

_ ~q FIGURE 54 is an alternate embodiment to that illustrated in FIGURE 2 and can best be described by com-parison to PIGURE 2. In both FIGURE 2 and FIGURE 54, application env~ur,lnel.ls, system f~iliti~.s, and resource managers are distributed. However, in FIGURE 2 one physical device, system SOA, contains multiple application environments, S2A,B,C, two resource managers 63A,B, recovery facility 70A and com-munication facility 57A. ~IGURE 2 shows that System Control Ptogram SSA contains the conversation manager S3A and the Syncpoint ManaBer 60A,B,C. System SOA of FIGURE 2 can be a mainframe com-puter and configurations of this type are oflen called centralized computing. Also, PIGURE 2 shows appli-cation environments in system SOA connected to application ehvuon...ents in system SOD through a communication network. In contrast, FIGURE 54 shows each application en~;rDn.,Jent, system facility and resource manager in a separate physical machine. This configuration is called distributed computing. In this environment systems 90A,B,C, I IOE, 114F, and 120G are programmable workstations similar in function but not necessarily similar in size and power to systems 50A,D of FIGURE 2. The systems of I;IGURE 54 are connected by a communication network which, for e~ample, is a local area network (LAN). Application environments 92A,B,and C of FIGURE 54 are functionally equivalent to application environrncnts 52A,B, and C of FIGURE 2. However, each application envilonl..enl 92A, B, and C is contained in a separate programmable workstation. Each system control prograrn 95A, B, and C of PIGURE 54 is functionally equivalent to system control program S5A of 17IGURE 2. Each ~ystem control program 95A, B, and C
contains (a) a Syncpoint ManaBer IOOA, B, or C which is functionally equivatent to Syncpoint Managers 60A,B, and C, (b) e~ecution environment control programs 91A, B, and C which are functionally equivalent to e~ecution environment control programs 61A, B, and C, (c) protected conversation adapters (PCA) 104A, B,and C which are functionally equivalent to BCA 64A, B, arld C, (d) resource adapters (RA) 102A,B,C and 103 A,B,C which are functionally equivalent to resource adapter~ 62A, B, and (e) conversation managers 93A,B,C which are functionally equivalent to conversation managers 53A,B,C and cornrnunication facilities 97A,B,C each of which is functionally equivalent to communication facility 57A. However, in the e~arnple of FIGURE 54, the communication facility is part of each system control program 95A, B, and C and not in its own e~ecution environment. Also in FIGURE S4, resource managers 112E and 113F and their respec-tive filcs/log~ 115E,116E and 117F,118F are functionally equivalent to resource managers 63A and 63B and their re~pective f~les/logs 78A, 800A and 78B, 800B of l;lGURE 2. However, resource marlagers 112E and 113F arc each on separate prograrnmable workstations. Recovery facility 121 G and i~s log 122G in FIGURE 54 are functionally equivalent to recovery facility 70A and its log 72A in FIGURE 2.
~lowever,recovery facility 121G is in a programrnable workstation. System 50D of PIGURE 54 is the same - 2û4~32 _ 3 o--as system 50D of FIGURE 2 and i~ in.-luded to show the ~_. 'ity of the network. A dcsc.;~t;on of IJ lt y~v~n~ in thi~ ,..ur~ llt can be obtaincd by s~lb ~t;lul:n& the correct numbers from FIGURE 54 for the COIl.i .,onding numbers from FIGURE 2 as ju~t dclcribcd into the i,~llCyv.llt plu~.eD~ &
dcsc.;l)tion above. Thus, there are a wide range of cv~ ,r syDtems and n~ t~. ~..L~ in which the present tion can reDside.

~4~3 lt is possible in system 50A, PIG 2, for recovery facility 70A to become unavailable for a variety of reasons.
Accordingly, system 50A provides back-ups For e~arnple, if recovery facility 70A is part of an e~ecution environrnent which also controls a resource manager and the resource manager encounters a disabling failure, then recovery facility 70A will also become inoperational. In the e~ample illustrated in FTG. 28, system SOA
includes more than one e~ec.ltlon environrnent dedicated to a resource manager, and each e~ecution environ-ment containing the resource manager also contains a recovery facility program, although only one recovery facility in a system may be active at one time.

Specifically, I~IG 28 illustrates that in systern SOA there are three identical e~ecution environments 52E, 52F
and 52(-1 each containing a resource mana8er (program) 260A, 260B and 2fiOC, respectively. Preferrably, each resource manager 260A, 260B and 260C is an enhanced version of the Shar~d File System (Sl;S) re~cource managcr or the VM/SP Release 6 opcrating systcm (~M' is a tradcmark of the InM Corp. Or Armonk, N.Y.) and associated resources 262A, 262B and 262C, respectively. In addition, each e~ecution environment 52E, 521~ and 52G also contains a program 70A, B and C to provide the function of recovery facility 70A illustrated in FIG 23. An advantage of locating each recovery facility in an e~ecution environ-ment which includes the shared file system is that the shared file system includes services, i.e. cornrnuni-cation and tasking services, that the recovery facility can use. The cornrnunication services handle communication protocols, interrupt processing, and message management. In system 50A FIG 28, recovery facility 70A is initially identified to the system control prograrn as the recovery facilily associated with recovcry facility 108 72A when the e~ecution environment 52E is initiali~cd. This is accomplished by specifi-cation of a parameter as input to the e~ecution environment 52E's initiali7ation process. E~ecution environ-mcnt 521~ identifie~ itself to the system control program as the recovery facility and as the targct of all communica~ion in system 50A for the sync point log rcsourcc identifier. (Refer to ~ection 'I,og ~ame E~change for Recovery of Protccted Resources' for de~cription of term sync point log resource identifier.) This sync point log resource identifier must be unique in system 50A and can be associated with only one c~ecution environment at any time. In the illustrated embodiment, e~ecution environment 52E defines a nonvolatile storage area which contains recovery facility log 72A so that spccification of e~ecution environ-ment 52E automatically implies log 72A as the resource recovery log, absent an overruling specification of another storage area.

20~0322 _ ~, l lowever, if exccution env;.onl..enl 52Ei is not available, the user can activate recovery facility 70B or 70C as a backup and move log 72A to ~ ion c.~vuumnent 52F or 52G by specifying the aforesaid parameter at initi~ ion of e~ecution c.lv.lun,l.t..~ 52~; or 52G and specifying to the e~ecution envuu~ nt the location of recovery facility log 72A. The user specifies the location of log 72A by giving the system control program the nccess~ cu.. ~nd5 from the chosen e~e~.ulion cnvi.ùnlllcllt 521:~ or 52G to identify the location of the non-volatile storage area that contains l~co~_ly facility log 72A.

All the information that is needed by the rccovery facility to co...pl:1e res~,n.,l..ul~Lation after a ~ .point failure is contained in r~,c(,~ facility log 72A, and no info,.,..llion required for the s~--. poi,-l .~,o~,.y is contained in the ~ tion c nv;,urr"~,nt, resource ~ , or associated non-volatile stora~c. l herefore, any e~tecution cnvuunlll~nt with the resource ..,ana~. that contains the r~icov~.y facility program can act as the .O~C.~ facility 70A as long as the active recovery facility has acccss to log 72A. l he back-up transfer of the ..:~ov~.~ facility function to e~ unoll envirun.J,t..l 521; is jn(l;cated by communication path 272B, and the back-up transfer of the .e~.o~ facility function to e~ecution e.~v;~unun~l)t 52G is indicated by comrnu-nication path 272C.

Co.."..ulli~,dlion between any of the ;.~ ,oint ~ 60A, 60B, or 60C in any applic~lion cnviloru~cnt with the ~ .y facility 70 is acco.J.pLi,l.~ by using the sync point log resource id, nlif~c~ when ini~i~ting a con~ ation through the system control pJu~rll to the ~o~ facility.

~20403~2 LOCALA~D GLO~ALCOMMITSCOPESl'AILORED TO WORK UNIl'S

The foregoing flowcharts of Figures S A,B illustrate an exarnple whcre a single logical unit of work or commit scope extends to two application partners in different systems, for example, to resources and applica-tions in more than one execution environrnent in different systems, and the commit procedure is coordinated betwcen the two application partners. The following describes in detail this process as well as the ability of Systcm SOA to provide separate work units or comrnit scopes for the same application in the same execution environment. Thus, all systems 50 can tailor commit scopes to the precise resources which are involved in one or more relaled work units.

As noted above, a ~work unit~ is the scope of resources that are directly accessible by one application and participate in a common syncpoint. For example (in Figure 2~, thc resources coupled to resource adapters 62A and 62B and protected conversation adapter 64A are all directly accessible by application 56A
and therefore, could all have the same work unit. They wou~d all have the same work unit if they all were involved in related work requests made by application 56A. The work unit identifiers are selected by the systcm control program SS and are unique within each e2ecution environment. In the illustrated embod-irncnt, the system control program 55A comprises a conversation manager 53~, and an execution environ-mcnt control program 61 for each e%ecution environment 52. By way of example and not limi~ation, e~ccution environment control program 61A can be an enhanced CMS component of the VM/Sr P~cleasc 6 or~cra~ing systcm ("VM" is a trademark of IBM Corp. of Armonk, NY). This e2ecution environrnent control program controls the e~ecution of application 56A and, as noted above, assigns tlle work unit iden-tifications. Thus, the work unit identifications are ur~ique within each cxecution environment. The applica-tion uses the sarne work unit for multiple, related work requests and diff~rcnt work units for unrelated work requests. A 10gical unit of work~ identifier is a globally unique (network wide) identifier for all resourcel that are involved in related work requests and enco,l,passc~ all the related work requests. The logical unit of work identifiers are assigned by the recovery facility 70 of the system in which the work request originated and in this embodiment comprises:

(I) /~ network identifer which identifies a group of interconnccted systcms;
(2) /~ systcm iclen~ificr which identifies one communica~ion racility within the nctwork;

(3) An instance number that provides a locally unique clcment to the LUWII) (ror example, ~` A 2 040322 -- 3~-a timestamp may be used); and t4) A sequence number which identifies a particular syncpoint instance.

By way of e~ample, this is of the type defined by Systcm Nctwork ~rchitccture l,U 6.2 Rc~erence: Peer rro~ocols, ~SC31-fi8~8 Ch7~pter ~.3 Pre~cnt~tion Scrvicc~ - Sync Point verb6. The syncpoint manager fiO
requests the logical unit of work identif~er (LUWID) from the recovery facility when a protected conversa-tion is involved in the work unit or when a two-phase commit proccdurc will be required, even if the work request does not require a protected conversation. The LUWID may be requested by the resource adapter by calling the syncpoint manager, or by the syncpoint manager by requesting an LUWID at the beginning of commit processing if one has not been acquired yet and it is needed for the commit. As described in more detail below, a work unit is associated with a LUWID when protected resources such as a protected conver-sation or multiple protected resources are involved in the work unit. ~ work unit can include a mixture of multiple files and multiple file repositories, other protected resources and other participating resource man-agers, and protected conversations between different parts of a distributed application. In ~he case of a pro-tected conversation, a sing]e logical unit of work e~tends between two or more application panners, even though each application partner assigns a different work unit (within each e~ecution environment) to the sarne protected conversation and to other resources directly accesscd by this application. Thus, each applica-tion panner a~sociatcd with a protected conversation asiigns and uscs its own work unit locally, but the work units of the two or more application panners refer to the same distributed logicai unit of work. It should be noted that each execution environment is ignorant of the work unit identi~lcations assigned by the other e~ecution environrnent, and it is possible by coincidence only that work units in diffcrent e~ecution environments have the sarne identifier. Work units with the extended scope described above, rather than LUWlDs, are used to define local cornmit scopes because e~isting applications can benefit from the extended function with a m~n~mum of change. Changing from work units to LUWlDs would be cumbersome and would require existing applications to change.

~ igures 6-9 illustrate, by e~ample, a process for establishing different work units and logical units of work for the same application 56A, and anothcr logical unit of work which e~tcnds to multiple resourccs as~oci-ated wi~h a plurality of application par~.ners 56A and 560 running in dirfcrent systems SOA and 50D, respec-tively. In the illustrated example in Pigure 7, application 56A is initiated and obtains a work unit identifier ~2~40322 _ ~S--X rrom execution environment control program 61A (Step 928). The e~ecution cnvironment control program is rcsponsible for selecting a unique work unit identirter within cach execution cnvironment. l hen, application 56A makes a work request to resource adapter 62A within execution environment S2A to update a filc located in resource 78A specifying that the work request is to be madc under work unit X, or by default, the work request is assigned to be under a ~current work unit~ designated by application 56~ (Step 93()). If the resource adapter requests the LUWID for work unit X (Decision Block 93S), then syncpoint manager 60A requests a LUWID from recovery facility 70A to encompass work unit X if one is not already assigned and associates it with work unit X. Then the syncpoint manager teturns the LUWID to the resource adapter (Step 936). In the illustrated e~ample in Figure 6, resource 78A (accessed via resource adapter 62A) is not a protected conversation so Decision Block 937 (~igure 7) leads to Step 939 in which the resources are updated. If resource adapter 62A was not previously registered for work unit X (Decision nlock 933), lhen resourcc adaptcr 62A rcgisters with syncpoint managcr 60~ (Step 934). In thc forcgoing e~ample, application 56A does not desire to perform additional work under the same work unit (Decision 13lock 940), and does not desire to do new unrelated work (Decision Block 941), so the next step is for application 56A to issue a comrnit (Step 942). In response, syncpoint manager 60A initiates the one-phase commit proccdure (Step 944). However, it should be noted that application 56A is not required to issue the comrnit for work unit X before beginning some other unrelated work request (Decision Block 941). In this particular case, the syncpoint manager is performing a one-phase commit procedure and so, does not need a LUWID.

In the illustrated e~amplc, application 56A next begins the following process to do work independcntly of work unit X. Application 56A requests a new work unit from execution environment control program 61A, and e~ccution environmcnt control program 61A returns work unit Y (Stcp 928). Ne~t, application 56A
makes a request to update resource ?8B via resource adapter 62B under work unit Y (Step 930). If the resource adapter requests the LUWID for work unit Y (Decision 131ock 935), syncpoint manager 60A

obtains from recovery facility 70A a LUWII) and associates it with work unit Y (Step 936). At this time, the logical urut of work for work unit Y extends onJy to resource manager 63B. I~lext, an update to resource 78B is implemented (Step 939). Since resource adapter 62Ps has not ye~ registered for work unit Y, it regis-ters with syncpoint manager 60A (Step 934).

1~`A2040322 l~e1~t, application 56A desires to do additional work under the same work unit Y (Deeision Bloek 940) e.g. to make ehanges to data in other resourees. In the example illustrated in Figure 6, the other resouree is a protected conversation, and the proteeted conversation is used to aecess resourees in system SOD via dis-tributed appliea~ion partner 56D. In the illustrated e~ample, this is the beginning of a new proteeted conver~
sation. Thus, application 56A initiates a new protected conversation with appliea~ion 56D under work unit Y (Slep 930). Because protected conversation adapter fi4A requests the LUWID for work unit Y, the syncpoint managcr invokes the recovery facility if a LUWID has not yet been assigned and associated with the work unit, and returns the LUWID to the protected conversation adapter (Step 936). (The protected conversalion adapter will need the LUWID when the conversation is initiated (Step 947).) Decision Block 937 leads to Decision Block 946. Because this is a new protected conversation, conversation manager 53A
initiales a protected conversation and sends the LUWID associated with work unit Y to a cornmunication facility (Step 947). In the illustrated e~ample, where application partner 56D resides in a diflerent system, eommunication facility 57A is utllized. However, it should be noted that if the applieation partner resided in another e~eeution environment, for e~ample 52B, within the same system 50A, then the communication function is provided by conversation manager 53A of system eontrol program 55A, without involvement of eommunication facilily S7A. When protected conversation adapter 64A receives control back from eonversa tion manager 53A and the protected conversation initiation request was indicated as sueeessful, proteeted conversation adapter 64A registers with syncpoint manager 60~ (Step 948) and gives eontrol baek to appliea-tion 56A. At this tirne applieation 56A sends a message to applieation 56D requesting the update of resource 78D (Step 949). However, the message is buffered in system SOD until applieation 56D is initiated.
After the message is sent, application 56A has no more work to do (Decision Blocks 940 and 941) and issues a eommit on work unit Y (Step 942). Synepoint manager 60A initiates a two-phase commit procedure (Step 944).

When system eontrol program 55D reeeives the eonversation initiation request from eommunieation faeility 57A via eornrnunication faeility 57D (Step 960 in Figure 8), system control program 55D initiates e~ecution environment 52D (Step 962). Protected conversation adapter 64D obtains new work unit Z for e~ecution environment 52D in which application 56D will run from e~ecution environment control program 611). This work unit is unique within e~teculion environment 521~. Als), protected eonversation adapter l,d2040322 _ ~, 640 teLls the syncpoint manager to associate the LUWID received with the initiated conversation to the new work unit, and then registers with syncpoint manager 60D under the new work ur~it (Step 966). (The flow of the conversation initiation request in Step 947 is from protected conversation adapter 64A to conversation mana~er 53A, to communication facili~y S7A, to communication facility 57D, to conYersation manager 53D, and to protected conversation adapter 64D.) Application 56D is then started.

Ne~t, application 56D makes a work request in Step 930D, and in the illustrated e~ample, the first work rcquc~t is to rcccive a mcssage on the conversation. Because the protectcd converSation alrea~ly has the l,UWID, Decision Block 935D leads to Decision Block 937D. necause this is a protec~cd conversation but not a new outbound protected conversation (i.e., not an initiation of a new protected conversation), Decision nlocks 937D and 946D lead to Step 949D in which the message is reccived by application 56D.

In lhe illustrated e~ample from l~igure 6, the protected conversation causes application 56D to perfomn adclitional work e.g. update a file within resource 78D (via resource adpater fi2D) and therefore Decision 131Ock 940D Icads lo Step 930D in which application 56D makes a work request to update resource 78D
using work unit ~;. If the resource adapter requests the LUWID (Decision Block 935D), the syncpoint managcr returns the l,UWII) to lhc resourcc adapter (Step 936t)). It was not neccssary to invoke thc recovery faciLity to assign the LUWIO since it was already assigned and associated with the work unit in Stcp 966. Because this work request does not involve a protccted conversation resource, Decision Block 937D leads to Step 939D in which resource 78D is updated according to the work request. I~ecause resource adapter 62D was not previously registered, Decision Block 933D leads to step 934D in which resource adapter 62D is registered with syncpoint manager 60D. Application 56D now needs to detemmine when application 56A requests the commit of the work. This is accomplished by application 56D by doing a receive (work request) on the protected conversation. Application 56D will get a retum code of Take Syncpoint when application 56A has issued the cornrnit. Therefore, Decision Block 940D leads to Slep 930D in which application 56D issues a receive on the protected conversation under work unit Z.
~Sincc prolected resource adapter 64D does not need the LUWlt), (l~ccision Block 935D) and the work requcst involves a protected conversation (Decision Block 9371)) and lhc protected convc,Sation is not a new outbound conversation (Decision Block 9461)), the reccive is done (Step 949D). Since appLication 56D has --~&--no additional work to do on work unit Z, Deeision Block 940D will lead to Deeision Bloek 941D. When application 56A has issued the eommit (Deeision Bloek 941D), application 56D will get a Take_Synepoint return eode on the reeeive, and issue a eomrnit (Step 942D). NeAt, Syln pG;I~l Manager 60D will initiate the eomrnit PIU~GJUIG (Step 944D). In the illustrated eAample, this eoneludes the work request z~so~;D~e~d with work unit Z, and Deeision Bloek 950D leads to the end of applicalion 56D. At this tirne, applieation 56A
reeeives eontrol baek from ~ylltpoil~ ana~r 60A and ends.

I~igure 9 (and Pigures 3 - 5 above) illustrate the timinB of the eommits in eAeeution envi,.,n...~ ntD S2A
and 52D aecording to the eAample used in this invention. When the protected co..~ .Oalion is in a send state relative to GA~ liOI1 c..~,~ur~ .lt 52A, applicdtion 56A issues a commit for work unit Y, as pl~ ~;OUDIY
described in Step 942 (Figure 7). When e, ~ ~.ul;r n Cnvilulllll~nt 52D is in receive state for the proteeted con~ .Dalion, it reeeives a message along with a retum eode of Take S~--cpo;nl from GA~ulion en~i~un~ ,n 52A. It should be noted that after reeeipt of the Take S~llcpvi.~t retum eode, application 56D should issue a tommit as soon as possible because this return eode inrlir~t~s that application 56A has issued the eomrnit and is waiting for eAc. ulion cl.~l.unlll~nl 52D to issue the eGIl~;Dponding eommit. Thus, after receipt of the message on the p-ule.,hd con~ and the retum eode, appli~lion 56D CGIll~ Q work on other pro-tected .eso~.~D assoc;~t~ d with the work unit in System 50D to get those other resourees into a eonsistent state. After this is done, sueh that all I~DOUI~GS in System 50D aQQor;qt~d with the work unit Z are con-~istent, appli~ation 56D issues the eommit. Ne~t, s~.lc~,int manager 60A and 60D i--.pl~ nt ~ ,e~
two-phase cornmit ploc.Ju~iD for l~soulceD directly accessed by the respective applications 56A and 56D.
Even though separate cvllllllil~ are invoked to commit those resources which are direetly accessed by the appli~ ?1ionQ, during the two-phase eomrnit pluccoD;n~ eaeh sy-.cpoi..l ll.anagcr repons s~l.epoinl status il~-...ation to the other s~...,po..~ .ana6~r. For a more detailed de;,. Iiption of s~llcpou~t p.v~css~g, sce Co~rdinatcd Syne l'oint M~ ~. t of P~ led Resources.

-~040~22 REGISTRATION OF RESOURCES FOR CoMMlT PRnCEDURE

FIG. 10 schematically illustrates automatic and generic registration of resources, where registration is a facility that identifies protected resources to synchronization point manager (SPM) 60. In each application execution environment 52, the resource adapter 62/64 and the SPM 60 participate in registration on behalf of the application 56. In the illustrated embodiment, the resource manager 63 and the resource 78 are located outside of this environment.

In FIG. 10, the application 56 is shown as having two parts, a work request and a commit request. Both parts usually execute in the same application execution environment. However, a broken line between the two parts is shown in the figure to indicate that the application may be distributed and that the two request types may originate from different environments.

Assume that an end user starts application 56 by invoking the start facility of the system control program.

The start facility buiIds the application execution environment 52, and loads and transfers control to the application 56. When the application 56 starts to execute, -there are no resources 78 yet registered with SPM 60.

- 20~-322 When the application 56 in FIG. 2 makes a work request (steps 500/530 in FIGS. 3/5(A)) to use a resource 78, this request invokes a specific adapter 62/64 associated with the resource 78. The general function of the adapter 62/64 is to connect the application 56 to the resource manager 63.
In system 50 the resource adapter 62/64 is extended to include a registration sub-routine that automatically registers in the sync point manger 60, and an adapter sync point exit entry point that supports the two-phase commit procedure.

The work request entry point indicates code lines in the adapter 62/64 that pass the work request (ex. to open a file, insert records into a data base, initiate a conversation, etc.) from the application 56 to the resource manager 63. These code lines also interact with the registration sub-routine in the adapter 62/64 to do automatic registration. Registration informs SPM 60 that the resource 78 is part of a work unit. Also, registration identifies the resource manager 63 to SPM 60. This consists specifically of telling SPM 60 the adapter sync point exit entry point, and the resource manager's object recovery resource identifier.

The adapter sync point exit entry point indicates code lines within the resource adapter 62/64 to be used by the SPM 60's two-phase commit facility when a commit request is made (Steps 506/534 in figs. 3/SA). The object recovery resource identifier is the identifier used by the recovery facility 70, described in the below section entitled "Log Name Exchange for Protected Resources" (Step 225 of FIG.
26), to initiate a conversation with the resource manager 63 in the event of a failure during the SPM 60 two-phase commit process.
The process initiated by a work request to any resource adapter 62/64 to handle automatic registration for the application 56 is resource dependent. The resource 78 to be used can be inherently protected regardless of the nature of the work request, and if it has not yet registered, the adapter 62/64 uses its registration sub-routine to automatically register the resource with SPM 60 for the application 56. Alternately the adapter 62/64 may not know if the resource 78 is protected. The resource manager 63 may have this knowledge. In this case, the adapter 62/64 may register and pass the work request to the resource manager 63. The resource manager 63 may do the work request and return to the adapter 62/64 with an indicator whether the resource 78 requires or does not require protection. If protection is not required, the adapter 62/64 may use its registration sub-routine to unregister with SPM 60. Or the adapter 62/64 may determine inherently from the work request or from the resource manager 63 that the resource will not be changed by the application 56; that is, the resource is used only for read. For this case, the adapter 62/64 may use the registration facility of SPM 60 to change the registration to read-only. Finally, the adapter 62/64 may determine that the resource 78 is a read-only resource or an unprotected resource that should be made available to other applications as soon as possible. In this case, the adapter may remain registered in order to obtain the prepare order during a two-phase commit procedure. The resource adapter 62/64 can then use the order as a cue to unlock the resource 78. In this case the adapter 62/64 may respond "prepared"
and "commit" to the orders from SPM 60.

By supporting unregistration and change of registration, as described in more detail below, the adapter 62/64 can give information to SPM 60 that allows for optimizing the two-phase commit procedure (also, as described below). When the application 56 issues a commit request, the SPM 60 may realize that only one resource is registered as having been changed (either no other resource is registered, or all other resources are registered as read-only). For this case the SPM 60 may use the more e f f icient one-phase commit process.

Now consider the foregoing general control flow as applied to a specific example where application 56A of 204~22 ~3 FIG. 2 is executing and makes a work request for a protected conversation with a partner application 56D (Step 53~ of FIG. SA). The request is processed by protected conversation adapter 64A which is one type of resource adapter. This adapter uses its registration sub-routine to invoke the registration facility of SPM 60A (Step 532).
~'ext the adapter 64A ùses communication facility 57A, which acts as a resource manager, to initialize the partner application 56D. As illustrated in FIG. 2, the conversation manager 53A is capable of starting a partner application on the same system 50A, or of communicating with a counterpart communication facility 57D on another system 50D via communication facility 57A to start an application within system 50D. In the latter case, the partner application runs on system 50D and the communication facility 57D starts the partner application 56D by invoking the system control program 55D's start facility. This facility builds the new application execution environment 52D for the partner application 56D. Since the start facility knows that it is buiIding a partner application 56D, it knows that the communications facility 57D will be used in the protected conversation with the originating application 56A. Thus, the start facility temporarily acts as the partner application 56D and invokes the resource adapter 64D for protected conversations. Then, adapter 64D

registers the protected conversation with the SPM 60D.
Thus, the partner application 56D's protected conversation with the originating application 56A is registered prior to the invocation of the partner (alternatively, the registration could be delayed until the partner application 56D uses the conversation with the application 56A). Thus, in FIG. 2, the SPM 60A within execution environment 52A of the application 56A and the SPM 60D within the execution environment 52D of the partner application 56D are each informed of the protected conversation resource.

At this point in the discussion in FIG. 2, the application 56A and the partner application 56D are each executing in their own execution environments 52A and 52D
under respective work units, and each may use one or more protected resources 78A or 78D. Each may, for example, use protected files. When the application 56A makes a request to use a file resource 78A, the file resource adapter 62A is invoked. The adapter uses its registration sub-routine to invoke the SPM 60A registration facility. Then the adapter invokes the file resource manager 63A. Thus, again, application 56A's usage of a protected resource 78A is automatically registered. Analogous registrations can be made in execution environment 52D for one or more resources such as resource 78D.

20~0322 _4~-From the above examples we see that this embodiment of registration is generic because registration does not depend on resource type. In FIG. ~0, any resource manager 63, that wants to support a protected resources 78 may add the registration subroutine to its resource adapter 62/64. No changes would be required to the system 50 sync point support.

In FIG. 10, the application 56 may also use non-protected resources. For example, the application may want to create a non-protected partner application that periodically displays messages about the work being done, where the display need not be synchronized with the actual completion of work. For this case, the application 56 makes a work request to have a non-protected conversation. The control flow is much the same as for a protected conversation in the above example. The only difference is that the resource adapter 64 knows from information in the work request that the conversation is not protected and in the illustrated embodiment, does not register with the SPM
60. Thus, the non-protected conversation will not participate in the synchronization point processing of SPM
60.

In FIG. 10, given the registration process described above, whenever the application 56 issues a commit request, the SPM 60 has a complete list of protected resources that need to be synchronized. See the foregoing section entitled "Coordinated Sync Point Management of Protected Resources", where the two-phase commit procedure in SPM 60 is described.
This shows how SPM 60 uses the adapter sync point exit entry points in the resource adapter 62/64 to use the sync point support in the resource managers 63. Although not shown in FIG. 10, the application 56 may issue a back out request.
For this case, the SPM 60 gives a back out order to the adapter sync point exit entry point in the resource adapter 62164.

At the end of the synchronization point process, each SPM 60 does not destroy the application 56's registration list. It does, however, invoke the resource adapter's exit one more time for post synchronization processing. For this invocation, the adapter may decide to modify its registration. For performance reasons, the adapter may keep the resource registered until the application 56 ends. On the other hand, if the adapter knows that the resource 78 wil1 no longer be used (for example, a protected conversation may end before the application 56 ends), the adapter may use its registration entry point 62 to unregister with SPM 60.

The control flows above assumed distributed resource managers 63. Thus, any request to use a resource 78 aIways went to the appropriate resource adapter 62/64 which, in turn, invoked the registration facility in SPM 60 and the work request in the distributed resource manager 63.
However, for the case where the resource manager 63 is not distributed, the adapter need not get involved with a work request. For this case, since resource manager 63 and SPM
60 are in the same application execution environment 52, the resource manager 63 may directly invoke the registration facility in SP~ 60.

: ~x: ~
- 2~4~322 In the illustrated example of FIGURE 12, application 56A
makes multiple work requests. TheY are processed bY system 50A concurrently and involve more than one resource ``
manager and resource. SPecificallY for the examPle.
application 56A makes eight work requests for two work units, C and D. that are processed concurrentlY bY sYstem 50A. The commit Points~ shown in FIGURE 13. are at times 19 and 44 for work unit C and at time 33 for work unit D.
The time units in FIGURE 13 are logical clock units denoting sequence tnot PhYsical clock units). In the illustration of FIGURE 13. events occurring at the same time implies that their order is not important.

A work unit is an application's understanding. or scope, of which resources participate in a synchronization point.
An apPlication can specifY for which work unit changes to protected resources are made. An application can also specify under what work unit protected conversations are initiated. System 50A Permits multiPle work units in the application execution environment (52A in FIGURE 12).
Specifically~ aPPlications~ sync Point manager 60A. and protected adaPters te.g., SQL Resource Adapter in FIGURE
12) can suPport multiple concurrent work units. System 50A also permits tYin~ together the work units of two application execution environments via a protected conversation. Each work unit can have a series of synchronization Points. A synchronization Point request to a work unit does not affect activity on other work units in an application's environment.

Consider the following examPle illustrated in FIGUREs 12 and 13. Mr. Jones of Hometown wishes to make a transfer to his son~s trust fund. The security dePartment for Mr.
Jones' bank keeps track of all People involved in any 2~40~22 ~9 transaction including both customers and employees The security log and financial records are not in a mutual ~all or nothing~ embrace but the two work units may need to be processed concurrentlY--one reason could be that response time would be too slow if the two work units were processed serially In the illustrated examPle> the work request for work unit C at time 1 involves resource manager 63A which controls the security log in the bank's headquarters in Chicago Unprotected conversation 1 is used by resource adaPter 62A
to communicate with resource manager 63A The work request for work unit D at time 1 also involves resource manager 63A in Chicago for Mr Jones' trust fund while the request at time 7 is to resource manager 63B in Hometown where Mr Jones other financial records are kePt UnProtected conversation 2 is used by resource adapter 62A to communicate with resource manager 63A and unprotected conversation 3 is used by resource adapter 62B to communicate with resource manager 63B

When Pplication 56A writes its first record a ~start security event~ 0essage, using work unit C, (Step 61Z in FIGURE 14) resource manager 63A registers via its resource adapter 62A in application execution environment 52A
Sync point manager 60A builds a registry entry for resource manager 63A in FIGURE lZ table 126 under work unit C (Step 614) This entry contains the parameter list to pass to the exit for resource adapter 62A which includes the routine name of the exit and a special and private value that resource adapter 62A passed on registration The resource adapter exit can use the special value to locate its control block for conversation 1 -., ~.. , `~

Consequently, when apPlic-tion 56A requests a commit at time 19 for work unit C, sync point manager 60A reads table 126 to determine which resource adapter exits should be notified to initiate the commit Procedure In the illustrated embodiment, at time 19 when commit is requested for work unit C, synchronization point manager 60A calls the exit routine for resource adapter 62A to initiate a one-phase commit procedure since only one protected resource is registered; resource adaPter 62A's exit routine knows to use conversation 1 to communicate with resource manager 63A since it receives from synchronization point manager 60A the sPecial value saved in table 126 during registration Registration is subsequently avoided (SteP 613) at time 26 when logging the emPloYee id of the bank clerk handling Mr Jones' tr-nsaction Re-registration is not required because sync point manager 60A already knows from the work unit registration tablQ 126, th~t resource manager 63A is participatin~ in work unit C Consequently, the processing of each work request for work unit C after the first work request and the subsequent commit at time 44 is exPedited Also, at each synchronization point for work unit C, only resource adapter 62A and resource manager 63A
are notified; there is no time wasted notifying other resource adaPters or other resource managers When application 56A makes work requests at times 1 and 7 under Work Unit D, both resource adapters 62A and 62B
register with sYnC point manager 60A which adds registrY
entries 63A and 63B to table 127 When the first security log commit is done at time 19, the trust fund update at time 17 is not affected in any way When the trust fund and financial records are committed s,~

~i- 2040322 ~ I--at time 33, the clerk-id message is not affected either.
Note that resource manager 63A in Chicago i5 not confused since it is communicating on two separate conversations, 1 and 2, to apPlication 56A.

The development of a resource adapter is simplified because system 50A knows which work units are active for the resource manager, relieving the resource adaPter of that task. Since the design is simple the resource adapter exit performs well; it has everything it needs and simply sends sYnc point manager 60A's actions to its resource manager. Another performance perspective is that sync point manager 60A can optimize synchronization Point procedures because it knows for which work units the resource manager is active, avoiding the overhead of calling resource adaPters or resource managers for resources which are not involved in synchronization points.

In system 50A, there may be occasions when the tYPe of work request made on a protected resource, such as a shared file or database, changes the state of-the resource such that the registration information should be changed. This is imPortant because an original work request may be a read-only request and require only a one-phase commit procedure, but a subsequent related work request under the same work unit may be a write request and require a two-phase commit procedure in order to coordinate the multiple protected resources involved.

As another example illustrated in FIG. 3, an application 56A tyPically makes one or more read requests on a file before making a write request in order to locate a particular record in the file to uPdate. Such read operations can be imPlemented using a one-phase commit .. . .

~ ~ 2~4~322 procedure in which case, upon receipt of the read work request by resource adapter 62A (Step 500), the resource adapter registers with syncPoint manager 60A for read mode (Step 50Z). It should be noted that during subsequent read operations, the resource adapter 62A need not interact with syncpoint manager 60A because there is no change in the type of commit procedure that is required. However, when aPPlication 56A subsequently makes a write request to resource adapter 62A under the same work unit (SteP
504), resource adapter 6ZA changes its registration status with syncpoint manager 60A to write mode. As described in more detail below, the rather time-consuming two-phase commit procedure will be used if more than one protected resource is registered for write mode on the same work unit.

This example of registration change is illustrated in detail by the flow chart of FIG. 11. When the work request in step 580 is the first one for the protected resource and the request is read-only, decision block 581 leads to decision block 582. It should be noted that the resource adapter 62A keePs an internal indicator for each resource under each work unit for which it has already registered.
This indicator is tested in decision block 581. The resource is not a protected conversation, therefore decision block 582 leads to decision block 583. Because the work is read-only, decision block 583 leads to steP
585. In steP 585, the corresPonding resource adapter 62A
registers as a read-only resource. When the next work request to step 580 is to write into, or update, the same resource under the same work unit, decision block 581 leads to decision block 584 because the resource adapter 62A
previously registered in step 585, albeit for read mode.
Decision block 584 leads to decision block 586 because the ~- 2o40322 resource is not a protected conversation, and decision block 586 leads to decision block 588 because the request is for update mode. Next, decision block 588 leads to step 590 where the resource adapter 62A (which had previously registered in steP 585 for read mode) changes its registration within syncpoint manager 60A to write mode.
It should be noted that according to FIG. 11, if the first work request under a work unit for the resource is write mode, then the resource adapter 62A registers for write mode in step 592.

There is also the situation of a resource manager 63 which has completed a sync POint and has had no further requests since comPleting that sync Point. Its resource adapter 62 is allowed to modify its registration status to "suspended", at the comPletion of a sYnc point procedure, so that the sync point manager 60 will know that resource manager 63 is currently not participating in any sync points for the work unit. The suspension of a write mode resource may permit sync point manager 60 to optimize a subsequent commit procedure (one-Phase commit~ for the remaining resources when, for example, there is only one other write mode resource in the work unit. If the suspended resource adapter 6Z receives a new work request for the work unit, it can reactivate its registration through the same registration modification function.

The designs of certain resource managers require that their resource adapters register early in their interaction with the application in order to be notified of distributed sync Point activities. However, they may not have a comPlete set of registration information at that time. For examPle, the Protected conversation adaPter 64A
needs to register at the point that it initiates a 2~32~

protected conversation with a partner application 56D
because it needs to know if a sync point occurs. yet it will not have all registration information until the conversation partner accepts the conversation. an event which may occur much later. This information can be added later under the foregoing change of registration process illustrated in step 590.

System 50 provides additional time-saving techniques in the registration process. When each resource adaPter 62 registers a first time with syncpoint manager 60. it registers information in addition to the identification of the resource manager 63 and the resource adapter exit routine name for sYnc Point processing. Much of this additional information usually does not change when the registration changes. Consequently, this additional information is not re-registered when the registration changes in step 590 for a resource adapter 6Z. The following is a list of some of the additional information which the resource adapter 62 registers only once with the syncpoint manager and which does not change when other registration information changes:

1. Resource and network identifiers which describe where the resource manager and resource are located in the system and the network;

2. Product identifier which indicates the product and thus the tYPe of resource--e.g., shared file. database.
protected conversation etc.; and 3. Additional data which is required for resynchronization.

0 ~ 0 3 ~ 2 Because this additional information is not re-registered each time, the registration process is expedited.

There are a variety of occasions when an aPplication can or will no longer use a protected resource. Examples include such events as end of aPplication, termination of a resource manager, or unavailability of the path to the resource manager. There may be appl~cation ~ resource manager protocols which allow the application to declare a resource to no longer be in use. The application execution environment may support protocols which make it appropriate to unregister resources prior to end of application. Protected conversations may also terminate due to aPplication action or due to an error condition such as a Path failure. Upon any such occasion, it is preferable for the resource adapter or protected conversation adapter to unregister àll applicable instances of the resource from the syncpoint manager because such unregistration will make subsequent syncpoint processing more efficient (fewer resources to consider and probably less memory consumed) (step 618 of FIGURE 14).
In addition, the resource adapter or protected conversation adapter can delete any control information about the registered resource and thus be more efficient in its subsequent processing.

FIGURE 15 shows the flow of unregistration activity when a resource adapter 62 or a Protected conversation adapter 64 discovers that a resource 78 or protected conversation is not available ~step 904~ or that the application has ended (step 903). Note that the adapter would tyPically discover that the resource is not available while processing an application work request (step 902). The adapter would determine from its own resource registration 20~032~

status information what registered resources should be unregistered (step 906). For each such registered resource, the adapter would call the syncpoint manager 60 to unregister the resource (steP 907). Note that the adapter must identify the resource and the work unit to the syncpoint manager 60.

In FIGURE 15, for each call to syncpoint manager 60 (steP
910), the syncpoint manager 60 uses the adapter-supplied work unit identifier to locate the work unit resource table (step 911). Within this work unit resource table, the syncpoint manager 60 uses the adapter-supplied resource identifier to locate the desired resource entry (step 912). The syncpoint manager 60 then flags the resource entry as unregistered ~step 913) and returns to the calling adapter (step 914 back to step 907). However, the syncpoint manager 60 cannot yet erase the unregistered resource entry because the resource entry logically contains error information which must be preserved until the next synchronization Point (see "Coordinated Handling of Error Codes and Information Describing Errors in a Commit Procedure").

The adapter can now delete its control information (or otherwise mark it as unregistered) about the unregistered resource (steP 908). Note that an event which causes unregistration may cause multiPle resource registrations to be deleted (for example, a resource may be registered for multiple work units). Thus, steps 906, 907, and 908 can be a program looP to handle each aPPlicable Previously registered resource. At this point, the adaPte~ can return to its caller (step 909). If the work request has failed due to an unavailable resource, the adapter can rePort the error condition to the aPplication by whatever mechanism ~ ~2040322 . , the resource adaPter has chosen to return error information to its aPPlication users.

The resource adapter may have other processing considerations as a result of the unavailable resource or the application termination. For examPle, if the una~ailable resource condition will cause the backout of resource updates, the adapter will need to notifY the application and~or the syncpoint manager 60 that the next syncpoint on the applicable work unit(s) must be a backout.
This condition during syncpoint processing requires the adapter to notify syncPoint manager 60 of the resource status (which is backing out~. There may be other resource, environment, or implementation dependencies.

Syncpoint manager 60 is now concerned with handling the flagged unregistered resources ~from steP 913) so that theY are ignored for normal oPeration and so that they are eventually erased. SYncpoint manager 60 can erase flagged unregistered resource entries at the beginning of the next syncpoint for the affected work unit. FIGURE 16 describes the syncpoint process flow within syncpoint manager 60.
When the next syncpoint process reads the registered resource table (step 622), it can erase any flagged unregistered resource entries in that table (an action not shown in FIGURE 16). Because step 622 builds all syncPoint resource participation lists for the duration of the current syncpoint process, resource unregistrations and modifications of resource registrY entries by adaPters will not affect the current syncPOint process. At this point, the total unregistration process is complete.

, , OPTIMIZATION OF COMMIT PROCEDURES

Each participating resource manager is capable of performing the two-phase commit procedure, such as the two-phase commit procedure described by SYstem Network Architecture LU 6.2: Peer Protocols. SC31-6808, ChaPter 5.3 Presentation Services - Sync Point verbs, and may or may not be capable of performing the one-Phase commit procedure. The two-phase commit procedure is important to protect resources; however, the two-phase commit procedure is a relatively comPlex and time consuming process compared to the one-phase commit procedure. For example, as described in more detail below, the two-phase commit procedure requires the time-consuming step of logging information about the sync point participants in the recovery facility log 72 (FIG. 2), whereas the one-phase commit procedure does not require such logging.
Also, the two-phase commit procedure requires two invocations of the resource adapter coordination exit to perform the commit, whereas the one-Phase commit procedure requires only one such invocation to commit data. A
"resource adapter coordination exit" is the mechanism for the sync POint manager 60 ~FIG; 2) to provide information to the associated resource manager. The sync POint manager utilizes the two-Phase commit procedure only when necessary to make the system operate as expeditiously as possible. In summary, the sync point manager utilizes the two-Phase commit procedure whenever a protec~ed conversation is involved, or at least two resources are in uPdate mode, or one or more Participating resource managers is not capable of performing the one-Phase commit `- 20~032 - ~4--procedure. Whenever all resources are capable of performing the one-phase commit procedure and no more than one resource is in update mode, the sync point manager utilizes the one-phase commit procedure. Also, if any resource is in read-only mode such that the data in the resource is read and not uPdated and the resource manager is capable of performing the one-phase commit procedure.
then a one-phase commit procedure is used for this resource regardless of the type of commit procedure used for the other resources. A keY comPonent of this optimization is the resource manager's ability and resource adapter's ability to determine prior to the synchronization Point its state defined by the work request, that is, whether the resource is in read-only mode or in update mode. When a resource is in read-only mode, it means that the application has only read data from the resource. When a resource is in update mode, this means that the application has changed the data in the resource.
The optimization process begins as follows.
Application 56 ~FIG. Z) makes a work request to a resource ~steP 61Z of FIG. 14). If this is the first work request for a particular work unit ~decision block 613 in FIG. 14), the resource adapter 6Z ~FIG. Z) associated with the resource registers with the synchronization point manager the fact that it is now an active, participating resource for the work unit ~step 615 in FIG. 14). One of the pieces of information about the resource that must be provided at registration time ~step 616 in FIG. 14' is whether the associated resource manager is capable of performing the one-phase commit procedure, e.g., is the resource a database manager which under certain circumstances could perform a one-phase commit procedure. Also during registration, the resource adaPter records with the sync point manager whether the work request made by the - 20~0322 application placed the resource in the read-only mode or update mode (step 616 in FIG. 14).

After the initial registration of a resource, subsequent work requests made by the aPplication against that resource may change the state of the resource. That is, the resource may change from read-only to update mode.
When these changes occur, the resource adapter must inform the sync Point manager about these changes, and the registration information is updated to reflect the new state (step 619 in FIG. 14).

If the work request from the application is for a protected conversation, the registration entry for the protected conversation adapter will always show that the protected conversation adapter is in update mode and that it is not capable of performing a one-phase commit procedure. Since the protected conversation adapter represents a communication path to another apPlication execution environment, which may involve a plurality of resources, it is not possible for the protected conversation adapter to determine accurately if it represents a communication path to read-only mode resources or to update mode resources. Therefore, the presence of a communication path to another application execution environment requires the two-Phase commit procedure, to provide the neCessarY protection of the critical resources. The protected conversation adapter insures that the two-phase commit procedure will be used by registering as an update mode resource that is not capable of performing the one-phase commit Procedure.

After the aPPlication has completed all its work, it will attemPt to either commit or back out the data - 2~'~0322 _ ~, at the resources. To accomplish this, the application issues a sYnC point request to the sync point managerO
To start processing the sync point request, (step 6Z0 in FIG. 16) the sync point manager reads the work unit table to find the entry for the affected work unit (step 621 in FIG. 16). For more information on work units, see Local and Global Commit Scopes Tailored To Work Uni~. Once the correct work unit entry is located, the sync point manager reads the information in that entry about the resources registered for that work unit and creates three lists of resources (step 622 in FIG. 16).

Each of these lists has a different meaning.
The read-only list contains those resources whose data has only been read by the application. The update list contains those resources whose data has been changed by the application and those resources that are in read-onlY
state but whose resource manager is not capable of performing the one-Phase commit procedure. The initiator list contains the list of communication partners that have sent a message that they want to synchronize updates to resources. Each resource maY appear in only one of the lists.

In practice, the registration for each resource includes two flags which are read by the sync point manager and used to determine if a resource should be entered into the update list or the read-only list. The first flag is on when the resource is in read-onlY mode, and is off when the resource is in update mode. The second flag is on when the resource supports both the one-Phase commit procedure and the two-Phase commit procedure, and is off when the resource is capable of Performing only the two-Phase commit procedure. In practice, the registration for each ~03~2 resource also includes a field that contains information about whether this resource adapter received a message from a communication partner indicating that it wants to synchronize uPdates to resources. The sync point manager reads this field and uses the data to determine if the resource should be entered into the initiator list.

Once the lists of resources have been built, the sync Point manager examines the sync point request tyPe (decision block 623 in FIG. 16). If the sYnC point request is to back out, the sYnc Point manager performs backout processing as follows. First, all the resource adapters in the uPdate list, if any, are told to back out the changes to their resource (step 626 in FIG. 16). Then, all the resource adapters in the read-only list, if any, are told to back out the effects on their resource (step 627 in FIG. 16). It should be noted that the Processing of a ~backout~ for a read-only resource is defined by the resource implementation, since there are no changes to the actual data in the resource to be backed out. For example, processing for a backout of a read-only file in a shared file resource manager 63 (FIG. 2). could include closing the file and discarding any file positioning information previously maintained for the application's use. After the read-only resources are told to back out, then all the resource adapters in the initiator list, if any, are told that this application execution environment backed out the changes for this synchronization point (step 628 in FIG.
16).
-If instead the sync point request is to commit(decision block 623 in FIG. 16), then the sync Point manager starts the optimization Process for the commit.
The first step in the optimization Process is to determine 20~0322 G,~
_ if the initiator list is not empty (decision block 624 in FIG. 16). If the initiator list is not empty, this means that this application execution environment is a cascaded initiator in the sync Point tree, and that the full two-phase commit procedure must be used for this commit.
This is necessary because neither application execution environment knows the full scoPe of the sync point tree, that is, how many resources are active and in uPdate mode for this synchronization Point. Since the number is not known, the two-phase commit procedure must be used, to provide the necessary Protection of these critical resources.

If the initiator list is empty (decision block 624 in FIG. 16), the next step is to determine if more than one resource is in the update list (decision block 6Z5 in FIG. 16). If this is true, then the full two-Phase commit Procedure must be used for this commit. The two-phase commit procedure provides more protection for the update mode resources, because no resource commits its changes until all resources have voted that they can commit their changes.

If there are less than two resources in the update list (decision block 625 in FIG. 16), the next step is to determine if there are zero or one resources in the update list 640 ~FIG. 16). If there are zero resources in the update list, then the one-phase commit Procedure will be used to commit the read-only resources. Likewise, if there is exactly one resource in the update list, and its resource manaser is caPable of performing the one-phase commit procedure, then the one-phase commit procedure will be used.

,,~, .. . ~

~, t ::
20~0322 i The one-Phase commit procedure starts by the sync point manager telling the resource adapters in ihe update list, if any, to commit their changes (steP 641 in FIG. 16). It should be noted that the one-phase commit of data by the resource manager is achieved by only one invocation of the resource adapter, in contrast with the two invocations needed during the two-phase commit procedure. Since there can be only zero or one resources in uPdate mode in the entire sYnchronization POint, there is no chance of data inconsistency caused by different decisions for different resources. Also note that during the one-phase commit procedure, there is no writing to the recovery facility log 72 (FIG. 2), as opposed to the required logging that is part of the two-phase commit procedure (stePs 644, 648, 651, 658, 659 of FIG. 17). The one-phase commit Procedure ends with the sync Point manager telling the resource adapters in the read-onlY
list, if anY, to commit their changes (steP 64Z in FIG.
16). It should be noted that a ~commit" of a read-only resource is defined by the resource implementation, since there are no actual changes to the data to be committed.
For example, some shared file resource managers 63 (FIG.
2) provide read consistency, so when an application reads a file in a shared file resource manager, the application is provided with a consistent image of the file, that is, changes made to the file by other application environments will not interfere with the reading of the contents of the file, as they existed at the time the file was opened.
When the application oPens the file with the intent of read, the image is created by the resource manager. which is considered to be a read-only resource. When the aPplication is done reading the file, it closes the file and attempts a commit. When the shared file resource manager performs the commit as a read-only resource, it .,,.............................................. ~ - .
.: ~

~ r 2 0 4 (~ 3 2 2 ~ CC~ ~

could discard the image maintained for the application's use. Now, if the application opens the file again, it will see an image of the file which contains all committed uPdates made by other applications.

If the sync point request results in a two-phase commit procedure according to the outcome of decision blocks 624, 625, or 640 of FIG. 16, the sync Point manager 60 (FIG. 2) still optimizes the commit of the read-only resources. There are several parts to this oPtimization for the read-only resources. First, (step 644 of FIG. 17) information about the read-only resources is not written to the recovery facility log 72 (FIG. Z). Information about the read-only resources does not have to be logsed at the recovery facility 70 (FIG. 2) because the read-onlY
resources will never log the state of "In-doubt" on their own logs. This means that the resource manager will never attempt to resynchronize with the recovery facility 70 (FIG- 2), 50 the recovery facilitY does not need any knowledge about the resource. Second, the read-onlY
resources are not involved in the first phase of the commit, which is sending Prepare to all resource adapters in the update list (step 645 of FIG. 17). The actions of a read-only resource cannot affect the protection of the resources, since in terms of data consistency, a backout is equivalent to a commit for~a read-onlY resource.

The only time that the read-only resources are involved in the two-phase commit procedure is when theY
are told the final direction of the commit, that is, they are told whether to commit their changes (step 653 of FIG.
17) or told to back out their changes (step 655 of FIG.
17).

. . ...... ,~

- C~

The following is an example of a two-phase commit procedure involving three different aPplication execution environments, which are part of a system such as System 50 (FIG. Z). Each application execution environment is executing a different application.
Application A and APplication B are communicating via a protected conversation; Application B and Application C
are communicating via a protected conversation. The two-Phase commit procedure is started when Application A
attempts to commit by issuing a commit request Bl (FIG.
18~ to the sync Point manager which is currently running in the same execution environment as APplication A. Phase one starts when the sYnc point manager writes the SPM
Pending log record to the recovery facility log BZ (FIG.
18). The SPM Pending log record contains the logical unit of work identifier for the synchronization point and information about the synchronization point participants, in this case, the SPM Pending record shows one participant, ApPlication B.

After the SPM Pending log record is successfully written to the recovery facility log, the sync POint manager sends a prepare message via the protected conversation adapters to Application B. APplication B is notified that its conversation partner APplication A
wishes to synchronize resources, and ApPlication B
subsequently issues a commit request B3 ~FIG. 18~ to the sync point manager which is currently running in the same execution environment as APPlication B.

For the sync Point manager at B, the first Phase of the two-phase commit Procedure starts by writing the SPM Pending record to the recovery facility log B4 (FIG.
18). The SPM Pending record contains the logical unit of 2~4l~22 work identifier for the synchronization point and information about the sYnchronization Point participants.
In this case. the SPM Pending log record contains information about Application A, showing it as the synchronization Point initiator, and ApPlication C as a synchronization point participant. Once the SPM Pending log record is successfully written to the recovery facility log, the sync point manager sends a prepare message via the protected conversation adapters to APplication C. Application C is notified that its conversation partner Application B wishes to synchronize resources, and APplication C subsequently issues a commit request B5 (FIG. 18) to the sync point manager which is currently running in the same execution environment as APplication C.
.

The sync point manager starts the first phase of the two-Phase commit procedure by writing the SPM
Pending record to the recoverY facility log B6 (FIG. 18).
The SPM Pending record contains information` about the synchronization point participants and the logical unit of work identifier for the synchronization Point. In this instance, the SPM Pending record contains information about ApPlication B, which is the synchronization Point initiator. The SPM Pending record also shows that there are no synchronization point participants for Application C.

Since there are no more participants. there is no need for the sYnc Point manager at C to send a prepare message via any protected conversation adapter. The sync point manager at C then sends a state record to the recovery facility, updating the state of the syncPoint to Agent, In-Doubt B7 (FIG. 18). Once the state record is ,,,, . .~._ ~. ,.
`-'I
. ~

successfully written to the recovery facility loy, the sync point manager at C responds to the prepare message by sending a request commit message via the protected conversation adapters to the sync point manager at B.

The sync point manager at B receives the request commit message from the sync point manager at C via the protected conversation adaPters. Since only request commit messages were received, the next step is to send a state record to the recovery facility, uPdating the state of the synchronization point to Agent, In-Doubt B8 (FIG.
18). Once the state record is successfully written to the recovery facility log, the sync Point manager at B responds to the Prepare message from A by sending a request commit message via the protected conversation adaPters to the sync point manager at A.

The sync point manager at A receives the request commit message from the sync point manager at B, which comPletes the first phase of the synchronization point. ~~
The sync point manager must then make the decision, as the synchronization POint initiator, whether to commit or back out the logical unit of work. Since only request commit messages were received by the sYnc point manager at A, the sync point manager at A will decide to commit the logical unit of work. The second phase of the two-phase commit procedure starts by the sync point manager recording this decision by sending a state record to the recovery facility. The state record changes the state of the sYnchronization POint to Initiator, Committed B9 (FIG.
18). Once the state record is successfully wr~tten to the recoverY facility log, the sync point manager sends a committed message via the protected conversation adapters to the sync Point manager at B.

.,~
. ~:

~-- 20~0322 _ ~,c~

The sync point manager at B receives the committed message, which completes the first phase of the two-phase commit procedure. The second Phase is started when the sync point manager sends a state record to the recovery facility, uPdating the state of the synchronization point to Initiator-Cascade, Committed B10 (FIG. 18). The sync Point manager at B then sends a committed message to the sync point manager at C via the Protected conversation.

The sync point manager at C receives the committed message, which completes the first phase of the two-phase commit Procedure. The sYnc point manager at C
starts the second phase by sending a state record to the recovery facility, uPdating the state of the synchronization point to Initiator-Cascade, Committed B11 (FIG. 18). Since there are no more participants to receive the committed message, the sYnC Point manager at C is finished with the synchronization Point. To record this, the sync point manager at C sends a state record to the recovery facility, updating the state of the synchronization point to Forget B12 (FIG. 18). This state tells the recovery facility that all records written by the sync point manager at C for the logical unit of work identifier are no longer needed and can be erased. After the state record is successfullY written to the recovery facility log, the sync point manager at C responds to the committed message bY sending a forget message to the sync point manager at B via the protected conversation adapters, which ends the 5econd phase of the two-phase commit procedure for the sync point manager at C. After the forget message is sent, the sync POint manager at C

~1 0--returns control to Application C, with an indication that the synchronization point has comDleted successfully.

The sync point manager at B receives the forget message from the sync point manager at C via the protected conversation adapters. The receiPt of the forget message indicates that the sync point manager at B has comPleted the synchronization Point. To record this, the sync Point manager at B sends a state record to the recovery facility.
updating the state of the synchronization point to Forget B13 (FIG. 18). This state tells the recovery facility that all records written by the sync point manager at B for the logical unit of work identifier are no longer needed and can be erased. After the state record is successfully written to the recovery facility log. the sync point manager at B responds to the committed message by sending a forget message to the sync point manager at A via the protected conversation adapters, which ends the second phase of the two-phase commit procedure for the sync POint manager at B. After the forget message is sent. the sync point manager at B returns control to ApPlication B. with an indication that the synchronization POint has completed successfully.
The sync Point manager at A receives the forget message. The receipt of the forget message indicates that the sync point manager at A has comPleted the synchronization point. To record this. the sYnc POint manager at A sends a state record to the recoverY facility, updating the state of the sYnchronization point to Forget B14 (FIG. 18). which tells the recovery facility that all records written bY the sync point manager at k for the logical unit of work identifier are no longer needed and can be erased. This ends the second phase of the two-phase commit procedure for the sync point manager at A. which - 20~0322 , means that the sync Point has comPleted at every participant. After the state record is successfully written to the recovery facilitY log, the sync Point manager at A returns control to APplication A. with an indication that the synchronization Point has comPleted successfully.

2~0322 COORDINATED HANDLING OF ERROR CODES AND INFORMATION
DESCRIBING ERRORS IN A COMMIT PROCEDURE

Figures 29-3Z illustrate components of system 50A which provide to application 56A a return code, if any resource or protected conversation reports an error or warning.
Also, application 56A can request detailed error information from each resource and protected conversation.
The detailed error information identifies the reporting resource and describes the reason for sYnchronization point errors or could be a warning about the synchronization point.

ApPlication 56A is running in apPlication execution environment 52A (see Figure 32) in sYstem 50A. Resource adapter 62A is the adapter for a shared file resource manager 63A, resource adapter 62G is the adapter for SQL
resource manager 63G, and protected conversation adaPter 64A is the adaPter for a protected conversation with system 50B via protected conversation adaPter 64B. In this examPle, adapters 62A and 64A have the same product identifier since they are integral components of the system control program in system 50A; adapter 62G has a unique product identifier since it is part of a different product; adapters 62A and 64A have different resource adapter exit identifiers. For illustrative purposes, resource adapter 62G produces error blocks that'are indecipherable to adaPter 56A and has a prior art function to return detailed error information to adapter 56A.

In response to work requests (SteP 651, Figure 29), adapters 62A and 62G and 64A register (Step 653), with sync point manager 60. Sync Point manager 60 creates registry objects 162A, 16ZB, and 16ZC, filling in the identifiers of the participating resources (shared file resource manager 63A, SQL resource manager 63G and the protected ... . ..... . .
.. .
.~ ~

~ 20~0322 conversation partner in system 50B). Also. the registration information includes the resource adaPter exit routine names, product identifiers for the resources and protected conversation, and the required length of an error block for each resource. The resource adapter exit name is required when a product such as the sYstem control Program in sYstem 50A in this illustrated example, owns two resource tyPes. The Product identifier and the resource adapter exit name both identify the particiPating resource type e.g. a shared file resource manager, a SQL
resource manager, or a protected conversation. All resource adaPters of the same resource type within an execution environment use error blocks from the same Pool to reduce the paging set of the system 50A. (See Figure 31 for a graphical description.) If a resource asks in SteP 653 (Figure 29) for an error block of the same size as another resource tyPe, the error block pool is shared by both resources.

For each registrant (62A, 62G, and 64A) the parameter list to call a resource adapter exit is built by sync Point manager 60; it contains the address and length of usable error information of the resource's error block. Placing the usable error information length in the registry entry results in system 50A's Paging set being unaffected if no error occurs.

Next, application 56A requests a commit from sync point manager 60 (SteP 654, Figure 29). If application 56A
desires detailed information from shared file resource manager 63A in the event an error occurs during this synchronization Point--a prior-art function of shared file resource manager in system -50A--then apPlication 56A
transmits an error data address on the Commit verb (SteP
654, Figure 29) of a data area in its execution environment to store a coPy of the detailed error information. This , ,;

2~0322 _ ~y area is used if resource manager 63A rePorts an error or warning. The sync point manager 60 receives the verb instead of the shared file resource adaPter 62A and the error data address is saved by the sync point manager 60.
On comPletion of the synchronization Point all errors and warnings (stored in error block 66A. Figure 29) would be moved to application 56A's error data area (not shown).
Thus, compatibility with the prior-art error-Pass-back architecture of shared file resource manager is preserved.

In Ster 655 (Figure 29) sync point manager 60 Passes each resource adapter (6ZA. 62G, 64A. shown in Figure 3Z) the address of its error block (objects 66A-C) saved in registry objects 162A-C that were built for each resource adapter when the resource adapter registered (SteP 653).
If there are no failures, the commit from SteP 654 is complete, then sync Point manager 60 rePorts back to application 56A the fact that the updates have been committed (SteP 657).

However, if a resource detects errors or warnings. its adapter, 62A, 62G or 64A (SteP 670 in Figure 30) fills in the detailed error information using the error block 66A-C
(Figure 29) as a place to store whatever is required by its design and updates the usable error length, which is an inPut~outPUt parameter. Since a resource adapter exit can be called many times during a two-phase commit procedure it can apPend error information to the error block if necessary; it may have three warnings and one severe error for instance; it manages the usable error length itself (SteP 672).

Sync Point manager 60 receives from the resource adapter exit.a single return code in a common format and proceeds with the two-Phase commit procedure's logic (steP 673);
Sync Point manager 60 neither knows nor cares about the .
?

t~

1s~

contents of the error blocks 66A-C. If the two-phase commit Procedure's logic dictates an error or warning, the sync point manager transmits a consolidated return code to aPplication 56A (SteP 657 in Figure 29 and 614 in Figure 30).

On receipt of the return code. apPlication 56A asks for a detailed error block by calling a routine Provided by sync point manager 60 (SteP 676, Figure 30). In response, the error block manager (Function 690, Figure 32) within sync point manager 60 looks for a non-emPty error block and moves it to application 56A's buffer. Other output parameters are the product identifier and resource adapter exit name for the owner of this error block. APplication 56A then examines the product identifier. If the reporting Product is the system control Program in system 50A
~decision block 678, Figure 30), then application 56A
examines the resource adapter exit name to distinguish between the two system control program adaPters. Now it can look at the error block for the resource name and the cause of failure (SteP 680A or B). Mapping macros are provided by the system control Program in sYStem 50A for the shared fiie resource manager and for protected conversations to aid in reading error blocks. Also a routine (Interaction 693, Figure 32) is provided by each adaPter to reformat its error block into a convenient form, parameter list. Existing applications using thè shared file resource manager require no change since its error-pass-back method is unchanged. Protected conversations are new so the comPatibility object is not violated for existing aPplications using communications.

If the product is a SQL resource manager (decision block 681 Figure 30), then the error block must be deciPhered, assuming for illustration that it is not in a form which aPplication 56A c,an presently understand. Thus, 2()~0322 application 56A asks resource adaPter 6ZG to identifY the type of error in a form that application 56A can understand (SteP 682). In response (SteP 683), the SQL
resource adapter 62G reads the error block from the sync point manager, using a routine very similar to the routine used by application 56A but specialized for resource adapters. Note that the SQL resource adapter 62G and application 56A are given unique tokens so that both can loop through the same error blocks without confusion. SQL
resource adapter 62G reformats the data in error block 66C
~Figure 29) to a form compatible with aPPlication 56A (SteP
684 Figure 30), and then sends the reformatted detailed error information to application 56A (Step 685). It should be noted that only a minor internal change is required to this example of a pre-existing SQL resource adaPter to participate in coordinated handling of error information, i.e. it must ask sync point manager 60 for its error blocks. No change is required by pre-existing applications if only one resource is updated by adapter 56A; the external apPearance of the SQL resource adapter error-pass-back interface is Preserved. Additional error codes indicating adapter 56A is using a new function, coordinated synchronization Point~ are not considered an incompatibility.
, . .
APplication 56A then queries sync Point manager 60 to determine if there are additional error blocks (SteP 676 Figure ~0). If so (Decision block 677), StePs 678-685 are rePeated to obtain one or more additional error blocks from sync point manager 60. If there are no additional error blocks, decision block 677 leads to SteP 688 in Figure 29 in which application 56A continues processing, either to pursue a different function or to attempt to correct the failure.

- .
: ,!, The sync point manager 60 keePs error blocks until the next sYnchronization Point. as described in the foregoing section entitled "Registration of Resources For Commit Procedure."

2Q4~322 -7~
LOG NAME EXCHANGE FOR RECOVERY OF PROTECTED
RESOURCES

Wllen app,ication 56 (r~lG. 2) issues a sync point request, a two-phase commit proccdure is initiated for committing changes for all protected resources. Protectcd resources include protected resources such as data bases managed by a resource manager, as well as a special clq~ciflr~q~tion of resources called protectcd conver-sations, which represent a distributed partner application. As noted in the section rCoordinated Sync Poin~
Mqnq-e ~ nt of l'rotected Resources for Distributed Application", the fLrst phase in the two-phase commit proccdure is to prepare the resources for the commit. Once all resource managers have agreed to a commit during the f,rst phase, then the second phase accu~ lish~ the actual commit. If any resource is unable to prcpare during the rlrst phase, then all the resources arc ordered to back out their changes during the second phasc instead of comrnitting them. All resource data changes are subject to back out until the timc that they are actually committcd.

In order to support a recovery piU~,~;dUl~.i, as described in the section 'VRecovcry Facility For Incomplete Sync Points l;or Distributed Application~, for completing a sync point when the sync point cannot complete due to a failure, it is necessary that sync point infonmation be previously stored and retained irl recovery facility logs 72 and resource manager logs 800, which are in non-volatile storage facilities. Logging is done by each sync point manager 60 as well as by each participating resource manager 63. Information recordcd in the log includes the current state of the sync point from thc standpoint of the logging sync point managcr or resource manager, the current name(s) q~cociq~ed with the sync point log of known sync point partic-ipants, and, in the case of sync point managers, infonmation required to establish conversations with sync point participants at the time of rccovery from sync point failurcs.

Information concerning the log naine of kllown sync point participallts is loggcd scparately or pqrtitioncd from the rcmaining Syllc point information. rhe log namc information is rccor(led hl a log name log 72A2 (FIG. 19), while lhe rcmaining infonnation is recorded in a Syllc point log 72A 1.

~ 2~32~

When a failure occurs in recording information in any of the sync point logs, requiring that thc log be rehliti-atcd, in effect beginning a new log, thc log is assigned a new name. Whcn this occurs it is hn,nortant that other sync point managers and resource managers that are sync poh~t participants with thc holder and main-tainer of the new log be no~ified that the loe has been reinitiaiized and tllat a new name is in crfect.

It is essential for automatic resynchronization that each sync point manager and participant have valid sync point logs. That is, the logs at the time of resynchronization must be the same logs that were used during syne point. If any logs have been replaeed or damaged then resynchronization cannot proeeed normally. To ensure that aU logs are eorrect, there is a pre-sync point agreement on the log names of each sync point manager and participating resource, which is acco.llpli~hed by a procedure caUed e~change of log names.
There is another exchange of log names just before the res~ luul~dtion begins, whe.eupon, the log names of aU participants being ~1~ t~ ,.,;"Fd to be the same as when the sync point began, the resynchronization can proceed to recover the failed sync point, knowing that no participant had a log niniti~li7~tion~ Without this procedure, invalid syne point log inforrnation could lead to a failure in or crroneous results from the recovery p- uc~.sulg~

~s an optirni7atinn for protected conversations between apptication environments in the same system (for example auuli<,atioll cl.~uull~ s 52A and 52B in system 50A) it is not necessary to exchange log names since the respective syne point managers 60A and 60B share the same recovery facility 70A and recovery faeility log 72A. When there is a eommon reeovery facility log 72A, the step of syncl~ ~.,u.g logs (by h~ngine log names) is not necessary and may be omitted. ~ync point manager logging is accomplished by the eolnmon reeovery faeility 70 which resides in the same system as thc supp0rted sync point manager(s) 60. AU sync point managers 60A, 60B, and 60C in a system 5nA sllare the common recovery raeility 70A
and the same sullpultillg pair (sync point and log name) of logs in recovery faci]ity log 72A.

I:IG. 33 iliustrates three systems 50A, 50D, and 501~, the rccovcry racilitics in cach, and communications bctween the systcms. Each application environment 52A, 521~, 52D, 52F, and 52G incrudes an application proyram 56A, 56B, 56D, 561;, and 56G respectively (not illustratc(l), WiliCII utilizes a syne pohlt manager 204~2~

_ ~o--60A, 60B, 60D, 60F, and 60G, ~ e~ ly, for purposes of coordinated resource recovery. A sync point manager uses the recovery facility in its system to manage the sync point and log name logs required for recovery from a failing sync point. For example, the sync point managers in application cnvironments 52A
and 52B use the recovery ~facility 70A to record in log 72A. Resource managers 63A, 63B, 63D, 63E, 63F, and 63G maintain their own sync point and log name logs 800A, 800B, 800D, 800E, 800F, and 800G, c~,tiv~,ly. The illustrated scope of sync points are indicatcd hy solid lines and arrows. Although sync pointS may be initiated by any p~.li-,i~ant and the scope of a sync point is dynamic, the illustration is static for simplicity of illustration. For the illustrated static cascs, sync points flow between application environ-ments 52B to 52D to 52F via the the ~c~o~ cd sync point managers and protected cGIlv~ dtion adapters (not shown) via cul.lulul~ic~lion solid lines 801 and 802; and from application ~,.IV;IUIL~ S 52A, 52B, 52D, 52F, and 52G via the ~sor;~ted sync point managers and resource adapters to the iesource managers 63A, 63B, 63D, 63E, 63F, and 63G via co~lul.~ <,lions solid lines 803A-1, 803~-2, 803B, 803D, 803E, 803F, and 803G, I~ .,Li~,ly. The dotted lines show collullunicdtion paths employcd at the time of pre-sync point dgll,~,lll~,nl:i and at the time of lui,~llclLIulli~lioll for recovering a failing sync point. For resource managers, this dotted line co.. ,~ ion is between the resource manager and the recovery facility of the system of the ol;~Sul.~liulg dp~ ,dlic,ll e.~vLulu~ t, for example, resource manager 63E to 70A, not 70B.

Three sync point scopes are included in FIG. 33. nle first involves a single application c.lvllulllll.,.ll 52~
(and sync point manager) and utilizes two resource managers 63A and 63E. The second sync point scope involves three dl~pL~,dliCU~ C~lVUolUll~ i 52B, 52D, and 52F each involving varjous participating resource managers (63B for 52B, 63D and 63E for 52D, and 63F,G for 52F), as further illustrated by a sync point tree in FIG. 34.

FIG. 19 block diagram and I~IG 20, 21, and 22 flowcllarts illustratc by example the process for log name c~changc involving a protected .UllV~ .ltiOII bet-vccn systcm SOA and 50D. Application 56A initiates a pro-tected ~on~ ation with application 56D (stcp 831 in ~IG. 20). ~pplication 56A is running in application ~,nv;l~nlll~llt 52A in system 50~ and application 56D is running in appucation ~l~vh~m~ 52D in systcm 501). The conversation initiation includes specifilcation of a path (system identirlcr), ~B~ in thc current examl-le, and a r.esource identifiler for the appucation partner. The path idcntir~cs systcm 50D and thc .

resouree identifier identifies target a~,uli.,dtion 56D. Resouree identifiers are explained in detail below in this section. The system eontrol prograrn includes a facility which acts as the resource manager of applications, to support the ~ bl;~ of an a,upl;~ tion resource identifier for applications and to recognize those identifiers when used in cul~ ion initiation, then to either activate the d~plicdtivn in an execution envi-ronrnent or, if already activated, route the new conversation to that active application. Thus conversation routing for appLcalions utili7.e paths tsystem identifiers) and resource identifiers, where paths accol-lpli~h the routing between systems, as interpreted by cv,lu,,u,,i~ltion facilities, each of which represent a system, and resource identifiers accvl~ Lall routing to or activation of an applicdtion in an execution environment within a system, as interpreted by the system control prograrn which acts as the resource manager for application resourees.

Upon receipt of this cv.~ io~l initiation, CO~ u~liCdliOl~ facility 57A searches its exehange log name status table (ELST) 208A for an entry for the current path, path B (step 833 in FIG. 20). The exchange log narne status table entry for path B indicates by status zero that no protected cvll~ dtions have occurred on this path since system 50A was last initiated. Therefore (decision step 834 in FIG. 20), the exchange log name status table entry 208A for path B is changed to status one (step 836 in FIG. 20), the conversation initiation message 505 FIG. 19 is intercepted, and the conversation is suspended by the co~lu.luni~dtion facility 57A (step 837 in FIG. 20). Next, ,v.,.. ~ tinn faeility 57A FIG. 19 sends message 200 FIG. 19 on a eontrol path to the loeal recovery facility 70A to indicate that an exchange of log names should be initiated for path B before the co~ ation initiation is accepted by the col--,-,u- icdtion facility 57A (step 838 in FIG. 20).

Recovery Facility 70A receives this message (step 850 in FIG. 21) and thcn sets ELSl 207A entry for path , ~^
B to status 1, in~lir~ing that exchange of log names for path n is in progress (step 851 in I IG. 21). Then r~:covcry facility 70A 1'1~. 19 initiates a non-proteeted eonversation on col~ un;.,c~tiol1 path B (message 202 F:IG. 19). Sinee the eonversation is NOT ~proteeted~, there is no possibility of intereeption by a comrnuni-cation facility sinee only protected conversations are monitored for intereeption to enforee log narne exehange ~-v~ Ju~. l he routing from system 50A to system 50D througl; their comrriunieation faeilitics is as described above..

2~0322 The conversation initi~li7.~tion also utilizes a globally reserved resource identifier called protected conversation recovery resource identifier which pcrmits routing to the recovery facility 70D of thc identified target system 50D. As each recovery facility 70A 70D is initiali7.ed the rccovery facility idcntir~cs itseLr to the system control program as the local resource manager for the global rcsourcc called "protccted conversation recovery". The result is that the system control program for systcm 50D routes conversations with the protected conversation recovery resource identifier to the local recovery facility 70D
and that recovery facility 70D also deterrr~ines based on the protected conversation recovery resourcc identi-fier that was used to initiate the conversation that thc purpose of thc conversation is to e~change log names with another recovery facility 70A. The initial message 202 FIG. 19 in this conversation includes the log name of log 72A along with an indication of whether that log name is ~new" that is whether the name of the log was changed to reflect a new log as a result of a major failurelloss associated with the "old'V log (stcp 852 in FIG. 21). The current exarnple assumes the log is not new. Rccovery Facility 70A waits for a response to message 202 FIG. 19.

~fter recovery facility 70D rcceives the log narne information tral-smitted by recovery facility 70J~ along communication line 202 (step 870 in FIG. 22) recovery facility 70D sets ELST 207D for path 13 to status I
and the local communication facility 57D is notified via message 203 FIG. 19 to also change ELST 208D to status I for path B (step 871 in FIG. 22). Steps 841 in FIG. 20 842 in FIG. 20 843 in FIG. 20 and 846 in l~I(`J~ 20 illustrate the steps for changing the ELST in a communication facility. Recovery facility 70D detcr-mincs from message 202 FIG. 19 that the log narne of recovery facility 70~ is not ncw (dccision stcp 872 in I~IG. 22) and that its own log is also not new (decision step 876 in I~IG. 22) and rlnally that thc log name h messagc 202 FIG. 19 matches with the log name stored in rccovery facility 70D log namc log 72D2 entry for path 13 (decisinn step 877 in I~IG. 22); thcrerore LLST 207D is set to status 2 ror path n and thc local communication facility 57D is notified via message 203 I~I('J. 19 to also change 1~ l~ r 2081) to status 2 for path 13 (stcps 879 in ~IG. 22 841 in l~I('J. 20 and 842 in ~ J. 20). Then recovcry facility 700 rcsponds (mcssagc 206 I;IG. 19) normally to rccovery facility 70~ by passing thc !og namc Or its log 721) and an indication o~ whether it is new or not (step 882 in ~IG. 22).

20~0~'~2 Reeovery faeility 70A reeeives this normal response (deeision step 853 in FIG.21) and, since reeovery facili-ty's70Alog72A is not new (deeision step 857 in FIG.21) and reeovery facility 70D log 72D is not new aeeording to message 206 FIG.I9 (decision step 858 in FIG.21), reeovery faeility 70A suecessfuUy matches the name of log 72D sent by reeovery faeility 70D in message 206 FIG.I9 with the log narne stored in the log 72A2 entry for path B (deeision step 859 in FIG. 21) and therefore set LLST 207A entry for path B to status 2 and notifies the loeal co"lmu~ dtion facility 57A via message 204 FlG.19to set ELST 208A to status 2 (step 862 in PIG.21). Then recovery facility 70A does a normal termination of the conversation on path B with recovery faeility 70D (step 863 in l;IG. 21), allowing recovery facility 70D to complete norrnally (deeision step 883 in I~IG. 22 and step 886 in FIG.22). Once the comrnunication facility 57A has received message 204 PIG.19to post the status for path B in ELST 208A (steps 841 in FIG.20 and 842 in FIG.
20), the Ult~ ~,GIJtcd and 5~ dcd cc",~ lion 505 on path B is perrnitted to complete its initi~li7~ion (deeision step 843 in FIG.20, and steps 845 in PIG.20 and 846 in FIG.20). This co~ lion removes the su ",~ ded status of the ~u~ tion and permits it to flow to its destination, eommunication faeility 57D.
In the target co"u--ulli~ation faeility 57D there is a proteeted eonversation arrival event (step 832 in FIG.
20), then the seareh for the path entry in the ELST 208D (deeision step 834 in FIG.20) indicates a status of 2, ~ lg the eoll~ lioll initiation to nOw norrnally (step 839 in FIG.20)to applieation 56D.

Tllis "lllpl~,t~ the normal ease flow for cull~ lion intereeption and exehange of log names. Some addi-tional eases are also illustrated. Steps 834 in FIG.20 and 835 in FIG.20 illustrate that additional eonversa-tions on the sarne path are also 5~ ,d~d onee the status of I has been established to indicate that an exehange of log names for the path is already in progress.

In the ease where the target recovery facility 70D finds a log name mismatch between the log n.~ne sent in message 202 FIG.I9 and the one stored in log 72D2 for path ~ (decision step 877 in 1IG.22), an error is relurnell in message 206 I IG. 19 (step 880 in 1~IG.22) and l~ ST 207Dis set to status O ror path n and eommunicatioll facility 57Disnotificd to ehange its rLST 2081) via message 203 FIG.I9 similarly (stcps 841 in ~IG.20,842 in ~IG.20 and 881 in FIG.22).

-~ .

20~0~22 In the case where recovery facility 70D receives a message 202 FlG.19indicating ~hat the source log 72A is new (decision step 872 in FIG. 22) and log 72D is also new (decision step 873 in FIG. 22), the new log narne for 72A is stored in log 72D2 for path B (step 878 in PIG. 22) and nommal completion continues as before (steps 879 in FIG. 22, 882 in FIG. 22 etc,) In the case where recovery facility 70D receives a message 202 FIG.I9 indicating that the source log 72/\ is new (decision step 872 in FIG. 22), but log 72D is not new (decision step 873 in FIG. 22), and it is deter-mined from the sync point log 72DI that there is an unresolved sync point recovery (oll~c~nding resynciuro-nization) for path B, (decision step 874 in FIG. 22), an error message is generated for the system 50D
operator (step 875 in FIG. 22), an error is retumed to recovery facility 70A in message 206 FIG.I9 (step 880 in FIG. 22), ELST 207D is changed to status 0, and the local COIlullU~ iOll facility is notified via message 203 FIG.I9 to change ELST 208D to status O (steps 881 in FIG. 22, 841 in FIG. 20, and 842 in FIG. 20) before retum (step 882 in FIG. 22).

When recovery facility 70A detects an error response in message 206 FIG.19 from recovery facility 70D
(decision step 8S3 in FIG. 21) and there is an ol~t~t~n~ling ~ ,y~lclu~mi~tion indicated in log 72AI (decision step 854 in FIG. 21), then a message is sent to the system SOA operator (step 8SS in FIG. 21) and ELSTs 207A and 208A are changed to status O (step 856 in FIG. 21) El,ST 208A is changcd to status O via mcssage 204 FIG.19 to the cuh~ ulLi~lion facility 57A (steps 841 in FIG. 20, and 842 in FIG 20) This results in an error return to the appli~ ioll 56A that originated the intercepted conversation, and rejection of the ~ull~ tion (step 844 in FIG. 20). If no ~ uni~tions are ou~ct~n~ling (decision step 854 in FIG.
21) thcn the operator message is avoided (decision step 854 in l:IG. 21) Wllcn a new log nalne is retumed to recovcry facility 70~ in mcssage 206 I~IG 19 from rccovery facility 70D
(dccision step 857 in FIG. 21), then it is stored in the log 72A2 entry for patll 1~ (step 8hl in FIG. 21), ELST status of 2 is set for path B (step 862 in l;IG. 21), and the communication facility 57A pemmits the co,.~ ation to be released from ~u:~u~llaion (steps 841 in l;IG. 20, 842 in Fl(;. 20, decision step 843 in FIG.
20, and step 845,in FIG. 20).

When recovery facility 70A detects that the log name returned by recovery facility 70D in message 206 FIG.
19 docs not match with that stored in log 72A2 for path B (decision step 858 in FIG. 21 and 859 in FIG.
21), or a new log name for 72D2 is returned (decision step 858 in FIG. 21~ and recovery facility 70A deter-mines from log 72AI that there are onts1~n~ling resychronizations required for path B (decision step 860 in FIG. 21), then recovery facility 70A signals recovery facility 70D that there is a serious error by abnormaUy n .,,,;,. I;.,g the COIl~ tion that supported messages 202 and 206 FIG.I9 (step 864 in FIG. 21), generates a message for the operator of system SOA (step 865 in FIG. 21), resets the status of ELST 207A, and, through message 204 FIG. 19 to communication facility 57A, also resets the status of EL5T 208A (step 866 in FIG. 21). nliS results in an error return to the appucation 56A that originated the intercepted conversa-tion, and rejection of the conversation (step 844 in FIG. 20).

After recovery facility 70D responds to recovery facility 70A in all cases (step 882 in FIG. 22), it can never-theless detect (decision step 883 in FIG. 22) errors signalled by recovery facility 70A (step 864 in FIG. 21) through abnormal cull~ io~ n~ When this occurs path B entries in ELST 207D and, though mcssage 203 FIG. 19 to co.. ~ I;oll facility 57D (and steps 841 in FIG. 20 and 842 in FIG. 20), ELST
208D are reset to 0 status (step 884 in FIG. 22) and the log name entry in log 72D2 for for path B is erased (step 885 in FIG. 22), negating previous step 878 in FIG. 22.

As illustrated in FIG. 20 E~LSTs 208A and 208D, each communication facility controls conversation inter-ception for each path (status other than 2), initiation of log namc exchmgc (status 0), and normal conversa-tion flow (status 2). Thc ELSTs 207A and 207D m~in~ -ed by each recovery facility are similar, but are optional optimi~.ations. They permil bypassing messages to the local communica~ion racility to update the El,ST of the communication facility when the update is not really nccessary. l his is furthcr iUustrated below.

;:~

20~3~2 _ ~G -I;IG. 19 further illustrates the processing required when one of the systems e~ ccs a failure. Assume that there is a failure of ~u.. ..u;. ~ n faeility 57A, recovery facility 70A or the communication paths between them. Any such failure causes all entries in the exchange log status tables 208A, 207~ in communication facility S7A and recovery facility 70A to be reset to status zero. This is essential because any such failure eould otherwise mask the possibility that there may have also been a log failure. Because of this possibility all sync point a6.~,."~ s are reset by zeroing the status of the exchange log name stalus table entries. It should be noted that failure of either appli.,dlioll cll~uu~ lt 52A or 52D does not cause a resetting of the e~tchange log name status tables because the application eln/uu~ lls do not directly affect the log name exchange process. This is important beeause dupli~dliol~ 'uulull~llts are more prone to failure than the system facilities. Likewise, failure of one of several application CllVuulllll~ S sharing a cornmon logical path (not illustrated) does not affect the processing for other applications that utilize the path.

Assume further that after the failure of cullu,lull;~dtion facility 57A, recovery facility 70A or the control paths between them, appl;catioll 56D initiates a ~ull~ alion along path B to dpplil,dtioll 56A in applicalion C.l~uulull~ll 52A. Tlus co"~ alion is not i"t~ d by Cullllllull;CdliOIl facility 57D because the exchange log name status table within cc.. ~-.;.~ti-ln facility 57D indicates status two for patb B; the tables in com-munication facility 57D were not reset upon the failure in system SOA. ~lowever, when the conversation proceeds to co"",......... ~ n facility 57A, there is a protected ,ull~ dtion arrival event (step 832 in FIG. 20), the search of the ELST 208A (decision step 833 in FIG. 20) indicates status 0, the communication facility 57A intercepts the routing of the ~;OIl~ alioll (steps 836 in FIG. 20 and 837 in FIG. 20), and therefore colllllluluk~dlion faeility 57A requests a log name exchange (step 838 in FIG. 20) by message 200 FIG. 19 to recovery facility 70A. This eauses â repetition of the p.~iuu~l~ described log name exehange process. When the log name exchange is reeeived at recovery facility 70D during the exchange process, the exchange log namc slalus table within recovery facility 70D hldicates status two for the patll B entry. I herefore, recovery facility 70D does not notify cornmunication facility 57D to change the exchange log name status table for path B, such exchange is not necessary. This is the only difrerence in this log name exchange process from that described above before the failure. At the completion of the log nalne cxcl-allgc p rocess, recovery facility 70A notirles co,.""u,u~dtion facility 57A via message 204 FIG. 19 to change the slatus for path B from ~ero to two. Then, the ~u~u,,u,,icdlion facility 57A releases the conversation along palh B so Ihat it flows to appli.,alio"~.,v;,~,l"ll~,.,l 52A.

r.~:
~O~Q32~

~.

It should be noted that in the foregoing two examples, recovery facility 70A initiated the log name e~change on path B via message 202 FIG. 19. IIowever, if instead, communication facility 57D were the rlrst commu-nication facility to intercept a protected ~o~ alion, then recovery facility 70D would initiate the log name exchange proeess as ilh-str:lted by message 206 FIC. 19. It should also be noted that a single log name exchange is sufficient to satisfy the pre-sync point agreement for all application environments in the sarne system 50A that utilize the same path for protected conversations. The recording in the comrnon e~change log names status table 208A makes this possible. Moreover, it should be noted that the single log name exchange process described above is sufficient to satisfy the ~ ui-~lu~.lt for pre-sync point agreement even when there is more than one application cnvuull~ ,llt in each system 50A and 50D involved in the protected conversation because all of the application c.lvuu~ lcllls within the same system share the same log 72.
Also, when a protected eGIl~ alioll is initiated from application e.~vhulu~ut 52A to application environ-ment 52B in the same system 50A, then cu..ullwuc~tion facility 57A does not interccpt the conversation because both application ~llvuulul~nt~ 52A and 52B share the same log 72A and no log name exchange is necessary.

By way of e~cample the architected intersystem collullu~u~ation standard can be of a type def~ned by IBM's Systcm Network Architecture LU 6.2 Reference: Peer Protocols, SC31-6808 and chapter 5.3 r~eS~ n~ior~
Services - Sync roint Verbs, published by IBM CGllJul~tiom The exchange of log names describcd in the current section addresses the process for executing, controlling, and optimizillg the exchange, not the archi-tected protocol for the excllange.
., Exchange of log names is also rcquired betwecn recovery facilities and resource managers of protected resourees such as shared rll~8 or ~ e Unlike protected conYcrs~ ns, whcre c~ch~nge of log names is not necessâry when conversations take place in the same system (since they share a common sync point log), log name exchange is neeessary for participating resource managcrs, even where resource managers are in the same system as the initiating application, because resource managers maintain Iheir own syne point logs.
Unlike protected eonversations, which may utilize a communication protocol ror estabtishing protceted con-versations and log narne exehange as describe-l by System Network Architecturc LU 6.2 cited ahove, pro-.

-,~

teeted resourees utilize non-proteeted co~ ,a~lions and a private message protoeol for those funetions. Also, for proteeted resourees, it is not praetieal in all eases to eentrally intereept initial ~ullllllullicdlions to the resouree manager by using a colll,l.ulli~dlion faeility as the interceptor because the eommunications do not in all eases proeeed through a ~o~ ns faeility. One example of this is the case Or a resource manager 63A FIG. 2 that is in the same system 50A as the applieation envuulul~. nt 52A ancl application 56A that uses its resouree. This situation does not require eonversations with the resouree to pass through the ~u~ nl..;.~tions faeility, but instead supports cull~ ions through the eonversation manager 53A or other loeal faeilities. Another reason is to afford the flexibility of supporting resource managers without requiring them to entirely ehange their method of eommunieation with the users of their resouree in order to eonform to the System ~1etwork ~hit~lu~ LU 6.2 eommunication protocols. Automatic recovery proe-essing from a syne point failure requires that the names of the various partieipant's logs remain the same as they were before the syne point began, as was the ease for proteeted conversations described above.

FIG. 23 illustrates log name exchange for managers of protected resourees. In the illustrated embodiment, system 50A ~,UIllpli~s application C~lviuu~ ll 52A, associated resource adapter 62A, recovery facility 70A, and a common resouree reeovery log 72A. Although resouree managers may be loeal or remote, the illus-tration is for the loeal ease. As described in more detail below, the process for the remote resource manager ease is basieally the same exeept that cullullulu~dlions faeilities are involved in eompletulg the inter-system co""..~,.,;r~ ns. Whereas proteeted coll~ ions, whether loeal or remote, always utilize a eommuni-eations faeility for ~ ullull~ dllons~ providing a eommon intercept point for initiating log name exchange for the pre-syne point a~ l. .lt, resouree managers, as illustrated, may bypass the use of a eommnnir:ltion facility in the local ease, and do not have such a eentralized intercept point to initiate pre-sync point log narne exehange.

A log name log ~OOA2 within log 800A is associated with resouree manager 63A ancl stores the name of log 72A of the ori~inating recovery facility 7()A. Also, a syne point log Rnol\ I within log R()OA is assoeiatecl with resource manager 63A and stores the state of its protected resourcc in a sync point procedure. As deseribed in more detail below, ~IG. 23 illustrates the essential elements required to ensure thc timely exehange of log names between a syne point manager and a pdllic;~ ting resouree manager, as well as the ability to recognize 20~3~2 log name changes brought about by failure that forces re-initiali~.ing one or more of the logs 72A, 800A.
Whcn an application 56A sends a request to resource manager 63A via resource adapter 62A (step 221 of FIG. 26), resource adapter 62A calls the sync point manager 60A (step 222) requesting:
1. The log name of the recovery facility's log 72A, and 2. The log name log resource identifier for recovery facility 70A required to eslablish a conversation to the rccovery facility 70A for the initial exchange of log names by resource manager 63A. This identifier uniquely identifies recovery facility 70A and also permits recovery facility 70A to distinguish incoming log name exchange cu~ D~llions from other conversations, such as a sync point manager co~ Ddtion that uses a sync point log resource identifier to connect as described below.
Sync point manager 60A then establishes a conversation to the local recovery facility 70~ using a sync_point log resource identifier (step 223 I;IG. 26).

A rcsource identifler is used to identify a resource within a system or more particularly to complete a conver-sation to the manager of a resource in its current execution environmcnt in that system. The manager of a resource uses a system control program facility to identify a resource to the system whcn the manager of the resource is initi~li7P'I The system control program enforces the uniqueness of these resource identifiers. In addition to resource manager 63 FIG. 2, other facilities may act as resource managers. An example is the recovery facility 70, whose logs are consil~"~d resources for which it has resource idcntiGers. There are four types of resources, each of which is identifted by a type of resource identifier. The ftrst of these is basically generic and can be extended to include any resource. The otllers are dcfined spccificalJy for resource recovery.

1. objcct resource, identirlcd by an objcct resourcc identificr, wllich is tlle set of ohjccts 7~ managed by a resource manager 63 ~llis is thc case of a generic resourcc managcr and its rcsourcc, extcndible to any rcsource, including sets of data filcs, queucs, storage, or applications T his type of rcsourcc identificr is uscll to establish a connection to the manager of the resourcc 63 in order to use ~he rcsource in somc way, for example to open a file, start up an application, etc. that is owne(l by that resource manager.

2. object_recovery resource, identified by an object recovery resource identifier, which is a resource manager log 800 and supporting plU~,CidU~ for coop~,lating with a recovery facility 70 in the recovery from a failed sync point procedure. This identifier is used by a recovery facility 70 at the time of recov-ering from a failed sync point to establish a conversation with the manager of the resource 63 to exchange log names and complete the sync point as a part of automatic recovery.

3 sync point log resource, identifled by a sync_point_log resource identifier, which is the log 72A FIGS.
19 and 23 managed by the recovery facility 10A and the set of plu~,clu~s ~ul,po~ g the ",;~ nce of that log 72A. ~rhis identifier is used by a sync point manager 60 FIG. 2to establish a conversation with its recovery facility 70 in order to provide log il~llllaLion on the status of sync points.

4. log_name log resource, identified by a log name log resource identifier, which is the log name log 72A2 FIG. 23, managed by the recovery facility 70A and the set of procedures su~,l,oltiu~g the m~ n~ of that log 72A2. This identifier is used by resource manager 63A to establish a co~ .tiun with the recovery facility 70A to exchange log names with the appropriate recovery facility 70A.

After establishing the connection to the recovery facility 70A, sync point manager 60A obtains the recovery information requested by resource adapter 62A. This recovery information is returned by sync point manager 60A to resource adapter 62A (step 224 FIG. 26) and is held by sync point manager 60A for release to any other requesting resource adapter. Next, resource adapter 62A FIG. 23 also provides the following sync point recovery information to sync point manager 60A FIG. 23 (step 225 FIG. 26):

l. An object reeovery resource identifler which can he used by recovery facility 70A FIG. 23 to connect to resource manager 63A in the event of a failure during sync point. This object recovery resource identifier permits the resource manager to distinguish between incoming conversations from resouree adaptcr 62A
and ~rom recovery facility 70A, each of which requires dilrcrent programs for ~ g. ny giving rcsource manager 63A, through its resource adapter 62A, the capability of providing its own object recovery resource identifier, rather than establishing a standard reeovery resource identifier for all resource managers, the recovery faciliLy 70A avoids conilicts with otller resource identifiers employed by this resource manager 63A or any other resource manager, maintaining a generalized, non-disruptive interface for any resource manager to participate in sync point pl~J~Cs~;ilg.

204~322 _ q ~

2. An object resource identifier which can be used by recovery facility 70A when there is a sync point failure, to identify resource manager 63A which participates in the sync point and to find the log name log 72A2 entry for it. This identifier uniqucly identir~es the resource manager for purposes of m:ln~ging the sync point, logging the sync point in case of a sync point failure, and n,co~ L(.g from a failing sync point.

Fol~owing the application's 56A first request for use of resource 78A, described above, resource adapter 62A
initializes a conversation to resource manager 63A using its own object resouree identifier, and passes recovery information ineluding the log_name log resouree identifier of reeovery faeility 70A and the current name of log 72A, acquired from the syne point manager (step 226 FIG. 26).

Although FIG. 23 illustrates only one reeovery faeility 70A that is responsible for resouree reeovery, a single resouree manager may be involved with many reeovery faeilities sinee the resouree may be used by appliea-tions in many systems, each with its own recovery facility. This is illustrated in FIG. 33 where resource manager 63E is used by both a~,plicdLion 52A in system SOA and application 52D ;JI system SOD, therefore requiring log name information from two recovery facilities 70A and 70D.

To support recovery of a failed sync point, a resource manager 63A requires a log name log 800A2 FIG. 23 entry for the name of each recovery facility log 72, where each such log name l~pl~ t~ a system 50 that utili%es the resource through one or more applications 56 and sync point managers 60. The log name log 800A2 rlG. 23 for the participating resource manager 63A ineludes the following ~ luldlioll for eaeh asso-ciated reeovery faeility 70:
. 1. A log name log resource identifier which identirles each assoeiated reeovery faeility 70 (in the ease of FIG. 23, recovery faeility 70A);

2. Reeovery faeility's 70 log name ( in the case of I~IG. 23, the name of log 72A);
3. An exchange done nag whicll indicates when a log name has been sucçessfully exchanged. Although Lhe exchange done tlag is logieally a part of the log name log 800A2 FIG. 23, it need not be written to non-volatile stbra8e beeause it is logically reset for each initiation Or the resouree manager with which it is associated. The purpose of the nag is to avoid the exchange of log names for a particular recovery facility 70A except for the first conversation from the resource adaptcr 62A that is operating in the system SOA of the recovery facility 70A. There may be many application c~v;~vllln~ s in a system, all serviced by the same recovery facility and each with a resource adapter with a conversation to the same or different resource manager. It is only necessary for a resource manager to initiate an exchange of log names upon the first instance of a co.l~ tion with one of the resource adapters that are associated with the same recovery facility. The exchange done nag is set to prevent subsequent P~l~hqng~s The remainder of I~IG. 26 illustrates an algorithm e%ecuted by resource manager 63A I~IG. 23 to determine when to irLitiate a log name exchange. Upon fLrst reccipt of the object resource identirler (step 226), resource manager 63A searches log name log 800A2 to determine if it has an entry for recovery facility 70A identirled by the log name log resource identifier that was included in the recovery information passed from resource adapter 62A FIG. 23(step 230 rlG. 26). The resource manager uses the log name log resource identirler reccivcd from the resource adapter to search the log name log 80()A2 E;IG. 23. If there is no entry, then resource manager 63A initiates the log name e~change (step 232 FIG. 26). If an entry is found in step 230 for recovery facility 70A I~IG. 23, then resource manager 63A d~ s if the exchange done nag is set stcp 234 FIG. 26). The exchange done nag is set when a successful log name e~change occurs, and remains set until the resource manager tr minq~es abnormally or is shut down normally. If a resource manager is unable to exchange log names due to a failure to initiate a col~ tion with the recovery facility, the resource manager tcrminates the con~ at;on initiated by its resource adapter. If the e~change done nag is not set, then resource manager 63A ~IG. 23 initiates the log name exchange in step 232 I~IG. 26. Ilowever, if the excllange donc nag is set, resource manager 63A r;IG. 23 then compares thc log name transmitted by rcsource adaptcr 62A to thc log name in the entry (step 236 I~IG. 26). If these two log names are the same, thcn rcsourcc manager (loes not initiatc the log namc exchange (step 242 rlG. 26), but if thcy are diffcrcnt, resourcc manager 63A rlG. 23 initiates the log name exchange in step 232 rlG 26. Tl1e forcgoing algoritllm assurcs a log name exchange for any recovery facility the rlrst timc that a resource manager co,~ .unicates witll a rcsourcc adapter associatcd with that recovery facility. Also, thc algorithm assurcs a subscquent log namc exchangc whenever the log namcs for the rccovcry facility 70A rlG~. 23 changc. In thc lattcr casc, thc log name cxchange is ncccssary, even thoug}l the resource manager 63A gcts the ncw recovery facility log 72A name from the rcsource adapter, since it is necessary to providc the log name of the resource manager s 2 0 ~ 0 3 2 2 . .~

log 800A to the recovery facility 70A, whose log name log must be syncbronized wi~h that of resource managcr 63A.

The log name e~cchange of step 232 FIG. 26 between resource manager 63A FIG. 23 and recovery facility 70A is further illustrated in FIG. 27, and coll~pr;s~,s the following steps (assume that log 72A is the log):
1. Step 243 of FIG. 27: Resource manager 63A FIG. 23 initiates a conversation 250 to recovery facility 70A using a log narrte log resource identifier obtained from resource adapter 62A;

2. Step 243 of FIG. 27: Resource manager 63A FIG. 23 transmits the object resource identifier that uniquely identifies resource manager 63A to recovery facility 70A;
3. Step 244 of FIG. 27: Resource manager 63A FIG. 23 transmits the log name for log 800A to recovery facility 70A;
4. Step 245 of FIG 27: Recovery facility 70A FIG. 23 updates log name log 72A2 with the log name of rcsource manager 800A;

5. Step 246 of FIG. 27: Recovery facility 70A FIG. 23 returns a response to resource manager 63A pro-viding the log name of log 72A;

6. Step 247 of FIG. 27: Resource manager 63A FIG. 23 updates log name log 800A2 wilh the name of log 72A;

7. Step 248 of FIG. 27: Resource manager 63a FIG. 23 sets the exchange done flag in log narne log 800A2;

Whcn application 56A FIG. 23 requests a sync point from sync point manager 60A, sync point managcr 60A scnds the above objcct_rccovery resourcc identificr and object rcsource idcntifer to recovery facility 70A
whcrc it is stored in sync point log 72AI along with thc inforrnation <lescribing tllc statc in the sync p0int process. If a failurc occurs during a sync pohlt, recovery racility 7nA is actlvaled to perform the operations neccssary to complete the sync point l,.v~cJu~. If resources were particir)ating in the failing sync point, rccovcry inrorrnatibn in the ~vc;~d recovery facility's sync point log cntry is available to permit contact .

'- ` 2~40322 with those resources in order to ~co...~ . recovery. For example, if application 56A goes down during a two-phase cornmit operation, then recovery facility 70A is activated and subsequently exchanges log names with resource manager 63A. When this second exchange indicates that log names have not changed since the sync point was initiated, recovery facility 70A knows that it can continue with the recovery of the sync point.
A log narne mismatch in the exchange would indicate that log information required for automatic recovery has been lost and therefore automatic recovery should not be attempted. The recovery facility 70A initiates the second log name exchange and asks resource manager 63A what state or phase it was in prior to the failure. Even though the initial exchange of log names was initiated by resource manager 63A, as described above~ the exchange of log names required after the failure is initiated by recovery facility 70A as follows:
1. For each resource for which there is recovery information in sync point log 72AI associated witll the failing sync point, recovery facility 70A identifiles the log name log entry for the resource by using the object resource identifier found in the sync point log 72AI entry as a search argurncnt applied to log name log 72A2 entries, yielding the resource's log name. This is illustrated in FIG. 25.
2. The recovery facility est~bl;cll~c a co..~ ..tion 252 FIG. 23 to resource manager 63A using the object recovery resource identifier found in the sync point log entry.

3. Recovery facility 70A sends its own log name, the log name log resource identifier (unique identifier of recovery facility 70A), and the resource's log name to resource manager 63A using conversation 2S2.

In rcsponse, resource manager 63A performs the following steps:
1. Resource manager 63A l~.,O~,~ that the conversation from recovery facility 70A is intended for the purpose of sync point recovery because the ~o.l~ .Lion includcs the ohjcct recovery resource identifier.

2. Rcsource manager 63A uses the log_name log resource identificr scnt by recovcry racility 70A lo vcrify the entry in log name log 800A2 that is associated with rccovery facilily 70A.

3. Resource manager 63A verifies that the log name of Ihe rcsource transmilted by rccovery facility 7nA
,ollds with thc log name of ils own log 800A.

4. Resource manager 63A returns an error signal to recovery facility 70A on convcrsation 252 if it finds no entry in log name log 800A2 acqo~ ed with recovery facility 7nA.

~ i '' 2~40322 ~s-5. Resource manager 63A sends an error signal to recovery facility 70A on convcrsation 252 if either of the ~.ir,Ldlion steps described above fails.

An error condition detected in the exchange of log names at the beginning of recovery prevents the continua-tion of the automatic sync point failure recovery p~u~edul~; of recovery facility 70A. Such an error condition indicates that a failure of one or more of the pallic;~,~tu.g logs occurred con~ull~.,tly with the sync point failure. The loss of a log implies the loss of aU information in the log and the assignment of a new log name.
Such a failure requires manual intervention and heuristic decisions to resolve the failing sync point. Detection of such an error condition is the main purpose of the log name exchange process u,l~ .lled after sync point failure.

Similar to the case of the local resource manager 63A illustrated in FIG. 23, I~IG 24 illustrates log narne exchange whcre the resource manager 63E of system 50D is remote from application c.~vuulull~.ll 52A and the ap~)Lcaliol~ 56A of system 50A that uses the resource managed by resource manager 63E. Communi-cations between remote resource manager 63E and local application SfiA and recovery facility 70A are made via inter-system ~v""~ ljon~ facilities 57A and 57D, rathcr than through intra-system co"u"u,ucdtions support provided by the system control program. Sync point manager 60A uses recovery facility 70A to manage the sync point and log name logs 72A required for recovery from a failing sync point. Resource manager 63E maintains its own rcsource manager logs 800E. The cvllullu-,ications path utilized at the time of pre-sync point a~ ,.lts and at the time of ~ luu~ation for rccovery of failing sync points is betwcen resource manager 63E and recovery facility 70A of system 50A. The recovery facility 70D (not shown) of system SOD is not utilized in this case since thc originating sync point manager, application and ~ori~cd rccovery facility are not local to systcm 50D, but are remote in system 50A. Thc only differcnce bctwecn the log name e~cchange process for local and remote rcsource managcrs is that ~ul-ul~u~ications bctween a remote resource manager 63E and resource adapter 62~ and recovcry facility 70A are madc via communications facilities 57A and 57D instead of througll intra-system communications services of the local system control program. Otherwisc the exchange of log names process is thc same as dcscribcd above with refcrence to I~IG. 23. The Col"lllul il,dtions facilities 57A ~nd 57r) do not play a rolc ir'l dcterrnining when to e~h~nee log names with a remote log, i.e. the cU~ r~tir)n~ facilities do not inlelc~ cul~ ions as was the case for protected co~ alions in FIG. 19.

- ~ 2040~22 REOOVERY FACILITY FOR INCoMPLETE SYNC POINTS
FOR DISTRIBUTED APPLICATION

Recovery Facility 70A illustrated in FIG. 2 is used to complete a sync point that encounters a failure. In most cases the recovery (resynchronization) is accomplished automatically by a Recovery Facility 70A, which recognizes the failure and then acts as a surrogate for the local sync point manager 60A to complete the sync point normally through alternate or reacquired communications to participants in the sync point. Failures include a failing sync point manager 60A, a failure in communications between a sync point manager 60A and its recovery facility 70A, faiiure of communications with or failure of an application partner 56D or resource manager 63, and failure of the recovery facility 70A.

By way of example the architected intersystem communication standard can be of a type defined by IBM's System Network Architecture LU 6.2. Reference: Peer Protocols SC31-6808 and chapter 5.3 Presentation Services -Sync Point verbs published by IBM Corporation.

- q8- .

Recovery facility 70A serves all of the application execution environments 52A, B, C and participating sync point applications within system 50A and utilizes common recovery facility log 72A for the purpose of sync point recovery. Typically, there are many systems interconnected with each other by communication facilities 57 and therefore, many recovery facilities 70 can be involved in recovery processlng.

FIG. 33 illustrates various recovery situations involving systems 50A, 50D and 50F. Each application execution environment 52A, B, D, F, and G executes an application 56A, B, D, F, and G respectively (not illustrated) which utilizes a sync point manager 60A, B, D, F, AND G respectively (not illustrated) for the purposes of coordinating resource recovery. Each sync point manager uses the recovery facility in its system to manage the sync point and log name logs required for recovery from a failing sync point. For example, the sync point managers in application environments 52A and 52B use the recovery facility 70A to record sync point recovery information in recovery facility log 72A. Resource managers 63A, B, D, E, F, and G maintain their own sync point and log name logs 800A, B, D, E, F, and G respectively. In the illustrated - ` 2040322 - _ qq examples, scopes of sync points are indicated by solid lines with arrows. Although sync points may be initiated by any participant and the scope of a sync point is dynamic, the illustration is static for simplicity of illustration. For the illustrated static cases, sync points flow between application environments 52B to 52D to 52F via the associated sync point managers and protected conversation adapters (not shown) via communication solid lines 801 and 802; and from application environments 52A, B, D, F and G
via the associated sync point managers and resource adapters to the resource managers 63A, B, D, E, F and G via communication solid lines 803A-1, 803A-2, 803B, 803D, 803E, 803F and 803G, respectively.

Three sync point scopes are included in the FIG. 33 illustration. The first involves a single application environment 52A including sync point manager 60A and utilizes two resource managers 63A and 63E. The second sync point scope involves three application environments 52B, 52D
and 52F, each involving various participating resource managers 63B for 52B, 63D, E for 52D, and 63F, G for 52F, as further illustrated by a sync point tree illustrated in FIC.
34. The third sync point scope invoIves application environment 52G and a resource manager 63G.

~0322 The dotted lines in FIG. 33 show communications paths employed at the time of pre-sync point agreements and at the time of resynchronization for recovering a failing sync point (refer to the section "Log Name Exchange For Recovery of Protected Resources" below). For resource managers, the pre-sync point and resynchronization path is between the resource manager and the recovery facility of the system of the originating application environment (i.e. user, for example update~, of the resource managed by the resource manaaer), for examples, between resource manager 63E and recovery facility 70A via path 804A-2 when application environment 52A is the originator (user of the resource managed by resource manager 63E), and between resource manager 63E and recovery facility 70D via path 804D when application environment 52D is the originator.

A sync point propagates through participants of the sync point in a cascaded manner forming the sync point tree illustrated in FIG. 34. Appl ications 56B, 56D and 56F
communicate with each other via protected conversations 801 and 802 managed by protected conversation adapters 64B, D
and F ( not shown), respectively. App I i cations 56B, 56D and 56F utilize resource adapters 62B, D and F (not shown), respectively which use non-protected conversations 803B, 803D, 803E, 803G, and 803F to cornmunicate with the resource managers 63B, D, E, G and F, respectively. This tree includes the sync point initiator application 56B whose participants are a resource manager 63B and a distributed application 56D, which in turn has participants resource managers 63E, 63D
and distributed application 56F, which in turn has participant resource managers 63G and 63F.

For purposes of sync point recovery, a sync point log, 72D for example, is maintained by sync point manager 60D
(through recovery facility 70D not shown~ with information about its immediate predecessor in the sync point tree, application 56B in environment 52B, and the immediate participants known to it, resource managers 63E, 63D and application 56F in application environment 52F, but maintains nothing in its sync point log 72D concerning any of the other sync point participants 63B, 63G or 63F.

FIG. 35 is a high level flowchart 298 of the principal elements for sync point recovery. It represents the two parts of a recovery facility 70; pre-sync point recovery agreement (Steps 299, 300, 301 and 302) and recovery from sync point failure (Steps 303-306).

Prior to a sync point occurrence there must be agreement between the participants in the sync point concerning the identity of the logs associated with the sync point and the current level of their respective logs 72. (Refer to the foregoing section entitled "Log Name Exchange For Recovery of Protected Resources"). This pre-sync point recovery agreement is important in case of a sync point failure to ensure that the logs used to recover from the sync point failure are the same ones and are at the same level as they were before the sync point was initiated. If, between the time of the pre-sync point recovery agreement (exchange of log names described above) and the occurrence of a sync point failure, one or more of the participants has a log failure and must begin with a new 109, then the automatic recovery procedures associated with the failing log will fail.

The exchange of log names between the sync point participants and the recording of log names in the logs 72 make this information available for validation in the case of a sync point failure. These exchanges are initiated upon the detection of the first establishment of communications over a particular path. Because communications can be initiated locally or remotely, the recovery facility 70 supports both local detection (Steps 299 and 300) requiring an outgoing log name exchange and remote detection (Steps 301, 302) requiring an incoming log name exchange.

The recovery facility 70 provides automatic recovery from sync point failure and includes Step 303 - the various events that may occur to initiate the recovery procedure, Step 304 - the initialization of the recovery procedure, Step 305 - the actual recovery, referred to as a recovery driver process, and Step 306 - the termination of the recovery procedure. The recovery facility 70 includes asynchronous handling of multiple sync point failure events.

FIG. 36 shows more detail for the "Recovery From Syncpoint Failure" portion of the recovery procedure ~Steps 303-306). Five types of events (Step 303) initiate the recovery procedure:

(1) A sync point request event 311 occurs as a result of receiving a request from a sync point manager 60 when it encounters a communications failure with one or more of its sync point participants (ex. resource managers 63). The sync point manager 60 initiates the recovery procedure explicitly by sending a request to the recovery facility 70 using the same path that is used for Iogging the sync point activity. The request includes a description of the failing participant(s) using the corresponding sync point - identifier(s). An event occurs for each sync point identifier that is specified.

(2) A recovery request event 312 occurs at a target recovery facility 70 (one that represents a participant in a failing sync point) when a recovery process that represents a sync point initiator sends a recovery request to one of its participants.
(3) A communications failure event 313 occurs in a recovery facility 70 when there is a broken connection on the path used to send log information from the application environment to that recovery facility. An event occurs for each sync point that is in progress for the application environment that was utilizing the failed path.

(4) A recovery facility failure event 314 occurs when there is a termination failure for a recovery facility such that sync point logging cannot take place. An event occurs for each incomplete sync point at the time of the failure and the events occur when the recovery facility is restarted.

(5) A recovery administrative request event 315 results from an administrative command that is used to repair sync point failures that have encountered prolonged delays or serious failures during the normal, automatic recovery procedure.
The request manually supplies response state information that is normally available through automatic recovery protocols. The appropriate response state information is determined off-line from manual investigation of sync point log records. The appropriate response data (state information) is determined by administrators from manual investigation of sync point log records.

When the recovery procedure is initiated, Step 304 starts an asynchronous sub-process for each recovery event received. A participation driver sub-process (Step 317) initiates communications and accepts responses from each downstream participant in the failing sync point for the purpose of agreeing upon a consistent resolution. This communication involves the participation driver sending a message that includes the recovery server log name and a sync point state such as commit or back out, and then receiving a response from the participant that includes an indication of agreement or disagreement with the recovery server log name sent, a participant log name, and a response to the sync point state, such as committed or backed out.
The participation driver invokes a response processing driver (Step 318) for each response message thus received.
The response processing driver analyzes the response and completes all required actions and recording. This involves checking the participant's log name against the one recorded for the participant in log 72 to verify that the participant has not had a log failure since the sync point began. It further involves posting the sync point response to the recovery facility log 72. Then the response processing driver returns to the participation driver. When all responses are received and processed, an initiator response driver (Step 319) is invoked to buiId and send a response to the recovery facility that represents the initiator of the sync point, permitting it, in turn, to resolve the sync _ 1~? - ~

point with its initiator, if applicable. The response to the initiator is similar to the response that the current recovery facility received from its participants, involving a return of the current recovery facility log name and the response sync point state, such as committed or back out, that is based on the results from all of its own sync point participants. Finally, a recovery terminator (Step 306) terminates all involved processes.

FIG. 37 illustrates control structures required for the recovery procedure. A recovery control structure 340 contains information about a particular recovery event and exists throughout the current processing of the event. It contains information that is common to the recovery of all partjcipants for the related sync point. It also contains anchors to an associated entry 342 in log 72 and to a chain of participant control structures 344, each of which contains the current recovery status and path identifier for the recovery participant. The sync point log entry 342 has header information 348 that is common to the local sync point participants as well as body information 350 about the immediate initiator and each immediate participant. Finally there is a log name log entry 354 which contains initial log name exchange information for each sync point path known to the recovery facility that is associated with the sync point 109.

20~322 The purposes of these fields is further indicated by the structural flows that follow. Some fields require preliminary description: "Chain" fields are used to interconnect structures of like type.

"State" fieIds:

SPL SYNCPOINT_STATE is the overall sync point state.
Once the sync point has reached phase two this state permits driving downstream participants to resolve the sync point.
If the sync point was in phase one at the time of failure recovery request event processing may change this state according to the direction provided by the initiator recovery facility.

SPL PARTICIPANT STATE is updated with response states from participants by the Response Processing Driver 318.

RCS PARTICIPANTS STATE is set by the various recovery event processing for the purpose of driving the affected downstream sync point participants.

--10~

RCS INITIATOR RESPONSE STATE is initialized by various _ recovery events processing 311-315 along with RCS PARTICIPANTS STATE, but under some circumstances is also updated by the response processing driver 318 where the response to the initiator is to reflect unusual and unexpected responses from participants that result from unilateral decisions known as heuristic responses. This field is used by the initiator response driver 319 to provide the state returned to the initiator.

"Path ID" fields:

RCS PATH ID is the path associated with an incoming event ana may be used to respond to the originator of that event.

PCS PATH ID is the path associated with a participant in a failed sync point. It would be the same as the SPL RECOVERY PATH ID for participants.

SPL RECOVERY PATH ID is the path to get to the participant or the initiator as needed by the sync point recovery facility.

20~0322 , ~ ~
.

SPL_SYNCPOINT PATH ID is the path used by sync point processing in the application environment to supply sync point log information to the local recovery facility's sync point log.

"Flags":

RCS RESPOND_TO INITIATOR indicates that a response should be generated to the immediate initiator of the sync point recovery facility;

RCS RETURN TO CALLER - is used for controlling synchronous return from a sync point recovery request when the wait indicator (described below) is used;

RCS_ERASE_LOG is used to record that a recovery administrative request included a PURGE option, causing the sync point log entry to be erased at the conclusion of processing; and SPL INITIATOR indicates that the information in the particular sub-entry of the BODY of the sync point log entry concerns the initiator of the sync point; .
otherwise it concerns a participant.

- l 2~0322 "Miscellaneous" Fields:

RCS FUNCTION_ID is used by the sub-process starter service to determine the function to be invoked to execute in the new process.

SPL_SYNCPOINT ID is the unique identifier of the sync point and the node in the sync point tree. Each sync point log entry has a distinct sync point identifier.

SPL SUSPENDED PROCESS ID is set by the timer wait service to identify the suspended process and reset when the timed wait interval expires. It is used to invoke the resume service to prematurely terminate the timed wait for a particular process.

PCS STATUS is used to record the status of communications with each participant in the recovery procedure. It has four possible values: RESTART, CONNECTED, RETRY, AND RESPONDED.

LL LOGNAME is the log name of the sync point participant. One is~recorded for each path involved in any potential sync point communication.

i._ FIG. 38 is a flowchart which illustrates the processing Step 300, triggered by event step 299 (corresponds to same step in FIG. 35) and executed by recovery facility 70 when a sync point communication is initiated for the first time during the activation of the recovery facility. It initiates a process (Step 359) for exchanging log names between the local recovery facility and the recovery facility associated with the target of the sync point communication.

A receive service (Step 361) provides the input data (path identifier) for the process. The log name log is used (step 362) to retrieve the log name associated with the path for use in the exchange of log names (Step 362). In the log name exchange, the expected log name for the target is sent along with the log name of the local recovery facility. The log name exchange request is sent (Steps 363-365) and the response is processed (Step 366). When the exchange is successful, the log name log is updated with the new or changed target log name. Then the recovery facility disconnects from the path (Step 367) and invokes a first communication service to record that the exchange was successful to prevent future exchange events for the processed path, or unsuccessful to insure continued suspension of communications and attempts to complete an exchange of log names (Step 368).

20~0322-.i FIG. 39 is a flowchart which illustrates in detail the Steps 302, triggered by event step 301 (corresponds to same step in FIG. 35), that take place as a result of an incoming 109 name exchange request arrival. After an initiation (Step 370), the log name and path identifier are received (Step 371) and the log name log is updated accordingly (Steps 371-373). If there are any recovery processes associated with the path that are in suspension (timer-wait) tStep 374), then the recovery facility 70 invokes the resume service for each to cause resumption of the processes. The log name exchange response (Step 374A) includes the local log name and an indication of agreement/disagreement with the exchange data received. The response is sent to the originator (Step 375) and, for successful exchange, the first communications service is invoked (Step 376) to prevent subsequent exchange of log names for the path.

FIG. 40 is a flowchart which illustrates the procedure for an explicit request event (Step 311 corresponds to same step in FIG. 35? from an active sync point to perform sync point recovery. This would occur if there were a partial failure in an application environment 52 requiring recovery from a sync point but not terminating the application or sync point. The request from the sync point manager in the application environment 52 provides the sync point I

identifier and the direction (commit or back-out) to be used to complete the failing sync point. Additionally, for each failed participant in the sync point, the recovery path identifier is supplied. The required action can complete synchronously (wait indicator supplied) or asynchronously as described in more detail below (no wait indicator supplied).

The arrival of this request is an event that initiates (Step 379) a procedure (Step 380) which requires searching the sync point log (Step 381) for an entry that has a matching SPL SYNCPOINT ID. When found, a recovery control structure is buiIt (Step 382) with an anchor to the sync point log entry and RCS_PARTICIPANTS STATE set to the direction passed in the request. Additionally, the RCS RESPOND TO INITIATOR flag setting prevents sending a response to a recovery facility representing the initiator of the sync point and, in the case where the wait indicator is passed, the RCS RETURN TO CALLER flag is set, causing the response to the request to be deferred until the recovery procedure is completed. Without the wait indicator, there is a response to the initiating request after the recovery procedure is started. Next, an agent control-structure is built (Step 383) for each participant, represented by the path identifiers provided, and PCS STATUS is initialized to ~s RESTART. The chain of agent control structures is anchored to the recovery control structure. Next, recovery initialization is invoked (Step 384), passing the recovery control structure. When returning from the initialization, there is a response to the invoker (Step 385). When the wait indicator was used, the invoker is advised of completion; otherwise, the notification is either completion or an indication that the request processing was begun (will complete later).

FIG. 41 is a flowchart illustrating the procedure that results from an event initiated (Step 312) by receiving a recovery request from a recovery facility that represents the immediate initiator in a failing sync point. This initiates (Step 388) a procedure (Step 390) which invokes the receive service (Step 391) to obtain the path ID
associated with the incoming request, the sync point identifier for the failing sync point (which also identifies the local node in that sync point), the log name associated with the originator's sync point log, the log name that the initiator's recovery facility expects to match with the name of the sync point log for the current recovery facility, and the direction (state) to be used to resolve the failure.

The path identifier is used to find an entry in the local log name log (Step 392). Then LL LOGNAME is verified with the originator's log name and the local sync point log name is verified with the expected log name passed (Step 393). Next, the sync point log is searched for an entry with the matching sync point identifier (Step 394). When found, a recovery control structure is built (Step 395) with an anchor to the sync point log entry and RCS PARTICIPANT STATE set to the direction passed in the request. Additionally, the RCS_RESPOND_TO_INITIATOR flag is set to indicate that a response to the initiator is appropriate and the RCS_PATH_ID is set to the path identifier of the initiator's incoming path. The RCS RETURN TO CALLER flag is set to prevent return to the _ calling sync point manager 60 in the application environment 52. Finally recovery initialization is invoked (Step 396), passing the recovery control structure.

FIG. 42 is a flowchart illustrating the processing (Step 400) that results when there is a failure in the path (Step 3t3) between the application environment 52 and the recovery facility 70 such that sync point logging is inoperative. After the process is initiated (Step 399), the sync point log is searched for entries that satisfy both of the following conditions (Step 401):

(1) SPL SYNCPOINT PATH ID matches the failing path.

(2) SPL SYNCPOINT STATE indicates that the immediate sync point participants can be driven to complete the sync point. This is indicated by one of the following: SPL SYNCPOINT_STATE indicates sync point phase one and there has not been a response to the initiator's "prepare", or SPL SYNCPOINT STATE
indicates sync point phase two.

Where these conditions are met, a recovery control structure is built (for each such log entry) (Step 402) with an anchor to the sync point log entry, where both RCS INITIATOR RESPONSE STATE and RCS PARTICIPANTS_STATE are derived from the SPL_SYNCPOINT STATE. In some cases, SPL PARTICIPANT STATE also affects the setting of the RCS INITIATOR P~ESPONSE_STATE setting. This occurs, for example, when a response from a participant had indicated a unilateral (heuristic) action. Additionally, the RCS RESPOND TO INITIATOR flag setting prevents sending a response to a recovery facility representing the initiator of the sync point and the RCS RETURN TO CALLER flag setting indicates that there is no calling sync point manager to which to return. The resulting recovery control structures are chained together. Finally recovery initialization is invoked (Step 4031, passing the chain of recovery control structures.

2~40322 FIG. 43 is a flowchart which illustrates processing (Step 408) that results when there is a failure of the recovery facility 72 (Step 314). When the recovery facility 70 is restarted (Step 407), the log 72 is searched (Step 411) for all entries that satisfy the following condition:

SPL SYNCPOINT STATE indicates that the immediate sync point participants can be driven to complete the sync point. This is indicated by one of the following: SPL SYNCPOINT STATE indicates sync point phase one and there has not been a response to the initiatoris "prepare", or SPL SYNCPOINT STATE
indicates sync point phase two.

Where this condition is met, a recovery control structure is built for each such log entry (Step 412) with an anchor to the sync point log entry, where both RCS INITIATOR RESPONSE STATE and RCS PARTICIPANTS STATE are _ derived from the SPL SYNCPOINT STATE. In some cases, SPL PARTICIPANT STATE also affects the setting of the RCS INITIATOR RESPONSE STATE setting. This occurs when a response from a participant had indicated, for example, a unilateral (heuristic) action. Additionally, the RCS RESPOND TO INITIATOR flag setting allows for sending a 2~4~322 notification to the recovery facility representing the initiator of the sync point and the RCS RETURN TO_CALLER
flag setting indicates that there is no calling process to which to return. The resulting recovery control structures are chained together. Finally recovery initialization is invoked (Step 413), passing the chain of recovery control structures.

FIG. 44 is a flowchart which illustrates a support (Step 409) for recovery administrative requests (Step 315) which permits manually initiated repair of stalled automatic sync point recovery due to failure to initiate a conversation with a sync point participant (participant case) for downstream resolution or a sync point initiator (initiator case) for providing the direction (state) to drive its participants to completion.

In the participant case, the request provides a substitution for the participant's response so that the recovery facility 70 that is driving the downstream participants can complete the recovery without actually communicating with the participant. In the initiator case, the request provides a substitution for the normal recovery initiated recovery request event (as described in FIG. 41) that cannot occur due to the inability of the initiator to connect to the local recovery facility 70; in the latte-r case, the response permits the local recovery facility 70 to drive its participants without the event depicted in FIG. -41.

~ , .
In the initiator case, after the support is initiated (Step 408), a recovery control structure is built (Step 414), setting the RCS_INITIATOR RESPONSE_STATE and RCS PARTICIPANTS STATE to the direction passed, providing the equivalent of a recovery initiated recovery request. In addition, RCS RESPOND TO INITIATOP~ is set off to prevent response generation and RCS_RETURN_TO_CALLER is set off to prevent return from recovery initialization when processing is complete. Recovery initialization is invoked (Step 415) to initiate the processing.

In the participant case, a recovery control structure and a suspended recovery process shouId already exist. The process is suspended while in timer-wait, retrying the initialization of a conversation to the participant at the end of each time interval. After verifying this (Step 416), the PCS for the participant associated with the passed recovery path identifier is located and the PCS_STATUS is set (Step 417) to RESPONDED, as if the participant had actually responded, and the SPL_PARTICIPANT STATE is set to the direction passed; then the sync point log entry is updated. Next, the SPL SUSPENDED_PROCESS_ID i-s used to call the resume service to restart the suspended process (Step 418). In either case, there is a response made to the originating request (Step 419), indicating that the proper substitutions ! ~ l~-have been made and the recovery process is active again. If the purge option is passed, RCS ERASE_LOG is turned on to erase the sync point log entry at the conclusion of processing.

FIG. 45 is a flowchart which illustrates the Steps required for the recovery initialization function (Step 304). After initialization (Step 303) the RCS RETURN_TO_CALLER flag determines (Step 421) whether the participation driver is invoked in the current process (ON) or in a separate, parallel process lOFF). Where RCS_RETURN_TO CALLER is set, the participation driver is invoked (Step 422), passing the recovery control structure.
Otherwise, the RCS_FUNCTION_ID is set to indicate the "participation driver" and the sub-process starter service is invoked for each recovery control structure passed (Step 423).

FIG. 46 is a flowchart which illustrates the flow for the participation driver Step 317. The primary function of the participation driver is to initiate communications with the participants of the failing sync point and obtain responses from them in order to insure that the associated sync point logs are at the same level as they were when the sync point began and provide sync point state information that will provide the basis for resolving the sync point.

After initiation of the participation driver (Step 430), the SPL_SYNCPOINT STATE is set (Step 431) according to the current RCS_PARTICIPANTS_STATE. If participation control structures have not already been built for the sync point participants, they are built at this time, chained together, and anchored to the current recovery control structure. PCS_PATH_ID comes from the SPL RECOVERY_PATH ID
of each participant and the PCS_STATUS is initialized to RESTART, unless SPL PARTICIPANT STATE indicates that sync point is resoIved for the particular participant, whereupon it is set to RESPONDED.

The flow of Steps 432-444 is controlled by the PCS STATU5 value for each participant. The possible values are:
(1) RESTART - indicat,es that a conversation with the participant is required.

~2) CONNECTED - indicates that there was success in initializing a conversation with the participant and causes the sending of the recovery request message to the participant.

--~3--(3) RESPONDED - indicates that the sending of the recovery request message to the participant completed with a response from the participant.
The response processing driver is invoked (Steps 438-439) to handle the response.

(4) RETRY - indicates failure in an attempt to connect (i.e. establish a conversation ) (Steps 436-437) or send a message (Steps 440-441), or a mismatch of log names (Steps 440-441). After all PCS_STATUS flags for participants have progressed beyond the RESTART and CONNECTED status, but there are some that have encountered communications failures (the remainder RESPONDED), the participation driver for the current sync point recovery suspends itself for a timed interval.
Y~hen the suspension is completed, all PCS STATUS
of RETRY are changed to RESTART, which causes attempts to reconnect.

The multiple event wait service (Step 433) is used to wait for completion of the first of any outstanding connect or send service requests, returning control to the participation driver with the path identifier and indication of success or failure. The recovery request sent to the participant (Steps 434-435) includes the log name of the i 2040322 sending recovery facility 70 and the expected log name associated with the participant. The RCS PARTICIPANTS_STATE
is sent to permit a comparison with the participant's actual state, defining the appropriate recovery action. The timed wait service (Steps 442-443) is used to delay processing for a system-defined time interval before re-attempting unsuccessful initiation of a conversation. This intentional delay is undertaken only after all participation paths have been driven and some failures have been encountered.
Timed-wait completion ~Step 444) serves to restart suspended processes which causes another attempt to connect with the participant. After all participants have attained a RESPONDED status and completed processing by the response processina driver, the initiator response driver is invoked (step 445~ to handle possible responses to the recovery process that represents the sync point initiator.

FIG. 47 is a flowchart which illustrates the processing required to process a response to a recovery request sent to a participant in a failed sync point. The response processing driver (Step 318) is passed the sync point identifier, path identifier, and the state received from the participant (Step 450). Then, the log name exchange response is processed (Step 451). If log names do not match, flow is returned to the participation driver (Step 317 FIG. 36) with an error that will cause a timed-wait retry to occur.

The sync point identifier is used to locate the sync point iog entry; then the path identifier is used to locate the participant in the body of that sync point log entry, matching on SPL RECOVERY PATH_ID. Then the SPL PARTICIPANT_STATE is updated with the state (Step 452).

The RCS INITIATOR RESPONSE_STATE is updated in some cases as a result of unexpected responses from participants, e.g. reflecting unilateral (heuristic) decisions (Step 453).
Finally, the disconnection service is invoked to disconnect the current path (Step 454).

FIG. 48 is a flowchart which illustrates the initiator response driver (Step 319). First, the Initiator Response Driver is initiated (Step 460). When the RCS RESPONSE TO
INITIATOR is not set (decision block 461), it is not necessary to respond; therefore, it is only necessary to erase (Step 468) the sync point log entry. Response is also bypassed when (Step 462) there is no initiator to which to respond, i.e. when the recovery facility represents the first node in the sync point tree.

When there is no suspended recovery initiated recovery request (event illustrated in FIG. 41) to handle the response to the initiator and there is no existing .

conversation to which to respond to the initiator (Decision Step 479), then it is appropriate to attempt upstream communications with the recovery facility that represents the initiator in order to notify it that the participant represented by the current recovery facility 70 is ready with a response (Step 464). This is most effective when there is a recovery facility for the initiator that is in timed suspension due to an earlier failed attempt to communicate with the local recovery facility 70, i.e., when the currently completed recovery resulted from a sync point failure that resulted in a failure of the local recovery facility 70 (event illustrated in FIG. 43). This upstream communications would have the effect of prematurely terminating the timed suspension and therefore minimizing the delay in resolving the sync point. FIG. 39, Step 374 illustrates the action by the receiving recovery facility (representing the initiator).

If the SPL SUSPENDED PROCESS ID is not defined and the RCS PATH ID is not set (decision block 479), the upstream communication is accomplished by finding the entry for the initiator in the body of the sync point log entry for the recovering sync point and using the SPL RECOVERY PATH ID that is associated with it to invoke the connection service for SPL RECOVERY PATH ID. There is no retry when this attempt _ to initialize a conversation fâ ils ("no" decision path in 20~0322 step 464A) because it is an optional optimization to complete the conversation and notify the initiator. If the conversation is initiated ("yes" decision path in Step 464A), a normal exchange of 1O9 names request is sent (Step 464B), as illustrated in FIG. 38, steps 364 through 367, then exit via decision step 477. In the case of connection not completed, invoke recovery termination (Step 479).

When the RCS PATH ID is not set (Decision Block 465), the response to the initiator Steps 466 and 467) is bypassed. Otherwise, a normal response to the initiator is made, using the RCS_INITIATOR RESPCNSE_STATE (Step 466) and the respond service (Step 467). In the case where RCS RESPCND TO INITIATOR or RCS EP~ASE_LOG is on (Decision Block 477), the recovery termination function in invoked (Step 469) before completion.

FIG. 49 is a flowchart which illustrates the recovery termination logic (Step 306) which involves, after initiation in Step 470, cleaning up storage and control structures (Step 471), and either returning to the caller (END) or invoking the sub-process termination -service to complete the current process (Step 472).

~ ~ 4 ~
. .

-` ~;A2040322 ASYI~CIIRONOUS RESYNCI-IRONIZATION OF A COMMIT PROCI~DURE

When there is a failure during synepont processing in system 50, the foUowing asynchronous resynchroniza-tion pru<,~,1u,c and faeilities are provided to optimize the use of the participating applications. This proce-dure avoids extended delays in executing the applicatio.. which issued a commit because the application need not wait idly during ~ clu~ru~dliolI~ Instead, as described in more detail below, the application can do other useful work while waiting for l~ .,luulli~dlioll. The syncpoint manager and recovery facility execute this plu~,cdul~, provided either the appl;~ liùll or a system default requested it. The recovery facility 70 sup-ports as.~ll.,luuuùus ~ ,luu~uGdlioll (lc~ 1Llulù~dl;on-in-progress) and suppolts the new f.lh~ f ~ 5 to the archilected intersystem co~ ulli~,alions flows in support of this r~7rl~ .unous l~ u~ i7ation process. By way of exaunple, the intersystem cc,l,.. ~ ions protocols are defuled by IBM's System Network Ar~ LU 6.2 Referenee: Peer rrotocols, SC31-6808, Chaptcr ~.3 rr~Pfi Scrviccs -Sync Point verbs. The d ~ it~,cl~,J intersystem co..u.lu-ucdlion rnh~ Q within systems 50 iu~clude addi-tional in~lir ~innQ on such nOws of Cornrnitted (last agent only), Forget, ~u~d Backout indicating resynchroni-7ation is in progress. In the data field deflned for e.~change log names between two different system recovery facililies during initial exehange or during l~ ,luu.u~dlion, there is an indicator that the sender of the e~ehange log names supports ~c~ ,luuri~dlion-in-progress~ ~xehange log names p.o~ u-g is desetibed above in the seetion enîitled Log NAme rY~ For Reeo-erv of Protected Resourees. 13oth recovery facil-ilies must support ~ ,hlu~lioll-in-progtess in order for the facility to be used. Finally, there is an indi-eator in the eompare states data field that tells îhe partner that resynchronizalion is in progress.

The foregoing seetion entitled Co~ e-t S~ne Point M~ g. ..t or l'roleetcll Resourees and FIG. 2, FIG
54, FIG 3, FIG 4, and FIG S(a,b) deseribe and illustrate t-vo partner applications, 56A and 56D, their appli-calion ~ h~ ll. .lts~ their proeessing and successful commit ptocessing. Thc present section will e~tend the above to include a d, ~.,.;~,lion of a failure durhlg commit processing which results in asynchronous resyn-el..~,r;~at;on. It should be understood that the asynchronous resyncllroniz;ltioll process described herein is also applieable when a proleeled eonversalion i5 made l etween applicalion partners on lhe same sys~cm and bolh are in dilrcrenl application c.~;.un~ nls, for example different ~irtual machines of lhe enhanced version of the VM operation system (~VM~ is a t~..le,~a~ h of IPtM Corporation of Armonk. N.Y.) ll should ~ 204~22 also be noted that in other embodimcnts, application 56A or application 56D could execute in a different type of e~ecution Cl~vi unll~ lt~

As described in the section entitled C~.~" e~l Syne Point M O. ..t of rroteeted Resourees, application 56A starts application 56D via a protected co..~ alion (FIG SA, Step 530). rrotected conversation adapters 64A and 64D register with their respective syncpoint managers (FIG SA., Step 532). Figure 50A
e~pands the plV~e~ g done ne~ct by application 56A (FIG 5A., Step 533). As shown in FIG. 50A, applica-tion S6A issues to syncpoint manager 60A a 'set syncpoint options wait= no' call to indicate that application 56A does not desire to wajt indef~nitely for a synchronous l.~ h~ atiOII if there is a failure during syncpoint pluce~i lg (Step 900) and syncpoint manager 60A records the option (Step 902). Similar proc-essing (Steps 904 and 906 of FIG. 50B) is done by application 56D after application 56A contacts it to do some work (FIG. SA, Step 533). It should be noted that in the illustrated embodimcnt, the architected default is WAIT = yes. However, if desired, the default condition could be WAIT = no at system 50A and system 50D. In such cases, it is not necessary for application 56A and application 56D to issue the 'set syncpoint options' caU if they desired WAIT= no. 'Set syncpoint options' is a local value. Therefore, the value of the 'syncpoint options' in effect at the syncpoint manager where the failure is detected is the one used.

Plo~ i .g continues as described in the foregoing section entitled Cov~d 'e~l Sync roint M ,,a~ "t Or rrotectcd Resourccs and illustrated in FIG 2. and FIG 5(a,b) steps 533A through step 546. Sumrnarizing the above details, applicdtio-- 56A sends a request to appucation 56D over the protected conversation causing at,~,licalion 56D to update file 78D. At~t,licdlioll 56D replies to application 56A causing application 56A to update files 78A and 78B. Application 56A issues a comrnit (Step 534 of FIG. SA), causing syncpoint managcr 60A to caU protected conv~l~alion adapter 64A to send a phase one 'prepare' call to protectcd conversation adapter 64D. This causes applicalion 56D to receive a rcquest asking it to issue a commit. Application 56D issues a comlTIit (Step 537) and syncpoint manager 60D docs its phase one proc-essing and caUs protected coll~ ation adapter 641) to reply 'request commit' to prolected conversation adapter 64A. At this thne syncpoint manager 60D's state is 'in doubt' (an(l is so nolcd on its log 72D).
rrotectcd .~v..~ alion adapter 64A replies 'request comr~iit to syncpoint manager 60A. Su~ce its other rcsollrces also replied 'request commit', syncpoint manager 60A's stale is no~v 'eommilte'd' and writes this stale to its log, 72A. Syncpoint manager 60A now contacts its registercd rcsources v~ith the phase two deci-, . . .

20~0322 _ 1 30 sion of 'CO~ lulh,.l' (FIG Sb, Step 545). Protected cu~ d~ion adapter 64A then sends the phase two deci-sion of '.-,....,.;1l~.1' to protected conversation adapter 64D (~IG Sb., Step 546) llowever, during tbis processing protected ~ull~ adtiOII adapter 64A discovers a failure such that the path between system SOA
and system SOD for the protected ,ul.~ dtion belween application 56A and application 56D is no longer available. Protected coll~,.a~tion adapter 64A replies 'resource failure' to syncpoint manager 60A. This is an interruption in syncpoint manager 60A's processing (I~IG. Sb, Step SS0), causing syncpoint manager 60A to start recovery plu~ aulg (~IG. Sb, Step 557).

Tbe recovery p.u~clu.u~ are defined by the two-phase comrnjt example being used. In t:.e iUustrated embodiment, the two-phase commit e~ample i5 tbe one used in the section entitled Coordinatell Syne roint i~Idn.~g. ' Or rrotted Resourees. Recovery processing occurs if a protected resource adapter replies abnommally to the syncpoint manager's phase one or phase two call. The abnormal reply is the result of a resource failure whieh may be eaused by a system failure, path failure, program failure or resource manager failure. Reeovery is eondueted in.~t,. .-1~ Illy for each failed protected resource for which it is required.
Recovery has the following purposes:
1. to place protected resources in a consistent state if possible; if not possi~ble, to notify the operators at the system or, in the case of a failed protected col.~ dtion, systems that detected the damage;

2. to unlock locked resources in order to free them for other uses; and 3. to update the recovery facility log, showing that no more syncpoint work is needed for aU protected resources, for that LUWID.

l he steps involved in recovery, i.e. ~ clu~lu~dtion, include the following:
1. Tl-e data struetures from tl-e recovery ~acility log records reprcscnting the status of the syncpoint opera-tion are restored if the system failed where tllis recovery facility or~erates. ~rom these data structurcs, the recovery faeility ean (in other embodiments 1he rccovery facility might be caUed the syncpoint manager hecause one facility performs both iyncpoint and recovery processing) determine the resources for whicl1 it is responsible for ulitiating reeovery. If the recovery occurs without a system failure, it is not nccessary to restore inrormation rrom tlle log beeause the data structures written during syricpoint used by the recovery facility are stiU intact.

. .
t ., 2~4Q322 -- 13 (--2. A program in the recovery facility that is ~ onsible for initiating recovery is started. For the conversa-tion e~ample used for protected conversations in this illustrated embodiment this mcans:
for protected ~ul~ aLivns, establishing a non-protected conversation of a type requiring confirma-tion with a partner recovery prograrn runnhlg in the recovery facility in the system originally involved in the syncpoint. (this may require a new path between the two systems to be activated);

`'"u ~g 1B names to verify that the partner has the appl.,p~ial~ memory of the LUWID;
c~JIllpalu~g and adjusting the state of the LUWID (i.e. commit or backout ) at both partners; and erasing recovery facility log entries and notifying the operators at both partners of the outcome whcn the recovery co...~
3. Fnr other resource managers pa~ ;paliulg in the two-phasc comrnit processing, a similar method of recovery is defined. In general, recovery plU~ ulg for protected resource managers that do not dis-tribute are defined by operating systems ;~ pl~ ~ "n"g syncpoint support. Recovery processing for pro-tccted ~o..~ alions are defined by an intersystem ~ .ulù.,ations architecture. By way of examplc, the rorrner can be of a type described by the enhanced version of the VM opcrating ~system; ("VM" is a hadc~llalh of IBM Corp. of Armonlc, NY) the latter can be of a type defined in part by System Network Al~ LU 6.2 Rcferencc: reer rrotocols, SC31-6R08 Chaptcr 5.3 r-~ t t; ~ Serviccs - Sync roint verbs.

Ne~t, syncpoint manager 60A calls recovery facility 70A with the identifier of lhe resource that failcd (in this e~amplc the resource would be protected conversation 64A) and the LU~ID bcing processed. Recovery facility 70A fulds the log entry for thc LUWID and the entry for protectcd conversation 64A (FIG. 4, Step 518). Recovery facility 70A deterrnines the rccovery decision rrom thc state inÇormation in the entry (Step 519). 13ased on the p.~ ;.UI& described above the decision is 'Commit'. Rccovery facilily 70A knows lhe resourcc to be rccovered is a protected conversation and stans a recovery proccss which is an application whose pl~ ;llg is described by the recovery methods architccted for tlle conversation and two-phase commil paradigrn being used. That recovery process starts a non-protected conversalion for a partncr rccovery process in recovery facitity 70D on system 501) (Step 520). Tlle recovery attempt rails because a conversalion cannot be staned between the two systems (decision block 521, the l~lo hranch) due to a path failure. Recovery facility 70A then cl-ecks the log entry to see whcthcr applicatiol) 56A had requcstcd 2 ~ 3~

WAIT= ~'o meaning recovery facility 70A could retum to syncpoint manager 60A before recovery was com-plete. Recovery facility 70A could then complete recovery later asynchronously from application 56A (Slep 524). This infommation was written by syncpoint manager 60A during its phase one log write. As described abo-e, appllcdliun 56A issued a 'set syncpoint options wait=no' call. Therefore recovery facility 70A
retums to syncpoint manager 60A with the intent of the recovery, i.e. commit, and an illd;~,dlion that resyn-cluv..;~dtivn (recovery) is still in progress. (Step S26). Because syncpoint manager 60A had already heard 'forget' from its other protected resources (FIG Sb, Step 545A), it updates the value of ;;.e LUWID by one and returns to all~>L~,atiOII 56A with a retum code of ~RC = OK.LUW OUTCOMF, PENDII~G~ which indicates the intended outcome, Commit, and that not all resources have been cornrnitted (~IG. Sa, Slep 558). This means that the commit p.u~ i,.g will be completed asynchronously to application 56~. Thus, a~,ul;~dlion 56A can then continue p~V~ iilg other work and not waste time waiting for l~y~,h~vni~dLion.

Reco-cry facility 70~ repeatedly tries to successfully complete recovery for protected convcrsation adapter 64A witll recovery facility 70D on system 50D (FIG. 4, Step 527). When recovery is started and finalJy com-pleted (decision block 521, YES branch) both recovery facility 70A and recovery facility 70D write operator messages stating that the recovery had started and that it had successfully completed (Step 522). Syncpoint manager 60D had also leamed of the failed co..v~.~dtion tllrough its registered resource, ~rotected conversa-tion adapter 64D. It too had contacted its recovery facility 70D, with the identifier of the failed resource, in this case protected ~ atiOIl 64D, and the LUWID. Based on the syncpoint manager state of ~m doubt~
recovery facility 70D knew it had to wait to be contacted for recovery by recovery facilily 70A. When the recovery CnaJly cv~ t~, (decision block 523, YLS branch), recovery facilily 70D rcturns to syncpoint manager 60D a decision of commit (Step 523A). Syncpoint manager 60D then performs its phase two proe-essing. Because of the protected Cvll~ dliOIl breakagc, syncpoint manager 60D subsequenlly gets a new unique LUWID. It then returns to application 56D with an ou~come of Commit. Arplicalion 56D c~n now pcrform its ,u~v~ u~g. It should be noted that in the prcvious examplc, thcre could have been a failure wilh file manager 63A in step 545A instead of with the protecled conversation, rcprcsenled to syncpoinl manger finA by proteeted ~vll~ alioll adapter (4~. In this alternate case, recovery facility 70A would initiate recovery witll file manager 63A instead of recovery facility 7()l) hased on the recovery methods for non-protected conversations defined by the operating system.

..... , . "

20~0~22 In I~IG. 5(a,b), application 56A (and thus syncpoint manager 60A) was the initiator of the comrnit request.
Ilowevcr, FIG. 51 illustrates another example in which another application 5611 at System 5011 initiated a commit (Step 700) instead of appli~tion 56A. Application 56H is running in an application CUVIIU~
that can be similar to or different than the one where application 56A is running; however, both systems and appli~,dtiOII t,.lVUU~Ull~.~lts support the aforesaid co".,.".,~ - ,a;nn~ and two-phase commit ~ Jc~dulcs. System 50A and System 50D are the same as in l;IG. 2. For purposes of thc example iUustrated .. FIG. 51, (and l;IGS. 52 and 53 which follow), ~pLcdtioll 561~ issued a commit request (SYNCI'T) to syncpoint manager 6011 within System 5011 which commit request involved resources in systcm 501-1, system 50A, and systcm 50D. In response to the commit request, syncpoint manager 6011 caUs its registeted resource protccted con-versation adapter 64H with a phase one 'prepare' call. Protected conversation adapter 64H then sends thc Ult~ Dt~ lutc~t~l 'prepare' call to protected C~ .Dation adapter 64B within System 50A (Step 701).
As noted above, the 'prepare' signal is part of the f-lrst phase of the two-phase commit plu~cdul~. Ne~t, protected ~.ull~ ation adapter 64B gives appLcdlioll 56A a notification of ~rake Syncpoint~(Slep 704), and in response, applil atiOII 56A issues a commit request (SYNCrT) to syncpoint managcr 60A (Step 706).
Ne~t, syncpoint manager 60A calls protected cOIl~.Dalioll adapter 64A with a phase one 'prepare' caU. Pro-tected . Oll~ ,Datioll adapter 64A ænds an architected UII~IDY;~ II prepare caU to protected . Oll~ .D~.tion adapter 64D in System 50D (Step 708). In response, protected conversation adapter 64D gives application 56D a notir~cation of ~rake Syncpoint~ (Step 710). In rcsponse, application 56D issues a comrnit (SYNCPT) request to syncpoint manager 60D (Step 712). Syncpoint manger 60D issues a phase one 'prepare' call to all its registered resources. When aU the resources accessed by syncpoint manager 60D are ready to commit, syncpoint manager 60D caUs protected cull~lDatioll adapter 64D with a reply of 'requcst commit'. rrotected ~ on~ liull adapter 64D sends an architected Ult~ lll 'requcst commil' caU to tlle initiator of the comrnit request, in this case protected conversation adapter 64A which replies to syncpoint managcr 60A 'request commit' (Step 714). Aftcr syncpohlt managcr 60A receives this request and notifica-tion that all of its resources are ready, syncpoint manager 60A replics to protcctcd conversation adaptcr 64B
with 'rcquest commit'. Protected coll~lD~Ilion adapter 64B sends an architccted intersystenn 'request commit' call to the initiator of the commit request, in this case the hlitiating protcctcd c(,n~ ,D~.tion adaptcr 6411 and syncpoint manager 6011 (Step 716). l~fter receivh~g this reply from protcctcd convcrsation adaptcr 6411 on hehaU of syncpoint manager 60A auld notirlcation that aU of syncpoint managcr 6011's resourccs are rcady, syncpoint manager 6011's phasc two decision is commit. Syncpoint manager 601-1 calls all rcsourccs with a phasc two decision of 'commit'. Wllen protected conversation adapter 6411 is callcd it scnds an archi-.
`'' . .

2 ~ ~ 0 3 2 2 --13~-tected intersystem 'commit' call to protected ~v~ ation adaptcr 64B which in turn replies 'comrnittcd' to syncpoint manager 60A which becomes its phase two decision (Step 71~).

So far, there have been no problems in ~..lpl~l.l.,nti,~& the two-phase commit procedure. Also, it should be noted that after each a~ n issues the commit request to the respective syncpoint manager in Steps 700, 706 and 712, the respeetive syncpoint managers logs the phase one information and state into the l~ ti~
recovery facility logs. Similarly, when each of the syncpoint managers 60A and 60D receives the notifica-tions from ils acsoc;~fed resources that all resources are ready, they log 'in doubt' in their respective recovery facility log entries. If one or more resources carmot commit, no log entry is made, but backout processing is ccvlll~Jl~led before replying ~aekout' to its upstream initiator. Similarly, when synepoint manager 6011 reeeives 'request eommit' from all its registered resourees, it writes the decision of 'commit' in its recovery faeility log. When synepoint managers 60A and 60D, respeetively, receive the commit decicion, they too will write the cornmit decision in their respective recovery facility logs before contacting their registered resources.

Ne~ct, syncpoint manager 60A c~lls aU its registered resources with the phase two 'commit' decision. When syncpoint manager 60A calls protected cv~ ation adapter 64A with the 'commit' call, protected conversa-tion adapter 64A attempts to send an ~hil~.,ted intersystem 'cornmitted' call to protected conversation adapter 64D which in turn should reply committed to syncpoint manager 60D. In the illustrated example, however, this t.~ is IJ~UC~ rul (Step 720) due to a failure in the conversation path. In response to this failure, syncpoint manager 60A contacts recovery facility 70A for recovery l~lv~ h-& for this LUWID
and proteeted l,ull~ atiOII. As deseribed above, reeovery facility 70A tries once to perforrn recovery with recovery faeility 70D (Step 722). This attempt is also ~ rul in this example due to the persistence of the cv.~ R~ ~;vn path failure. Ne~ct, recovery facility 70A reads the log entry and leams that asynchro-nous ll,~ LIull;~ation is required. Recovery facility 70A then notifies syncpoint manager 60A of the failed attempt to recover and that recovery will contiuIue asynchronously. Syncpoint manager 60A then calls pro-tected resouree adapter 64r~ with 'forget"~ I,.vni~..tion-in-progress (Rll')'. rrotccted conversation adapter 64n sends an architected intersystem 'forget, Rll" call to protected conversation adapter 6411 which rcplics 'forget, Rlr' to syncpoint manager 60H (Step 726). Syncpoint manaBer 60~ then gives application S6A a retum code, ~RC = OK. LlJ\~r_OUTCOME liENDlNG~, to advise application 56A the intent of Commit and that the commit p-ucc~hl6 will be complcted asynchrollously (Stcp 724). The ~Forget Rll'~
notification of Step 726 serves as an acknv~ "~ "t to Stcp 71R ;:md causcs syncpoint managcr 6011 to 20~0322 -- 13~ -write a s~ate of 'forget' in its recovery facility log for the syncpoint information relating to the syncpoint of Step 700 because two-phase commit l~lvc~ g is now complete for the comrnit requcsted by application 561-1. Syncpoint manaBer 6011, upon receiving the "Forget, RIP" indication from its protected conversation adapter 64H (and assurning it had hcard from all other resources involved in the commit) can return to application 56H with a retum code, ~RC = OK.LUW OUTCOME PENDING~, advising application 5611 of the intent of Cornmit and that the cornmit p~u~,c~u~g will be completed d;~yll~hlullously (Step 728).

Recovery facility 70A p~ in~;r~lly attempts to e~ecute rccovery processing with recovery facility 70D on system 50D and to ~ nlllt~nrouSly order the commit (Step 730). As discussed above, when recovery is com-plete, recovery facility 70D replies to syncpoint manager 60D with a phase two decision of 'comrnit'.
Syncpoint manager 60D will complete its phase two plUC~ulg and return to a~,t,l;cation 56D with a return code, ~RC = OK. ALL AGREED', meaning the commit request completed successfully (Step 732).
Ap~ ' t ~ 5611, 56A, and 56D can all continue with other processing. It should be noted that whcn recovery p~U~iS~ulg takes place between recovery facility 70A and rccovery facility 70D, mcssages are sent to the operator consoles in~ tine recovery is starting and the outcome of the plU~ Ulg.

It should be noted also that when syncpoint manager 60A received the "FAILED ATTE~MrT TO
RESYNC~ nutirl~dtion from recovery &cility 70A, syncpoint manager 60A updates the state for the LUWID
to 'Forget,RlP' in the log entry in log 72A. System 50~ will later write a state of 'forget' ror this LUWID
when the ne~t normal flow arrives over the ,ull~ dl;on path between System 50A and System 5011 which has or had carried the protected col~ dtion involved in this LUWID. This is an ~implied rorget~ opera-tion. I~there is a failure such that the ~,UII~ dtiOII path fails between System 50A and System 501-1 (over which the protected .,~ atiOIl flowed that was involved in the commit pl~ Jul~ which received the ,.,~.l~hruni~.dtion-in-progress notification) after syncpoint manager 60A writcs the state of 'Forget, Rlr' and bcfore the Implied rorget~ is received, the log entry for thc LUWID at Systcm 50A will be crascd by normal recovery p~u~Jul~,~ as defrned by the two-phase commit paradigm being used. This would involve, howevcr, that new ~ L~ul~dtion-in-progress indicators bc scnt in tlle comparc states data nOw as derlned earlicr. It should also be noted that if the "implied forgct" is rcceivcd causing Systcm SnA to write a state of 'forgct' on recovery facility log 72A, recovery facility 70A will not allow thc rccovery record to really bc forgottcll until rccovery is complete with rccovcry facility 70D.

2~40322 lt should also be noted that thçre is a migration path between syncpoint managers such that syncpoint man-agers which support the foregoing ~ay~ luunous resynchronization (resynchronization-in-progress) function can o~ r-~ with other syncpoint managers that do not. ~'hen the systems that support syncpoint plU~,~,aa;llg originaUy u.. ~ ~:- ule with each other, it is de~P~nin~d in the initial capabilities exchange as defiuned by the, o ~ architecture and the two-phase commit procedures used hy both systems if they support the foregoing l~,a.~ ,hlu~ dlioll-in-progress function. If the initiator of the commit request, in the above example from FIG. 51, syncpoint manager 6011, does not support res~,lcllJuni/~dtion-in-progress, the cascaded initiator (the syncpoint manager that receives the comrrlit request, in the above example, syncpoint manager 60A) will send back to the syncpoint manager ~ ho initiated the commit request (in the above e~ample syncpoint manager 60H) the intent of a syncpoint request (either commit or backout) and not an indication that It,a~.lt~l..r.,lu~tion will take place later ~;tyll~,luunously. The local at)l)L."ltion, where the outage took place (in the above exatnple, application 56A) and where the syncpoint manager supports yll~luulli~tioll-in-progress (in the above e~cample, syncpoint manager 60A), will receive this ";,y.. cluulu,dtion-in-progrcss notification. -''' I:;IG. 52 illustrates the l~,a~ll,.,luul~dtio.l-in-progress function in the event that syncpoint manager 601-1 issues a backout as described in more detail below. Steps 700-716 are the same as in FIG. 51. I~owever, after receipt of the 'request comrnit' reply from syncpoint manager 60A ia protected conversation adapter 6411 in Step 716, syncpoint manager 60H decides to back out because one or more of its protected resources are not ready. Then, syncpoint manager 60H calls its registered resources with a phase two decision of 'backout'.
The 'backout' decision is given to syncpoint martager 60A (protected conversation adapter 6411 sends an ...~lJte~led intersystem backout call to protected CullVt~l ,dtion adapter 64B who replies 'backout' to syncpoint manager 60A) (Step 740). Syncpoint mau-ager 60A calls its registered resources with a phase two decision of 'backout'. rrotected conversation adapter 64A attempts to send an intersystem backout call to syncpoint manager 60D via protected ~-,r.~,a.ltion adaptcr 641) in S~ep 742. Ilowever in the example, ~Step 742 rails clue to a co..ullu~ dtion path failure or other type of failure. In response, svncpoillt managcr 60A
calls recovery facility 70A with the LUWID and failed resource identificr to perform recovery processh-g with recovery facility 70D on System 50D in Step 744. Ilowcver, in the illustrated example, this recovery attemrt also fails. Recovery facility 70A replies to syncpoint manager 60A tllat ~he recovery attempt failed, but that it will complete recovery plUC~aa'Ulg .lay~ hlunou Jy. Ilaving heard from its other prolected resources, syncpoint manager 601\ writes a state of 'baekout, rip' on its reco-erv facility log 72A. Syncpoint managcr -- 13~

60A then calls protected conversation adapter 64B with a reply of 'backout,rip'. Based on the architected intersystem backout call, protectcd conversalion adapter 64B sends an error reply to the original phase two 'backout' call from protected co~ iolI adapter 64~1 (Step 748). It then sends an architected intersystem 'backout,rip' caU to protected conversation adapter 641-1 (Stcp 750). I~aving received the 'backout, rip' indi-cation, protected co.,~ .lion adapter 64H sends an architected intersystem a~;k.lvwl~ dge~ (Step 752) and replies 'backout,rip' to syncpoint manager 6011 (Step 752). Having heard from its other resources, syncpoint manager 60H retums to ay~licdtion 56H with a retum code, ~RC = Backout, LUW OUTCOME PENDI~G~, which notifies it that backout is pending and to advise application 5611 that it is free to perfomn other useful work (Step 754). When protected conversation adapter 64B gets an ack,.c,~ to the 'backout,rip' call from protected conversation adapter 641-1 (response to steps 748 and 750) it replies 'ok' to syncpoint manager 60A. Syncpoint manager 60A then writcs a state of 'forgct' hI
the log entry for this LUWID in recovery facility log 72A and retums to application 56A with a return code, ~RC = Backout, LUW OUTCOME PENDING" (Step 746), which means that the intended result of the cornmit request is backout, but all resources have not backed out. Application 56A can then contilluc with its p~U~ Ulg. The LUWID entry in recovery facility log 72A will be forgotten by System 50A as an ~implied forget~ which was described above. When 'forget' is written, if the failed resource in the LUWID
has not been recovered yet, the LUWID entry will not be really forgotten until recovery takes place. Mean-wh~le, recovery facility 70A continues to attempt to recover with recovery facility 70D in system 50D asyn-chronously (Step 756). When recovery colllyl~ t~, syncpoint manager 60D is notified of the backout and ~,v~yl~ lcs its phase two ylo~ u~,. Syncpoint manager 60D thcn returns to application 56D with a retum code o~ ~RC= BACK OUT.ALL AGRE,ED~ which means all resources have backed out. (Step 758).
/~pplications 56}1, 56A, and 56D can all continue with othcr proccssing. It should be noted that whcn recovcry processing takes place betwecn recovery facility 70A and rccovery facility 7()D, messages are sent to thc opcrator consoles indicating rccovery is starting and the outcome of the proccssing.

FIG. 53 illustratcs tlle ~ h~u~ ~tion-in-progress function in the event that syncroint manager 60~ issues a backout as describcd in more detail bclow. Steps 700-714 are thc samc as in I IG 52. I-lowever, after reccil~t of the ~request commit' rcply in ~tep 714, syncroint managcr 60A calls its registercd rcsourccs with a phase lwo call of 'backout' because one or more of the resources associatcd Wit]l syncroint managcr 60A
cannot commit(Step 759). rrotected conversation adapter 64A attemrts to send an arclu'tected intcrsystem 'backout' call to protected co"~ dtion adapter 64D (Step 760). I lowevcr, as illustrated in Step 7fi0, thc , . . .

- ~

'backout' eaU is not received by protected ~u~ dtion adapter 64D due to a communication path failure or other failure. Syncpoint manager 60A calls recovery facility 70A with the LUWID and failed resource identi-fier asking it to perform recovery p-u.,c,;,,u~g. Reeovery faeility 70A tries to perform recovery processing with recovery facility 70D in system SOD (Step 744). Step 744 also fails because the communication path failure persists, and ~u~ ly, syncpoint manager 60A transmits the signal of step 746 described above in refer-ence to FIG. 52. Step 750-758 are also the same as in FIG. 52.

FIG. 53A iUustrates the ~ ,luu~ dtiull-in-progress function in the event that synepoint manager 60A
issues a baekout beeause of a different failure as deseribed in more detail below. Steps 700-706 are the same as in FIG. 52. However, after reeeipt of the eommit request in Step 706, syncpoint manager 60A ealls its registered resourees with a phase one eall of 'prepare'. Brotected cu.~v~dtion adapter 64A attempts to send an ~ lut~,~,ted i~ st~ l 'prepare' eall to proteeted eonversation adapter 64D (Step 708a). However, as illustrated in Step 708a, the 'prepare' eaU is not received by protected conversation adapter 64D due to a cul~....u~ ,dtion path failure or other failure. Synepoint manager 60A calls its local registered resource with a phase two eaU of baekout (Step 763). Synepoint manager 60A then ealls recovery facility 70~ with the LUWID and failed resouree identi~ler asking it to perform reeovery proeessing. Recovery facility 70A tries to perform recovery ~,u.,c~,.ng with recovery faeility 70D in system SOD (Step 744). Step 744 also fails because the ~ullullulu~dtion path failure persists, and cull~ u~ ly~ syncpoint manager 60A transmits the signal of step 746 deseribed above in reference to FIG. 52. Step 750-756 are also the same as in FIG. 52. Asynchro-nously to the processing being done by syncpoint manager 60A, appucation 56D receives a path failure indi-eation on its l"~,v;o~ est~b!i~l~Pd (when applieation 56A initiated applil,dlioll 56D) proteeted co..v~ dlion with d~Julic~llivll 56A (Step 761). This path failure prevented proteeted ~Ull~ladtiOn adapter 64D from reeeiving the prepare eall from proteeted eonversation adapter 64A. Beeause the path failure was on a pro-teeted Coll~ dtiOII, dlJ~Jli-,dlion 56D must issue a baekout rcqucst. Application 56D issues a backout request (Step 762) and eventually receives a retum codc that indicates all regislered resourccs are backed out (Step 764). At this point, applications 5611, 56A, and 56D can all continuc with otller plUcC~illg. MCall-while, reeovery &eitity 70A eontinues to attempt to recover with recovery facility 7vl~ in system 501) asyn-ehronously (Step 756). It should be noted that when recovery processing takcs placc hetween recovery facility 70A and reeovery facility 70D, messages are selit to the operalor consoles indicating recovery is staning and the outcome of the processing.

`~ 2040322 -- 13q-Based on the foregoing, plUC~ G;~ and systems embodying the present invention have been disclosed.
However, numerous modifications and substitutions may be made without deviating from the scope of the invention. Therefore, the invention has been disclosed by way of illustration and not limitation, and refcr-ence should be made to the following claims to d~t~ ine the scope of the invention.-204~22 --I'/D-The following is a partial glossary of terms, Application User or service program(s) or a work distribution function integrated with a resource manager, that execute in an execution environment and can issue one or more of the following: commit, back out or work request.
Execution Environment Any computing means for executing applications, system facilities (recovery facility, communication facility, etc), resource managers, and/or other programs in virtual machine, personal computer, work station, mini computer, mainframe computer, and/or other type of computers.
Protected Conversation A conversation that is subject to any form of synchronization point processing or protective commit or back out procedure.
Protected Resource A resource that is subject to any form of synchronization point processing or other protective commit or back out procedure.
Recovery Facility A facility that has a responsibility for recovery of a failed synchronization point or other commit or back out procedure.
Two-Phase Commit Procedure A procedure for coordinating and/or synchronizing a commit or back out of updates and/or a protected conversation. Usually, the two phase commit procedure is used to atomically commit or back out a plurality of resources or a single resource via a protected conversation.
By way of example, the two phase commit procedure can include a polling or prepare phase and a back out or commit phase.

Claims

1. A method for attempting to implement a commit procedure and recovering from a failure to complete said commit procedure, said method comprising the steps of:
requesting a work operation, said work operation requiring use of a plurality of resources, said request being made by an application;
attempting to implement a commit procedure for said work operation, said commit procedure not being completed due to a failure;
after said commit procedure fails, notifying said application that said application can continue with other operations, whereby said application need not wait for recovery from said failure; and while said application continues with said other operations, recovering from said failure by resynchronizing said commit procedure for said resources asynchronously relative to said application.

2. A method as set forth in claim 1 wherein said resynchronizing step includes the step of completing said commit procedure.

3. A method as set forth in claim 1 wherein said work operation is a protected conversation with another application.

4. A method as set forth in claim 3 wherein said commit procedure is two phase.

5. A method as set forth in claim 3 wherein said applications execute on different processors and said failure is a failure to communicate between said processors.

6. A method as set forth in claim 5 wherein said resynchronizing step includes the step of regaining communication between said processors.

7. A method as set forth in claim 6 wherein said resynchronizing step includes the step of making a communication between a recovery facility serving one of said processors and another recovery facility serving the other processor.

8. A method as set forth in claim 3 wherein one of said application runs in a first virtual machine in a first real machine and the other application runs in a second virtual machine in a second real machine.

9. A method as set forth in claim 1 wherein said resources are repositories for data, and said resynchronizing step includes the step of making the data within both of said repositories consistent with each other.

10. A method as set forth in claim 9 wherein said resynchronizing step includes the step of regaining communication with either or both of said resources.

11. A method as set forth in claim 1 further comprising the step of attempting synchronous resynchronization a predetermined number of times while said application waits for resynchronization before the asynchronous resynchronization step.

12. A method as set forth in claim 1 wherein said application requests said asynchronous resynchronization before the asynchronous resynchronizing step is performed.

13. A method as set forth in claim 12 wherein another application participates in said work operation, and further comprising the step of recording a selection by said other application for synchronous resynchronization only, whereby when said commit procedure fails to complete, said other application is made to wait while said resynchronization takes place.

14. A computer system which attempts to implement a commit procedure and recovers from a failure to complete said commit procedure, said system comprising:
a processor for executing an application which requests a work operation;
means for implementing a commit procedure for said work operation;
means for notifying said application to continue with other operations if said commit procedure fails before completion, whereby said application need not wait for said commit procedure to be resynchronized; and means for resynchronizing the failed commit procedure asynchronously relative to said application.

15. A computer system as set forth in claim 14 wherein the resynchronizing means completes said commit procedure.

16. A computer system as set forth in claim 14 further comprising;
another processor for executing another application which participates in said work operation; and means for making a protected conversation between said applications; and wherein said commit procedure is for said protected conversation.

17. A computer system as set forth in claim 16 wherein said commit procedure is two-phase.

18. A computer system as set forth in claim 17 further comprising:
a first recovery facility serving one of said processors; and a second recovery facility serving the other of said processors; and wherein said failure is a failure to communicate between said processors; and the resynchronizing means includes means, responsive to said failure, for re-establishing a communication between said recovery facilities.

19. A computer system as set forth in claim 18 wherein said resynchronizing means is contained at least in part in said recovery facilities.

20. A computer system as set forth in claim 14 further comprising a recovery facility including said processor, said recovery facility including the resynchronizing means.