US20070244937A1 - System and method for application fault tolerance and recovery using topologically remotely located computing devices - Google Patents
System and method for application fault tolerance and recovery using topologically remotely located computing devices Download PDFInfo
- Publication number
- US20070244937A1 US20070244937A1 US11/403,050 US40305006A US2007244937A1 US 20070244937 A1 US20070244937 A1 US 20070244937A1 US 40305006 A US40305006 A US 40305006A US 2007244937 A1 US2007244937 A1 US 2007244937A1
- Authority
- US
- United States
- Prior art keywords
- application instance
- primary
- computing device
- shadow
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000011084 recovery Methods 0.000 title abstract description 55
- 238000003860 storage Methods 0.000 claims description 94
- 230000004044 response Effects 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 16
- 238000004519 manufacturing process Methods 0.000 abstract description 40
- 238000013500 data storage Methods 0.000 description 43
- 238000012545 processing Methods 0.000 description 26
- 230000007246 mechanism Effects 0.000 description 21
- 238000005516 engineering process Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 230000001360 synchronised effect Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000012546 transfer Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 229920000638 styrene acrylonitrile Polymers 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2048—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share neither address space nor persistent storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1658—Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
- G06F11/1662—Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit the resynchronized component or unit being a persistent storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
Definitions
- the present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for relocating running applications to topologically remotely located computing systems.
- High availability of applications may be achieved by providing clustered server/storage environments that protect against server and storage failures.
- the application when a failure occurs, the application is restarted on a redundant server and/or storage system in the clustered server/storage environment with only a small time in which the application is made unavailable.
- a hot standby server/storage system is provided and a log shipping technique is used.
- a log of the application state is maintained by a production server and is shipped to the hot standby server in order to keep the application state of the standby server close to the current state of the application on the production server. If a failover to the standby server is required, only updates since the last log update was shipped to the standby server will be lost.
- server clusters or storage system clusters are topologically and geographically limited such that the devices that make up the cluster must be in relative close proximity to one another.
- the clustered server/storage environments do not provide any application independent mechanism for providing availability and disaster recovery at remote network topological and/or geographic distances.
- the clustered server/storage environments do not provide any such availability and recovery mechanism that has zero data loss including no loss of in-flight transactions.
- VMotionTM software available from VMWare (an evaluation copy of VMotionTM is available from www.vmware.com/products/vc/vmotion.html).
- VMWare an evaluation copy of VMotionTM is available from www.vmware.com/products/vc/vmotion.html.
- the VMotionTM software allows users to move live, running virtual machines from one physical server computing system to another physical server computing system connected to the same SAN while maintaining continuous service availability.
- the VMotionTM software is able to perform such relocation because of the virtualization of the disks in the SAN.
- VMotionTM is limited in that it requires that the entire virtual machine, which may comprise the operating system and a plurality of running applications, be moved to the new physical server computing device. There is no ability in the VMotionTM software to be able to move individual applications from one physical server computing device to another.
- VMotionTM is limited in that the movement of virtual machines can only be performed from one server computing device to another in the same SAN. Thus, VMotionTM cannot be used to move virtual machines to other server computing devices that are outside the SAN. This, in essence, places a network topology and geographical limitation on the server computing devices to which virtual machines may be moved using the VMotionTM software product.
- MetaClusterTM UC 3.0 Another known solution for providing high availability and disaster recovery of running applications is the MetaClusterTM UC 3.0 software product available from Meiosys, Inc., which has been recently acquired by International Business Machines, Inc. As described in the article “Meiosys Releases MetaCluster UC Version 3.0,” available from PR Newswire at www.prnewswire.com, the MetaClusterTM software product is built upon a Service Oriented Architecture and embodies the latest generation of fine-grained virtualization technologies to enable dynamic data centers to provide preservation of service levels and infrastructure optimization on an application-agnostic basis under all load conditions.
- MetaClusterTM Unlike coarse-grained virtual machine technologies and virtual machine mobility technologies, such as VMotionTM described above, which run at the operating system level and can only move an entire virtual machine at one time, the MetaClusterTM software product runs in a middleware layer between the operating system and the applications. MetaClusterTM provides a container technology which surrounds each application, delivering both resource isolation and machine-to-machine mobility for applications and application processes.
- MetaClusterTM software product's application virtualization and container technology enables relocation of applications both across physical and virtual machines.
- MetaClusterTM also provides substantial business intelligence which enables enterprises to set thresholds and define rules for managing the relocation of applications and application processes from machine to machine, both to address high availability and utilization business cases.
- MetaClusterTM UC 3.0 for business critical applications allows applications to be virtualized very efficiently so that the performance impact is unnoticeable (typically under 1%). Virtualized applications may then be moved to the infrastructure best suited from a resource optimization and quality of service standpoint. Server capacity can be reassigned dynamically to achieve high levels of utilization without compromising performance. Since MetaClusterTM UC 3.0 enables the state and context of the application to be preserved during relocation, the relocation is both fast and transparent to the users of the applications.
- MetaClusterTM UC 3.0 uses a transparent “checkpoint and restart” functionality for performing such relocation of applications within server clusters.
- Checkpoint and restart functionality for performing such relocation of applications within server clusters.
- This checkpoint may then be provided to another server computing device in the same cluster as the original server computing device.
- the server computing device to which the checkpoint is provided may then use the checkpoint information to restart the application, using application data available from a shared storage system of the cluster, and recreate the state, connections, and context of the application on the new server computing device.
- MetaClusterTM FT a “Record and Replay” technology is provided in which events that continuously impact the behavior of an application at runtime are recorded to disk in the form of log files and then those events may be replayed in the event of a failure.
- the “Record and Replay” technology of MetaClusterTM FT allows recorded events to be replayed on a redundant application instance in the same server cluster in the event of a failure in order to provide failover fault tolerance.
- MetaClusterTM FT (formerly referred to as “Meiosys FT”) software product may be found, for example, in “Meiosys breaks technology barrier for cost-effective fault tolerance technology designed to be embedded in OEM platform solutions” available from Primeur Monthly at www.hoise.com/primeur/2017articles /monthly/AE-PR-05-05-46.htmlat and the presentation materials for the 47 th Meeting of IFIP WG10.4 in Puerto Rico available at www2.laas.fr/IFIPWG/Workshops&Meetings/47/WS/04-Rougier.pdf.
- MetaClusterTM UC 3.0 and MetaClusterTM FT allow relocation and failover of individual applications within the same cluster, as opposed to requiring entire virtual machines to be relocated
- MetaClusterTM UC and FT are still limited to a localized cluster of server computing devices. That is, MetaClusterTM relies on the ability of all of the server computing devices having access to a shared storage system for accessing application data.
- MetaClusterTM UC and FT do not allow movement or relocation of running applications outside of the server cluster or failover of application instances outside of the server cluster. Again this limits the network topology and geographical locations of computing devices to which running applications may be relocated and to which failover may be performed.
- topologically remotely located refers to the computing system being outside the cluster or storage area network of the computing device from which the running application is being relocated.
- a topologically remotely located computing system may be geographically remotely located as well, but this is not required for the computing system to be topologically remotely located. Rather, the topologically remotely located computing system need only be remotely located in terms of the network topology connecting the various computing devices.
- a primary computing device runs one instance of an application (i.e. the primary application instance) at a production site and an active standby computing device runs a second instance of the application (i.e. the shadow application instance) at a recovery site which may be topologically remotely located from the production site.
- the primary computing device and active standby computing device have the same virtual network address such that both computing devices may receive the same inputs from a network via network address virtualization, as is generally known in the art.
- the two instances of the application are brought into a consistent state by running an initial application “checkpoint” on the primary computing device followed by an application “restart” on the active standby computing device.
- the generation of the “checkpoint” of the primary application instance on the primary computing device and the “restarting” of the shadow application instance on the active standby computing device may make use of the mechanism described in commonly assigned and co-pending U.S. patent application Ser. No. 11/340,813 (Attorney Docket No. SJO920050108US1), filed on Jan. 25, 2006, which is hereby incorporated by reference.
- the generation of a checkpoint may involve copying application data for the primary application instance on the primary computing device to a storage system of the topologically remotely located computing device.
- the copying of application data may be performed using mirroring technology, such as a peer-to-peer remote copy operation, for example.
- a stateful checkpoint of the primary application instance may be generated and stored to a storage medium.
- the stateful checkpoint comprises a set of metadata describing the current state of the primary application instance at the particular point in time when the stateful checkpoint is generated.
- the stateful checkpoint is generated at substantially the same time as the copying of the application data so as to ensure that the state of the application as represented by the stateful checkpoint metadata matches the application data.
- the stateful checkpoint metadata may be copied to the same or different storage system associated with the topologically remotely located computing device in a similar manner as the application data. For example, a peer-to-peer remote copy operation may be performed on the checkpoint metadata to copy the checkpoint metadata to the topologically remotely located storage system.
- the MetaClusterTM product may be used to generate stateful checkpoint metadata for the primary application instance as if the application were being relocated within a local cluster of server computing devices.
- the stateful checkpoint metadata and application data may be relocated to a topologically remotely located computing device using the Peer-to-Peer Remote Copy (PPRC) product available from International Business Machines, Inc. of Armonk, N.Y., also referred to by the name Metro MirrorTM.
- PPRC Peer-to-Peer Remote Copy
- the relocation may be performed using the Global MirrorTM product, also available from International Business Machines, Inc., which implements an asynchronous replication method that can be guaranteed to be consistent.
- Global MirrorTM is actually a combination of the Metro MirrorTM product, Global CopyTM product, and FlashCopyTM product available from International Business Machines, Inc.
- Global MirrorTM uses Global Copy to maximize data replication throughput and dynamically switches to synchronous mode (Metro Mirror) to get to a consistent state, preserves that consistent state using FlashCopyTM, and switches back to Global CopyTM (also referred to as PPRC-XD).
- Metro MirrorTM has not data loss but is limited to approximately a 300 kilometer range while Global MirrorTM has minimal data loss but has no distance limitation.
- the shadow application instance After generating an initial checkpoint of the primary application instance running on the primary computing device, and copying this checkpoint's stateful checkpoint metadata and application data to the topologically remotely located computing device, the shadow application instance is “restarted” using the application data and stateful checkpoint metadata. This in effect causes the shadow application instance to have the same state and application data as the primary application instance and thereby synchronize the shadow application instance with the primary application instance.
- events occurring in the primary application instance may be continuously recorded in a log and then replayed at the recovery site so as to maintain the states of the application instances consistent.
- the log may be a high speed pipeline to the shadow application instance, a log file on a shared file system, a log file that is automatically replicated and applied at the recovery site, or the like.
- continuous consistency between the application instances may be provided by automatically performing a peer-to-peer remote copy operation on the log event data as the log event data is written to the log storage in the production site or upon the closing of a log file. That is, log event data may be written to the log file in the log storage of the production site and then automatically copied to a storage system associated with the recovery site using a Metro MirrorTM or Global MirrorTM operation, for example. Alternatively, the log event data may be written to the log file in the log storage of the production site and then copied to the log storage at the recovery site when the log file is closed.
- the log will eventually be closed, such as when the log becomes full or at some predetermined time interval, at which time the logging of events is switched from a first log file to a secondary log file.
- the closing of the log may be communicated to the active standby computing device in the recovery site at which time events logged in the first log file may be replayed by the shadow application instance while event logging is continued using the secondary log file.
- the event logging using the secondary log file is likewise automatically replicated in a storage system associated with the recovery site.
- the shadow application instance preferably has all of its network outputs, i.e. shadow sockets, disabled in order to avoid conflicts on the network with the primary application instance, such as message duplication.
- the shadow application instance's network outputs may be activated in the event that the primary application instance at the production site fails and the shadow application instance must take over for the failed primary application instance.
- the active standby computing device detects the loss of the primary application instance.
- the shadow application instance is at a state corresponding to the last replay of logged events, which will typically occur when the logs are switched, e.g., from a first log file to a second log file.
- the second log file which is continuously and automatically replicated by use of the peer-to-peer copy operation, contains all of the events that have occurred in the primary application instance since the log switch.
- the shadow application instance need only replay the events in the second log in order to bring the shadow application instance to a state corresponding to the state of the primary application instance just prior to the failure of the primary application instance.
- the network outputs i.e. shadow sockets, may then be enabled such that the shadow application instance may now generate outputs that are sent across the network to client devices. In this way, the shadow application instance may then take over the functions of the primary application instance without data loss.
- a computer program product comprising a computer usable medium having a computer readable program.
- the computer readable program when executed on a standby computing device, causes the standby computing device to automatically receive, from a primary computing device topologically remotely located from the standby computing device, event data for a primary application instance written to a first log data structure associated with the primary computing device.
- the computer readable program may further cause the computing device to store the event data in a second log data structure associated with the standby computing device and a shadow application instance.
- the primary application instance and shadow application instance may be instances of a same application.
- the computer readable program may also cause the computing device to update a state of the shadow application instance by replaying events in the second log associated with the shadow application instance to thereby bring a state of the shadow application instance to a consistent state with the primary application instance. Moreover, the computer readable program may further cause the computing device to relocate the primary application instance to the standby computing device using the shadow application instance in response to a failure of the primary application instance.
- the computer readable program may cause the standby computing device to update a state of the shadow application instance in response to receiving a log switch event message from the primary computing device indicating a log switch event.
- the computer readable program may cause the standby computing device to store event data received subsequent to receiving the log switch event message to a third log data structure associated with the shadow application instance.
- the computer readable program may further causes the standby computing device to relocate the primary application instance to the standby computing device by detecting a failure of the primary application instance on the primary computing device, and replaying events in the third log data structure in response to detecting the failure of the primary application instance.
- the primary computing device and standby computing device may have a same virtual network address. Input and output sockets of the primary application instance may be enabled and only input sockets of the shadow application instance may be enabled prior to relocating the primary application instance.
- the computer readable program may further cause the standby computing device to detect a failure of the primary application instance on the primary computing device and replay events in the third log data structure in response to detecting the failure of the primary application instance to thereby bring the shadow application instance to a consistent state with the primary application instance just prior to the detected failure of the primary application instance.
- the computer readable program may also enable output sockets of the shadow application instance in response to detecting the failure of the primary application instance.
- the computer readable program may further cause the standby computing device to synchronize a state of the shadow application instance with a state of the primary application instance prior to receiving event data from the primary computing device.
- the computer readable program may cause the standby computing device to synchronize a state of the shadow application instance with a state of the primary application instance by receiving application data for the primary application instance from the topologically remotely located primary computing device and receiving an application checkpoint comprising checkpoint metadata that represents a same point in time as the copy of the application data.
- the computer readable program may initialize the shadow application instance on the standby computing device using the application data and checkpoint metadata.
- the primary computing device may automatically write the event data for events occurring in the primary application instance in the first log data structure and automatically copy the event data to the second log data structure as the event data is written to the first log data structure.
- the event data may be automatically received via a peer-to-peer remote copy operation executed by the primary computing device.
- the standby computing device may be geographically remotely located from the primary computing device.
- the standby computing device may be outside a cluster or storage area network of the primary computing device.
- a standby computing system is provided that comprises a processor and a memory coupled to the processor.
- the memory may contain instructions which, when executed by the processor, cause the processor to perform various ones and combinations of the operations outlined above with regard to the computer program product illustrative embodiment.
- a method, in a standby computing device, for providing a shadow application instance for relocation of a primary application instance is provided.
- the method may comprise various ones and combinations of the operations outlined above with regard to the computer program product illustrative embodiment.
- FIG. 1 is an exemplary block diagram of a distributed data processing system in which exemplary aspects of the illustrative embodiments may be implemented;
- FIG. 2 is an exemplary block diagram of a server computing device in which exemplary aspects of the illustrative embodiments may be implemented;
- FIG. 3 is an exemplary block diagram illustrating the peer-to-peer remote copy operation in accordance with one illustrative embodiment
- FIG. 4 is an exemplary block diagram illustrating an operation for maintaining continuous consistent application states between two related application instances that are topologically remotely located in accordance with one illustrative embodiment
- FIG. 5 is an exemplary block diagram of the primary operational components of a primary fault tolerance engine in accordance with one illustrative embodiment
- FIG. 6 is an exemplary block diagram of the primary operation components of a remote fault tolerance engine in accordance with one illustrative embodiment
- FIG. 7 is a flowchart outlining an exemplary operation for maintaining a continuous consistent state between two related application instances that are topologically remotely located in accordance with an illustrative embodiment.
- FIG. 8 is a flowchart outlining an exemplary operation for performing a failover operation from a primary application instance to a shadow application instance in accordance with one illustrative embodiment.
- the illustrative embodiments set forth herein provide mechanisms for maintaining a shadow application instance that is running in an active standby computing device that is topologically, and often times geographically, remotely located, i.e. not within the same storage area network or cluster, from a primary computing device running a primary application instance of the same application.
- the mechanisms of the illustrative embodiments are preferably implemented in a distributed data processing environment.
- FIGS. 1 and 2 provide examples of data processing environments in which aspects of the illustrative embodiments may be implemented.
- the depicted data processing environments are only exemplary and are not intended to state or imply any limitation as to the types or configurations of data processing environments in which the exemplary aspects of the illustrative embodiments may be implemented. Many modifications may be made to the data processing environments depicted in FIGS. 1 and 2 without departing from the spirit and scope of the present invention.
- FIG. 1 depicts a pictorial representation of a network of data processing systems 100 in which the present invention may be implemented.
- Network data processing system 100 contains a local area network (LAN) 102 and a large area data network 130 , which are the media used to provide communication links between various devices and computers connected together within network data processing system 100 .
- LAN 102 and large area data network 130 may include connections, such as wired communication links, wireless communication links, fiber optic cables, and the like.
- server computing devices 102 - 105 are connected to LAN 102 .
- the server computing devices 102 - 105 may comprise a storage area network (SAN) or a server cluster 120 , for example.
- SANs and server clusters are generally well known in the art and thus, a more detailed explanation of SAN/cluster 120 is not provided herein.
- client 112 is connected to LAN 102 .
- Clients 108 and 110 are connected to the large area data network 130 .
- These clients 108 , 110 , and 112 may be, for example, personal computers, workstations, application servers, or the like.
- server computing devices 102 - 105 may store, track, and retrieve data objects for clients 108 , 110 and 112 .
- Clients 108 , 110 , and 112 are clients to server computing devices 102 - 105 and thus, may communication with server computing devices 102 - 105 via the LAN 102 and/or the large area data network 130 to run applications and interface with running applications on the server computing devices 102 - 105 and obtain data objects from these server computing devices 102 - 105 .
- Network data processing system 100 may include additional servers, clients, and other devices not shown.
- the large area data network 130 is coupled to the LAN 102 .
- the large area data network 130 may be the Internet, representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another.
- TCP/IP Transmission Control Protocol/Internet Protocol
- At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages.
- large area data network 130 may also be implemented as a number of different types of networks, such as for example, an intranet, another local area network (LAN), a wide area network (WAN), or the like.
- FIG. 1 is only intended as an example, and is not intended to state or imply any architectural limitations for the illustrative embodiments described herein.
- the Internet is typically used by servers in a cluster to communicate with one another using TCP/IP for messaging traffic.
- Storage controllers participating in mirroring such as Metro MirrorTM or Global MirrorTM as discussed hereafter, typically communicate over a separate storage network using FICON channel commands, SCSI commands, or TCP/IP.
- Server computing device 140 is coupled to large area data network 130 and has an associated storage system 150 .
- Storage system 150 is shown as being directly coupled to the server computing device 140 but, alternatively, may be indirectly accessed by the server computing device 140 via the large area data network 130 or another network (not shown).
- Server computing device 140 is topologically remotely located from the SAN/cluster 120 . That is, server computing device 140 is not part of the SAN/cluster 120 . Moreover, Server computing device 140 may be geographically remotely located from the SAN/cluster 120 .
- the server computing device 140 operates as an active standby server computing device for one or more of the server computing devices 102 - 105 in SAN/cluster 120 .
- the server computing device 140 has the same virtual network address on the large area data network 130 as the server computing devices 102 - 105 in SAN/cluster 120 .
- IP Internet Protocol
- network traffic directed to or from these server computing devices 102 - 105 and 140 may make use of the same virtual network address with mechanisms provided for redirecting such traffic to the appropriate server computing device 102 - 105 and 140 .
- the illustrative embodiments described hereafter provide mechanisms for fault recovery of running application instances on one or more of the server computing devices 102 - 105 of the SAN/cluster 120 by utilizing shadow application instances on the topologically remotely located server computing device 140 . It should be appreciated that while the illustrative embodiments will be described in terms of fault recovery of running application instances on a SAN/cluster 120 , the illustrative embodiments and the present invention are not limited to such.
- a single server computing device, or even client computing device may be the source of a primary running application instance whose state is made consistent with a corresponding shadow application instance on the topologically remotely located computing device (either server or client computing device) in order to provide fault tolerance, without departing from the spirit and scope of the present invention.
- Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality of processors 202 and 204 connected to system bus 206 . Alternatively, a single processor system may be employed. Also connected to system bus 206 is memory controller/cache 208 , which provides an interface to local memory 209 . I/O Bus Bridge 210 is connected to system bus 206 and provides an interface to I/O bus 212 . Memory controller/cache 208 and I/O Bus Bridge 210 may be integrated as depicted.
- SMP symmetric multiprocessor
- Peripheral component interconnect (PCI) bus bridge 214 connected to I/O bus 212 provides an interface to PCI local bus 216 .
- PCI Peripheral component interconnect
- a number of modems may be connected to PCI local bus 216 .
- Typical PCI bus implementations will support four PCI expansion slots or add-in connectors.
- Communications links to clients 108 - 112 in FIG. 1 and/or other network coupled devices may be provided through modem 218 and/or network adapter 220 connected to PCI local bus 216 through add-in connectors.
- Additional PCI bus bridges 222 and 224 provide interfaces for additional PCI local buses 226 and 228 , from which additional modems or network adapters may be supported. In this manner, data processing system 200 allows connections to multiple network computers.
- a memory-mapped graphics adapter 230 and hard disk 232 may also be connected to I/O bus 212 as depicted, either directly or indirectly.
- FIG. 2 may vary.
- other peripheral devices such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted.
- the depicted example is not meant to imply architectural limitations with respect to the present invention.
- the data processing system depicted in FIG. 2 may be, for example, an IBM eServer pSeries system, a product of International Business Machines Corporation in Armonk, N.Y., running the Advanced Interactive Executive (AIX) operating system or LINUX operating system.
- AIX Advanced Interactive Executive
- the topologically and/or geographically remotely located server computing device 140 which may operate as an active standby server computing device that runs a shadow application instance of the same application that is being run in one or more of the server computing devices 102 - 105 .
- the topologically and/or geographically remotely located server computing device 140 may operate as a “hot swappable” server computing device for one or more of the server computing devices 102 - 105 even though it is remotely located from the SAN/cluster 120 .
- Such “hot swapping” may be performed with no data loss in the process.
- known mechanisms such as VMotionTM and MetaClusterTM only permit relocation and failover of running applications within a local topology, i.e. within SAN/cluster 120 .
- the computing devices to which running applications may be relocated or failed-over must have access to the same shared storage system, thereby limiting relocation to a local topology and geographical area.
- the known mechanisms do not permit relocation/failover of running applications to topologically and/or geographically remotely located computing devices.
- a primary computing device such as server 102
- runs one instance of an application i.e. the primary application instance
- an active standby computing device such as server 140
- runs a second instance of the application i.e. the shadow application instance
- the primary computing device 102 and active standby computing device 140 have the same virtual network address such that both computing devices may receive the same inputs from a network via network address virtualization, as is generally known in the art.
- the two instances of the application are brought into a consistent state by running an initial application “checkpoint” on the primary computing device 102 followed by an application “restart” on the active standby computing device 140 .
- the generation of the “checkpoint” of the primary application instance on the primary computing device 102 and the “restarting” of the shadow application instance on the active standby computing device 140 may make use of the mechanism described in commonly assigned and co-pending U.S. patent application Ser. No. 11/340,813, filed on Jan. 25, 2006, which is hereby incorporated by reference.
- the generation of a checkpoint may involve copying application data for the primary application instance on the primary computing device 102 to a storage system of the topologically remotely located computing device 140 .
- the copying of application data may be performed using mirroring technology, such as a peer-to-peer remote copy operation, for example.
- a stateful checkpoint of the primary application instance may be generated and stored to a storage medium.
- the stateful checkpoint comprises a set of metadata describing the current state of the primary application instance at the particular point in time when the stateful checkpoint is generated.
- the stateful checkpoint is generated at substantially the same time as the copying of the application data so as to ensure that the state of the application as represented by the stateful checkpoint metadata matches the application data.
- the stateful checkpoint metadata may be copied to the same or different storage system associated with the topologically remotely located computing device 140 in a similar manner as the application data. For example, a peer-to-peer remote copy operation may be performed on the checkpoint metadata to copy the checkpoint metadata to the topologically remotely located storage system.
- the MetaClusterTM product may be used to generate stateful checkpoint metadata for the primary application instance as if the application were being relocated within a local cluster of server computing devices.
- the stateful checkpoint metadata and application data may be relocated to a topologically remotely located computing device using a peer-to-peer remote copy operation, such as is provided by either the Metro MirrorTM or Global MirrorTM products available from International Business Machines, Inc. of Armonk, N.Y.
- MetaClusterTM Information regarding the MetaClusterTM product may be found, for example, in the articles “Meiosys Releases MetaCluster UC Version 3.0” and “Meiosys Relocates Multi-Tier Applications Without Interruption of Service,” available from the PR Newswire website (www.prnewswire.com). Additional information regarding MetaClusterTM and the ability to replicate application states within a cluster may be found in U.S. Patent Application Publication No. 2005/0251785. Information regarding an exemplary peer-to-peer remote copy may be obtained, for example, in the Redbooks paper entitled “IBM TotalStorage Enterprise Storage Server PPRC Extended Distance,” authored by Castets et al., and is available at the official website for International Business Machines, Inc. (www.ibm.com). These documents are hereby incorporated herein by reference.
- the shadow application instance After generating an initial checkpoint of the primary application instance running on the primary computing device 102 , and copying this checkpoint's stateful checkpoint metadata and application data to the topologically remotely located computing device 140 , the shadow application instance is “restarted” using the application data and stateful checkpoint metadata. This in effect causes the shadow application instance to have the same state and application data as the primary application instance and thereby synchronize the shadow application instance with the primary application instance.
- events occurring in the primary application instance may be continuously recorded in a log and then replayed at the recovery site so as to maintain the states of the application instances consistent.
- the log may be a high speed pipeline to the shadow application instance, a log file on a shared file system (not shown), a log file that is automatically replicated and applied at the recovery site, or the like.
- the recording of events in the log may be performed in a similar manner as provided in the Record and Replay technology of the MetaClusterTM FT, discussed previously, for example.
- continuous consistency between the application instances may be provided by automatically performing a peer-to-peer remote copy operation on the log event data, using Metro MirrorTM or Global MirrorTM, for example, as the log event data is written to the log storage in the production site or upon the closing of a log file. That is, log event data may be written to the log file in the log storage of the production site and then automatically copied to a storage system associated with the recovery site using a peer-to-peer remote copy operation, for example. Alternatively, the log event data may be written to the log file in the log storage of the production site and then copied to the log storage at the recovery site when the log file is closed.
- Metro MirrorTM or Global MirrorTM for example
- the log will eventually be closed, such as when the log becomes full or at some predetermined time interval, at which time the logging of events is switched from a first log file to a secondary log file.
- the closing of the log may be communicated to the active standby computing device in the recovery site, at which time, events logged in the first log file may be replayed by the shadow application instance while event logging is continued using the secondary log file.
- the event logging using the secondary log file is likewise automatically replicated in a storage system associated with the recovery site.
- the shadow application instance preferably has all of its network outputs, i.e. shadow sockets, disabled in order to avoid conflicts on the network with the primary application instance, such as message duplication.
- Such disabling of sockets may comprise, for example, creating a socket and then closing it.
- the socket output is disabled (closed) while the socket input is enabled. This allows for all the recorded logs to be replayed by the active standby computing device at the recovery site, but without sending any messages out to the network.
- the production site computing device receives input messages (events) from the network clients while the active standby receives the recorded logs of events from the production site computing device, but only the production site computing device has the socket output enabled.
- the shadow application instance's network outputs may be activated in the event that the primary application instance at the production site fails and the shadow application instance must take over for the failed primary application instance.
- the active standby computing device 140 detects the loss of the primary application instance. The detection of the loss of the primary application instance may be based on, for example, a failure to receive a heartbeat signal from the production site computing device. Other mechanisms for detecting the primary application instance failure may be used without departing from the spirit and scope of the present invention.
- the shadow application instance is at a state corresponding to the last replay of logged events, which will typically occur when the logs are switched, e.g., from a first log file to a second log file.
- the second log file which is continuously and automatically replicated by use of the peer-to-peer copy operation, contains all of the events that have occurred in the primary application instance since the log switch.
- the shadow application instance need only replay the events in the second log in order to bring the shadow application instance to a state corresponding to the state of the primary application instance just prior to the failure of the primary application instance.
- the network outputs i.e. shadow sockets, may then be enabled such that the shadow application instance may now generate outputs that are sent across the network to client devices.
- the active standby computing device 140 Since the active standby computing device 140 has the same virtual network address as the primary computing device 102 , inputs from client devices may be received by the active standby computing device 140 . In this way, the shadow application instance may then take over the functions of the primary application instance without data loss.
- the logs of events occurring in a primary application instance are automatically and continuously transferred to the active standby computing device 140 so as to keep the state of the shadow application instance running on the active standby computing device 140 consistent with the current state of the primary application instance running on the primary computing device 102 .
- This automatic and continuous transfer is facilitated by the use of a remote copy operation, such as a peer-to-peer remote copy operation.
- FIG. 3 is an exemplary block diagram illustrating the peer-to-peer remote copy operation in accordance with one illustrative embodiment.
- the Metro MirrorTM product is used to perform the peer-to-peer remote copy operation, although the present invention is not limited to using Metro MirrorTM or Global MirrorTM. Rather, any mechanism that permits the remote copying of data and metadata to a topologically remotely located storage system may be used without departing from the spirit and scope of the present invention.
- Global MirrorTM allows the shadowing of application system data from one site (referred to as the production site) to a second site (referred to as the recovery site).
- the logical volumes that hold the data at the production site are referred to as primary volumes and the corresponding logical volumes that hold the mirrored data at the recovery site are referred to as secondary volumes.
- the connection between the primary and the secondary volumes may be provided using fiber channel protocol (FCP) links.
- FCP fiber channel protocol
- FIG. 3 illustrates the sequence of a write operation when operating Global MirrorTM in synchronous mode, i.e. peer-to-peer remote copy-synchronous (PPRC-SYNC).
- PPRC-SYNC peer-to-peer remote copy-synchronous
- the data at the recovery site secondary volumes 330 is real time data that is always consistent with the data at the primary volumes 320 .
- PPRC-SYNC can provide continuous data consistency at the recovery site without needing to periodically interrupt the application to build consistency checkpoints. From the application perspective this is a non-disruptive way of always having valid data at the recovery location.
- the mechanisms of the illustrative embodiments may be equally applicable to both synchronous and asynchronous remote copy operations.
- the “write complete” may be returned from the primary volumes 320 prior to the data being committed in the secondary volumes 330 .
- the logs in the storage system of the topologically remotely located computing device need to be at a data-consistent state with regard to the logs at the primary computing device prior to any replay of log events on the topologically remotely located computing device.
- both the primary computing device 102 and the topologically remotely located computing device 140 must have access to the shared storage system and the logs maintained on the shared storage system.
- the primary computing device 102 may write to the logs of the shared storage system and the topologically remotely located computing device 140 may read from these logs so as to replay log events to bring the shadow application instance into a consistent state with the primary application instance.
- Such reading and replaying of events may occur when the primary computing device 102 signals a log switch event to the topologically remotely located computing device 140 , for example.
- an initial checkpoint is generated and used to synchronize the states of a primary application instance and a remotely located shadow application instance. Thereafter, logs are automatically and continuously maintained consistent between the primary application instance location and the remotely located shadow application instance.
- logs Periodically, logs are closed and logging of events is switched to secondary logs such that the events recorded in the initial logs may be used to update the state of the shadow application instance. This replaying of events from the closed log may be performed without interrupting the continuous logging of events.
- the switching between the initial and secondary logs may be repeated in the opposite direction, and so forth, as many times as necessary so as to facilitate the continued logging of events and updating of application states.
- the state of the shadow application instance may be automatically updated by replaying events from the log that is currently being used to log events. This brings the state of the shadow application instance to a current state of the failed primary application instance.
- the network outputs of the shadow application instance may then be enabled and the shadow application instance thereby takes over operation for the primary application instance. All of this is done with the shadow application instance running on a topologically and/or geographically remotely located computing device from the primary computing device upon which the primary application instance is running.
- FIG. 4 is an exemplary block diagram illustrating an operation for maintaining continuous consistent application states between two related application instances that are topologically remotely located in accordance with one illustrative embodiment.
- a primary computing device 410 is provided at a production site 450 and an active standby computing device 420 is provided at a recovery site 460 .
- the recovery site 460 is topologically and/or geographically remotely located from the production site 450 .
- the primary computing device 410 has associated storage devices A, B and C that are coupled to the primary computing device 410 .
- Data storage device A in the depicted example, stores application data for the primary application instance 412 running in the primary computing device 410 .
- Data storage device B stores stateful checkpoint metadata as well as one or more logs of events that occur during the running of the primary application instance 412 .
- Data storage device C also stores one or more logs of events that occur during the running of the primary application instance 412 .
- the logs in data storage device C are secondary logs to which logging of events is switched when an initial log in storage device B is closed. Such switching may occur back and forth between logs in storage devices B and C while the primary application instance is running as necessary.
- the active standby computing device 420 is coupled to storage devices D, E and F.
- Storage devices D, E and F are mirrors of storage devices A, B and C.
- storage device D stores application data for the primary application instance 412 .
- Storage device E stores checkpoint metadata for the primary application instance 412 as well as one or more logs of events occurring in application instance 412 .
- Storage device F stores the secondary logs of events occurring in the application instance 412 .
- the data stored in data storage devices A, B, and C at the production site 450 may be transferred to the data storage devices D, E and F at the remotely located recovery site 460 using a peer-to-peer remote copy operation, as previously described above.
- PPRC is used to copy the initial application data for the primary application instance 412 from the data storage device A to the data storage device D.
- PPRC is also used to copy initial stateful checkpoint data from data storage B to data storage E. Thereafter, PPRC is used to provide automatic and continuous copying of write data to logs stored in the data storage devices B and C to data storage devices E and F.
- the primary computing device 410 includes a primary fault tolerance engine 414 that is responsible for performing operations in accordance with the illustrative embodiments with regard to the primary computing device 410 .
- the active standby computing device 420 includes a remote fault tolerance engine 424 that is responsible for performing operations in accordance with the illustrative embodiments with regard to the active standby computing device 420 .
- the primary fault tolerance engine 414 In operation, upon initialization of a primary application instance 412 on the primary computing device 410 , for example, or at any other suitable time point at which a stateful checkpoint may be generated, the primary fault tolerance engine 414 generates a checkpoint of the state of the primary application instance 412 .
- This checkpoint involves a copy of the application data at the checkpoint time and stateful checkpoint metadata at the checkpoint time.
- the checkpoint application data copy is generated and stored in the data storage device A while the stateful checkpoint metadata is generated and stored in the data storage device B.
- the primary fault tolerance engine 414 may generate the checkpoint using, for example, the MetaClusterTM product as discussed previously.
- the primary fault tolerance engine 414 initiates a transfer of a copy of the checkpoint application data and metadata to the recovery site 460 using a peer-to-peer remote copy (PPRC) operation.
- PPRC peer-to-peer remote copy
- a copy of the application data and checkpoint metadata is sent from storage devices A and B to storage devices D and E associated with the recovery site 460 .
- the primary fault tolerance engine 414 may send a message to the active standby computing device 420 to “restart” a shadow application instance 422 of the primary application instance 412 .
- the remote fault tolerance engine 424 may initiate a “restart” operation on the shadow application instance 422 in response to the message from the primary fault tolerance engine 414 .
- the restart operation makes use of the copy of the application data and stateful checkpoint metadata to restart the shadow application instance 422 at a state that corresponds to the initial state of the primary application instance 412 specified by the application data and stateful checkpoint metadata.
- the remote fault tolerance engine 424 may use the MetaClusterTM product to perform this restart operation.
- the primary application instance 412 is made available to client devices for use.
- the primary computing device 410 may receive inputs from client devices via one or more networks and may generate outputs that are sent to the client devices via the one or more networks.
- the inputs, processing of the inputs, and generation of outputs results in events occurring in the primary application instance 412 .
- the primary fault tolerance engine 414 records these events occurring in the primary application instance 412 in a log stored in data storage device B. The logging of these events continues until a predetermined criteria is reached, at which time the log in data storage device B is closed and logging of events is switched to a secondary log in data storage device C. This same logging is then performed with regard to data storage device C until the predetermined criteria is again reached at which point logging is switched back to a log maintained in data storage device B. This switching back and forth between logs may be performed repeatedly as necessary during the running of the primary application instance 412 .
- this PPRC operation is performed immediately after writing the event data to the primary volume, e.g., the log in the data storage device B or C.
- the logs in data storage devices E and F are kept consistent with the current state of the logs in data storage devices B and C in an automatic and continuous manner by use of the PPRC operation.
- the primary fault tolerance engine 414 may close the log in data storage device B and no longer logs events in the log in data storage device B.
- the primary fault tolerance engine 414 switches the logging of events to a secondary log in data storage device C.
- the primary fault tolerance engine 414 may signal the switch of logs to the remote fault tolerance engine 424 which may then close its own copy of the log in data storage device E.
- the closing of the log in data storage device E may be followed by the remote fault tolerance engine 424 replaying the events recorded in the closed log so as to bring the state of the shadow application instance 422 up to date as of the time of the log switch.
- the input and output sockets of the primary application instance 412 are enabled during the running of the primary application instance 412 on the primary computing device 410 .
- the output sockets of the shadow application instance 422 are disabled such that the outputs generated by the shadow application instance 420 are not sent to the client devices. This assures that there are no conflicts on the network with the primary application instance 412 , such as message duplication.
- the primary application instance 412 may fail and it will be necessary to transfer operations over to the shadow application instance 422 on the topologically and/or geographically remotely located active standby computing device 420 .
- the following is an example illustrating such a failover from the primary application instance 412 to the shadow application instance 422 .
- events occurring in the primary application instance 412 are logged in a first log on data storage device B of the production site 450 .
- logging may be switched from the first log on data storage device B to a second log on data storage device C.
- events in the first log may be automatically applied to the shadow application instance 422 from the shared storage system in response to the log switch event.
- the first log and second log are automatically replicated to the data storage devices E and F at the recovery site 460 .
- the events in the copy of the first log on data storage device E may be applied to the shadow application instance 422 without having to terminate the PPRC operation on log events being written to the second log in data storage device C. This allows continuous replication of log events on data storage devices E and F.
- the primary application instance 412 fails.
- the loss of the primary application instance 412 at the production site 450 is detected by the remote fault tolerance engine 424 , such as by way of a heartbeat signal not being detected, a predetermined time interval elapsing without a log event being replicated from the primary application instance, or the like.
- the state of the shadow application instance 422 is the state of the primary application instance 412 at the point in time when the log switch event occurred.
- the remote fault tolerance engine 424 facilitates this replaying of events from the copy of the second log maintained in the data storage device F.
- a topologically and/or geographically remotely located active standby computing device 420 may be used to provide failover for a primary application instance with no data loss.
- FIG. 4 shows separate storage devices A-F for storage of application data, checkpoint metadata, and logs
- the illustrative embodiments are not limited to such. Rather, any number of storage devices may be utilized without departing from the spirit and scope of the present invention.
- the production site 450 may comprise a single storage device upon which application data, checkpoint metadata and various logs may be stored.
- the recovery site 460 may have one or more storage devices for storing copies of this data that is copied over from the production site 450 using the PPRC operations described above.
- FIG. 5 is an exemplary block diagram of the primary operational components of a primary fault tolerance engine in accordance with one illustrative embodiment.
- the primary fault tolerance engine includes a controller 510 , a peer-to-peer remote copy module 520 , a storage system interface 540 , an initial checkpoint generation module 530 , a network interface 550 , and an application log module 560 .
- the elements 510 - 560 may be implemented in hardware, software, or any combination of hardware and software. In one illustrative embodiment, the elements 510 - 560 are implemented as software instructions executed by one or more processing devices.
- the controller 510 is responsible for the overall operation of the primary fault tolerance engine and orchestrates the operation of the other elements 520 - 560 .
- the peer-to-peer remote copy module 520 is responsible for performing peer-to-peer remote copying of application data, initial checkpoint metadata, and log event data from the production site 450 to the recovery site 460 .
- the storage system interface 540 provides an interface through which data may be written to or read from the storage device(s) associated with the production site 450 .
- the initial checkpoint generation module 530 is responsible for generating initial copies of application data and stateful checkpoint metadata for transfer to the recovery site 460 for initializing a shadow application instance.
- the network interface 550 is responsible for providing an interface through which application data, stateful checkpoint metadata, and log data may be transmitted to the recovery site 460 . Messages indicative of initialization of an application instance, log switches, failure of an application instance, and the like may also be transmitted from the primary fault tolerance engine to the remote fault tolerance engine via the network interface 550 .
- the application log module 560 is responsible for logging events occurring in an application instance in one or more logs that are maintained in a storage system of the production site.
- the application log module 560 also performs operations for switching between logs.
- the controller 510 instructs the initial checkpoint generation module 530 to generate an initial checkpoint for an application instance and store that checkpoint data to the storage system of the production site via storage system interface 540 .
- the controller 510 then instructs the peer-to-peer remote copy module 520 to copy the checkpoint data to a topologically and/or geographically remotely located recovery site 460 .
- the controller 510 then instructs the application log module 560 to begin logging events for the primary application instance and performing a peer-to-peer remote copy of such event data to the recovery site 460 using the peer-to-peer remote copy module 520 .
- the logging of events may be done with regard to one or more logs stored in the storage system via the storage system interface 540 .
- Peer-to-peer remote copy operations may be performed via the network interface 550 .
- the application log module 560 may further send messages to the recovery site 460 indicating log switches via the network interface 550 .
- FIG. 6 is an exemplary block diagram of the primary operation components of a remote fault tolerance engine in accordance with one illustrative embodiment.
- the remote fault tolerance engine includes a controller 610 , an application log replay module 620 , a storage system interface 630 , a shadow application failover module 640 , and a network interface 650 .
- the elements 610 - 650 may be implemented in hardware, software, or any combination of hardware and software. In one illustrative embodiment, the elements 610 - 650 are implemented as software instructions executed by one or more processing devices.
- the controller 610 is responsible for the overall operation of the remote fault tolerance engine and orchestrates the operation of the other elements 620 - 650 .
- the application log replay module 620 is responsible for replaying log events from logs stored in a storage system associated with the recovery site 460 . Such replaying of events may occur when a log switch event occurs or when a primary application instance fails, for example.
- the storage system interface 630 provides an interface through which the remote fault tolerance engine may access checkpoint data and logs stored in a storage system associated with the recovery site 460 .
- the network interface 650 provides an interface through which messages and data may be received from the primary fault tolerance engine via one or more networks.
- the shadow application failover module 640 performs the necessary operations for failing-over the operations of a primary application instance to a shadow application instance using the logs maintained in the storage system associated with the recovery site 460 .
- the shadow application failover module 640 may perform operations to cause the replay of events from a currently active log so as to bring the shadow application instance up to a current state and then enable shadow sockets of the shadow application instance so that the shadow application instance may take over operations for the failed primary application instance.
- the controller 610 may receive messages from the primary fault tolerance engine of the production site 450 and may instruct various ones of the elements 620 and 640 to operate accordingly. For example, in response to receiving an instruction to initiate a shadow application instance, and in response to receiving checkpoint data which is stored in the storage system associated with the recovery site 460 , the controller 610 may initialize a shadow application instance. Thereafter, updates to logs stored in the storage system associated with the recovery site 460 may be received via the network interface 650 and stored in the storage system via the storage system interface 630 .
- the controller 610 may instruct the application log replay module 620 to perform a replay of events from the previous log so as to bring the shadow application instance up to data as of the time of the log switch event.
- the controller 610 may instruct the shadow application failover module 640 to perform operations for failing-over the operations of the primary application instance to the shadow application instance, as described previously.
- FIGS. 7 and 8 are flowcharts outlining exemplary operations of an illustrative embodiment. It will be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the processor or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks.
- These computer program instructions may also be stored in a computer-readable memory or storage medium that can direct a processor or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory or storage medium produce an article of manufacture including instruction means which implement the functions specified in the flowchart block or blocks.
- blocks of the flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
- FIG. 7 is a flowchart outlining an exemplary operation for maintaining a continuous consistent state between two related application instances that are topologically remotely located in accordance with an illustrative embodiment.
- the operation starts with the primary computing device initiating a primary application instance (step 710 ).
- a primary fault tolerance engine generates an initial checkpoint of the primary application instance (step 720 ) and transmits the initial checkpoint data via a peer-to-peer remote copy operation to a recovery site (step 730 ).
- a remote fault tolerance engine receives the checkpoint data and initiates a shadow application instance using the checkpoint data (step 740 ).
- the primary fault tolerance engine initiates logging of events in a first log associated with the primary application instance (step 750 ).
- the events that are written to the log are automatically and continuously transmitted to the recovery site by the primary fault tolerance engine using a PPRC operation (step 760 ).
- the primary fault tolerance engine determines if a switching of logs is to be performed (step 770 ). If so, the primary fault tolerance engine switches logging of events from the first log to a second log (or vice versa in subsequent switching events) and a switch log event message is sent to the remote fault tolerance engine.
- the remote fault tolerance engine receives the switch log event message and replays events recorded in the previous log, e.g., the first log (step 780 ).
- the primary fault tolerance engine determines whether a termination event has occurred (step 790 ). This termination event may be, for example, the discontinuing of the primary application instance, for example. If a termination event has occurred, the operation is terminated. Otherwise, the operation returns to step 750 and continues to log events using the new event log.
- FIG. 8 is a flowchart outlining an exemplary operation for performing a failover operation from a primary application instance to a shadow application instance in accordance with one illustrative embodiment.
- the operation starts with the detection of a failure of a primary application instance by the remote fault tolerance engine (step 810 ).
- the remote fault tolerance engine replays the events logged in the currently active log using the shadow application instance (step 820 ).
- the shadow sockets of the shadow application are enabled (step 830 ) and inputs to the primary application instance are redirected to the shadow application instance (step 840 ). The operation then terminates.
- the illustrative embodiments provide mechanisms for topologically and/or geographically remote computing systems to be used as active standby computing systems for failover operations.
- the illustrative embodiments provide mechanisms for the automatic and continuous replication of log events between the primary application instance and the shadow application instance such that the two application instances may be maintained with a consistent state.
- the shadow application instance at the topologically and/or geographically remotely located active standby computing device may be used to take over operations for the failed primary application instance with a simple update of the shadow application state using the consistent copy of logged events stored at the remote location. All of this is done with no loss in data.
- the illustrative embodiments as described above may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.
- the illustrative embodiments may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
- Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc.
- I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Abstract
Description
- 1. Technical Field
- The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for relocating running applications to topologically remotely located computing systems.
- 2. Description of Related Art
- High availability and disaster recovery are increasingly more important in the information technology industry as today's society relies more heavily on electronic systems to perform daily activities. In this vein, it is becoming more important to be able to transfer a running application from one server computing device to another so as to ensure that the running application is available if a server computing system fails. Moreover, it is important to be able to relocate running applications in the event of a failure of a server computing system so that the running application may be recovered on a different computing system.
- High availability of applications may be achieved by providing clustered server/storage environments that protect against server and storage failures. In such environments, when a failure occurs, the application is restarted on a redundant server and/or storage system in the clustered server/storage environment with only a small time in which the application is made unavailable.
- In some clustered server/storage environments, a hot standby server/storage system is provided and a log shipping technique is used. With the log shipping technique, a log of the application state is maintained by a production server and is shipped to the hot standby server in order to keep the application state of the standby server close to the current state of the application on the production server. If a failover to the standby server is required, only updates since the last log update was shipped to the standby server will be lost.
- It should be noted that such server clusters or storage system clusters are topologically and geographically limited such that the devices that make up the cluster must be in relative close proximity to one another. The clustered server/storage environments do not provide any application independent mechanism for providing availability and disaster recovery at remote network topological and/or geographic distances. Moreover, the clustered server/storage environments do not provide any such availability and recovery mechanism that has zero data loss including no loss of in-flight transactions.
- One known solution for relocating running applications in a storage area network (SAN) is provided by the VMotion™ software available from VMWare (an evaluation copy of VMotion™ is available from www.vmware.com/products/vc/vmotion.html). The VMotion™ software allows users to move live, running virtual machines from one physical server computing system to another physical server computing system connected to the same SAN while maintaining continuous service availability. The VMotion™ software is able to perform such relocation because of the virtualization of the disks in the SAN.
- However, VMotion™ is limited in that it requires that the entire virtual machine, which may comprise the operating system and a plurality of running applications, be moved to the new physical server computing device. There is no ability in the VMotion™ software to be able to move individual applications from one physical server computing device to another.
- Moreover, VMotion™ is limited in that the movement of virtual machines can only be performed from one server computing device to another in the same SAN. Thus, VMotion™ cannot be used to move virtual machines to other server computing devices that are outside the SAN. This, in essence, places a network topology and geographical limitation on the server computing devices to which virtual machines may be moved using the VMotion™ software product.
- Another known solution for providing high availability and disaster recovery of running applications is the MetaCluster™ UC 3.0 software product available from Meiosys, Inc., which has been recently acquired by International Business Machines, Inc. As described in the article “Meiosys Releases MetaCluster UC Version 3.0,” available from PR Newswire at www.prnewswire.com, the MetaCluster™ software product is built upon a Service Oriented Architecture and embodies the latest generation of fine-grained virtualization technologies to enable dynamic data centers to provide preservation of service levels and infrastructure optimization on an application-agnostic basis under all load conditions.
- Unlike coarse-grained virtual machine technologies and virtual machine mobility technologies, such as VMotion™ described above, which run at the operating system level and can only move an entire virtual machine at one time, the MetaCluster™ software product runs in a middleware layer between the operating system and the applications. MetaCluster™ provides a container technology which surrounds each application, delivering both resource isolation and machine-to-machine mobility for applications and application processes.
- The MetaCluster™ software product's application virtualization and container technology enables relocation of applications both across physical and virtual machines. MetaCluster™ also provides substantial business intelligence which enables enterprises to set thresholds and define rules for managing the relocation of applications and application processes from machine to machine, both to address high availability and utilization business cases.
- Deploying MetaCluster™ UC 3.0 for business critical applications allows applications to be virtualized very efficiently so that the performance impact is unnoticeable (typically under 1%). Virtualized applications may then be moved to the infrastructure best suited from a resource optimization and quality of service standpoint. Server capacity can be reassigned dynamically to achieve high levels of utilization without compromising performance. Since MetaCluster™ UC 3.0 enables the state and context of the application to be preserved during relocation, the relocation is both fast and transparent to the users of the applications.
- MetaCluster™ UC 3.0 uses a transparent “checkpoint and restart” functionality for performing such relocation of applications within server clusters. When generating a checkpoint, the necessary stateful data and metadata for recreating the full state, connections and context of the running application are preserved for a particular point in time. This checkpoint may then be provided to another server computing device in the same cluster as the original server computing device. The server computing device to which the checkpoint is provided may then use the checkpoint information to restart the application, using application data available from a shared storage system of the cluster, and recreate the state, connections, and context of the application on the new server computing device.
- In a further product from Meiosys, i.e. MetaCluster™ FT a “Record and Replay” technology is provided in which events that continuously impact the behavior of an application at runtime are recorded to disk in the form of log files and then those events may be replayed in the event of a failure. Thus, the “Record and Replay” technology of MetaCluster™ FT allows recorded events to be replayed on a redundant application instance in the same server cluster in the event of a failure in order to provide failover fault tolerance. Information about the “Record and Replay” aspects of the MetaCluster™ FT (formerly referred to as “Meiosys FT”) software product may be found, for example, in “Meiosys breaks technology barrier for cost-effective fault tolerance technology designed to be embedded in OEM platform solutions” available from Primeur Monthly at www.hoise.com/primeur/05/articles /monthly/AE-PR-05-05-46.htmlat and the presentation materials for the 47th Meeting of IFIP WG10.4 in Puerto Rico available at www2.laas.fr/IFIPWG/Workshops&Meetings/47/WS/04-Rougier.pdf.
- While MetaCluster™ UC 3.0 and MetaCluster™ FT allow relocation and failover of individual applications within the same cluster, as opposed to requiring entire virtual machines to be relocated, MetaCluster™ UC and FT are still limited to a localized cluster of server computing devices. That is, MetaCluster™ relies on the ability of all of the server computing devices having access to a shared storage system for accessing application data. Thus, MetaCluster™ UC and FT do not allow movement or relocation of running applications outside of the server cluster or failover of application instances outside of the server cluster. Again this limits the network topology and geographical locations of computing devices to which running applications may be relocated and to which failover may be performed.
- In view of the above, it would be beneficial to have a system, method and computer program product for providing application fault tolerance and recovery using topologically and/or geographically remotely located computing devices. Moreover, it would be beneficial to have a system, method, and computer program product for relocating running applications to computing devices topologically remotely located outside a storage area network or cluster of computing devices in which the running applications were previously present. Furthermore, it would be beneficial to have such a fault tolerance and relocation mechanism that maintains synchronization between a production computing device and the remotely located computing device such that no data loss is experienced, including no data loss with regard to in-flight transactions. The illustrative embodiments described hereafter provide such a system, method and computer program product.
- It should be noted that the description of the illustrative embodiments provided herein refers to computing devices being “topologically remotely located.” Being “topologically remotely located,” in the present description, refers to the computing system being outside the cluster or storage area network of the computing device from which the running application is being relocated. In many cases a topologically remotely located computing system may be geographically remotely located as well, but this is not required for the computing system to be topologically remotely located. Rather, the topologically remotely located computing system need only be remotely located in terms of the network topology connecting the various computing devices.
- With the mechanisms of the illustrative embodiments, a primary computing device runs one instance of an application (i.e. the primary application instance) at a production site and an active standby computing device runs a second instance of the application (i.e. the shadow application instance) at a recovery site which may be topologically remotely located from the production site. The primary computing device and active standby computing device have the same virtual network address such that both computing devices may receive the same inputs from a network via network address virtualization, as is generally known in the art.
- The two instances of the application are brought into a consistent state by running an initial application “checkpoint” on the primary computing device followed by an application “restart” on the active standby computing device. The generation of the “checkpoint” of the primary application instance on the primary computing device and the “restarting” of the shadow application instance on the active standby computing device may make use of the mechanism described in commonly assigned and co-pending U.S. patent application Ser. No. 11/340,813 (Attorney Docket No. SJO920050108US1), filed on Jan. 25, 2006, which is hereby incorporated by reference. For example, the generation of a checkpoint may involve copying application data for the primary application instance on the primary computing device to a storage system of the topologically remotely located computing device. The copying of application data may be performed using mirroring technology, such as a peer-to-peer remote copy operation, for example.
- In addition to copying the application data, a stateful checkpoint of the primary application instance may be generated and stored to a storage medium. The stateful checkpoint comprises a set of metadata describing the current state of the primary application instance at the particular point in time when the stateful checkpoint is generated. Preferably, the stateful checkpoint is generated at substantially the same time as the copying of the application data so as to ensure that the state of the application as represented by the stateful checkpoint metadata matches the application data.
- The stateful checkpoint metadata may be copied to the same or different storage system associated with the topologically remotely located computing device in a similar manner as the application data. For example, a peer-to-peer remote copy operation may be performed on the checkpoint metadata to copy the checkpoint metadata to the topologically remotely located storage system.
- In one illustrative embodiment, the MetaCluster™ product may be used to generate stateful checkpoint metadata for the primary application instance as if the application were being relocated within a local cluster of server computing devices. In such an illustrative embodiment, the stateful checkpoint metadata and application data may be relocated to a topologically remotely located computing device using the Peer-to-Peer Remote Copy (PPRC) product available from International Business Machines, Inc. of Armonk, N.Y., also referred to by the name Metro Mirror™.
- Moreover, in another illustrative embodiment, the relocation may be performed using the Global Mirror™ product, also available from International Business Machines, Inc., which implements an asynchronous replication method that can be guaranteed to be consistent. Global Mirror™ is actually a combination of the Metro Mirror™ product, Global Copy™ product, and FlashCopy™ product available from International Business Machines, Inc. Global Mirror™ uses Global Copy to maximize data replication throughput and dynamically switches to synchronous mode (Metro Mirror) to get to a consistent state, preserves that consistent state using FlashCopy™, and switches back to Global Copy™ (also referred to as PPRC-XD). Metro Mirror™ has not data loss but is limited to approximately a 300 kilometer range while Global Mirror™ has minimal data loss but has no distance limitation.
- After generating an initial checkpoint of the primary application instance running on the primary computing device, and copying this checkpoint's stateful checkpoint metadata and application data to the topologically remotely located computing device, the shadow application instance is “restarted” using the application data and stateful checkpoint metadata. This in effect causes the shadow application instance to have the same state and application data as the primary application instance and thereby synchronize the shadow application instance with the primary application instance.
- After having synchronized the application instances so that they have consistent states with each other, events occurring in the primary application instance may be continuously recorded in a log and then replayed at the recovery site so as to maintain the states of the application instances consistent. The log may be a high speed pipeline to the shadow application instance, a log file on a shared file system, a log file that is automatically replicated and applied at the recovery site, or the like.
- In one illustrative embodiment, continuous consistency between the application instances may be provided by automatically performing a peer-to-peer remote copy operation on the log event data as the log event data is written to the log storage in the production site or upon the closing of a log file. That is, log event data may be written to the log file in the log storage of the production site and then automatically copied to a storage system associated with the recovery site using a Metro Mirror™ or Global Mirror™ operation, for example. Alternatively, the log event data may be written to the log file in the log storage of the production site and then copied to the log storage at the recovery site when the log file is closed.
- In either embodiment, the log will eventually be closed, such as when the log becomes full or at some predetermined time interval, at which time the logging of events is switched from a first log file to a secondary log file. The closing of the log may be communicated to the active standby computing device in the recovery site at which time events logged in the first log file may be replayed by the shadow application instance while event logging is continued using the secondary log file. The event logging using the secondary log file is likewise automatically replicated in a storage system associated with the recovery site.
- The shadow application instance preferably has all of its network outputs, i.e. shadow sockets, disabled in order to avoid conflicts on the network with the primary application instance, such as message duplication. The shadow application instance's network outputs may be activated in the event that the primary application instance at the production site fails and the shadow application instance must take over for the failed primary application instance.
- When a failure of the primary application instance occurs at the production site, the active standby computing device detects the loss of the primary application instance. The shadow application instance is at a state corresponding to the last replay of logged events, which will typically occur when the logs are switched, e.g., from a first log file to a second log file. At this point, the second log file, which is continuously and automatically replicated by use of the peer-to-peer copy operation, contains all of the events that have occurred in the primary application instance since the log switch. Thus, the shadow application instance need only replay the events in the second log in order to bring the shadow application instance to a state corresponding to the state of the primary application instance just prior to the failure of the primary application instance. The network outputs, i.e. shadow sockets, may then be enabled such that the shadow application instance may now generate outputs that are sent across the network to client devices. In this way, the shadow application instance may then take over the functions of the primary application instance without data loss.
- In one illustrative embodiment, a computer program product comprising a computer usable medium having a computer readable program is provided. The computer readable program, when executed on a standby computing device, causes the standby computing device to automatically receive, from a primary computing device topologically remotely located from the standby computing device, event data for a primary application instance written to a first log data structure associated with the primary computing device. The computer readable program may further cause the computing device to store the event data in a second log data structure associated with the standby computing device and a shadow application instance. The primary application instance and shadow application instance may be instances of a same application. The computer readable program may also cause the computing device to update a state of the shadow application instance by replaying events in the second log associated with the shadow application instance to thereby bring a state of the shadow application instance to a consistent state with the primary application instance. Moreover, the computer readable program may further cause the computing device to relocate the primary application instance to the standby computing device using the shadow application instance in response to a failure of the primary application instance.
- The computer readable program may cause the standby computing device to update a state of the shadow application instance in response to receiving a log switch event message from the primary computing device indicating a log switch event. The computer readable program may cause the standby computing device to store event data received subsequent to receiving the log switch event message to a third log data structure associated with the shadow application instance. The computer readable program may further causes the standby computing device to relocate the primary application instance to the standby computing device by detecting a failure of the primary application instance on the primary computing device, and replaying events in the third log data structure in response to detecting the failure of the primary application instance.
- The primary computing device and standby computing device may have a same virtual network address. Input and output sockets of the primary application instance may be enabled and only input sockets of the shadow application instance may be enabled prior to relocating the primary application instance.
- The computer readable program may further cause the standby computing device to detect a failure of the primary application instance on the primary computing device and replay events in the third log data structure in response to detecting the failure of the primary application instance to thereby bring the shadow application instance to a consistent state with the primary application instance just prior to the detected failure of the primary application instance. The computer readable program may also enable output sockets of the shadow application instance in response to detecting the failure of the primary application instance.
- The computer readable program may further cause the standby computing device to synchronize a state of the shadow application instance with a state of the primary application instance prior to receiving event data from the primary computing device. The computer readable program may cause the standby computing device to synchronize a state of the shadow application instance with a state of the primary application instance by receiving application data for the primary application instance from the topologically remotely located primary computing device and receiving an application checkpoint comprising checkpoint metadata that represents a same point in time as the copy of the application data. The computer readable program may initialize the shadow application instance on the standby computing device using the application data and checkpoint metadata.
- The primary computing device may automatically write the event data for events occurring in the primary application instance in the first log data structure and automatically copy the event data to the second log data structure as the event data is written to the first log data structure. The event data may be automatically received via a peer-to-peer remote copy operation executed by the primary computing device.
- The standby computing device may be geographically remotely located from the primary computing device. The standby computing device may be outside a cluster or storage area network of the primary computing device. In a further illustrative embodiment, a standby computing system is provided that comprises a processor and a memory coupled to the processor. The memory may contain instructions which, when executed by the processor, cause the processor to perform various ones and combinations of the operations outlined above with regard to the computer program product illustrative embodiment.
- In yet another illustrative embodiment, a method, in a standby computing device, for providing a shadow application instance for relocation of a primary application instance is provided. The method may comprise various ones and combinations of the operations outlined above with regard to the computer program product illustrative embodiment.
- These and other features and advantages of the illustrative embodiments will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description.
- The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is an exemplary block diagram of a distributed data processing system in which exemplary aspects of the illustrative embodiments may be implemented; -
FIG. 2 is an exemplary block diagram of a server computing device in which exemplary aspects of the illustrative embodiments may be implemented; -
FIG. 3 is an exemplary block diagram illustrating the peer-to-peer remote copy operation in accordance with one illustrative embodiment; -
FIG. 4 is an exemplary block diagram illustrating an operation for maintaining continuous consistent application states between two related application instances that are topologically remotely located in accordance with one illustrative embodiment; -
FIG. 5 is an exemplary block diagram of the primary operational components of a primary fault tolerance engine in accordance with one illustrative embodiment; -
FIG. 6 is an exemplary block diagram of the primary operation components of a remote fault tolerance engine in accordance with one illustrative embodiment; -
FIG. 7 is a flowchart outlining an exemplary operation for maintaining a continuous consistent state between two related application instances that are topologically remotely located in accordance with an illustrative embodiment; and -
FIG. 8 is a flowchart outlining an exemplary operation for performing a failover operation from a primary application instance to a shadow application instance in accordance with one illustrative embodiment. - The illustrative embodiments set forth herein provide mechanisms for maintaining a shadow application instance that is running in an active standby computing device that is topologically, and often times geographically, remotely located, i.e. not within the same storage area network or cluster, from a primary computing device running a primary application instance of the same application. As such, the mechanisms of the illustrative embodiments are preferably implemented in a distributed data processing environment.
- In the following description, the mechanisms of the illustrative embodiments will be described in terms of a distributed data processing environment in which there is a plurality of data processing systems provided that may communicate with one another via one or more networks and communication links.
FIGS. 1 and 2 provide examples of data processing environments in which aspects of the illustrative embodiments may be implemented. The depicted data processing environments are only exemplary and are not intended to state or imply any limitation as to the types or configurations of data processing environments in which the exemplary aspects of the illustrative embodiments may be implemented. Many modifications may be made to the data processing environments depicted inFIGS. 1 and 2 without departing from the spirit and scope of the present invention. - With reference now to the figures,
FIG. 1 depicts a pictorial representation of a network ofdata processing systems 100 in which the present invention may be implemented. Networkdata processing system 100 contains a local area network (LAN) 102 and a largearea data network 130, which are the media used to provide communication links between various devices and computers connected together within networkdata processing system 100.LAN 102 and largearea data network 130 may include connections, such as wired communication links, wireless communication links, fiber optic cables, and the like. - In the depicted example, server computing devices 102-105 are connected to
LAN 102. The server computing devices 102-105 may comprise a storage area network (SAN) or aserver cluster 120, for example. SANs and server clusters are generally well known in the art and thus, a more detailed explanation of SAN/cluster 120 is not provided herein. - In addition to server computing devices 102-105,
client 112 is connected toLAN 102.Clients area data network 130. Theseclients clients Clients LAN 102 and/or the largearea data network 130 to run applications and interface with running applications on the server computing devices 102-105 and obtain data objects from these server computing devices 102-105. Networkdata processing system 100 may include additional servers, clients, and other devices not shown. - The large
area data network 130 is coupled to theLAN 102. In the depicted example, the largearea data network 130 may be the Internet, representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages. - Of course, large
area data network 130 may also be implemented as a number of different types of networks, such as for example, an intranet, another local area network (LAN), a wide area network (WAN), or the like.FIG. 1 is only intended as an example, and is not intended to state or imply any architectural limitations for the illustrative embodiments described herein. - It should be noted that the Internet is typically used by servers in a cluster to communicate with one another using TCP/IP for messaging traffic. Storage controllers participating in mirroring, such as Metro Mirror™ or Global Mirror™ as discussed hereafter, typically communicate over a separate storage network using FICON channel commands, SCSI commands, or TCP/IP.
-
Server computing device 140 is coupled to largearea data network 130 and has an associatedstorage system 150.Storage system 150 is shown as being directly coupled to theserver computing device 140 but, alternatively, may be indirectly accessed by theserver computing device 140 via the largearea data network 130 or another network (not shown).Server computing device 140 is topologically remotely located from the SAN/cluster 120. That is,server computing device 140 is not part of the SAN/cluster 120. Moreover,Server computing device 140 may be geographically remotely located from the SAN/cluster 120. - In one illustrative embodiment, the
server computing device 140 operates as an active standby server computing device for one or more of the server computing devices 102-105 in SAN/cluster 120. As such, theserver computing device 140 has the same virtual network address on the largearea data network 130 as the server computing devices 102-105 in SAN/cluster 120. Such virtualization of network addresses, e.g., Internet Protocol (IP) addresses, is generally known in the art and thus, a detailed explanation is not provided herein. Suffice it to say that through virtualization of the network addresses of the server computing devices 102-105 and 140, network traffic directed to or from these server computing devices 102-105 and 140 may make use of the same virtual network address with mechanisms provided for redirecting such traffic to the appropriate server computing device 102-105 and 140. - The illustrative embodiments described hereafter provide mechanisms for fault recovery of running application instances on one or more of the server computing devices 102-105 of the SAN/
cluster 120 by utilizing shadow application instances on the topologically remotely locatedserver computing device 140. It should be appreciated that while the illustrative embodiments will be described in terms of fault recovery of running application instances on a SAN/cluster 120, the illustrative embodiments and the present invention are not limited to such. Rather, instead of the SAN/cluster 120, a single server computing device, or even client computing device, may be the source of a primary running application instance whose state is made consistent with a corresponding shadow application instance on the topologically remotely located computing device (either server or client computing device) in order to provide fault tolerance, without departing from the spirit and scope of the present invention. - Referring now to
FIG. 2 , a block diagram of a data processing system that may be implemented as a server computing device, such as one or more of server computing devices 102-105 orserver computing device 140 inFIG. 1 , is depicted in accordance with a preferred embodiment of the present invention.Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality ofprocessors system bus 206. Alternatively, a single processor system may be employed. Also connected tosystem bus 206 is memory controller/cache 208, which provides an interface tolocal memory 209. I/O Bus Bridge 210 is connected tosystem bus 206 and provides an interface to I/O bus 212. Memory controller/cache 208 and I/O Bus Bridge 210 may be integrated as depicted. - Peripheral component interconnect (PCI)
bus bridge 214 connected to I/O bus 212 provides an interface to PCIlocal bus 216. A number of modems may be connected to PCIlocal bus 216. Typical PCI bus implementations will support four PCI expansion slots or add-in connectors. Communications links to clients 108-112 inFIG. 1 and/or other network coupled devices may be provided throughmodem 218 and/ornetwork adapter 220 connected to PCIlocal bus 216 through add-in connectors. - Additional
PCI bus bridges local buses data processing system 200 allows connections to multiple network computers. A memory-mappedgraphics adapter 230 andhard disk 232 may also be connected to I/O bus 212 as depicted, either directly or indirectly. - Those of ordinary skill in the art will appreciate that the hardware depicted in
FIG. 2 may vary. For example, other peripheral devices, such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted. The depicted example is not meant to imply architectural limitations with respect to the present invention. - The data processing system depicted in
FIG. 2 may be, for example, an IBM eServer pSeries system, a product of International Business Machines Corporation in Armonk, N.Y., running the Advanced Interactive Executive (AIX) operating system or LINUX operating system. - Referring again to
FIG. 1 , with the mechanism of the illustrative embodiments, it is desirable to provide high availability and disaster recovery for application instances running on one or more of the server computing devices 102-105 of the SAN/cluster 120. In particular, it is desirable to have a topologically and/or geographically remotely locatedserver computing device 140 which may operate as an active standby server computing device that runs a shadow application instance of the same application that is being run in one or more of the server computing devices 102-105. In this way, the topologically and/or geographically remotely locatedserver computing device 140 may operate as a “hot swappable” server computing device for one or more of the server computing devices 102-105 even though it is remotely located from the SAN/cluster 120. Such “hot swapping” may be performed with no data loss in the process. - As discussed above, known mechanisms, such as VMotion™ and MetaCluster™ only permit relocation and failover of running applications within a local topology, i.e. within SAN/
cluster 120. With these known mechanisms, the computing devices to which running applications may be relocated or failed-over must have access to the same shared storage system, thereby limiting relocation to a local topology and geographical area. The known mechanisms do not permit relocation/failover of running applications to topologically and/or geographically remotely located computing devices. - With the mechanisms of the illustrative embodiments, a primary computing device, such as
server 102, runs one instance of an application (i.e. the primary application instance) at a production site and an active standby computing device, such asserver 140, runs a second instance of the application (i.e. the shadow application instance) at a recovery site which may be topologically, and/or geographically, remotely located from the production site. Theprimary computing device 102 and activestandby computing device 140 have the same virtual network address such that both computing devices may receive the same inputs from a network via network address virtualization, as is generally known in the art. - The two instances of the application are brought into a consistent state by running an initial application “checkpoint” on the
primary computing device 102 followed by an application “restart” on the activestandby computing device 140. The generation of the “checkpoint” of the primary application instance on theprimary computing device 102 and the “restarting” of the shadow application instance on the activestandby computing device 140 may make use of the mechanism described in commonly assigned and co-pending U.S. patent application Ser. No. 11/340,813, filed on Jan. 25, 2006, which is hereby incorporated by reference. For example, the generation of a checkpoint may involve copying application data for the primary application instance on theprimary computing device 102 to a storage system of the topologically remotely locatedcomputing device 140. The copying of application data may be performed using mirroring technology, such as a peer-to-peer remote copy operation, for example. - In addition to copying the application data, a stateful checkpoint of the primary application instance may be generated and stored to a storage medium. The stateful checkpoint comprises a set of metadata describing the current state of the primary application instance at the particular point in time when the stateful checkpoint is generated. Preferably, the stateful checkpoint is generated at substantially the same time as the copying of the application data so as to ensure that the state of the application as represented by the stateful checkpoint metadata matches the application data.
- The stateful checkpoint metadata may be copied to the same or different storage system associated with the topologically remotely located
computing device 140 in a similar manner as the application data. For example, a peer-to-peer remote copy operation may be performed on the checkpoint metadata to copy the checkpoint metadata to the topologically remotely located storage system. - In one illustrative embodiment, the MetaCluster™ product may be used to generate stateful checkpoint metadata for the primary application instance as if the application were being relocated within a local cluster of server computing devices. In such an illustrative embodiment, the stateful checkpoint metadata and application data may be relocated to a topologically remotely located computing device using a peer-to-peer remote copy operation, such as is provided by either the Metro Mirror™ or Global Mirror™ products available from International Business Machines, Inc. of Armonk, N.Y.
- Information regarding the MetaCluster™ product may be found, for example, in the articles “Meiosys Releases MetaCluster UC Version 3.0” and “Meiosys Relocates Multi-Tier Applications Without Interruption of Service,” available from the PR Newswire website (www.prnewswire.com). Additional information regarding MetaCluster™ and the ability to replicate application states within a cluster may be found in U.S. Patent Application Publication No. 2005/0251785. Information regarding an exemplary peer-to-peer remote copy may be obtained, for example, in the Redbooks paper entitled “IBM TotalStorage Enterprise Storage Server PPRC Extended Distance,” authored by Castets et al., and is available at the official website for International Business Machines, Inc. (www.ibm.com). These documents are hereby incorporated herein by reference.
- After generating an initial checkpoint of the primary application instance running on the
primary computing device 102, and copying this checkpoint's stateful checkpoint metadata and application data to the topologically remotely locatedcomputing device 140, the shadow application instance is “restarted” using the application data and stateful checkpoint metadata. This in effect causes the shadow application instance to have the same state and application data as the primary application instance and thereby synchronize the shadow application instance with the primary application instance. - After having synchronized the application instances so that they have consistent states with each other, events occurring in the primary application instance may be continuously recorded in a log and then replayed at the recovery site so as to maintain the states of the application instances consistent. The log may be a high speed pipeline to the shadow application instance, a log file on a shared file system (not shown), a log file that is automatically replicated and applied at the recovery site, or the like. The recording of events in the log may be performed in a similar manner as provided in the Record and Replay technology of the MetaCluster™ FT, discussed previously, for example.
- In one illustrative embodiment, continuous consistency between the application instances may be provided by automatically performing a peer-to-peer remote copy operation on the log event data, using Metro Mirror™ or Global Mirror™, for example, as the log event data is written to the log storage in the production site or upon the closing of a log file. That is, log event data may be written to the log file in the log storage of the production site and then automatically copied to a storage system associated with the recovery site using a peer-to-peer remote copy operation, for example. Alternatively, the log event data may be written to the log file in the log storage of the production site and then copied to the log storage at the recovery site when the log file is closed.
- In either embodiment, the log will eventually be closed, such as when the log becomes full or at some predetermined time interval, at which time the logging of events is switched from a first log file to a secondary log file. The closing of the log may be communicated to the active standby computing device in the recovery site, at which time, events logged in the first log file may be replayed by the shadow application instance while event logging is continued using the secondary log file. The event logging using the secondary log file is likewise automatically replicated in a storage system associated with the recovery site.
- The shadow application instance preferably has all of its network outputs, i.e. shadow sockets, disabled in order to avoid conflicts on the network with the primary application instance, such as message duplication. Such disabling of sockets may comprise, for example, creating a socket and then closing it. However, on an active standby computing device at the recovery site, only the socket output is disabled (closed) while the socket input is enabled. This allows for all the recorded logs to be replayed by the active standby computing device at the recovery site, but without sending any messages out to the network. The production site computing device receives input messages (events) from the network clients while the active standby receives the recorded logs of events from the production site computing device, but only the production site computing device has the socket output enabled.
- The shadow application instance's network outputs may be activated in the event that the primary application instance at the production site fails and the shadow application instance must take over for the failed primary application instance. When a failure of the primary application instance occurs at the production site, the active
standby computing device 140 detects the loss of the primary application instance. The detection of the loss of the primary application instance may be based on, for example, a failure to receive a heartbeat signal from the production site computing device. Other mechanisms for detecting the primary application instance failure may be used without departing from the spirit and scope of the present invention. - At this point in time, the shadow application instance is at a state corresponding to the last replay of logged events, which will typically occur when the logs are switched, e.g., from a first log file to a second log file. At this point, the second log file, which is continuously and automatically replicated by use of the peer-to-peer copy operation, contains all of the events that have occurred in the primary application instance since the log switch. Thus, the shadow application instance need only replay the events in the second log in order to bring the shadow application instance to a state corresponding to the state of the primary application instance just prior to the failure of the primary application instance. The network outputs, i.e. shadow sockets, may then be enabled such that the shadow application instance may now generate outputs that are sent across the network to client devices. Since the active
standby computing device 140 has the same virtual network address as theprimary computing device 102, inputs from client devices may be received by the activestandby computing device 140. In this way, the shadow application instance may then take over the functions of the primary application instance without data loss. - As mentioned above, with the illustrative embodiments, the logs of events occurring in a primary application instance are automatically and continuously transferred to the active
standby computing device 140 so as to keep the state of the shadow application instance running on the activestandby computing device 140 consistent with the current state of the primary application instance running on theprimary computing device 102. This automatic and continuous transfer is facilitated by the use of a remote copy operation, such as a peer-to-peer remote copy operation. -
FIG. 3 is an exemplary block diagram illustrating the peer-to-peer remote copy operation in accordance with one illustrative embodiment. In the depicted example, the Metro Mirror™ product is used to perform the peer-to-peer remote copy operation, although the present invention is not limited to using Metro Mirror™ or Global Mirror™. Rather, any mechanism that permits the remote copying of data and metadata to a topologically remotely located storage system may be used without departing from the spirit and scope of the present invention. - Using Global Mirror™ as representative of one illustrative embodiment for performing remote copying of data and metadata, Global Mirror™ allows the shadowing of application system data from one site (referred to as the production site) to a second site (referred to as the recovery site). The logical volumes that hold the data at the production site are referred to as primary volumes and the corresponding logical volumes that hold the mirrored data at the recovery site are referred to as secondary volumes. In one illustrative embodiment, the connection between the primary and the secondary volumes may be provided using fiber channel protocol (FCP) links.
-
FIG. 3 illustrates the sequence of a write operation when operating Global Mirror™ in synchronous mode, i.e. peer-to-peer remote copy-synchronous (PPRC-SYNC). As shown inFIG. 3 , in this synchronous type of operation, the updates done to the production siteprimary volumes 320 are synchronously shadowed onto thesecondary volumes 330 at the recovery site. Because this is a synchronous solution, write updates are ensured on both copies (primary and secondary) before the write is considered to be completed for the application running on thecomputing device 310. - Because in PPRC-SYNC operation the application does not get the “write complete” condition until the update is synchronously done in both the primary and the
secondary volumes secondary volumes 330 is real time data that is always consistent with the data at theprimary volumes 320. - One implication of this characteristic is that, in normal PPRC-SYNC operation, dependent writes are applied on the
secondary volumes 330 in the same sequence as they are applied in theprimary volumes 320. This is very important from an application consistency perspective at the time of the recovery. PPRC-SYNC can provide continuous data consistency at the recovery site without needing to periodically interrupt the application to build consistency checkpoints. From the application perspective this is a non-disruptive way of always having valid data at the recovery location. - While a synchronous PPRC operation is illustrated in
FIG. 3 , it should be appreciated that the mechanisms of the illustrative embodiments may be equally applicable to both synchronous and asynchronous remote copy operations. In an asynchronous remote copy operation, the “write complete” may be returned from theprimary volumes 320 prior to the data being committed in thesecondary volumes 330. Essentially, with regard to the asynchronous remote copy operations of the illustrative embodiments herein, the logs in the storage system of the topologically remotely located computing device need to be at a data-consistent state with regard to the logs at the primary computing device prior to any replay of log events on the topologically remotely located computing device. - Thus, with the peer-to-peer remote copy operation of the illustrative embodiments, as writes are performed to the logs present on the storage system associated with the
primary computing device 102, corresponding writes are also performed to a corresponding log in a storage system associated with the topologically remotely locatedcomputing device 140. This may be done in a synchronous or asynchronous manner. In this way, the logs in both the production site and the recovery site are maintained consistent. - As mentioned above, another alternative to performing such a peer-to-peer remote copy operation of log writes is to use a shared storage system that may be attached to the large
area data network 130, for example. In such an embodiment, both theprimary computing device 102 and the topologically remotely locatedcomputing device 140 must have access to the shared storage system and the logs maintained on the shared storage system. Theprimary computing device 102 may write to the logs of the shared storage system and the topologically remotely locatedcomputing device 140 may read from these logs so as to replay log events to bring the shadow application instance into a consistent state with the primary application instance. Such reading and replaying of events may occur when theprimary computing device 102 signals a log switch event to the topologically remotely locatedcomputing device 140, for example. - Thus, with the illustrative embodiments, an initial checkpoint is generated and used to synchronize the states of a primary application instance and a remotely located shadow application instance. Thereafter, logs are automatically and continuously maintained consistent between the primary application instance location and the remotely located shadow application instance.
- Periodically, logs are closed and logging of events is switched to secondary logs such that the events recorded in the initial logs may be used to update the state of the shadow application instance. This replaying of events from the closed log may be performed without interrupting the continuous logging of events. The switching between the initial and secondary logs may be repeated in the opposite direction, and so forth, as many times as necessary so as to facilitate the continued logging of events and updating of application states.
- When a failure of a primary application instance occurs, the state of the shadow application instance may be automatically updated by replaying events from the log that is currently being used to log events. This brings the state of the shadow application instance to a current state of the failed primary application instance. The network outputs of the shadow application instance may then be enabled and the shadow application instance thereby takes over operation for the primary application instance. All of this is done with the shadow application instance running on a topologically and/or geographically remotely located computing device from the primary computing device upon which the primary application instance is running.
-
FIG. 4 is an exemplary block diagram illustrating an operation for maintaining continuous consistent application states between two related application instances that are topologically remotely located in accordance with one illustrative embodiment. As shown inFIG. 4 , aprimary computing device 410 is provided at aproduction site 450 and an activestandby computing device 420 is provided at arecovery site 460. Therecovery site 460 is topologically and/or geographically remotely located from theproduction site 450. Theprimary computing device 410 has associated storage devices A, B and C that are coupled to theprimary computing device 410. Data storage device A, in the depicted example, stores application data for theprimary application instance 412 running in theprimary computing device 410. Data storage device B stores stateful checkpoint metadata as well as one or more logs of events that occur during the running of theprimary application instance 412. Data storage device C also stores one or more logs of events that occur during the running of theprimary application instance 412. The logs in data storage device C are secondary logs to which logging of events is switched when an initial log in storage device B is closed. Such switching may occur back and forth between logs in storage devices B and C while the primary application instance is running as necessary. - The active
standby computing device 420 is coupled to storage devices D, E and F. Storage devices D, E and F are mirrors of storage devices A, B and C. Thus, storage device D stores application data for theprimary application instance 412. Storage device E stores checkpoint metadata for theprimary application instance 412 as well as one or more logs of events occurring inapplication instance 412. Storage device F stores the secondary logs of events occurring in theapplication instance 412. - The data stored in data storage devices A, B, and C at the
production site 450 may be transferred to the data storage devices D, E and F at the remotely locatedrecovery site 460 using a peer-to-peer remote copy operation, as previously described above. In particular PPRC is used to copy the initial application data for theprimary application instance 412 from the data storage device A to the data storage device D. PPRC is also used to copy initial stateful checkpoint data from data storage B to data storage E. Thereafter, PPRC is used to provide automatic and continuous copying of write data to logs stored in the data storage devices B and C to data storage devices E and F. - The
primary computing device 410 includes a primaryfault tolerance engine 414 that is responsible for performing operations in accordance with the illustrative embodiments with regard to theprimary computing device 410. The activestandby computing device 420 includes a remotefault tolerance engine 424 that is responsible for performing operations in accordance with the illustrative embodiments with regard to the activestandby computing device 420. - In operation, upon initialization of a
primary application instance 412 on theprimary computing device 410, for example, or at any other suitable time point at which a stateful checkpoint may be generated, the primaryfault tolerance engine 414 generates a checkpoint of the state of theprimary application instance 412. This checkpoint involves a copy of the application data at the checkpoint time and stateful checkpoint metadata at the checkpoint time. The checkpoint application data copy is generated and stored in the data storage device A while the stateful checkpoint metadata is generated and stored in the data storage device B. The primaryfault tolerance engine 414 may generate the checkpoint using, for example, the MetaCluster™ product as discussed previously. - The primary
fault tolerance engine 414 initiates a transfer of a copy of the checkpoint application data and metadata to therecovery site 460 using a peer-to-peer remote copy (PPRC) operation. In response, a copy of the application data and checkpoint metadata is sent from storage devices A and B to storage devices D and E associated with therecovery site 460. The primaryfault tolerance engine 414 may send a message to the activestandby computing device 420 to “restart” ashadow application instance 422 of theprimary application instance 412. - The remote
fault tolerance engine 424 may initiate a “restart” operation on theshadow application instance 422 in response to the message from the primaryfault tolerance engine 414. The restart operation makes use of the copy of the application data and stateful checkpoint metadata to restart theshadow application instance 422 at a state that corresponds to the initial state of theprimary application instance 412 specified by the application data and stateful checkpoint metadata. As mentioned above, the remotefault tolerance engine 424 may use the MetaCluster™ product to perform this restart operation. - While the
shadow application instance 422 is being restarted using the application data and stateful checkpoint metadata in data storage devices D and E, theprimary application instance 412 is made available to client devices for use. Thus, theprimary computing device 410 may receive inputs from client devices via one or more networks and may generate outputs that are sent to the client devices via the one or more networks. - The inputs, processing of the inputs, and generation of outputs results in events occurring in the
primary application instance 412. The primaryfault tolerance engine 414 records these events occurring in theprimary application instance 412 in a log stored in data storage device B. The logging of these events continues until a predetermined criteria is reached, at which time the log in data storage device B is closed and logging of events is switched to a secondary log in data storage device C. This same logging is then performed with regard to data storage device C until the predetermined criteria is again reached at which point logging is switched back to a log maintained in data storage device B. This switching back and forth between logs may be performed repeatedly as necessary during the running of theprimary application instance 412. - As events are written to the logs in data storage devices B and C, the same events are written to corresponding logs in data storage devices E and F using a PPRC operation. As described in
FIG. 3 above, this PPRC operation is performed immediately after writing the event data to the primary volume, e.g., the log in the data storage device B or C. Thus, the logs in data storage devices E and F are kept consistent with the current state of the logs in data storage devices B and C in an automatic and continuous manner by use of the PPRC operation. - When a predetermined criteria is reached, such as the log becoming full, a predetermined amount of time expiring, or the like, the primary
fault tolerance engine 414 may close the log in data storage device B and no longer logs events in the log in data storage device B. The primaryfault tolerance engine 414 switches the logging of events to a secondary log in data storage device C. The primaryfault tolerance engine 414 may signal the switch of logs to the remotefault tolerance engine 424 which may then close its own copy of the log in data storage device E. The closing of the log in data storage device E may be followed by the remotefault tolerance engine 424 replaying the events recorded in the closed log so as to bring the state of theshadow application instance 422 up to date as of the time of the log switch. - During this update, events continue to be logged by the primary
fault tolerance engine 414 in the secondary log in data storage device C. These events are written to the secondary log and are also written to the corresponding log in data storage device F by way of the PPRC operation. Thus, no loss in data occurs even when updating the state of theshadow application instance 422. - As shown in
FIG. 4 , it should be noted that the input and output sockets of theprimary application instance 412 are enabled during the running of theprimary application instance 412 on theprimary computing device 410. With regard to theshadow application instance 422, the output sockets of theshadow application instance 422 are disabled such that the outputs generated by theshadow application instance 420 are not sent to the client devices. This assures that there are no conflicts on the network with theprimary application instance 412, such as message duplication. - At some point in time, the
primary application instance 412 may fail and it will be necessary to transfer operations over to theshadow application instance 422 on the topologically and/or geographically remotely located activestandby computing device 420. The following is an example illustrating such a failover from theprimary application instance 412 to theshadow application instance 422. - It is assumed that while the
primary application instance 412 is running on theprimary computing device 410, events occurring in theprimary application instance 412 are logged in a first log on data storage device B of theproduction site 450. When predetermined criteria are reached, such as a predetermined interval, logging may be switched from the first log on data storage device B to a second log on data storage device C. If a shared storage system is utilized, events in the first log may be automatically applied to theshadow application instance 422 from the shared storage system in response to the log switch event. However, if a shared storage system is not utilized, as depicted inFIG. 4 , the first log and second log are automatically replicated to the data storage devices E and F at therecovery site 460. The events in the copy of the first log on data storage device E may be applied to theshadow application instance 422 without having to terminate the PPRC operation on log events being written to the second log in data storage device C. This allows continuous replication of log events on data storage devices E and F. - At some time while log events are being written to the second log in data storage device C, the
primary application instance 412 fails. The loss of theprimary application instance 412 at theproduction site 450 is detected by the remotefault tolerance engine 424, such as by way of a heartbeat signal not being detected, a predetermined time interval elapsing without a log event being replicated from the primary application instance, or the like. - At this time, the state of the
shadow application instance 422 is the state of theprimary application instance 412 at the point in time when the log switch event occurred. As a result, the events that were logged in the second log in data storage device C, and replicated to data storage device F, need to be replayed in theshadow application instance 422 in order to bring theshadow application instance 422 up to a current state of theprimary application instance 412. The remotefault tolerance engine 424 facilitates this replaying of events from the copy of the second log maintained in the data storage device F. - After bringing the shadow application instance up to a current state by replaying the events in the copy of the second log, inputs from client devices may be redirected to the shadow application instance and the output shadow sockets associated with the
shadow application instance 422 may be enabled. Thereafter, the shadow application instance may take over operations for theprimary application instance 412. Thus, a topologically and/or geographically remotely located activestandby computing device 420 may be used to provide failover for a primary application instance with no data loss. - It should be appreciated that while
FIG. 4 shows separate storage devices A-F for storage of application data, checkpoint metadata, and logs, the illustrative embodiments are not limited to such. Rather, any number of storage devices may be utilized without departing from the spirit and scope of the present invention. For example, theproduction site 450 may comprise a single storage device upon which application data, checkpoint metadata and various logs may be stored. Similarly, therecovery site 460 may have one or more storage devices for storing copies of this data that is copied over from theproduction site 450 using the PPRC operations described above. -
FIG. 5 is an exemplary block diagram of the primary operational components of a primary fault tolerance engine in accordance with one illustrative embodiment. As shown inFIG. 5 , the primary fault tolerance engine includes acontroller 510, a peer-to-peerremote copy module 520, astorage system interface 540, an initialcheckpoint generation module 530, anetwork interface 550, and anapplication log module 560. The elements 510-560 may be implemented in hardware, software, or any combination of hardware and software. In one illustrative embodiment, the elements 510-560 are implemented as software instructions executed by one or more processing devices. - The
controller 510 is responsible for the overall operation of the primary fault tolerance engine and orchestrates the operation of the other elements 520-560. The peer-to-peerremote copy module 520 is responsible for performing peer-to-peer remote copying of application data, initial checkpoint metadata, and log event data from theproduction site 450 to therecovery site 460. Thestorage system interface 540 provides an interface through which data may be written to or read from the storage device(s) associated with theproduction site 450. The initialcheckpoint generation module 530 is responsible for generating initial copies of application data and stateful checkpoint metadata for transfer to therecovery site 460 for initializing a shadow application instance. - The
network interface 550 is responsible for providing an interface through which application data, stateful checkpoint metadata, and log data may be transmitted to therecovery site 460. Messages indicative of initialization of an application instance, log switches, failure of an application instance, and the like may also be transmitted from the primary fault tolerance engine to the remote fault tolerance engine via thenetwork interface 550. - The
application log module 560 is responsible for logging events occurring in an application instance in one or more logs that are maintained in a storage system of the production site. Theapplication log module 560 also performs operations for switching between logs. - In operation, the
controller 510 instructs the initialcheckpoint generation module 530 to generate an initial checkpoint for an application instance and store that checkpoint data to the storage system of the production site viastorage system interface 540. Thecontroller 510 then instructs the peer-to-peerremote copy module 520 to copy the checkpoint data to a topologically and/or geographically remotely locatedrecovery site 460. - The
controller 510 then instructs theapplication log module 560 to begin logging events for the primary application instance and performing a peer-to-peer remote copy of such event data to therecovery site 460 using the peer-to-peerremote copy module 520. The logging of events may be done with regard to one or more logs stored in the storage system via thestorage system interface 540. Peer-to-peer remote copy operations may be performed via thenetwork interface 550. Theapplication log module 560 may further send messages to therecovery site 460 indicating log switches via thenetwork interface 550. -
FIG. 6 is an exemplary block diagram of the primary operation components of a remote fault tolerance engine in accordance with one illustrative embodiment. As shown inFIG. 6 , the remote fault tolerance engine includes acontroller 610, an applicationlog replay module 620, astorage system interface 630, a shadowapplication failover module 640, and anetwork interface 650. The elements 610-650 may be implemented in hardware, software, or any combination of hardware and software. In one illustrative embodiment, the elements 610-650 are implemented as software instructions executed by one or more processing devices. - The
controller 610 is responsible for the overall operation of the remote fault tolerance engine and orchestrates the operation of the other elements 620-650. The applicationlog replay module 620 is responsible for replaying log events from logs stored in a storage system associated with therecovery site 460. Such replaying of events may occur when a log switch event occurs or when a primary application instance fails, for example. - The
storage system interface 630 provides an interface through which the remote fault tolerance engine may access checkpoint data and logs stored in a storage system associated with therecovery site 460. Thenetwork interface 650 provides an interface through which messages and data may be received from the primary fault tolerance engine via one or more networks. - The shadow
application failover module 640 performs the necessary operations for failing-over the operations of a primary application instance to a shadow application instance using the logs maintained in the storage system associated with therecovery site 460. The shadowapplication failover module 640 may perform operations to cause the replay of events from a currently active log so as to bring the shadow application instance up to a current state and then enable shadow sockets of the shadow application instance so that the shadow application instance may take over operations for the failed primary application instance. - In operation, the
controller 610 may receive messages from the primary fault tolerance engine of theproduction site 450 and may instruct various ones of theelements recovery site 460, thecontroller 610 may initialize a shadow application instance. Thereafter, updates to logs stored in the storage system associated with therecovery site 460 may be received via thenetwork interface 650 and stored in the storage system via thestorage system interface 630. - When the
controller 610 receives a message from the primary fault tolerance engine indicating that a log switch event has occurred, thecontroller 610 may instruct the applicationlog replay module 620 to perform a replay of events from the previous log so as to bring the shadow application instance up to data as of the time of the log switch event. When thecontroller 610 receives a message or otherwise detects that the primary application instance has failed, thecontroller 610 may instruct the shadowapplication failover module 640 to perform operations for failing-over the operations of the primary application instance to the shadow application instance, as described previously. -
FIGS. 7 and 8 are flowcharts outlining exemplary operations of an illustrative embodiment. It will be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the processor or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory or storage medium that can direct a processor or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory or storage medium produce an article of manufacture including instruction means which implement the functions specified in the flowchart block or blocks. - Accordingly, blocks of the flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
-
FIG. 7 is a flowchart outlining an exemplary operation for maintaining a continuous consistent state between two related application instances that are topologically remotely located in accordance with an illustrative embodiment. As shown inFIG. 7 , the operation starts with the primary computing device initiating a primary application instance (step 710). A primary fault tolerance engine generates an initial checkpoint of the primary application instance (step 720) and transmits the initial checkpoint data via a peer-to-peer remote copy operation to a recovery site (step 730). - A remote fault tolerance engine receives the checkpoint data and initiates a shadow application instance using the checkpoint data (step 740). The primary fault tolerance engine initiates logging of events in a first log associated with the primary application instance (step 750). The events that are written to the log are automatically and continuously transmitted to the recovery site by the primary fault tolerance engine using a PPRC operation (step 760).
- The primary fault tolerance engine determines if a switching of logs is to be performed (step 770). If so, the primary fault tolerance engine switches logging of events from the first log to a second log (or vice versa in subsequent switching events) and a switch log event message is sent to the remote fault tolerance engine. The remote fault tolerance engine receives the switch log event message and replays events recorded in the previous log, e.g., the first log (step 780).
- The primary fault tolerance engine determines whether a termination event has occurred (step 790). This termination event may be, for example, the discontinuing of the primary application instance, for example. If a termination event has occurred, the operation is terminated. Otherwise, the operation returns to step 750 and continues to log events using the new event log.
-
FIG. 8 is a flowchart outlining an exemplary operation for performing a failover operation from a primary application instance to a shadow application instance in accordance with one illustrative embodiment. As shown inFIG. 8 , the operation starts with the detection of a failure of a primary application instance by the remote fault tolerance engine (step 810). The remote fault tolerance engine replays the events logged in the currently active log using the shadow application instance (step 820). The shadow sockets of the shadow application are enabled (step 830) and inputs to the primary application instance are redirected to the shadow application instance (step 840). The operation then terminates. - Thus, the illustrative embodiments provide mechanisms for topologically and/or geographically remote computing systems to be used as active standby computing systems for failover operations. The illustrative embodiments provide mechanisms for the automatic and continuous replication of log events between the primary application instance and the shadow application instance such that the two application instances may be maintained with a consistent state. Upon the occurrence of a failure in the primary application instance, the shadow application instance at the topologically and/or geographically remotely located active standby computing device may be used to take over operations for the failed primary application instance with a simple update of the shadow application state using the consistent copy of logged events stored at the remote location. All of this is done with no loss in data.
- The illustrative embodiments as described above may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.
- Furthermore, the illustrative embodiments may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- The medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
- As described previously above with regard to
FIG. 2 , a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. - Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- The description of the illustrative embodiments has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the illustrative embodiments of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various illustrative embodiments with various modifications as are suited to the particular use contemplated.
Claims (35)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/403,050 US7613749B2 (en) | 2006-04-12 | 2006-04-12 | System and method for application fault tolerance and recovery using topologically remotely located computing devices |
CNB2007100913664A CN100461122C (en) | 2006-04-12 | 2007-03-30 | System and method for application fault tolerance and recovery |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/403,050 US7613749B2 (en) | 2006-04-12 | 2006-04-12 | System and method for application fault tolerance and recovery using topologically remotely located computing devices |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070244937A1 true US20070244937A1 (en) | 2007-10-18 |
US7613749B2 US7613749B2 (en) | 2009-11-03 |
Family
ID=38606086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/403,050 Expired - Fee Related US7613749B2 (en) | 2006-04-12 | 2006-04-12 | System and method for application fault tolerance and recovery using topologically remotely located computing devices |
Country Status (2)
Country | Link |
---|---|
US (1) | US7613749B2 (en) |
CN (1) | CN100461122C (en) |
Cited By (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070276879A1 (en) * | 2006-05-26 | 2007-11-29 | Rothman Michael A | Sparse checkpoint and rollback |
US20080046552A1 (en) * | 2006-08-18 | 2008-02-21 | Microsoft Corporation | Service resiliency within on-premise products |
US20080059639A1 (en) * | 2006-08-31 | 2008-03-06 | Sap Ag | Systems and methods of migrating sessions between computer systems |
US20080183799A1 (en) * | 2006-06-06 | 2008-07-31 | Norman Bobroff | System and method for collaborative hosting of applications, virtual machines, and data objects |
US20090113233A1 (en) * | 2007-10-31 | 2009-04-30 | Electronic Data Systems Corporation | Testing Disaster Recovery Elements |
US20090133016A1 (en) * | 2007-11-15 | 2009-05-21 | Brown Aaron C | System and Method for Management of an IOV Adapter Through a Virtual Intermediary in an IOV Management Partition |
US20090133028A1 (en) * | 2007-11-15 | 2009-05-21 | Brown Aaron C | System and method for management of an iov adapter through a virtual intermediary in a hypervisor with functional management in an iov management partition |
US20090144731A1 (en) * | 2007-12-03 | 2009-06-04 | Brown Aaron C | System and method for distribution of resources for an i/o virtualized (iov) adapter and management of the adapter through an iov management partition |
US20090150538A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for monitoring virtual wires |
US20090150521A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for creating a virtual network path |
US20090150527A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for reconfiguring a virtual network path |
US20090150529A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for enforcing resource constraints for virtual machines across migration |
US20090150547A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for scaling applications on a blade chassis |
US20090150883A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for controlling network traffic in a blade chassis |
US20090219936A1 (en) * | 2008-02-29 | 2009-09-03 | Sun Microsystems, Inc. | Method and system for offloading network processing |
US20090222567A1 (en) * | 2008-02-29 | 2009-09-03 | Sun Microsystems, Inc. | Method and system for media-based data transfer |
US20090238189A1 (en) * | 2008-03-24 | 2009-09-24 | Sun Microsystems, Inc. | Method and system for classifying network traffic |
US20090271654A1 (en) * | 2008-04-23 | 2009-10-29 | Hitachi, Ltd. | Control method for information processing system, information processing system, and program |
US20090276773A1 (en) * | 2008-05-05 | 2009-11-05 | International Business Machines Corporation | Multi-Root I/O Virtualization Using Separate Management Facilities of Multiple Logical Partitions |
US20090320011A1 (en) * | 2008-06-20 | 2009-12-24 | Vmware, Inc. | Accelerating replayed program execution to support decoupled program analysis |
US20090328073A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Method and system for low-overhead data transfer |
US20090327392A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Method and system for creating a virtual router in a blade chassis to maintain connectivity |
US20100165874A1 (en) * | 2008-12-30 | 2010-07-01 | International Business Machines Corporation | Differentiating Blade Destination and Traffic Types in a Multi-Root PCIe Environment |
US20100191845A1 (en) * | 2009-01-29 | 2010-07-29 | Vmware, Inc. | Speculative virtual machine resource scheduling |
CN101807985A (en) * | 2010-03-03 | 2010-08-18 | 交通银行股份有限公司 | Datacenter centralization control switching method and system |
US20100250867A1 (en) * | 2009-03-30 | 2010-09-30 | The Boeing Company | Computer architectures using shared storage |
US20100269121A1 (en) * | 2009-04-17 | 2010-10-21 | Accenture Global Services Gmbh | Exchangeable application components |
US20110047413A1 (en) * | 2009-08-20 | 2011-02-24 | Mcgill Robert E | Methods and devices for detecting service failures and maintaining computing services using a resilient intelligent client computer |
EP2372978A1 (en) * | 2010-03-30 | 2011-10-05 | The Boeing Company | Computer architectures using shared storage |
US20120005240A1 (en) * | 2009-03-17 | 2012-01-05 | Nec Corporation | Event database, data management device, data management system, data management program, and data management method |
US20120030178A1 (en) * | 2007-03-12 | 2012-02-02 | Microsoft Corporation | Interfaces for high availability systems and log shipping |
CN102508764A (en) * | 2011-11-04 | 2012-06-20 | 哈尔滨工程大学 | Method for recording event log of node by fault tolerant mobile computing system |
US20120216007A1 (en) * | 2011-02-23 | 2012-08-23 | Red Hat Israel, Ltd. | Page protection ordering for lockless write tracking |
US20130007506A1 (en) * | 2011-07-01 | 2013-01-03 | Microsoft Corporation | Managing recovery virtual machines in clustered environment |
US8621275B1 (en) * | 2010-08-06 | 2013-12-31 | Open Invention Network, Llc | System and method for event-driven live migration of multi-process applications |
US8634415B2 (en) | 2011-02-16 | 2014-01-21 | Oracle International Corporation | Method and system for routing network traffic for a blade server |
US20140078894A1 (en) * | 2012-09-17 | 2014-03-20 | Electronics And Telecommunications Research Institute | Lane fault recovery apparatus and method |
EP2816467A4 (en) * | 2012-03-15 | 2015-03-11 | Huawei Tech Co Ltd | Method and device for checkpoint and restart of container state |
WO2015068097A1 (en) * | 2013-11-05 | 2015-05-14 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for failure recovery in a machine-to-machine network |
US9069782B2 (en) | 2012-10-01 | 2015-06-30 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US9098462B1 (en) | 2010-09-14 | 2015-08-04 | The Boeing Company | Communications via shared memory |
US20150261628A1 (en) * | 2014-03-12 | 2015-09-17 | Ainsworth Game Technology Limited | Devices and methodologies for implementing redundant backups in nvram reliant environments |
US20150281013A1 (en) * | 2014-03-25 | 2015-10-01 | Fujitsu Limited | Ascertainment method and device |
WO2016063114A1 (en) * | 2014-10-23 | 2016-04-28 | Telefonaktiebolaget L M Ericsson (Publ) | System and method for disaster recovery of cloud applications |
US9489327B2 (en) | 2013-11-05 | 2016-11-08 | Oracle International Corporation | System and method for supporting an efficient packet processing model in a network environment |
US20170031804A1 (en) * | 2015-07-31 | 2017-02-02 | Microsoft Technology Licensing, Llc | Enhanced service validation |
US9767271B2 (en) | 2010-07-15 | 2017-09-19 | The Research Foundation For The State University Of New York | System and method for validating program execution at run-time |
US9767284B2 (en) | 2012-09-14 | 2017-09-19 | The Research Foundation For The State University Of New York | Continuous run-time validation of program execution: a practical approach |
US9779105B1 (en) * | 2014-03-31 | 2017-10-03 | EMC IP Holding Company LLC | Transaction logging using file-system-specific log files |
US9858241B2 (en) | 2013-11-05 | 2018-01-02 | Oracle International Corporation | System and method for supporting optimized buffer utilization for packet processing in a networking device |
US20180095830A1 (en) * | 2016-10-03 | 2018-04-05 | International Business Machines Corporation | Replaying processing of a restarted application |
CN108334415A (en) * | 2017-01-20 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | A kind of fault-tolerance processing method, device, terminal and storage medium |
US10122626B2 (en) | 2015-08-27 | 2018-11-06 | Nicira, Inc. | Self-managed overlay networks |
US10153918B2 (en) | 2015-08-27 | 2018-12-11 | Nicira, Inc. | Joining an application cluster |
US10216587B2 (en) | 2016-10-21 | 2019-02-26 | International Business Machines Corporation | Scalable fault tolerant support in a containerized environment |
US10341020B2 (en) * | 2016-03-17 | 2019-07-02 | Avago Technologies International Sales Pte. Limited | Flexible ethernet logical lane aggregation |
US10462011B2 (en) * | 2015-08-27 | 2019-10-29 | Nicira, Inc. | Accessible application cluster topology |
US10685073B1 (en) * | 2013-12-04 | 2020-06-16 | Google Llc | Selecting textual representations for entity attribute values |
US20200204620A1 (en) * | 2018-12-20 | 2020-06-25 | The Boeing Company | Systems and methods of monitoring software application processes |
US10997034B1 (en) | 2010-08-06 | 2021-05-04 | Open Invention Network Llc | System and method for dynamic transparent consistent application-replication of multi-process multi-threaded applications |
US11012292B2 (en) * | 2013-07-08 | 2021-05-18 | Nicira, Inc. | Unified replication mechanism for fault-tolerance of state |
US11030055B2 (en) * | 2013-03-15 | 2021-06-08 | Amazon Technologies, Inc. | Fast crash recovery for distributed database systems |
US11163633B2 (en) | 2019-04-24 | 2021-11-02 | Bank Of America Corporation | Application fault detection and forecasting |
US11321348B2 (en) | 2009-10-26 | 2022-05-03 | Amazon Technologies, Inc. | Provisioning and managing replicated data instances |
US20220263897A1 (en) * | 2019-09-13 | 2022-08-18 | Pure Storage, Inc. | Replicating Multiple Storage Systems Utilizing Coordinated Snapshots |
US11663094B2 (en) | 2017-11-30 | 2023-05-30 | Hewlett Packard Enterprise Development Lp | Reducing recovery time of an application |
US11706296B2 (en) * | 2015-10-13 | 2023-07-18 | Palantir Technologies Inc. | Fault-tolerant and highly available configuration of distributed services |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7793148B2 (en) * | 2007-01-12 | 2010-09-07 | International Business Machines Corporation | Using virtual copies in a failover and failback environment |
JP2008305070A (en) * | 2007-06-06 | 2008-12-18 | Hitachi Communication Technologies Ltd | Information processor and information processor system |
CN101145946B (en) * | 2007-09-17 | 2010-09-01 | 中兴通讯股份有限公司 | A fault tolerance cluster system and method based on message log |
US8019732B2 (en) | 2008-08-08 | 2011-09-13 | Amazon Technologies, Inc. | Managing access of multiple executing programs to non-local block data storage |
CN101477488B (en) * | 2009-01-16 | 2011-03-16 | 哈尔滨工程大学 | Key service system oriented system repentance recovery method and system |
US8205113B2 (en) | 2009-07-14 | 2012-06-19 | Ab Initio Technology Llc | Fault tolerant batch processing |
US8751760B2 (en) * | 2009-10-01 | 2014-06-10 | Dell Products L.P. | Systems and methods for power state transitioning in an information handling system |
US8245083B2 (en) * | 2009-12-24 | 2012-08-14 | At&T Intellectual Property I, L.P. | Systems, methods, and apparatus to debug a network application |
CN101964030A (en) * | 2010-07-19 | 2011-02-02 | 北京兴宇中科科技开发股份有限公司 | Volume stage continuous data protection system supported by consistent point insertion and recovery and method |
US8903782B2 (en) * | 2010-07-27 | 2014-12-02 | Microsoft Corporation | Application instance and query stores |
US8762339B2 (en) * | 2010-11-29 | 2014-06-24 | International Business Machines Corporation | Disaster recovery utilizing collapsible virtualized capacity |
US8713362B2 (en) | 2010-12-01 | 2014-04-29 | International Business Machines Corporation | Obviation of recovery of data store consistency for application I/O errors |
US8694821B2 (en) | 2010-12-03 | 2014-04-08 | International Business Machines Corporation | Generation of standby images of applications |
US8566636B2 (en) | 2011-01-27 | 2013-10-22 | International Business Machines Corporation | Application recovery in a file system |
CN102455951A (en) * | 2011-07-21 | 2012-05-16 | 中标软件有限公司 | Fault tolerance method and system of virtual machines |
CN103186435B (en) * | 2011-12-28 | 2015-11-25 | 英业达股份有限公司 | System mistake disposal route and the server system using the method |
KR20130081552A (en) | 2012-01-09 | 2013-07-17 | 삼성전자주식회사 | Apparatus and method for recovery of fault |
US8918672B2 (en) * | 2012-05-31 | 2014-12-23 | International Business Machines Corporation | Maximizing use of storage in a data replication environment |
US9256463B2 (en) | 2012-06-29 | 2016-02-09 | International Business Machines Corporation | Method and apparatus to replicate stateful virtual machines between clouds |
US9507586B2 (en) | 2012-10-05 | 2016-11-29 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Virtual machine based controller and upgrade mechanism |
US9104607B2 (en) | 2012-10-31 | 2015-08-11 | International Business Machines Corporation | Simulation engine for use in disaster recovery virtualization |
US9983953B2 (en) * | 2012-12-20 | 2018-05-29 | Intel Corporation | Multiple computer system processing write data outside of checkpointing |
CN103927236B (en) * | 2013-01-11 | 2018-01-16 | 深圳市腾讯计算机系统有限公司 | On-line testing method and apparatus |
US10127060B2 (en) * | 2013-08-16 | 2018-11-13 | Intuitive Surgical Operations, Inc. | System and method for replay of data and events provided by heterogeneous devices |
GB2517780A (en) | 2013-09-02 | 2015-03-04 | Ibm | Improved checkpoint and restart |
CN105204977A (en) * | 2014-06-30 | 2015-12-30 | 中兴通讯股份有限公司 | System exception capturing method, main system, shadow system and intelligent equipment |
US10645163B2 (en) * | 2015-10-01 | 2020-05-05 | Hewlett Packard Enterprise Development Lp | Site-aware cluster management |
CN105530120A (en) * | 2015-12-01 | 2016-04-27 | 中国建设银行股份有限公司 | Service processing method, controller and service processing system |
US10228962B2 (en) | 2015-12-09 | 2019-03-12 | Commvault Systems, Inc. | Live synchronization and management of virtual machines across computing and virtualization platforms and using live synchronization to support disaster recovery |
US10387266B2 (en) * | 2015-12-23 | 2019-08-20 | Commvault Systems, Inc. | Application-level live synchronization across computing platforms including synchronizing co-resident applications to disparate standby destinations and selectively synchronizing some applications and not others |
CN106020859A (en) * | 2016-05-04 | 2016-10-12 | 青岛海信移动通信技术股份有限公司 | Terminal application installation method and device |
US10552267B2 (en) * | 2016-09-15 | 2020-02-04 | International Business Machines Corporation | Microcheckpointing with service processor |
US10719244B2 (en) * | 2016-11-18 | 2020-07-21 | International Business Machines Corporation | Multi-mode data replication for data loss risk reduction |
CN110727550B (en) * | 2019-12-18 | 2020-06-12 | 中国银联股份有限公司 | Data replication processing method, data replication processing device, disaster recovery system, disaster recovery equipment and storage medium |
US11327663B2 (en) | 2020-06-09 | 2022-05-10 | Commvault Systems, Inc. | Ensuring the integrity of data storage volumes used in block-level live synchronization operations in a data storage management system |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4941087A (en) * | 1986-09-19 | 1990-07-10 | Asea Aktiebolag | System for bumpless changeover between active units and backup units by establishing rollback points and logging write and read operations |
US4945474A (en) * | 1988-04-08 | 1990-07-31 | Internatinal Business Machines Corporation | Method for restoring a database after I/O error employing write-ahead logging protocols |
US5274645A (en) * | 1990-03-02 | 1993-12-28 | Micro Technology, Inc. | Disk array system |
US6339793B1 (en) * | 1999-04-06 | 2002-01-15 | International Business Machines Corporation | Read/write data sharing of DASD data, including byte file system data, in a cluster of multiple data processing systems |
US6349357B1 (en) * | 1999-03-04 | 2002-02-19 | Sun Microsystems, Inc. | Storage architecture providing scalable performance through independent control and data transfer paths |
US6397229B1 (en) * | 1998-02-02 | 2002-05-28 | International Business Machines Corporation | Storage-controller-managed outboard incremental backup/restore of data |
US20020163910A1 (en) * | 2001-05-01 | 2002-11-07 | Wisner Steven P. | System and method for providing access to resources using a fabric switch |
US6658590B1 (en) * | 2000-03-30 | 2003-12-02 | Hewlett-Packard Development Company, L.P. | Controller-based transaction logging system for data recovery in a storage area network |
US20040064639A1 (en) * | 2000-03-30 | 2004-04-01 | Sicola Stephen J. | Controller-based remote copy system with logical unit grouping |
US6826613B1 (en) * | 2000-03-15 | 2004-11-30 | 3Com Corporation | Virtually addressing storage devices through a switch |
US20050071380A1 (en) * | 2003-09-29 | 2005-03-31 | Micka William F. | Apparatus and method to coordinate multiple data storage and retrieval systems |
US20050081091A1 (en) * | 2003-09-29 | 2005-04-14 | International Business Machines (Ibm) Corporation | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
US20050262483A1 (en) * | 2004-05-05 | 2005-11-24 | Bea Systems, Inc. | System and method for application propagation |
US20060015770A1 (en) * | 2004-07-14 | 2006-01-19 | Jeffrey Dicorpo | Method and system for a failover procedure with a storage system |
US20060108470A1 (en) * | 2004-11-24 | 2006-05-25 | The Boeing Company | Superconducting crawler system for a production line |
US7054960B1 (en) * | 2003-11-18 | 2006-05-30 | Veritas Operating Corporation | System and method for identifying block-level write operations to be transferred to a secondary site during replication |
US7146387B1 (en) * | 2001-12-19 | 2006-12-05 | Emc Corporation | System and method for configuring and performing application backups and restores in diverse environments |
US20070033361A1 (en) * | 2005-08-02 | 2007-02-08 | Abdulvahid Jasmeer K | Apparatus, system, and method for fastcopy target creation |
US20070214196A1 (en) * | 2006-03-08 | 2007-09-13 | International Business Machines | Coordinated federated backup of a distributed application environment |
US7346905B2 (en) * | 2003-06-10 | 2008-03-18 | International Business Machines Corporation | Apparatus and method for maintaining resource integrity without a unified transaction manager in a software environment |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155678A (en) | 1985-10-29 | 1992-10-13 | International Business Machines Corporation | Data availability in restartable data base system |
JPH0540682A (en) * | 1990-06-08 | 1993-02-19 | Internatl Business Mach Corp <Ibm> | High available trouble-resistant relocation of storage device having atomicity |
US5737514A (en) * | 1995-11-29 | 1998-04-07 | Texas Micro, Inc. | Remote checkpoint memory system and protocol for fault-tolerant computer system |
US6205449B1 (en) | 1998-03-20 | 2001-03-20 | Lucent Technologies, Inc. | System and method for providing hot spare redundancy and recovery for a very large database management system |
US6092085A (en) | 1998-03-24 | 2000-07-18 | International Business Machines Corporation | Method and system for improved database disaster recovery |
US6163856A (en) | 1998-05-29 | 2000-12-19 | Sun Microsystems, Inc. | Method and apparatus for file system disaster recovery |
US6629263B1 (en) | 1998-11-10 | 2003-09-30 | Hewlett-Packard Company | Fault tolerant network element for a common channel signaling (CCS) system |
JP2001034568A (en) | 1999-07-21 | 2001-02-09 | Fujitsu Ltd | Logical path establishing method, and storage medium |
US8156074B1 (en) | 2000-01-26 | 2012-04-10 | Synchronoss Technologies, Inc. | Data transfer and synchronization system |
US6721901B1 (en) | 2000-02-28 | 2004-04-13 | International Business Machines Corporation | Method and system for recovering mirrored logical data volumes within a data processing system |
FR2820221B1 (en) | 2001-02-01 | 2004-08-20 | Cimai Technology | METHOD AND SYSTEM FOR MANAGING EXECUTABLES WITH SHARED LIBRARIES |
US7143252B2 (en) | 2001-05-10 | 2006-11-28 | Hitachi, Ltd. | Storage apparatus system and method of data backup |
US6978398B2 (en) | 2001-08-15 | 2005-12-20 | International Business Machines Corporation | Method and system for proactively reducing the outage time of a computer system |
FR2843210B1 (en) | 2002-08-02 | 2005-10-14 | Meiosys | METHOD FOR MIGRATION OF CONNECTIONS IN A MULTI-COMPUTER ARCHITECTURE, METHOD FOR PERFORMING OPERATING CONTINUITY USING THE METHOD OF MIGRATION, AND MULTI-COMPUTER SYSTEM THUS EQUIPPED |
FR2843209B1 (en) | 2002-08-02 | 2006-01-06 | Cimai Technology | METHOD FOR REPLICATING SOFTWARE APPLICATION IN MULTI-COMPUTER ARCHITECTURE, METHOD FOR REALIZING OPERATING CONTINUITY USING THIS REPLICATION METHOD, AND MULTI-COMPUTER SYSTEM THUS EQUIPPED |
CA2419883A1 (en) | 2003-02-26 | 2004-08-26 | Ibm Canada Limited - Ibm Canada Limitee | Discriminatory replay of log files during table space recovery in a database management system |
US20050021836A1 (en) | 2003-05-01 | 2005-01-27 | Reed Carl J. | System and method for message processing and routing |
US7237056B2 (en) | 2003-11-17 | 2007-06-26 | Hewlett-Packard Development Company, L.P. | Tape mirror interface |
US7299378B2 (en) | 2004-01-15 | 2007-11-20 | Oracle International Corporation | Geographically distributed clusters |
JP4452533B2 (en) * | 2004-03-19 | 2010-04-21 | 株式会社日立製作所 | System and storage system |
-
2006
- 2006-04-12 US US11/403,050 patent/US7613749B2/en not_active Expired - Fee Related
-
2007
- 2007-03-30 CN CNB2007100913664A patent/CN100461122C/en not_active Expired - Fee Related
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4941087A (en) * | 1986-09-19 | 1990-07-10 | Asea Aktiebolag | System for bumpless changeover between active units and backup units by establishing rollback points and logging write and read operations |
US4945474A (en) * | 1988-04-08 | 1990-07-31 | Internatinal Business Machines Corporation | Method for restoring a database after I/O error employing write-ahead logging protocols |
US5274645A (en) * | 1990-03-02 | 1993-12-28 | Micro Technology, Inc. | Disk array system |
US6397229B1 (en) * | 1998-02-02 | 2002-05-28 | International Business Machines Corporation | Storage-controller-managed outboard incremental backup/restore of data |
US6349357B1 (en) * | 1999-03-04 | 2002-02-19 | Sun Microsystems, Inc. | Storage architecture providing scalable performance through independent control and data transfer paths |
US6339793B1 (en) * | 1999-04-06 | 2002-01-15 | International Business Machines Corporation | Read/write data sharing of DASD data, including byte file system data, in a cluster of multiple data processing systems |
US6826613B1 (en) * | 2000-03-15 | 2004-11-30 | 3Com Corporation | Virtually addressing storage devices through a switch |
US6658590B1 (en) * | 2000-03-30 | 2003-12-02 | Hewlett-Packard Development Company, L.P. | Controller-based transaction logging system for data recovery in a storage area network |
US20040064639A1 (en) * | 2000-03-30 | 2004-04-01 | Sicola Stephen J. | Controller-based remote copy system with logical unit grouping |
US20020163910A1 (en) * | 2001-05-01 | 2002-11-07 | Wisner Steven P. | System and method for providing access to resources using a fabric switch |
US7146387B1 (en) * | 2001-12-19 | 2006-12-05 | Emc Corporation | System and method for configuring and performing application backups and restores in diverse environments |
US7346905B2 (en) * | 2003-06-10 | 2008-03-18 | International Business Machines Corporation | Apparatus and method for maintaining resource integrity without a unified transaction manager in a software environment |
US20050071380A1 (en) * | 2003-09-29 | 2005-03-31 | Micka William F. | Apparatus and method to coordinate multiple data storage and retrieval systems |
US20050081091A1 (en) * | 2003-09-29 | 2005-04-14 | International Business Machines (Ibm) Corporation | Method, system and article of manufacture for recovery from a failure in a cascading PPRC system |
US7054960B1 (en) * | 2003-11-18 | 2006-05-30 | Veritas Operating Corporation | System and method for identifying block-level write operations to be transferred to a secondary site during replication |
US20050262483A1 (en) * | 2004-05-05 | 2005-11-24 | Bea Systems, Inc. | System and method for application propagation |
US20060015770A1 (en) * | 2004-07-14 | 2006-01-19 | Jeffrey Dicorpo | Method and system for a failover procedure with a storage system |
US20060108470A1 (en) * | 2004-11-24 | 2006-05-25 | The Boeing Company | Superconducting crawler system for a production line |
US20070033361A1 (en) * | 2005-08-02 | 2007-02-08 | Abdulvahid Jasmeer K | Apparatus, system, and method for fastcopy target creation |
US20070214196A1 (en) * | 2006-03-08 | 2007-09-13 | International Business Machines | Coordinated federated backup of a distributed application environment |
Cited By (122)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070276879A1 (en) * | 2006-05-26 | 2007-11-29 | Rothman Michael A | Sparse checkpoint and rollback |
US20080183799A1 (en) * | 2006-06-06 | 2008-07-31 | Norman Bobroff | System and method for collaborative hosting of applications, virtual machines, and data objects |
US8549515B2 (en) * | 2006-06-06 | 2013-10-01 | International Business Machines Corporation | System and method for collaborative hosting of applications, virtual machines, and data objects |
US8799446B2 (en) * | 2006-08-18 | 2014-08-05 | Microsoft Corporation | Service resiliency within on-premise products |
US20080046552A1 (en) * | 2006-08-18 | 2008-02-21 | Microsoft Corporation | Service resiliency within on-premise products |
US20080059639A1 (en) * | 2006-08-31 | 2008-03-06 | Sap Ag | Systems and methods of migrating sessions between computer systems |
US7840683B2 (en) * | 2006-08-31 | 2010-11-23 | Sap Ag | Systems and methods of migrating sessions between computer systems |
US8615486B2 (en) * | 2007-03-12 | 2013-12-24 | Microsoft Corporation | Interfaces for high availability systems and log shipping |
US20120030178A1 (en) * | 2007-03-12 | 2012-02-02 | Microsoft Corporation | Interfaces for high availability systems and log shipping |
US20090113233A1 (en) * | 2007-10-31 | 2009-04-30 | Electronic Data Systems Corporation | Testing Disaster Recovery Elements |
US8984326B2 (en) | 2007-10-31 | 2015-03-17 | Hewlett-Packard Development Company, L.P. | Testing disaster recovery elements |
WO2009058427A1 (en) * | 2007-10-31 | 2009-05-07 | Hewlett-Packard Development Company L.P. | Testing disaster recovery elements |
US20090133028A1 (en) * | 2007-11-15 | 2009-05-21 | Brown Aaron C | System and method for management of an iov adapter through a virtual intermediary in a hypervisor with functional management in an iov management partition |
US20090133016A1 (en) * | 2007-11-15 | 2009-05-21 | Brown Aaron C | System and Method for Management of an IOV Adapter Through a Virtual Intermediary in an IOV Management Partition |
US8141092B2 (en) | 2007-11-15 | 2012-03-20 | International Business Machines Corporation | Management of an IOV adapter through a virtual intermediary in a hypervisor with functional management in an IOV management partition |
US8141093B2 (en) | 2007-11-15 | 2012-03-20 | International Business Machines Corporation | Management of an IOV adapter through a virtual intermediary in an IOV management partition |
US20090144731A1 (en) * | 2007-12-03 | 2009-06-04 | Brown Aaron C | System and method for distribution of resources for an i/o virtualized (iov) adapter and management of the adapter through an iov management partition |
US8141094B2 (en) | 2007-12-03 | 2012-03-20 | International Business Machines Corporation | Distribution of resources for I/O virtualized (IOV) adapters and management of the adapters through an IOV management partition via user selection of compatible virtual functions |
US20090150538A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for monitoring virtual wires |
US20090150529A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for enforcing resource constraints for virtual machines across migration |
US7945647B2 (en) | 2007-12-10 | 2011-05-17 | Oracle America, Inc. | Method and system for creating a virtual network path |
US7962587B2 (en) | 2007-12-10 | 2011-06-14 | Oracle America, Inc. | Method and system for enforcing resource constraints for virtual machines across migration |
US20090150521A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for creating a virtual network path |
US20090150883A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for controlling network traffic in a blade chassis |
US20090150547A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for scaling applications on a blade chassis |
US8095661B2 (en) * | 2007-12-10 | 2012-01-10 | Oracle America, Inc. | Method and system for scaling applications on a blade chassis |
US8086739B2 (en) | 2007-12-10 | 2011-12-27 | Oracle America, Inc. | Method and system for monitoring virtual wires |
US8370530B2 (en) | 2007-12-10 | 2013-02-05 | Oracle America, Inc. | Method and system for controlling network traffic in a blade chassis |
US7984123B2 (en) | 2007-12-10 | 2011-07-19 | Oracle America, Inc. | Method and system for reconfiguring a virtual network path |
US20090150527A1 (en) * | 2007-12-10 | 2009-06-11 | Sun Microsystems, Inc. | Method and system for reconfiguring a virtual network path |
US7970951B2 (en) | 2008-02-29 | 2011-06-28 | Oracle America, Inc. | Method and system for media-based data transfer |
US20090219936A1 (en) * | 2008-02-29 | 2009-09-03 | Sun Microsystems, Inc. | Method and system for offloading network processing |
US20090222567A1 (en) * | 2008-02-29 | 2009-09-03 | Sun Microsystems, Inc. | Method and system for media-based data transfer |
US7965714B2 (en) | 2008-02-29 | 2011-06-21 | Oracle America, Inc. | Method and system for offloading network processing |
US7944923B2 (en) | 2008-03-24 | 2011-05-17 | Oracle America, Inc. | Method and system for classifying network traffic |
US20090238189A1 (en) * | 2008-03-24 | 2009-09-24 | Sun Microsystems, Inc. | Method and system for classifying network traffic |
US8074098B2 (en) * | 2008-04-23 | 2011-12-06 | Hitachi, Ltd. | Control method for information processing system, information processing system, and program |
US20120047395A1 (en) * | 2008-04-23 | 2012-02-23 | Masayuki Fukuyama | Control method for information processing system, information processing system, and program |
US8423162B2 (en) * | 2008-04-23 | 2013-04-16 | Hitachi, Ltd. | Control method for information processing system, information processing system, and program |
US20090271654A1 (en) * | 2008-04-23 | 2009-10-29 | Hitachi, Ltd. | Control method for information processing system, information processing system, and program |
US8359415B2 (en) * | 2008-05-05 | 2013-01-22 | International Business Machines Corporation | Multi-root I/O virtualization using separate management facilities of multiple logical partitions |
US20090276773A1 (en) * | 2008-05-05 | 2009-11-05 | International Business Machines Corporation | Multi-Root I/O Virtualization Using Separate Management Facilities of Multiple Logical Partitions |
US10255159B2 (en) | 2008-06-20 | 2019-04-09 | Vmware, Inc. | Decoupling dynamic program analysis from execution in virtual environments |
US8719800B2 (en) | 2008-06-20 | 2014-05-06 | Vmware, Inc. | Accelerating replayed program execution to support decoupled program analysis |
US9058420B2 (en) * | 2008-06-20 | 2015-06-16 | Vmware, Inc. | Synchronous decoupled program analysis in virtual environments |
US9823992B2 (en) | 2008-06-20 | 2017-11-21 | Vmware, Inc. | Decoupling dynamic program analysis from execution in virtual environments |
US20090320011A1 (en) * | 2008-06-20 | 2009-12-24 | Vmware, Inc. | Accelerating replayed program execution to support decoupled program analysis |
US20090320010A1 (en) * | 2008-06-20 | 2009-12-24 | Vmware, Inc. | Synchronous decoupled program analysis in virtual environments |
US20090320009A1 (en) * | 2008-06-20 | 2009-12-24 | Vmware, Inc. | Decoupling dynamic program analysis from execution in virtual environments |
US20090328073A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Method and system for low-overhead data transfer |
US7941539B2 (en) | 2008-06-30 | 2011-05-10 | Oracle America, Inc. | Method and system for creating a virtual router in a blade chassis to maintain connectivity |
US20090327392A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Method and system for creating a virtual router in a blade chassis to maintain connectivity |
US8739179B2 (en) | 2008-06-30 | 2014-05-27 | Oracle America Inc. | Method and system for low-overhead data transfer |
US8144582B2 (en) | 2008-12-30 | 2012-03-27 | International Business Machines Corporation | Differentiating blade destination and traffic types in a multi-root PCIe environment |
US20100165874A1 (en) * | 2008-12-30 | 2010-07-01 | International Business Machines Corporation | Differentiating Blade Destination and Traffic Types in a Multi-Root PCIe Environment |
US20100191845A1 (en) * | 2009-01-29 | 2010-07-29 | Vmware, Inc. | Speculative virtual machine resource scheduling |
US8019861B2 (en) | 2009-01-29 | 2011-09-13 | Vmware, Inc. | Speculative virtual machine resource scheduling |
US20120005240A1 (en) * | 2009-03-17 | 2012-01-05 | Nec Corporation | Event database, data management device, data management system, data management program, and data management method |
US8601307B2 (en) | 2009-03-30 | 2013-12-03 | The Boeing Company | Computer architectures using shared storage |
US8972515B2 (en) | 2009-03-30 | 2015-03-03 | The Boeing Company | Computer architectures using shared storage |
US9690839B2 (en) | 2009-03-30 | 2017-06-27 | The Boeing Company | Computer architectures using shared storage |
US8601309B2 (en) | 2009-03-30 | 2013-12-03 | The Boeing Company | Computer architectures using shared storage |
US9098562B2 (en) | 2009-03-30 | 2015-08-04 | The Boeing Company | Computer architectures using shared storage |
US8601308B2 (en) | 2009-03-30 | 2013-12-03 | The Boeing Company | Computer architectures using shared storage |
US20100257374A1 (en) * | 2009-03-30 | 2010-10-07 | The Boeing Company | Computer architectures using shared storage |
US20100250867A1 (en) * | 2009-03-30 | 2010-09-30 | The Boeing Company | Computer architectures using shared storage |
US20100269121A1 (en) * | 2009-04-17 | 2010-10-21 | Accenture Global Services Gmbh | Exchangeable application components |
US8949657B2 (en) | 2009-08-20 | 2015-02-03 | Landmark Technology Partners, Inc. | Methods and devices for detecting service failures and maintaining computing services using a resilient intelligent client computer |
US20110047413A1 (en) * | 2009-08-20 | 2011-02-24 | Mcgill Robert E | Methods and devices for detecting service failures and maintaining computing services using a resilient intelligent client computer |
US11321348B2 (en) | 2009-10-26 | 2022-05-03 | Amazon Technologies, Inc. | Provisioning and managing replicated data instances |
US11907254B2 (en) | 2009-10-26 | 2024-02-20 | Amazon Technologies, Inc. | Provisioning and managing replicated data instances |
CN101807985A (en) * | 2010-03-03 | 2010-08-18 | 交通银行股份有限公司 | Datacenter centralization control switching method and system |
EP2372978A1 (en) * | 2010-03-30 | 2011-10-05 | The Boeing Company | Computer architectures using shared storage |
US9767271B2 (en) | 2010-07-15 | 2017-09-19 | The Research Foundation For The State University Of New York | System and method for validating program execution at run-time |
US10997034B1 (en) | 2010-08-06 | 2021-05-04 | Open Invention Network Llc | System and method for dynamic transparent consistent application-replication of multi-process multi-threaded applications |
US8621275B1 (en) * | 2010-08-06 | 2013-12-31 | Open Invention Network, Llc | System and method for event-driven live migration of multi-process applications |
US11099950B1 (en) | 2010-08-06 | 2021-08-24 | Open Invention Network Llc | System and method for event-driven live migration of multi-process applications |
US9098462B1 (en) | 2010-09-14 | 2015-08-04 | The Boeing Company | Communications via shared memory |
US8634415B2 (en) | 2011-02-16 | 2014-01-21 | Oracle International Corporation | Method and system for routing network traffic for a blade server |
US9544232B2 (en) | 2011-02-16 | 2017-01-10 | Oracle International Corporation | System and method for supporting virtualized switch classification tables |
US20120216007A1 (en) * | 2011-02-23 | 2012-08-23 | Red Hat Israel, Ltd. | Page protection ordering for lockless write tracking |
US9990237B2 (en) * | 2011-02-23 | 2018-06-05 | Red Hat Israel, Ltd. | Lockless write tracking |
US9176829B2 (en) * | 2011-07-01 | 2015-11-03 | Microsoft Technology Licensing, Llc | Managing recovery virtual machines in clustered environment |
US20130007506A1 (en) * | 2011-07-01 | 2013-01-03 | Microsoft Corporation | Managing recovery virtual machines in clustered environment |
CN102508764A (en) * | 2011-11-04 | 2012-06-20 | 哈尔滨工程大学 | Method for recording event log of node by fault tolerant mobile computing system |
EP2816467A4 (en) * | 2012-03-15 | 2015-03-11 | Huawei Tech Co Ltd | Method and device for checkpoint and restart of container state |
US9767284B2 (en) | 2012-09-14 | 2017-09-19 | The Research Foundation For The State University Of New York | Continuous run-time validation of program execution: a practical approach |
US20140078894A1 (en) * | 2012-09-17 | 2014-03-20 | Electronics And Telecommunications Research Institute | Lane fault recovery apparatus and method |
US9069782B2 (en) | 2012-10-01 | 2015-06-30 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US9552495B2 (en) | 2012-10-01 | 2017-01-24 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US10324795B2 (en) | 2012-10-01 | 2019-06-18 | The Research Foundation for the State University o | System and method for security and privacy aware virtual machine checkpointing |
US11030055B2 (en) * | 2013-03-15 | 2021-06-08 | Amazon Technologies, Inc. | Fast crash recovery for distributed database systems |
US11012292B2 (en) * | 2013-07-08 | 2021-05-18 | Nicira, Inc. | Unified replication mechanism for fault-tolerance of state |
US9348701B2 (en) | 2013-11-05 | 2016-05-24 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for failure recovery in a machine-to-machine network |
US9858241B2 (en) | 2013-11-05 | 2018-01-02 | Oracle International Corporation | System and method for supporting optimized buffer utilization for packet processing in a networking device |
WO2015068097A1 (en) * | 2013-11-05 | 2015-05-14 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for failure recovery in a machine-to-machine network |
US9489327B2 (en) | 2013-11-05 | 2016-11-08 | Oracle International Corporation | System and method for supporting an efficient packet processing model in a network environment |
US10685073B1 (en) * | 2013-12-04 | 2020-06-16 | Google Llc | Selecting textual representations for entity attribute values |
US20150261628A1 (en) * | 2014-03-12 | 2015-09-17 | Ainsworth Game Technology Limited | Devices and methodologies for implementing redundant backups in nvram reliant environments |
US9720790B2 (en) * | 2014-03-12 | 2017-08-01 | Ainsworth Game Technology Limited | Devices and methodologies for implementing redundant backups in NVRAM reliant environments |
US20150281013A1 (en) * | 2014-03-25 | 2015-10-01 | Fujitsu Limited | Ascertainment method and device |
US9680721B2 (en) * | 2014-03-25 | 2017-06-13 | Fujitsu Limited | Ascertainment method and device |
US9779105B1 (en) * | 2014-03-31 | 2017-10-03 | EMC IP Holding Company LLC | Transaction logging using file-system-specific log files |
WO2016063114A1 (en) * | 2014-10-23 | 2016-04-28 | Telefonaktiebolaget L M Ericsson (Publ) | System and method for disaster recovery of cloud applications |
US10061684B2 (en) * | 2015-07-31 | 2018-08-28 | Microsoft Technology Licensing, Llc | Enhanced service validation |
US20170031804A1 (en) * | 2015-07-31 | 2017-02-02 | Microsoft Technology Licensing, Llc | Enhanced service validation |
US10462011B2 (en) * | 2015-08-27 | 2019-10-29 | Nicira, Inc. | Accessible application cluster topology |
US11206188B2 (en) | 2015-08-27 | 2021-12-21 | Nicira, Inc. | Accessible application cluster topology |
US10122626B2 (en) | 2015-08-27 | 2018-11-06 | Nicira, Inc. | Self-managed overlay networks |
US10153918B2 (en) | 2015-08-27 | 2018-12-11 | Nicira, Inc. | Joining an application cluster |
US11706296B2 (en) * | 2015-10-13 | 2023-07-18 | Palantir Technologies Inc. | Fault-tolerant and highly available configuration of distributed services |
US10341020B2 (en) * | 2016-03-17 | 2019-07-02 | Avago Technologies International Sales Pte. Limited | Flexible ethernet logical lane aggregation |
US10896095B2 (en) | 2016-10-03 | 2021-01-19 | International Business Machines Corporation | Replaying processing of a restarted application |
US10540233B2 (en) * | 2016-10-03 | 2020-01-21 | International Business Machines Corporation | Replaying processing of a restarted application |
US20180095830A1 (en) * | 2016-10-03 | 2018-04-05 | International Business Machines Corporation | Replaying processing of a restarted application |
US10216587B2 (en) | 2016-10-21 | 2019-02-26 | International Business Machines Corporation | Scalable fault tolerant support in a containerized environment |
CN108334415A (en) * | 2017-01-20 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | A kind of fault-tolerance processing method, device, terminal and storage medium |
US11663094B2 (en) | 2017-11-30 | 2023-05-30 | Hewlett Packard Enterprise Development Lp | Reducing recovery time of an application |
US10924538B2 (en) * | 2018-12-20 | 2021-02-16 | The Boeing Company | Systems and methods of monitoring software application processes |
US20200204620A1 (en) * | 2018-12-20 | 2020-06-25 | The Boeing Company | Systems and methods of monitoring software application processes |
US11163633B2 (en) | 2019-04-24 | 2021-11-02 | Bank Of America Corporation | Application fault detection and forecasting |
US20220263897A1 (en) * | 2019-09-13 | 2022-08-18 | Pure Storage, Inc. | Replicating Multiple Storage Systems Utilizing Coordinated Snapshots |
Also Published As
Publication number | Publication date |
---|---|
CN101055538A (en) | 2007-10-17 |
CN100461122C (en) | 2009-02-11 |
US7613749B2 (en) | 2009-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7613749B2 (en) | System and method for application fault tolerance and recovery using topologically remotely located computing devices | |
US20070234342A1 (en) | System and method for relocating running applications to topologically remotely located computing systems | |
US5870537A (en) | Concurrent switch to shadowed device for storage controller and device errors | |
EP2118750B1 (en) | Using virtual copies in a failover and failback environment | |
US9823973B1 (en) | Creating consistent snapshots in a virtualized environment | |
US7603581B2 (en) | Remote copying of updates to primary and secondary storage locations subject to a copy relationship | |
JP3655963B2 (en) | Storage controller, data storage system including the same, and dual pair suppression method | |
US8209282B2 (en) | Method, system, and article of manufacture for mirroring data at storage locations | |
US9514012B2 (en) | Tertiary storage unit management in bidirectional data copying | |
EP1700215B1 (en) | Coordinated storage management operations in replication environment | |
US7779291B2 (en) | Four site triangular asynchronous replication | |
US7043665B2 (en) | Method, system, and program for handling a failover to a remote storage location | |
US7032089B1 (en) | Replica synchronization using copy-on-read technique | |
US7308545B1 (en) | Method and system of providing replication | |
US8028192B1 (en) | Method and system for rapid failback of a computer system in a disaster recovery environment | |
US9576040B1 (en) | N-site asynchronous replication | |
US7831550B1 (en) | Propagating results of a volume-changing operation to replicated nodes | |
JP2007310701A (en) | Database system, storage device, initial duplication method, and log application method | |
MXPA06005797A (en) | System and method for failover. | |
JPH07239799A (en) | Method for provision of remote data shadowing and remote data duplex system | |
US7979396B1 (en) | System and method for performing consistent resynchronization between synchronized copies | |
CN111158955B (en) | High-availability system based on volume replication and multi-server data synchronization method | |
CN105824571A (en) | Data seamless migration method and device | |
JP2017167602A (en) | Storage system | |
US9582384B2 (en) | Method and system for data replication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FLYNN JR., JOHN THOMAS;HOWIE, MIHAELA;REEL/FRAME:017732/0439 Effective date: 20060411 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.) |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20171103 |