US8271436B2 - Retro-fitting synthetic full copies of data - Google Patents

Retro-fitting synthetic full copies of data Download PDF

Info

Publication number
US8271436B2
US8271436B2 US11/541,857 US54185706A US8271436B2 US 8271436 B2 US8271436 B2 US 8271436B2 US 54185706 A US54185706 A US 54185706A US 8271436 B2 US8271436 B2 US 8271436B2
Authority
US
United States
Prior art keywords
data
servers
copy
log
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US11/541,857
Other versions
US20070233756A1 (en
Inventor
Roy P. D'Souza
T. M. Ravi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Mimosa Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/211,056 external-priority patent/US7778976B2/en
Priority claimed from US11/500,809 external-priority patent/US8161318B2/en
Priority to US11/541,857 priority Critical patent/US8271436B2/en
Application filed by Mimosa Systems Inc filed Critical Mimosa Systems Inc
Assigned to MIMOSA SYSTEMS, INC. reassignment MIMOSA SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RAVI, T.M., D'SOUZA, ROY P.
Publication of US20070233756A1 publication Critical patent/US20070233756A1/en
Assigned to SILICON VALLEY BANK (AGENT) reassignment SILICON VALLEY BANK (AGENT) SECURITY AGREEMENT Assignors: MIMOSA SYSTEMS, INC.
Assigned to SILICON VALLEY BANK reassignment SILICON VALLEY BANK SECURITY AGREEMENT Assignors: MIMOSA SYSTEMS, INC.
Assigned to MIMOSA SYSTEMS, INC. reassignment MIMOSA SYSTEMS, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: SILICON VALLEY BANK
Assigned to MIMOSA SYSTEMS, INC. reassignment MIMOSA SYSTEMS, INC. RELEASE Assignors: SILICON VALLEY BANK
Assigned to MIMOSA SYSTEMS, INC. reassignment MIMOSA SYSTEMS, INC. RELEASE Assignors: SILICON VALLEY BANK (AGENT)
Publication of US8271436B2 publication Critical patent/US8271436B2/en
Application granted granted Critical
Assigned to HEWLETT-PACKARD COMPANY reassignment HEWLETT-PACKARD COMPANY MERGER (SEE DOCUMENT FOR DETAILS). Assignors: MIMOSA SYSTEMS, INC.
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD COMPANY
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers

Definitions

  • the disclosure herein relates generally to data protection, archival, data management, and information management.
  • Data servers host critical production data in their storage systems.
  • the storage systems are usually required to provide a level of data availability and service availability.
  • Data and service are usually required to be resilient to a variety of failures, which could range from media failures to data center failures. Typically this requirement is addressed in part by a range of data protection schemes that may include tape-based backup of all or some of the production data.
  • FIG. 1 is a block diagram of a data surrogation system, according to an embodiment.
  • FIG. 2 is a block diagram of a data surrogation system that includes a production system with multiple production servers and corresponding databases according to an embodiment.
  • FIG. 3 is a block diagram showing a capture operation, an apply operation, and an extraction operation according to an embodiment.
  • FIG. 4 is a block diagram of backup capture used in shadowing, according to an embodiment.
  • FIG. 5 is a block diagram of snapshot capture used in shadowing, according to an embodiment.
  • FIG. 6 is a block diagram of replication capture used in shadowing, according to an embodiment.
  • FIG. 7 is a block diagram of continuous data protection (CDP) capture used in shadowing, according to an embodiment.
  • CDP continuous data protection
  • FIG. 8 is a block diagram showing generation of an incremental or differential update of log files from a production system, according to an embodiment.
  • FIG. 9 is a block diagram of a system that includes shadowing using retro-fitted log shipping to create synthetic fulls according to an embodiment.
  • FIG. 10 is a block diagram of a process of obtaining and applying log files, according to an embodiment.
  • FIG. 11 is a flow diagram illustrating an embodiment of a shadowing process including applying log files according to an embodiment.
  • FIG. 12 is a flow diagram of a process of shadowing according to another embodiment.
  • FIG. 13 is a block diagram of a utility system architecture having the data surrogation capabilities described herein, according to an embodiment.
  • Embodiments of data surrogation enable a host of open-ended data management applications while minimizing data movement, latencies and post-processing.
  • Embodiments provide protection of data, while storing the data in such a way as to be easily located and accessed.
  • Application-aware one-pass protection is described, including production server database shadowing using log shipping for creation of synthetic full copies (also referred to herein as “synthetic fulls”) of the database, transformation of the copied data from “bulk” form to “brick” form, classification of the data, tiered storage of the data according to the classification, and life-cycle management of the stored data.
  • Another advantage provided by embodiments described herein is the use of less storage space. Significantly less storage space is used to store log files because, in contrast to prior systems that merely store log files, the log files are consumed as they are generated according to various intervals, schedules, events, etc.
  • Embodiments described herein perform shadowing of production server databases, including creation of synthetic fulls by retrofitting log shipping to database systems, including enterprise database systems, or other systems, that do not have log shipping capabilities.
  • the shadowing described herein can be used to integrate log shipping capability with non-relational databases or databases of file system data.
  • Shadowing maintains an off-host copy of up-to-date enterprise production data for purposes that include one or more of protection, archival and analysis. Shadowing optionally leverages lower-level mechanisms such as backup, replication, snapshots, or continuous data protection (CDP) to construct an aggregate system and method for making near real-time production data available to applications in a manner that is non-disruptive to the production host, while at the same time being trusted, scalable and extensible.
  • lower-level mechanisms such as backup, replication, snapshots, or continuous data protection (CDP) to construct an aggregate system and method for making near real-time production data available to applications in a manner that is non-disruptive to the production host, while at the same time being trusted, scalable and extensible.
  • CDP continuous data protection
  • shadowing includes receiving a copy of original data from the production system, including an initial copy of a production database.
  • Delta data is received from the production system in multiple instances.
  • the delta data includes information of changes to the original data.
  • An updated version of the copy is generated and maintained by applying the delta data as the delta data is received.
  • the delta data includes log files, but embodiments are not so limited.
  • the delta data includes data of an incremental difference, or alternatively, of a differential difference between the original data at different instances.
  • FIG. 1 is a block diagram of a data surrogation system 100 , according to an embodiment.
  • Data surrogation as described with reference to different embodiments herein includes systems and methods that enable a range of data management solutions for production servers and enhanced capabilities for production server clients.
  • An example of a production server is any server usually referred to as an enterprise server, but embodiments are not so limited.
  • a Microsoft ExchangeTM server is used as one example of a production server.
  • Clients include any client device or application that provides end-user access to production or enterprise servers.
  • An example of a client is Microsoft OutlookTM but the embodiments described herein are not so limited.
  • the system 100 includes a production system and utility system.
  • the production system in an embodiment, includes production data and a production database.
  • An embodiment of a production system includes one or more messaging and collaboration servers (e.g. electronic mail (email) servers) that can be local or distributed through the enterprise and either single-computer or clustered and replicated.
  • An example of an email server is Microsoft ExchangeTM Server but the embodiment is not so limited.
  • Conventional access describes normal interaction between the production clients and production servers. In the case of Microsoft ExchangeTM and OutlookTM, for example, conventional access may include the MAPI protocol, but other protocols, such as IMAP4 and POP3, are also applicable.
  • the system 100 also includes a utility system.
  • the utility system handles production data after it is produced.
  • the utility system of an embodiment includes one or more data management functions accessible to various data management applications that benefit from access to data shadowed and further processed by the utility system. Data management applications include backup applications, monitoring applications, compliance applications, audit applications, etc.
  • the utility system referred to is intended to encompass the embodiments of data surrogation, including or shadowing methods and apparatus as disclosed.
  • a production database implies a production server
  • a utility database implies a utility server.
  • the utility server is a near-line server including the data surrogation or shadowing methods and apparatus described and claimed herein.
  • Embodiments of the data surrogation or shadowing methods and apparatus include an add-on module that integrates with a near-line server.
  • the near-line server is a NearPointTM server, available from Mimosa Systems.
  • Shadowing generates shadow data that provides a relationship between the production data on the enterprise production system and the data on the utility system.
  • the utility system stores the shadow data in a shadow database, also referred to as a shadow repository.
  • the utility system can optionally leverage near-line storage to reduce costs.
  • shadowing is a method that maintains a relatively up-to-date copy of production enterprise data in a data surrogate, which in this case includes the shadow database.
  • This data may be optionally translated into multiple alternate formats and augmented with metadata.
  • the production and/or utility systems can be single computers or they may be clustered, replicated and/or distributed systems.
  • the production and/or utility systems can be in the same data center or they can be remote.
  • the primary connectivity between the production system and the utility system is through a local area network (LAN), a metropolitan area network (MAN) or a wide area network (WAN).
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • SAN storage area network
  • clients and servers can be any type and/or combination of processor-based devices.
  • Reference to a system and/or a server in the singular tense may include multiple instances of that system or server.
  • Couplings between various components of the system embodiments described herein can include wireless couplings, wired couplings, hybrid wired/wireless couplings, and other network coupling types, as appropriate to the host system configuration.
  • the network components and/or couplings between system components can include any of a type, number, and/or combination of networks and the corresponding network components including, but not limited to, a wide area network (WAN), local area networks (LAN), metropolitan area network (MANs), proprietary network, backend network, and the Internet to name a few.
  • WAN wide area network
  • LAN local area networks
  • MANs metropolitan area network
  • proprietary network backend network
  • backend network and the Internet to name a few.
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • SCSI Internet Small Computer System Interface
  • HT HyperTransport
  • VI Virtual Interface
  • RDMA Remote Direct Memory Access
  • FIG. 2 is a block diagram of a system 200 that includes a production system with multiple production servers and corresponding databases.
  • the production servers are messaging servers, and the databases are messaging databases, but embodiments are not so limited.
  • Production servers can include messaging servers, collaboration servers, portals, or database servers.
  • Production servers host a variety of structured, semi-structured, and unstructured data. These servers may be individual, clustered, replicated, constituents of a grid, virtualized, or any combination or variation.
  • An example that is used for illustration purposes is a Microsoft ExchangeTM Server but the embodiments described herein are not so limited.
  • a utility system includes a shadow repository, as previously described.
  • the shadow repository includes shadow data that is received from one or more of the messaging databases.
  • a capture component obtains a copy of production data, and an application (or “apply”) component keeps the shadow data up-to-date, as further described below.
  • the capture component is configured to reduce disruption of production system operations.
  • the capture component is able to capture the production data in a scalable and high-performance manner, securely and reliably.
  • the data captured may be referred to variously herein as data, production data, the production database, etc.
  • the data captured is a production database file that includes one or more of application data, databases, storage groups, mailbox data, and server data.
  • the capture component supplies data to the shadow repository to keep the shadow copy as up-to-date as possible with high efficiency and low cost.
  • the capture component can include backup, snapshots, replication, and continuous data protection (CDP) methods but is not so limited.
  • CDP continuous data protection
  • the apply component is intrinsic to a data type in an embodiment.
  • the apply component is retro-fitted to work with the particular data type.
  • enterprise applications reside on relational databases.
  • Relatively more capable databases such as OracleTM, DB2TM and Microsoft SQLTM Server offer log shipping mechanisms that facilitate direct re-use for application.
  • relatively less-capable databases and/or other semi-structured or unstructured data do not include log shipping capabilities.
  • Microsoft ExchangeTM Server is an example of an enterprise server that resides on a database that does not support log shipping.
  • the shadowing described herein provides log-shipping capability in support of the shadowing of databases and/or other semi-structured or unstructured data.
  • An extraction (or “extract”) component of an embodiment optionally transforms data formats from a relatively dense application format to a format that is directly usable by data management applications.
  • the extract component provides high-performance, scalable, lossless, flexible and extensible data transformational capabilities.
  • the extraction capabilities described herein are not present in systems such as the Microsoft ExchangeTM Server.
  • the Microsoft ExchangeTM Server provides a messaging application programming interface (MAPI) and protocol that is relatively difficult to deploy on a remote utility or management server, and generally does not meet the performance and scalability requirements of management applications.
  • MMI messaging application programming interface
  • An indexed object repository includes extracted (or transformed) data objects in an object database, and metadata related to the objects in a metadata database, or “metabase”.
  • object denotes a data item in an application-aware format.
  • An example of an object stored in the object database is an email message body, but there are many other examples.
  • An optional filter provides the data management applications with an API or Web Service capability for tuning or parameterizing the extract process.
  • An optional indexing mechanism operates on the data and metadata in the indexed object repository looking for patterns and relationships. When the indexing mechanism finds relevant information, it enhances the metadata with this new information.
  • the indexing mechanism may be guided by a data management application through the filter.
  • data management applications have API or Web Service access to the aggregate data as it is being semantically indexed.
  • the data management applications can get proactive notifications and callbacks when relevant additional data or metadata has been added to the indexed object repository.
  • the utility system is actively involved in influencing, guiding, participating in, or extending the function of the production servers. Applications that are part of the utility system can become active or passive participants in the production server workflow through positive or negative feedback loops and augmentation of the production server function to solve existing pain points or improve productivity through value additions.
  • FIG. 2 includes a configuration with three messaging servers and one near line server. Other deployment variations are possible, including a variable number of homogeneous or heterogeneous production servers, and a complex near line server that may be clustered, distributed, part of a grid, or virtualized. Although FIG. 2 shows three messaging servers, it is possible to provide equivalent services to multiple, arbitrary homogeneous heterogeneous servers. Although FIG. 2 shows a single near line server, it may in actuality be clustered, distributed, replicated, virtualized, and may straddle multiple machines or sites.
  • Embodiments of a shadowing method are described herein with reference to an example host system.
  • the shadowing is described in the context of providing log shipping of the application component for a Microsoft ExchangeTM Server as an example, but the shadowing described herein is not limited to the Microsoft ExchangeTM Server.
  • FIG. 3 is a block diagram showing a capture component, an apply component, and an extract component under an embodiment.
  • the capture generates or provides a baseline full copy of the production data.
  • This full copy data can be directly passed to an extraction component for converting the dense application format into another format desirable to post-processing entities.
  • An embodiment can optionally include cleansing and/or repairing of the full copy data prior to extraction when the capture component does not provide application consistent data.
  • log files (“logs” 1 and 2 are shown as an example) are shipped from the production system as they are generated, and are applied to the full copy to keep it up-to-date as a shadow copy of the production database.
  • the capture component of shadowing is configured to use one or more data capture capabilities that can include backup, snapshots, replication, and/or continuous data protection.
  • FIG. 4 is a block-diagram of backup capture used in shadowing, under an embodiment.
  • the backup capture uses the backup APIs provided by the application running on the production system.
  • the production system is Microsoft ExchangeTM Server but is not so limited.
  • the utility system is configured to obtain occasional full backups and frequent incremental or differential backups. Both these mechanisms typically run on a default or administrator-configured schedule.
  • There are other enhancements or variations that include the ability to detect that new log files have been generated on the production system and pulling a copy over (“dynamic log shipping”) or mechanisms for “tailing” the log files as they are being written on the production system.
  • FIG. 5 is a block diagram of snapshot capture used in shadowing, under an embodiment.
  • the snapshots of snapshot capture are either crash consistent or application consistent. Typically “hot split” snapshots that are obtained by breaking mirrors without application involvement tend to be crash consistent.
  • An example of an application consistent snapshot mechanism is Microsoft Data Protection ManagerTM.
  • the snapshots can either be local, which requires the management server to be co-located in the same data center, or the snapshots can be remote.
  • the production and utility systems can be single computers, or they may be clustered, replicated and/or distributed.
  • the transports for control and communication are typically LAN, MAN or WAN.
  • An optional SAN can facilitate efficient data movement.
  • additional mechanisms can be used to validate the snapshots for consistency (and perhaps repeat the process until a reasonably consistent copy is available).
  • the additional mechanisms can cleanse and/or repair the data in order to make it ready for application.
  • FIG. 6 is a block diagram of replication capture used in shadowing, under an embodiment.
  • the replication can be local within a data center, or it can be remote over a MAN, WAN or SAN.
  • the replication maintains a replica on the utility system that can be used for capture.
  • Conventional replication shares the characteristics of crash consistent mirrors, and the replication can be annotated by an “event stream” that captures points in time that are likely to be application consistent.
  • the production and utility systems can be single computers, or they can be clustered, replicated and/or distributed.
  • the transports for control and communication include LAN, MAN and/or WAN.
  • An optional SAN can facilitate efficient data movement.
  • the capture of production data using replication includes use of replication techniques that capture every relevant write at the source (e.g., the production system) and propagate the captured writes to the target (e.g., the utility system) to be applied to the copy of the data to bring it up-to-date.
  • This replication can be synchronous, asynchronous, or a quasi-synchronous hybrid.
  • the production and utility systems may be single computers, or they may be clustered, replicated or distributed.
  • additional mechanisms can be used to validate the snapshots for consistency and cleanse and/or repair the data in order to make it ready for application.
  • FIG. 7 is a block diagram of CDP capture used in shadowing, under an embodiment.
  • a capture component provides a stream of changes that have occurred on the production system, and provides the ability to move to “any point in time” (APIT).
  • the stream of changes (APIT) of an embodiment is annotated with an event stream that synchronizes with events on the production system.
  • a locator module can be configured to select the most appropriate points in time for use for application.
  • the production and utility systems can be single computers, or they can be clustered, replicated and/or distributed systems.
  • the transports for control and communication include LAN, MAN or WAN.
  • An optional SAN facilitates efficient data movement.
  • FIG. 8 is a block diagram showing generation of an incremental or differential update of log files from the production system, under an embodiment.
  • the updating of log files (also referred to herein as logs or transactional logs) includes adding data from the capture operation to the shadow repository with the previous database and logs.
  • the update of logs includes an apply, or log apply operation (also known as log shipping) that applies the logs to the database to bring it up-to-date.
  • the update of logs can optionally include an extract operation.
  • the extract operation is performed on the data resulting from the log apply operation to transform the resulting data from dense application format to one or more target formats for subsequent consumption by various data management applications.
  • FIG. 9 is a block diagram of a system 900 that includes shadowing using retrofitted log shipping to create synthetic fulls according to an embodiment.
  • System 900 includes a production system that performs write-ahead logging.
  • FIG. 9 will be described with reference to Microsoft ExchangeTM as a component of the production system, but embodiments are not so limited.
  • the production system includes a Microsoft ExchangeTM server and a Microsoft ExchangeTM database, in an embodiment.
  • the production system includes one or more databases, although only one is shown.
  • An application communicates with the production database (which, in the case of Microsoft ExchangeTM is called an Exchange database or EDB).
  • EDB Exchange database
  • the application detects a change to the database, it performs write-ahead logging to a log file. This includes appending information to the log file, which is much faster than traversing the database structure and updating the database each time there is a change.
  • the information appended to the log file reflects the particular change made to data in the database.
  • a lazy writer takes all of the logged, but not committed, changes to the database and writes them to disc.
  • One reason to use these log files is if the system suddenly crashes, the system can replay the log files when it comes back up, thus recovering all the lost data.
  • Write-ahead logging is usually used for database systems, but other systems may have different ways of handling changes to data.
  • log shipping is not supported by some systems, including Microsoft ExchangeTM.
  • the inability to support log shipping introduces significant limitations on data backup operations, data archiving operations, and data discovery operations.
  • third-party applications designed to provide data backup, data archiving and data discovery operations to Microsoft ExchangeTM go into the EDB and obtain the bulk version of the database.
  • System 900 further includes a utility system with a shadow repository and an IOR according to an embodiment.
  • the production database is copied from the production system to the shadow database on the utility system.
  • log files are shipped from the production system to the shadow repository as they are generated.
  • the shadow repository in an embodiment also store STM files. STM files are files in a well-known format for multi-media, typically emails.
  • each time a log file is generated it is received by the utility system and applied to the shadow database according to a retro-fitted log shipping operation.
  • the log files can be batched before applying.
  • Data in the shadow database is extracted to the indexed object repository in an application-aware manner and stored in such a way as to be easily located and accessed, even by data management applications external to the utility system.
  • FIG. 10 is a block diagram of a process of obtaining and applying log files, according to an embodiment.
  • the extensible storage engine (ESE) or “engine” (also referred to as a recovery engine herein), used by Microsoft ExchangeTM, also known as JET Blue, is an indexed sequential access method (ISAM) data storage technology from Microsoft.
  • the engine allows client applications to store and retrieve data via indexed and sequential access.
  • the engine is invoked by the utility system, directed to the database (EDB in this case) and used to facilitate shadowing, including log shipping, and log application.
  • an EDB header is made to point to a particular log file number as a starting log file number, and the engine is run.
  • the engine goes through each log file and checks for integrity, for example by checking the checksums.
  • the engine begins applying transactions from the log files into the shadow database.
  • the engine moves through the log files in sequence, applying each log file. For example, log files 1 - 4 are shown in FIG. 10 .
  • the database enters a “recovered” state which indicates that the data is ready to be restored to the production database. In the recovered state, no more log files can be applied to the database. This state is referred to as “clean shutdown” state in Microsoft ExchangeTM. This behavior is an artifact from when tape was the dominant backup storage medium.
  • the EDB automatically enters a state in which no more logs can be applied.
  • backed-up state in which log files can be applied. This state is referred to as “dirty shutdown” state in Microsoft ExchangeTM.
  • the EDB in order to apply log files at any time, the EDB is allowed to go into clean shutdown state after the last log file (for example, log file 4 ). Then the EDB header is modified to indicate that it is in dirty shutdown state. When the utility system is ready to apply a new set of log files, the EDB will be in dirty shutdown state and the engine will be able to apply the log files. This is referred to as toggling the dirty bit(s) in the appropriate header field of the EDB.
  • the EDB and EDB header are specific to certain embodiments, and are not meant to be limiting. In various embodiments, other systems may use different databases in which there are headers or other structural metadata that can be manipulated to achieve the result of allowing application of log files using the database engine as described.
  • the engine may be any recovery engine employed to recover a database including application of changes made to the database, but not yet applied to the database.
  • FIG. 11 is a flow diagram illustrating an embodiment of a shadowing process including applying log files according to an embodiment.
  • the process starts, and it is determined whether it is the first time the shadowing process has been run.
  • the first time the process has been run may occur when the shadow repository is empty, or when the utility system and/or the shadowing components have just been installed, or when a new repository has been created. If it is the first time the process has been run, a full copy of the production database is acquired. This involves completely copying the production database into the shadow database.
  • an incremental copy is acquired. In order to obtain the incremental copy, it is determined whether there are sufficient un-applied logs present. If sufficient un-applied logs are not present, the process waits for sufficient logs. In one embodiment, this includes going back to the initial starting point. If there are sufficient un-applied logs, it is determined whether the logs are in sequence. If the logs are not in sequence, they cannot be applied, and a full copy of the database is obtained. Alternatively, the production system is accessed specifically to acquire the “missing” log files. Logs must be in sequence because of their nature as multiple transactions that may have interdependencies. In a manner that is analogous to the area of microprocessor instructions, for example, database transactions can be committed or uncommitted.
  • EDB headers are updated. In practice, there are multiple EDBs, so there are multiple EDB headers. The headers are updated to reference the first log file that has not been applied.
  • the database recovery engine in this case the ESE, is invoked. The engine is used to replicate the EDB by applying the log files. The replicated EDB is used for later transformation from bulk-to-brick according to an embodiment to be later described.
  • the EDB headers are updated to indicate dirty shutdown state, and the process returns to the starting point.
  • FIG. 11 illustrates an embodiment for a production database system that does not support log shipping.
  • Embodiments are also applicable to other systems, for example file systems.
  • To keep an updated copy of a set of files the process starts by acquiring a set of all the files. Later, all the files in the file system that have changed are obtained, and the previous copy is overwritten. Alternatively, just the differences can be obtained and applied to the previous copy. That is another example of a synthetic full.
  • Embodiments of retrofitted log shipping apply to any application data, or unstructured data.
  • log files are retained by the shadowing process, and how long log files are retained depends on whether the log files include any uncommitted transactions. As previously mentioned, each log file could include several transactions and several of the transaction could be outstanding. At some point there is a “begin” transaction, and at another point there is a corresponding “end” transaction. When a “begin” transaction is encountered by the shadowing process, it is bracketed. The brackets are closed when the corresponding “end” transaction is encountered. All of the transactions between the “begin” transaction and a later “end” transaction are saved until it is confirmed that every transaction in the bracketed chain completed successfully. If every transaction did not complete successfully, all of the transactions in the chain are rolled back. Retention of the appropriate log files facilitates rollback. Accordingly, the log files are accumulated, and as they are applied, a check is made for outstanding transactions. If there are no outstanding transactions associated with a log file, the log file is deleted. If there are outstanding transactions associated with the log file, the log file is saved.
  • FIG. 12 is a flow diagram of a process of shadowing according to another embodiment in which the a database recovery engine that is part of the production system is directed to a copy of the production data (which in this example is part of the “Jet Blue” ExchangeTM database engine (an extensible storage engine (ESE)) is directed to the EDB and used to facilitate shadowing and log shipping.
  • the database recovery engine is part of the Jet Blue ExchangeTM database engine, but embodiments are not so limited.
  • FIG. 12 illustrates an alternative to the method described with reference to FIG. 11 for preventing the EDB from entering a recovered state.
  • FIG. 12 illustrates a continuous log apply process according to which the recovery engine is stalled in order to allow the engine to apply logs multiple times.
  • a production system includes a production database, such as an EDB, a production database application, such as ExchangeTM, and log files (or “logs”).
  • a utility system includes a shadow database and multiple log files transferred from the production system.
  • a copy of the production data is received by an embodiment of the utility system. Initially, a baseline copy of the entire production database file is received and stored in a shadow repository. As delta data is generated by the production system, the delta data is received by the utility system. Delta data is any data that records changes made to the database file. In an embodiment, the delta data is one or more log files.
  • the log files are shipped to a near line server of the utility system from a remote ExchangeTM Server.
  • the frequency of log shipping is pre-defined by a schedule, but the frequency could be determined otherwise, such as by an administrator through a data management application, or the log shipping may be event-driven.
  • the delta data is applied to the copy using the recovery engine.
  • the state of the database being operated on is changed to disallow the further application of log files.
  • the copy is prevented from entering this state by stalling the recovery engine.
  • the recovery engine is unstalled, and the additional log files are applied.
  • a new set of log files is introduced into the shadowing process.
  • One of the log files of the set is replicated and stored.
  • the original copy of the replicated log file is then modified in such a as to manner to stall the recovery engine.
  • One example introduces an exception that occurs during access to the modified log file, which is caught and post-processed by the recovery engine application process.
  • the recovery engine is directed to resume applying logs from the most recent log application cycle.
  • the Jet Blue engine may be running as part of a larger aggregate system, it may be running on its own, or it may only have essential components reconstituted so that the effect of the Jet Blue engine log application (e.g., recovery) is achieved. In addition it may be possible to have a replacement system that might replicate the necessary capabilities of the Jet Blue engine in order to accomplish the log application process.
  • the recovery engine applies the logs to the database until it encounters the modified log file, which stalls the Jet Blue engine. This prevents the database from entering a state in which no further logs can be applied.
  • the replicated log file is then substituted for the modified log file.
  • the shadowing process is ready for a subsequent set of log files and a consequent log application cycle.
  • the process described above can be resumed and replayed every time a new set of logs is received from the production system.
  • the process illustrated in FIG. 12 is described in relationship to Microsoft ExchangeTM. However, the process is applicable to other messaging and collaboration servers. The process is also extensible to generic applications that use structured, semi-structured, or unstructured data. Though this example shows a production database or server, it is possible to provide equivalent services to multiple homogeneous or heterogeneous databases or servers. Similarly, though this example described a single shadow database, which in an embodiment includes a near line server, in various embodiments, the shadow database may be clustered, distributed, replicated, virtualized, and may straddle multiple machines or sites.
  • FIG. 13 is a block diagram of a utility system architecture having the data surrogation capabilities described herein, according to an embodiment.
  • the utility system includes one or more near-line servers (one is shown for convenience) which communicate with a shadow database, a diff database, and an indexed object repository (IOR) database.
  • the utility system further includes one or more SQL servers.
  • An SQL server is a relational database management system (RDBMS) produced by Microsoft. Its primary query language is Transact-SQL, an implementation of the ANSI/ISO standard Structured Query Language (SQL). Other RDBMSs can also be used. Also, more than one SQL server may be used.
  • the SQL server communicates with an SQL database and a log database that stores log files.
  • the utility system further includes a framework, multiple handlers, and queues (for example, a notification queue and a task queue are shown).
  • the utility system further includes a workflow.
  • the utility system receives a request. Examples of a request include a timer being activated, or a user or administrator making a request.
  • the request manifests itself as a notification, which is placed in the notification queue.
  • the framework grabs the notification from the notification queue and looks it up in the workflow to determine how to handle the particular notification.
  • the framework looks up the workflow and then calls the appropriate handler depending on what it learned from the workflow.
  • the framework places the notification in the task queue.
  • the handler takes the notification from the task queue and proceeds to handle it as appropriate.
  • the framework determines whether the request has been successfully handled, and determines what to do next.
  • the framework looks to the workflow to get the next notification and call the next handler, and the process continues.
  • This architecture allows “hot code load”.
  • the utility system software code including the code related to the data surrogation capabilities described herein, is written in the form of handlers. This is advantageous, especially in the situation of a system in the field, because the system can be easily updated by simply installing one or more new handlers. If there are any issues with a new handler, the new handler can be discarded in favor of the handler it was meant to replace.
  • log shipping is dynamic, in that log files are transferred to the utility system as they are generated and applied as they are generated. This is in contrast to prior systems in which the log files are accumulated and only applied, for example, in the case of a failure of the production server.
  • Dynamic log shipping and application in various embodiments is event driven or occurs according to a pre-defined schedule. Dynamic log shipping provides a further improvement of the recovery point objective (RPO).
  • RPO recovery point objective
  • the data surrogation or shadowing process receives a notification whenever a new log file is filled up in the production server. The new log file is then transferred to the utility system for subsequent application.
  • the RPO is optimized because in case of a catastrophic failure in Exchange that results in all logs being lost on the production server, the window of data loss is bracketed by the content of a single or partial log file.
  • shadowing includes monitoring. For example, a change to the production data is detected and a notification is issued, causing the notification to be handled. This may be accomplished in a manual manner through user intervention or alternatively through automatic notification.
  • the automatic notification may be event driven or it may be scheduled and batched in some manner.
  • the log transfer process is optional in situations where the shadowing or data surrogation mechanism is co-resident on the production system or server, hence allowing direct access to the production database and log files. This optional transfer may occur over some form of network or equivalent mechanism. This optional process may occur lazily, or eagerly, or in some batched combination.
  • the availability of the shadow database to data management applications may be to the actual data that is being modified by the process, or it may be to a copy of that data, or it may be some combination thereof. This may be available in the form of an API or web service or equivalent.
  • a log file that has been shipped to, or made available to, the data surrogation mechanism is immediately applied to the shadow database in order to bring it up-to-date. This lowers the utility or near-line window since changes that occur on the messaging server become more immediately visible on the near-line server.
  • Other alternatives exist that might include batching the log files and then making decisions regarding batching and lazy application, perhaps for performance optimization of the utility or near-line server.
  • the logs are post-processed before they are applied, for example to filter for relevance, or to filter out undesirable content.
  • log tailing is incorporated into the data surrogation or shadowing process. Dynamic log shipping brings down the RPO to the contents of a single log file, or less. Log tailing may also be used to further reduce the RPO down since the logs are being continually captured as they are written on the production messaging server and then shipped over and applied on the utility or near-line server. According to such embodiments, the modifications that are occurring to the current transaction log are being immediately captured and shipped over to the utility server for application. This could improve the maintenance of the data surrogate from near real-time to real-time. In one example the log files are propagated and applied asynchronously. Other alternatives are possible, such as synchronous application. In addition, rather than apply changes immediately on the utility server, it is possible to batch the changes and apply them lazily.
  • the apply process as described herein may run on a schedule, be event driven, or run continuously.
  • the apply process may optionally apply the transactions or the re-constituted logs to the shadow database to bring it up-to-date.
  • data management applications are concurrently able to access a recent copy of the shadow data.
  • the components of the multi-dimensional surrogation described above may include any collection of computing components and devices operating together.
  • the components of the multi-dimensional surrogation can also be components or subsystems within a larger computer system or network.
  • Components of the multi-dimensional surrogation can also be coupled among any number of components (not shown), for example other buses, controllers, memory devices, and data input/output (I/O) devices, in any number of combinations.
  • functions of the multi-dimensional surrogation can be distributed among any number/combination of other processor-based components.
  • the information management of an embodiment includes a method comprising receiving a copy of original data at a first server.
  • the original data of an embodiment is stored at a second server.
  • the method of an embodiment includes receiving delta data at the first server in a plurality of instances.
  • the delta data of an embodiment includes information of changes to the original data.
  • the method of an embodiment includes dynamically generating and maintaining an updated version of the copy at the first server by applying the delta data to the copy as the delta data is received.
  • the generating and maintaining of an embodiment is asynchronous with the receiving.
  • the applying of an embodiment is according to an interval.
  • the interval of an embodiment is based on one or more of time and events at the second server.
  • the delta data of an embodiment includes data of an incremental difference between the original data at a plurality of instances.
  • the delta data of an embodiment includes data of a differential difference between the original data at a plurality of instances.
  • the method of an embodiment comprises controlling the applying using modified information of a component of the first server.
  • the component of an embodiment includes one or more of structural metadata of the copy and a log file of the delta data.
  • the method of an embodiment includes modifying the component.
  • the component of an embodiment is structural metadata of the copy.
  • the modifying of an embodiment comprises detecting a first state of the copy, wherein the first state indicates the delta data has been applied to the copy.
  • the modifying of an embodiment comprises changing the first state to a second state.
  • the second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version. Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
  • the component of an embodiment is a log file of a plurality of log files.
  • the delta data a log file of a plurality of log files is a plurality of log files.
  • the applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
  • the first server of an embodiment includes a near-line server and the second server includes a messaging and collaboration server.
  • the information management of an embodiment includes a method comprising receiving a plurality of delta data at a first server.
  • the delta data of an embodiment includes information of changes to original data of a second server.
  • the method of an embodiment includes dynamically generating and maintaining an updated version of a copy of the original data at the first server by applying at least one of the plurality of delta data to the copy.
  • the method of an embodiment includes controlling the applying using modified information of a component of the first server.
  • the component of an embodiment includes structural metadata of the copy.
  • the component of an embodiment includes a log file of the delta data.
  • the method of an embodiment includes modifying the component.
  • the component of an embodiment is structural metadata of the copy.
  • the modifying of an embodiment comprises detecting a first state of the copy.
  • the first state of an embodiment indicates the delta data has been applied to the copy.
  • the modifying of an embodiment comprises changing the first state to a second state.
  • the second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version.
  • Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
  • the additional delta data of an embodiment is received after generating the updated version.
  • the applying of an embodiment includes invoking an engine of the second server.
  • the method of an embodiment includes causing the engine to reference a first unapplied log file of the delta data, wherein the first unapplied log file is a first log file unapplied to the copy.
  • the delta data of an embodiment is a plurality of log files.
  • the component of an embodiment is a log file of the plurality of log files.
  • the terminating of an embodiment comprises replacing the modified log file with the replicated log file.
  • the applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
  • the method of an embodiment includes receiving at the first server a copy of the original data from the second server.
  • the copy of an embodiment is a full copy.
  • the copy of an embodiment is an incremental copy.
  • the method of an embodiment includes transferring the updated version to an indexed object repository.
  • the generating of an embodiment is in response to at least one of an automatic trigger, a timer notification, an event notification, a poll, and a request.
  • the automatic trigger of an embodiment includes a trigger automatically initiated in response to at least one pre-specified parameter.
  • the automatic trigger of an embodiment includes content of the updated version.
  • the timer notification of an embodiment includes notifications corresponding to scheduled events including at least one of maintenance operations, user activities, server activities, and data population operations.
  • the event notification of an embodiment includes notifications corresponding to changes to data of the original data.
  • the request of an embodiment includes at least one of access attempts and configuration attempts to the original data by one or more of users of the second server, servers and applications.
  • the first server of an embodiment includes a near-line server.
  • the generating of an embodiment is in near real-time and maintains complete integrity and consistency of the original data.
  • the second server of an embodiment includes a messaging and collaboration server.
  • the original data of an embodiment includes one or more of application data, databases, storage groups, mailbox data, and server data.
  • the method of an embodiment includes maintaining the updated version.
  • the maintaining of an embodiment includes generating another updated version by applying at least one set of log files to the updated version.
  • the at least one set of log files of an embodiment is received later in time than the plurality of log files.
  • the second server of an embodiment includes one or more of local servers, remote servers, database servers, messaging servers, electronic mail servers, instant messaging servers, voice-over Internet Protocol servers, collaboration servers, Exchange Servers, portals, customer relationship management (CRM) servers, enterprise resource planning (ERP) servers, business-to-business servers, and content management servers.
  • CRM customer relationship management
  • ERP enterprise resource planning
  • the information management of an embodiment includes a method comprising receiving a copy of original data at a first server.
  • the original data is stored at a second server.
  • the method of an embodiment includes receiving a plurality of delta data at the first server.
  • the delta data of an embodiment includes information of changes to the original data.
  • the method of an embodiment includes dynamically generating and maintaining an updated version of the copy at the first server by applying at least one of the plurality of delta data to the copy.
  • the method of an embodiment includes controlling the applying using modified information of a component of the first server.
  • the information management of an embodiment includes a computer readable medium including executable instructions which, when executed in a processing system, support near real-time data shadowing by receiving a plurality of delta data at a first server.
  • the delta data of an embodiment includes information of changes to original data of a second server.
  • the instructions of an embodiment when executed dynamically generate and maintain an updated version of a copy of the original data at the first server by applying at least one of the plurality of delta data to the copy.
  • the instructions of an embodiment when executed control the applying using modified information of a component of the first server.
  • the component of an embodiment includes structural metadata of the copy.
  • the delta data of an embodiment comprises at least one log file, and the component of an embodiment includes one of the at least one log files.
  • the information management of an embodiment includes a system comprising a near-line server coupled to one or more servers that include original data.
  • the system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive a copy of the original data.
  • the shadowing system of an embodiment is configured to receive delta data in a plurality of instances.
  • the delta data of an embodiment includes information of changes to the original data.
  • the shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of the copy at the near-line server by applying the delta data to the copy as the delta data is received.
  • the shadowing system of an embodiment is configured to generate and maintain asynchronously.
  • the delta data of an embodiment includes data of an incremental difference between the original data at a plurality of instances.
  • the delta data of an embodiment includes data of a differential difference between the original data at a plurality of instances.
  • the shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
  • the component of an embodiment includes one or more of structural metadata of the copy and a log file of the delta data.
  • the shadowing system of an embodiment is configured to modify the component.
  • the component of an embodiment is structural metadata of the copy.
  • the modifying of an embodiment comprises configured to detect a first state of the copy, wherein the first state indicates the delta data has been applied to the copy.
  • the modifying of an embodiment comprises configured to change the first state to a second state, wherein the second state is a state from which another updated version can be generated by applying additional delta data to the updated version.
  • Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
  • the delta data of an embodiment is a plurality of log files, wherein the component is a log file of a plurality of log files.
  • the applying of an embodiment includes invoking an engine of the one or more servers and the terminating includes stalling the engine.
  • the one or more servers of an embodiment include a messaging and collaboration server.
  • the information management of an embodiment includes a system comprising a near-line server coupled to one or more servers that include original data.
  • the system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive a copy of the original data.
  • the shadowing system of an embodiment is configured to receive delta data that includes information of changes to the original data.
  • the shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of the copy at the near-line server by applying at least one of the plurality of delta data to the copy as the delta data is received.
  • the shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
  • the information management of an embodiment includes a system comprising a near-line server coupled to one or more servers.
  • the system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive delta data that describes incremental changes to original data of one or more servers.
  • the shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of a copy of the original data at the near-line server by applying at least one of the plurality of the delta data to the copy.
  • the shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
  • the component of an embodiment includes structural metadata of the copy.
  • the component of an embodiment includes a log file of the delta data.
  • the shadowing system of an embodiment is configured to modify the component.
  • the component of an embodiment is structural metadata of the copy.
  • Configured to modify of an embodiment comprises configured to detect a first state of the copy.
  • the first state of an embodiment indicates the delta data has been applied to the copy.
  • Configured to modify of an embodiment comprises configured to change the first state to a second state.
  • the second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version.
  • Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
  • the additional delta data of an embodiment is received after generating the updated version.
  • the applying of an embodiment includes invoking an engine of the one or more servers.
  • the shadowing system of an embodiment is configured to cause the engine to reference a first unapplied log file of the delta data.
  • the first unapplied log file of an embodiment is a first log file unapplied to the copy.
  • the delta data of an embodiment is a plurality of log files.
  • the component of an embodiment is a log file of the plurality of log files.
  • the applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
  • the shadowing system of an embodiment is configured to receive the copy from the one or more servers.
  • the copy of an embodiment is a full copy.
  • the copy of an embodiment is an incremental copy.
  • the shadowing system of an embodiment is configured to transfer the updated version to an indexed object repository.
  • the shadowing system of an embodiment is configured to generate and maintain in response to at least one of an automatic trigger, a timer notification, an event notification, a poll, and a request.
  • the automatic trigger of an embodiment includes a trigger automatically initiated in response to at least one pre-specified parameter.
  • the automatic trigger of an embodiment includes content of the updated version.
  • the timer notification of an embodiment includes notifications corresponding to scheduled events including at least one of maintenance operations, user activities, server activities, and data population operations.
  • the event notification of an embodiment includes notifications corresponding to changes to data of the original data.
  • the request of an embodiment includes at least one of access attempts and configuration attempts to the original data by one or more of users of the second server, servers and applications.
  • the shadowing system of an embodiment is configured to generate and maintain in near real-time with complete integrity and consistency of the original data.
  • the one or more servers of an embodiment include a messaging and collaboration server.
  • the original data of an embodiment includes one or more of application data, databases, storage groups, mailbox data, and server data.
  • the shadowing system of an embodiment is configured to maintain the updated version by generating another updated version by applying at least one set of log files to the updated version, the at least one set of log files received later in time than the delta data.
  • the one or more servers of an embodiment include one or more of local servers, remote servers, database servers, messaging servers, electronic mail servers, instant messaging servers, voice-over Internet Protocol servers, collaboration servers, Exchange Servers, portals, customer relationship management (CRM) servers, enterprise resource planning (ERP) servers, business-to-business servers, and content management servers.
  • CRM customer relationship management
  • ERP enterprise resource planning
  • aspects of the multi-dimensional surrogation described herein may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (PLDs), such as field programmable gate arrays (FPGAs), programmable array logic (PAL) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits (ASICs).
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • PAL programmable array logic
  • ASICs application specific integrated circuits
  • microcontrollers with memory such as electronically erasable programmable read only memory (EEPROM)
  • embedded microprocessors firmware, software, etc.
  • aspects of the multi-dimensional surrogation may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types.
  • Any underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (MOSFET) technologies like complementary metal-oxide semiconductor (CMOS), bipolar technologies like emitter-coupled logic (ECL), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, etc.
  • MOSFET metal-oxide semiconductor field-effect transistor
  • CMOS complementary metal-oxide semiconductor
  • ECL emitter-coupled logic
  • polymer technologies e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures
  • mixed analog and digital etc.
  • Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof.
  • Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, etc.).
  • data transfer protocols e.g., HTTP, FTP, SMTP, etc.
  • a processing entity e.g., one or more processors
  • the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
  • multi-dimensional surrogation is not intended to be exhaustive or to limit the multi-dimensional surrogation to the precise form disclosed. While specific embodiments of, and examples for, the multi-dimensional surrogation are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the multi-dimensional surrogation, as those skilled in the relevant art will recognize.
  • teachings of the multi-dimensional surrogation provided herein can be applied to other processing systems and methods, not only for the systems and methods described above.
  • the terms used should not be construed to limit the multi-dimensional surrogation and methods to the specific embodiments disclosed in the specification and the claims, but should be construed to include all processing systems that operate under the claims. Accordingly, the multi-dimensional surrogation is not limited by the disclosure, but instead the scope of the multi-dimensional surrogation is to be determined entirely by the claims.

Abstract

Multi-dimensional surrogation systems and methods are provided that generate at least one data surrogate using information of data and numerous data changes received from at least one data source. Embodiments described herein perform shadowing of production server databases, including creation of synthetic fulls by retrofitting log shipping to enterprise database systems, or other systems, that do not have log shipping capabilities.

Description

RELATED APPLICATION
This application is continuation-in-part of the U.S. patent applcation Ser. No. 11/500,809 filed Aug. 7, 2006, which is a continuation-in-part of U.S. patent applcation Ser. No. 11/211,056, filed Aug. 23, 2005, now U.S. Pat. No. 7,778,976 which claims the benefit of U.S. Patent Application Number 60/650,556, filed Feb. 7, 2005.
This application is related to the following U.S. patent applications, each of which was filed Aug. 7, 2006: Ser. No. 11/500,864; Ser. No. 11/500,805; Ser. No. 11/500,806; and Ser. No. 11/500,821.
TECHNICAL FIELD
The disclosure herein relates generally to data protection, archival, data management, and information management.
BACKGROUND
Data servers host critical production data in their storage systems. The storage systems are usually required to provide a level of data availability and service availability. Data and service are usually required to be resilient to a variety of failures, which could range from media failures to data center failures. Typically this requirement is addressed in part by a range of data protection schemes that may include tape-based backup of all or some of the production data.
In addition there is typically a need for other servers to concurrently access this same critical production data. These applications include data protection applications, site replication applications, search applications, discovery applications, analysis applications, and monitoring and supervision applications. This need has been addressed by a range of data management schemes, including setting up a specialized analysis server with a replica of the critical production data. Typical data protection and management schemes have some well known limitations. For example, in some cases, direct access to the enterprise server could result in instability and performance load on the enterprise servers. Other limitations are related to the serial and offline nature of traditional tape storage, which makes access to backed-up data time-consuming and inefficient.
While it is theoretically possible to transfer the entire source data on the Production System to the Management System, this is not efficient in practice. Instead, conventional systems and methods create an entire baseline copy of the source data on the Management System, followed by the periodic, or continuous, changes to the data that are occurring on the Production System, and transfer the baseline copy and the changes to the Management System. These changes are then applied to the copy of the data on the Management System, thereby bringing it up-to-date. While some database management systems provide these intrinsic capabilities that are known as “Log Shipping”, log shipping is not available in other databases like non-relational databases or databases of file system data.
INCORPORATION BY REFERENCE
Each publication and patent application mentioned in this specification is herein incorporated by reference in its entirety to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a data surrogation system, according to an embodiment.
FIG. 2 is a block diagram of a data surrogation system that includes a production system with multiple production servers and corresponding databases according to an embodiment.
FIG. 3 is a block diagram showing a capture operation, an apply operation, and an extraction operation according to an embodiment.
FIG. 4 is a block diagram of backup capture used in shadowing, according to an embodiment.
FIG. 5 is a block diagram of snapshot capture used in shadowing, according to an embodiment.
FIG. 6 is a block diagram of replication capture used in shadowing, according to an embodiment.
FIG. 7 is a block diagram of continuous data protection (CDP) capture used in shadowing, according to an embodiment.
FIG. 8 is a block diagram showing generation of an incremental or differential update of log files from a production system, according to an embodiment.
FIG. 9 is a block diagram of a system that includes shadowing using retro-fitted log shipping to create synthetic fulls according to an embodiment.
FIG. 10 is a block diagram of a process of obtaining and applying log files, according to an embodiment.
FIG. 11 is a flow diagram illustrating an embodiment of a shadowing process including applying log files according to an embodiment.
FIG. 12 is a flow diagram of a process of shadowing according to another embodiment.
FIG. 13 is a block diagram of a utility system architecture having the data surrogation capabilities described herein, according to an embodiment.
DETAILED DESCRIPTION
Multi-dimensional data surrogation and corresponding systems and methods are described herein. Embodiments of data surrogation enable a host of open-ended data management applications while minimizing data movement, latencies and post-processing. Embodiments provide protection of data, while storing the data in such a way as to be easily located and accessed. Application-aware one-pass protection is described, including production server database shadowing using log shipping for creation of synthetic full copies (also referred to herein as “synthetic fulls”) of the database, transformation of the copied data from “bulk” form to “brick” form, classification of the data, tiered storage of the data according to the classification, and life-cycle management of the stored data.
There are many advantages provided by the embodiments described herein as compared to prior systems that do not inherently include log shipping. For example, when performing synthetic fulls, any corruption is catalyzed right away. This is in contrast to typical systems with disc-based or tape-based backup. In typical system, full copies of the database and incremental updates to the database (in the form of log files) are saved. In the case of a production server failure, the log files must typically all be applied at once. If a corrupted file is encountered, or anything causes the process to fail, it is not possible to access either the “primary” production server or the back-up data.
Another advantage provided by embodiments described herein is the use of less storage space. Significantly less storage space is used to store log files because, in contrast to prior systems that merely store log files, the log files are consumed as they are generated according to various intervals, schedules, events, etc.
Embodiments described herein perform shadowing of production server databases, including creation of synthetic fulls by retrofitting log shipping to database systems, including enterprise database systems, or other systems, that do not have log shipping capabilities. For example, the shadowing described herein can be used to integrate log shipping capability with non-relational databases or databases of file system data.
Shadowing maintains an off-host copy of up-to-date enterprise production data for purposes that include one or more of protection, archival and analysis. Shadowing optionally leverages lower-level mechanisms such as backup, replication, snapshots, or continuous data protection (CDP) to construct an aggregate system and method for making near real-time production data available to applications in a manner that is non-disruptive to the production host, while at the same time being trusted, scalable and extensible.
In an embodiment, shadowing includes receiving a copy of original data from the production system, including an initial copy of a production database. Delta data is received from the production system in multiple instances. The delta data includes information of changes to the original data. An updated version of the copy is generated and maintained by applying the delta data as the delta data is received. In an embodiment, the delta data includes log files, but embodiments are not so limited. The delta data includes data of an incremental difference, or alternatively, of a differential difference between the original data at different instances.
FIG. 1 is a block diagram of a data surrogation system 100, according to an embodiment. Data surrogation as described with reference to different embodiments herein includes systems and methods that enable a range of data management solutions for production servers and enhanced capabilities for production server clients. An example of a production server is any server usually referred to as an enterprise server, but embodiments are not so limited. For example, a Microsoft Exchange™ server is used as one example of a production server.
Clients include any client device or application that provides end-user access to production or enterprise servers. An example of a client is Microsoft Outlook™ but the embodiments described herein are not so limited.
The system 100 includes a production system and utility system. The production system, in an embodiment, includes production data and a production database. An embodiment of a production system includes one or more messaging and collaboration servers (e.g. electronic mail (email) servers) that can be local or distributed through the enterprise and either single-computer or clustered and replicated. An example of an email server is Microsoft Exchange™ Server but the embodiment is not so limited. Conventional access describes normal interaction between the production clients and production servers. In the case of Microsoft Exchange™ and Outlook™, for example, conventional access may include the MAPI protocol, but other protocols, such as IMAP4 and POP3, are also applicable.
The system 100 also includes a utility system. The utility system handles production data after it is produced. The utility system of an embodiment includes one or more data management functions accessible to various data management applications that benefit from access to data shadowed and further processed by the utility system. Data management applications include backup applications, monitoring applications, compliance applications, audit applications, etc. The utility system referred to is intended to encompass the embodiments of data surrogation, including or shadowing methods and apparatus as disclosed.
Throughout the disclosure, where a database is shown or described, one or more corresponding servers are implied, even if not shown or described. For example, a production database implies a production server, and a utility database implies a utility server. In various embodiments described herein, the utility server is a near-line server including the data surrogation or shadowing methods and apparatus described and claimed herein. Embodiments of the data surrogation or shadowing methods and apparatus described products available from Mimosa Systems, Inc., of Santa Clara, Calif., including the NearPoint™ for Microsoft® Exchange Server Disaster Recovery Option. Embodiments of the data surrogation or shadowing methods and apparatus include an add-on module that integrates with a near-line server. In an embodiment, the near-line server is a NearPoint™ server, available from Mimosa Systems.
Shadowing generates shadow data that provides a relationship between the production data on the enterprise production system and the data on the utility system. The utility system stores the shadow data in a shadow database, also referred to as a shadow repository. The utility system can optionally leverage near-line storage to reduce costs.
In an embodiment, shadowing is a method that maintains a relatively up-to-date copy of production enterprise data in a data surrogate, which in this case includes the shadow database. This data may be optionally translated into multiple alternate formats and augmented with metadata.
The production and/or utility systems can be single computers or they may be clustered, replicated and/or distributed systems. The production and/or utility systems can be in the same data center or they can be remote. In an embodiment, the primary connectivity between the production system and the utility system is through a local area network (LAN), a metropolitan area network (MAN) or a wide area network (WAN). An optional storage area network (SAN) can be used for the data access and data movement.
As referred to herein, clients and servers can be any type and/or combination of processor-based devices. Reference to a system and/or a server in the singular tense may include multiple instances of that system or server. Couplings between various components of the system embodiments described herein can include wireless couplings, wired couplings, hybrid wired/wireless couplings, and other network coupling types, as appropriate to the host system configuration. The network components and/or couplings between system components can include any of a type, number, and/or combination of networks and the corresponding network components including, but not limited to, a wide area network (WAN), local area networks (LAN), metropolitan area network (MANs), proprietary network, backend network, and the Internet to name a few. Use herein of terms like transport, interconnect, or network is inclusive of a conventional Ethernet, a Storage Area Network (SAN), and/or other type of network. The protocols may be inclusive of Transmission Control Protocol (TCP)/Internet Protocol (IP) (TCP/IP) and layered protocols, Internet Small Computer System Interface (SCSI) (iSCSI), Fibre Channel, InfiniBand, HyperTransport (HT), Virtual Interface (VI), Remote Direct Memory Access (RDMA), and a range of other protocols.
FIG. 2 is a block diagram of a system 200 that includes a production system with multiple production servers and corresponding databases. In an embodiment, the production servers are messaging servers, and the databases are messaging databases, but embodiments are not so limited. Production servers can include messaging servers, collaboration servers, portals, or database servers. Production servers host a variety of structured, semi-structured, and unstructured data. These servers may be individual, clustered, replicated, constituents of a grid, virtualized, or any combination or variation. An example that is used for illustration purposes is a Microsoft Exchange™ Server but the embodiments described herein are not so limited.
A utility system includes a shadow repository, as previously described. The shadow repository includes shadow data that is received from one or more of the messaging databases. A capture component obtains a copy of production data, and an application (or “apply”) component keeps the shadow data up-to-date, as further described below.
The capture component is configured to reduce disruption of production system operations. The capture component is able to capture the production data in a scalable and high-performance manner, securely and reliably. The data captured may be referred to variously herein as data, production data, the production database, etc. In general, the data captured is a production database file that includes one or more of application data, databases, storage groups, mailbox data, and server data.
The capture component supplies data to the shadow repository to keep the shadow copy as up-to-date as possible with high efficiency and low cost. The capture component can include backup, snapshots, replication, and continuous data protection (CDP) methods but is not so limited. Various capture components configured for use in an embodiment are described in detail below.
The apply component is intrinsic to a data type in an embodiment. In an alternative embodiment, the apply component is retro-fitted to work with the particular data type. Typically enterprise applications reside on relational databases. Relatively more capable databases such as Oracle™, DB2™ and Microsoft SQL™ Server offer log shipping mechanisms that facilitate direct re-use for application. However relatively less-capable databases and/or other semi-structured or unstructured data do not include log shipping capabilities. Microsoft Exchange™ Server is an example of an enterprise server that resides on a database that does not support log shipping. The shadowing described herein provides log-shipping capability in support of the shadowing of databases and/or other semi-structured or unstructured data.
An extraction (or “extract”) component of an embodiment optionally transforms data formats from a relatively dense application format to a format that is directly usable by data management applications. The extract component provides high-performance, scalable, lossless, flexible and extensible data transformational capabilities. The extraction capabilities described herein are not present in systems such as the Microsoft Exchange™ Server. For example, the Microsoft Exchange™ Server provides a messaging application programming interface (MAPI) and protocol that is relatively difficult to deploy on a remote utility or management server, and generally does not meet the performance and scalability requirements of management applications.
An indexed object repository (IOR) includes extracted (or transformed) data objects in an object database, and metadata related to the objects in a metadata database, or “metabase”. As used herein, object denotes a data item in an application-aware format. An example of an object stored in the object database is an email message body, but there are many other examples.
An optional filter provides the data management applications with an API or Web Service capability for tuning or parameterizing the extract process.
An optional indexing mechanism operates on the data and metadata in the indexed object repository looking for patterns and relationships. When the indexing mechanism finds relevant information, it enhances the metadata with this new information. Optionally the indexing mechanism may be guided by a data management application through the filter.
In an embodiment, data management applications have API or Web Service access to the aggregate data as it is being semantically indexed. For examples, the data management applications can get proactive notifications and callbacks when relevant additional data or metadata has been added to the indexed object repository. In an embodiment, the utility system is actively involved in influencing, guiding, participating in, or extending the function of the production servers. Applications that are part of the utility system can become active or passive participants in the production server workflow through positive or negative feedback loops and augmentation of the production server function to solve existing pain points or improve productivity through value additions.
The embodiment of FIG. 2 includes a configuration with three messaging servers and one near line server. Other deployment variations are possible, including a variable number of homogeneous or heterogeneous production servers, and a complex near line server that may be clustered, distributed, part of a grid, or virtualized. Although FIG. 2 shows three messaging servers, it is possible to provide equivalent services to multiple, arbitrary homogeneous heterogeneous servers. Although FIG. 2 shows a single near line server, it may in actuality be clustered, distributed, replicated, virtualized, and may straddle multiple machines or sites.
Embodiments of a shadowing method are described herein with reference to an example host system. The shadowing is described in the context of providing log shipping of the application component for a Microsoft Exchange™ Server as an example, but the shadowing described herein is not limited to the Microsoft Exchange™ Server.
FIG. 3 is a block diagram showing a capture component, an apply component, and an extract component under an embodiment. The capture generates or provides a baseline full copy of the production data. This full copy data can be directly passed to an extraction component for converting the dense application format into another format desirable to post-processing entities. An embodiment can optionally include cleansing and/or repairing of the full copy data prior to extraction when the capture component does not provide application consistent data. In embodiments to be further described below, log files (“logs” 1 and 2 are shown as an example) are shipped from the production system as they are generated, and are applied to the full copy to keep it up-to-date as a shadow copy of the production database.
The capture component of shadowing is configured to use one or more data capture capabilities that can include backup, snapshots, replication, and/or continuous data protection. FIG. 4 is a block-diagram of backup capture used in shadowing, under an embodiment. The backup capture uses the backup APIs provided by the application running on the production system. In this example the production system is Microsoft Exchange™ Server but is not so limited. The utility system is configured to obtain occasional full backups and frequent incremental or differential backups. Both these mechanisms typically run on a default or administrator-configured schedule. There are other enhancements or variations that include the ability to detect that new log files have been generated on the production system and pulling a copy over (“dynamic log shipping”) or mechanisms for “tailing” the log files as they are being written on the production system.
FIG. 5 is a block diagram of snapshot capture used in shadowing, under an embodiment. The snapshots of snapshot capture are either crash consistent or application consistent. Typically “hot split” snapshots that are obtained by breaking mirrors without application involvement tend to be crash consistent. An example of an application consistent snapshot mechanism is Microsoft Data Protection Manager™. The snapshots can either be local, which requires the management server to be co-located in the same data center, or the snapshots can be remote. The production and utility systems can be single computers, or they may be clustered, replicated and/or distributed. The transports for control and communication are typically LAN, MAN or WAN. An optional SAN can facilitate efficient data movement.
For snapshots that are crash consistent, additional mechanisms can be used to validate the snapshots for consistency (and perhaps repeat the process until a reasonably consistent copy is available). The additional mechanisms can cleanse and/or repair the data in order to make it ready for application.
FIG. 6 is a block diagram of replication capture used in shadowing, under an embodiment. The replication can be local within a data center, or it can be remote over a MAN, WAN or SAN. The replication maintains a replica on the utility system that can be used for capture. Conventional replication shares the characteristics of crash consistent mirrors, and the replication can be annotated by an “event stream” that captures points in time that are likely to be application consistent. The production and utility systems can be single computers, or they can be clustered, replicated and/or distributed. The transports for control and communication include LAN, MAN and/or WAN. An optional SAN can facilitate efficient data movement.
The capture of production data using replication includes use of replication techniques that capture every relevant write at the source (e.g., the production system) and propagate the captured writes to the target (e.g., the utility system) to be applied to the copy of the data to bring it up-to-date. This replication can be synchronous, asynchronous, or a quasi-synchronous hybrid. The production and utility systems may be single computers, or they may be clustered, replicated or distributed. As in the case of snapshot capture, additional mechanisms can be used to validate the snapshots for consistency and cleanse and/or repair the data in order to make it ready for application.
FIG. 7 is a block diagram of CDP capture used in shadowing, under an embodiment. A capture component provides a stream of changes that have occurred on the production system, and provides the ability to move to “any point in time” (APIT). The stream of changes (APIT) of an embodiment is annotated with an event stream that synchronizes with events on the production system. A locator module can be configured to select the most appropriate points in time for use for application. The production and utility systems can be single computers, or they can be clustered, replicated and/or distributed systems. The transports for control and communication include LAN, MAN or WAN. An optional SAN facilitates efficient data movement.
FIG. 8 is a block diagram showing generation of an incremental or differential update of log files from the production system, under an embodiment. The updating of log files (also referred to herein as logs or transactional logs) includes adding data from the capture operation to the shadow repository with the previous database and logs. The update of logs includes an apply, or log apply operation (also known as log shipping) that applies the logs to the database to bring it up-to-date.
The update of logs can optionally include an extract operation. The extract operation is performed on the data resulting from the log apply operation to transform the resulting data from dense application format to one or more target formats for subsequent consumption by various data management applications.
FIG. 9 is a block diagram of a system 900 that includes shadowing using retrofitted log shipping to create synthetic fulls according to an embodiment. System 900 includes a production system that performs write-ahead logging. For purposes of illustration, FIG. 9 will be described with reference to Microsoft Exchange™ as a component of the production system, but embodiments are not so limited. The production system includes a Microsoft Exchange™ server and a Microsoft Exchange™ database, in an embodiment. The production system includes one or more databases, although only one is shown.
An application communicates with the production database (which, in the case of Microsoft Exchange™ is called an Exchange database or EDB). When the application detects a change to the database, it performs write-ahead logging to a log file. This includes appending information to the log file, which is much faster than traversing the database structure and updating the database each time there is a change. The information appended to the log file reflects the particular change made to data in the database.
A lazy writer takes all of the logged, but not committed, changes to the database and writes them to disc. One reason to use these log files is if the system suddenly crashes, the system can replay the log files when it comes back up, thus recovering all the lost data. Write-ahead logging is usually used for database systems, but other systems may have different ways of handling changes to data.
Another way of using log files in database systems is for creating a mirror database to provide a backup in the event of server loss or site loss. This is referred to variously as log shipping, log-apply, or synthetic fulls. Any of these terms imply various methods that take incremental changes to a production server and apply them to a database copy on a utility server to bring the copy up-to-date. Log shipping is not supported by some systems, including Microsoft Exchange™. The inability to support log shipping introduces significant limitations on data backup operations, data archiving operations, and data discovery operations. For example, conventionally, third-party applications designed to provide data backup, data archiving and data discovery operations to Microsoft Exchange™ (or other systems without log shipping capabilities) go into the EDB and obtain the bulk version of the database. If such an application repeatedly obtains the bulk database without applying the log files, many databases and many log files are accumulated, which becomes very cumbersome. Then, in order to restore data back to Exchange™, all of the accumulated log files must be applied to the EDB at the time of restoration. This makes the recovery time objective (RTO) of such conventional third-party applications very long.
Performing shadowing with synthetic fulls as described herein allows the log files to be consumed as they are generated, resulting in an improved RTO. In addition, because a copy of the current EDB (including applied log files) is available, extraction and transformation to brick form, according to embodiments to be described, becomes possible.
System 900 further includes a utility system with a shadow repository and an IOR according to an embodiment. Initially, the production database is copied from the production system to the shadow database on the utility system. In addition, log files are shipped from the production system to the shadow repository as they are generated. The shadow repository in an embodiment also store STM files. STM files are files in a well-known format for multi-media, typically emails.
In an embodiment, each time a log file is generated it is received by the utility system and applied to the shadow database according to a retro-fitted log shipping operation. Alternatively, the log files can be batched before applying. Data in the shadow database is extracted to the indexed object repository in an application-aware manner and stored in such a way as to be easily located and accessed, even by data management applications external to the utility system.
FIG. 10 is a block diagram of a process of obtaining and applying log files, according to an embodiment. The extensible storage engine (ESE) or “engine” (also referred to as a recovery engine herein), used by Microsoft Exchange™, also known as JET Blue, is an indexed sequential access method (ISAM) data storage technology from Microsoft. The engine allows client applications to store and retrieve data via indexed and sequential access. In an embodiment for shadowing a production database, the engine is invoked by the utility system, directed to the database (EDB in this case) and used to facilitate shadowing, including log shipping, and log application.
In an embodiment, an EDB header is made to point to a particular log file number as a starting log file number, and the engine is run. The engine goes through each log file and checks for integrity, for example by checking the checksums. The engine begins applying transactions from the log files into the shadow database. The engine moves through the log files in sequence, applying each log file. For example, log files 1-4 are shown in FIG. 10. When the engine finishes applying the last log file (log file 4), the database enters a “recovered” state which indicates that the data is ready to be restored to the production database. In the recovered state, no more log files can be applied to the database. This state is referred to as “clean shutdown” state in Microsoft Exchange™. This behavior is an artifact from when tape was the dominant backup storage medium. For example, if backups are stored to tape and retrieved from tape, there should never be a need to apply log files more than once. Thus, after a one-time application of log files, the EDB automatically enters a state in which no more logs can be applied. Conventionally, when the production database is backed up, it is transferred in “backed-up” state, which is the state in which log files can be applied. This state is referred to as “dirty shutdown” state in Microsoft Exchange™.
According to an embodiment, in order to apply log files at any time, the EDB is allowed to go into clean shutdown state after the last log file (for example, log file 4). Then the EDB header is modified to indicate that it is in dirty shutdown state. When the utility system is ready to apply a new set of log files, the EDB will be in dirty shutdown state and the engine will be able to apply the log files. This is referred to as toggling the dirty bit(s) in the appropriate header field of the EDB. The EDB and EDB header are specific to certain embodiments, and are not meant to be limiting. In various embodiments, other systems may use different databases in which there are headers or other structural metadata that can be manipulated to achieve the result of allowing application of log files using the database engine as described. The engine may be any recovery engine employed to recover a database including application of changes made to the database, but not yet applied to the database.
FIG. 11 is a flow diagram illustrating an embodiment of a shadowing process including applying log files according to an embodiment. The process starts, and it is determined whether it is the first time the shadowing process has been run. The first time the process has been run may occur when the shadow repository is empty, or when the utility system and/or the shadowing components have just been installed, or when a new repository has been created. If it is the first time the process has been run, a full copy of the production database is acquired. This involves completely copying the production database into the shadow database.
If it is not the first time the process has been run, an incremental copy is acquired. In order to obtain the incremental copy, it is determined whether there are sufficient un-applied logs present. If sufficient un-applied logs are not present, the process waits for sufficient logs. In one embodiment, this includes going back to the initial starting point. If there are sufficient un-applied logs, it is determined whether the logs are in sequence. If the logs are not in sequence, they cannot be applied, and a full copy of the database is obtained. Alternatively, the production system is accessed specifically to acquire the “missing” log files. Logs must be in sequence because of their nature as multiple transactions that may have interdependencies. In a manner that is analogous to the area of microprocessor instructions, for example, database transactions can be committed or uncommitted.
If there are sufficient log files, the appropriate EDB headers are updated. In practice, there are multiple EDBs, so there are multiple EDB headers. The headers are updated to reference the first log file that has not been applied. The database recovery engine, in this case the ESE, is invoked. The engine is used to replicate the EDB by applying the log files. The replicated EDB is used for later transformation from bulk-to-brick according to an embodiment to be later described.
The EDB headers are updated to indicate dirty shutdown state, and the process returns to the starting point.
FIG. 11 illustrates an embodiment for a production database system that does not support log shipping. Embodiments are also applicable to other systems, for example file systems. To keep an updated copy of a set of files, the process starts by acquiring a set of all the files. Later, all the files in the file system that have changed are obtained, and the previous copy is overwritten. Alternatively, just the differences can be obtained and applied to the previous copy. That is another example of a synthetic full. Embodiments of retrofitted log shipping apply to any application data, or unstructured data.
Whether or not log files are retained by the shadowing process, and how long log files are retained depends on whether the log files include any uncommitted transactions. As previously mentioned, each log file could include several transactions and several of the transaction could be outstanding. At some point there is a “begin” transaction, and at another point there is a corresponding “end” transaction. When a “begin” transaction is encountered by the shadowing process, it is bracketed. The brackets are closed when the corresponding “end” transaction is encountered. All of the transactions between the “begin” transaction and a later “end” transaction are saved until it is confirmed that every transaction in the bracketed chain completed successfully. If every transaction did not complete successfully, all of the transactions in the chain are rolled back. Retention of the appropriate log files facilitates rollback. Accordingly, the log files are accumulated, and as they are applied, a check is made for outstanding transactions. If there are no outstanding transactions associated with a log file, the log file is deleted. If there are outstanding transactions associated with the log file, the log file is saved.
FIG. 12 is a flow diagram of a process of shadowing according to another embodiment in which the a database recovery engine that is part of the production system is directed to a copy of the production data (which in this example is part of the “Jet Blue” Exchange™ database engine (an extensible storage engine (ESE)) is directed to the EDB and used to facilitate shadowing and log shipping. In an example, the database recovery engine is part of the Jet Blue Exchange™ database engine, but embodiments are not so limited. FIG. 12 illustrates an alternative to the method described with reference to FIG. 11 for preventing the EDB from entering a recovered state. FIG. 12 illustrates a continuous log apply process according to which the recovery engine is stalled in order to allow the engine to apply logs multiple times.
A production system includes a production database, such as an EDB, a production database application, such as Exchange™, and log files (or “logs”). A utility system includes a shadow database and multiple log files transferred from the production system. A copy of the production data is received by an embodiment of the utility system. Initially, a baseline copy of the entire production database file is received and stored in a shadow repository. As delta data is generated by the production system, the delta data is received by the utility system. Delta data is any data that records changes made to the database file. In an embodiment, the delta data is one or more log files. In an embodiment, the log files are shipped to a near line server of the utility system from a remote Exchange™ Server. In an embodiment, the frequency of log shipping is pre-defined by a schedule, but the frequency could be determined otherwise, such as by an administrator through a data management application, or the log shipping may be event-driven.
The delta data is applied to the copy using the recovery engine. In systems such as Exchange™ that do not have log shipping capability, after logs are applied, the state of the database being operated on is changed to disallow the further application of log files. In an embodiment, the copy is prevented from entering this state by stalling the recovery engine. When additional log files are ready to be applied, the recovery engine is unstalled, and the additional log files are applied.
A new set of log files is introduced into the shadowing process. One of the log files of the set is replicated and stored. The original copy of the replicated log file is then modified in such a as to manner to stall the recovery engine. There may several possible mechanisms for stalling the recovery engine. One example introduces an exception that occurs during access to the modified log file, which is caught and post-processed by the recovery engine application process.
The recovery engine is directed to resume applying logs from the most recent log application cycle. The Jet Blue engine may be running as part of a larger aggregate system, it may be running on its own, or it may only have essential components reconstituted so that the effect of the Jet Blue engine log application (e.g., recovery) is achieved. In addition it may be possible to have a replacement system that might replicate the necessary capabilities of the Jet Blue engine in order to accomplish the log application process.
The recovery engine applies the logs to the database until it encounters the modified log file, which stalls the Jet Blue engine. This prevents the database from entering a state in which no further logs can be applied.
The replicated log file is then substituted for the modified log file. At this point the shadowing process is ready for a subsequent set of log files and a consequent log application cycle. The process described above can be resumed and replayed every time a new set of logs is received from the production system.
The process illustrated in FIG. 12 is described in relationship to Microsoft Exchange™. However, the process is applicable to other messaging and collaboration servers. The process is also extensible to generic applications that use structured, semi-structured, or unstructured data. Though this example shows a production database or server, it is possible to provide equivalent services to multiple homogeneous or heterogeneous databases or servers. Similarly, though this example described a single shadow database, which in an embodiment includes a near line server, in various embodiments, the shadow database may be clustered, distributed, replicated, virtualized, and may straddle multiple machines or sites.
FIG. 13 is a block diagram of a utility system architecture having the data surrogation capabilities described herein, according to an embodiment. The utility system includes one or more near-line servers (one is shown for convenience) which communicate with a shadow database, a diff database, and an indexed object repository (IOR) database. The utility system further includes one or more SQL servers. An SQL server is a relational database management system (RDBMS) produced by Microsoft. Its primary query language is Transact-SQL, an implementation of the ANSI/ISO standard Structured Query Language (SQL). Other RDBMSs can also be used. Also, more than one SQL server may be used. The SQL server communicates with an SQL database and a log database that stores log files.
The utility system further includes a framework, multiple handlers, and queues (for example, a notification queue and a task queue are shown). The utility system further includes a workflow. In an embodiment, the utility system receives a request. Examples of a request include a timer being activated, or a user or administrator making a request. The request manifests itself as a notification, which is placed in the notification queue. The framework grabs the notification from the notification queue and looks it up in the workflow to determine how to handle the particular notification. The framework looks up the workflow and then calls the appropriate handler depending on what it learned from the workflow. The framework places the notification in the task queue. The handler takes the notification from the task queue and proceeds to handle it as appropriate.
The framework determines whether the request has been successfully handled, and determines what to do next. The framework looks to the workflow to get the next notification and call the next handler, and the process continues. This architecture allows “hot code load”. For example, in an embodiment, the utility system software code, including the code related to the data surrogation capabilities described herein, is written in the form of handlers. This is advantageous, especially in the situation of a system in the field, because the system can be easily updated by simply installing one or more new handlers. If there are any issues with a new handler, the new handler can be discarded in favor of the handler it was meant to replace.
Many variations of retrofitting synthetic full copies are contemplated to be within the scope of the claimed invention. In various embodiments, log shipping is dynamic, in that log files are transferred to the utility system as they are generated and applied as they are generated. This is in contrast to prior systems in which the log files are accumulated and only applied, for example, in the case of a failure of the production server. Dynamic log shipping and application in various embodiments is event driven or occurs according to a pre-defined schedule. Dynamic log shipping provides a further improvement of the recovery point objective (RPO). In one embodiment of dynamic log shipping, the data surrogation or shadowing process receives a notification whenever a new log file is filled up in the production server. The new log file is then transferred to the utility system for subsequent application. The RPO is optimized because in case of a catastrophic failure in Exchange that results in all logs being lost on the production server, the window of data loss is bracketed by the content of a single or partial log file.
In an embodiment, shadowing includes monitoring. For example, a change to the production data is detected and a notification is issued, causing the notification to be handled. This may be accomplished in a manual manner through user intervention or alternatively through automatic notification. The automatic notification may be event driven or it may be scheduled and batched in some manner.
The log transfer process is optional in situations where the shadowing or data surrogation mechanism is co-resident on the production system or server, hence allowing direct access to the production database and log files. This optional transfer may occur over some form of network or equivalent mechanism. This optional process may occur lazily, or eagerly, or in some batched combination.
In various embodiment, the availability of the shadow database to data management applications may be to the actual data that is being modified by the process, or it may be to a copy of that data, or it may be some combination thereof. This may be available in the form of an API or web service or equivalent.
In various embodiments, a log file that has been shipped to, or made available to, the data surrogation mechanism is immediately applied to the shadow database in order to bring it up-to-date. This lowers the utility or near-line window since changes that occur on the messaging server become more immediately visible on the near-line server. Other alternatives exist that might include batching the log files and then making decisions regarding batching and lazy application, perhaps for performance optimization of the utility or near-line server. In other embodiments, the logs are post-processed before they are applied, for example to filter for relevance, or to filter out undesirable content.
In yet other embodiments, log tailing is incorporated into the data surrogation or shadowing process. Dynamic log shipping brings down the RPO to the contents of a single log file, or less. Log tailing may also be used to further reduce the RPO down since the logs are being continually captured as they are written on the production messaging server and then shipped over and applied on the utility or near-line server. According to such embodiments, the modifications that are occurring to the current transaction log are being immediately captured and shipped over to the utility server for application. This could improve the maintenance of the data surrogate from near real-time to real-time. In one example the log files are propagated and applied asynchronously. Other alternatives are possible, such as synchronous application. In addition, rather than apply changes immediately on the utility server, it is possible to batch the changes and apply them lazily.
As individual transactions are being written to the write-ahead logs in the production server, they may be captured and transferred over to the near line server on the right and optionally reconstituted in an embodiment. The apply process as described herein may run on a schedule, be event driven, or run continuously. The apply process may optionally apply the transactions or the re-constituted logs to the shadow database to bring it up-to-date. In various embodiments, data management applications are concurrently able to access a recent copy of the shadow data.
The components of the multi-dimensional surrogation described above may include any collection of computing components and devices operating together. The components of the multi-dimensional surrogation can also be components or subsystems within a larger computer system or network. Components of the multi-dimensional surrogation can also be coupled among any number of components (not shown), for example other buses, controllers, memory devices, and data input/output (I/O) devices, in any number of combinations. Further, functions of the multi-dimensional surrogation can be distributed among any number/combination of other processor-based components.
The information management of an embodiment includes a method comprising receiving a copy of original data at a first server. The original data of an embodiment is stored at a second server. The method of an embodiment includes receiving delta data at the first server in a plurality of instances. The delta data of an embodiment includes information of changes to the original data. The method of an embodiment includes dynamically generating and maintaining an updated version of the copy at the first server by applying the delta data to the copy as the delta data is received.
The generating and maintaining of an embodiment is asynchronous with the receiving.
The applying of an embodiment is according to an interval. The interval of an embodiment is based on one or more of time and events at the second server.
The delta data of an embodiment includes data of an incremental difference between the original data at a plurality of instances.
The delta data of an embodiment includes data of a differential difference between the original data at a plurality of instances.
The method of an embodiment comprises controlling the applying using modified information of a component of the first server.
The component of an embodiment includes one or more of structural metadata of the copy and a log file of the delta data.
The method of an embodiment includes modifying the component.
The component of an embodiment is structural metadata of the copy. The modifying of an embodiment comprises detecting a first state of the copy, wherein the first state indicates the delta data has been applied to the copy. The modifying of an embodiment comprises changing the first state to a second state. The second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version. Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
The component of an embodiment is a log file of a plurality of log files. The delta data a log file of a plurality of log files is a plurality of log files. The applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
The first server of an embodiment includes a near-line server and the second server includes a messaging and collaboration server.
The information management of an embodiment includes a method comprising receiving a plurality of delta data at a first server. The delta data of an embodiment includes information of changes to original data of a second server. The method of an embodiment includes dynamically generating and maintaining an updated version of a copy of the original data at the first server by applying at least one of the plurality of delta data to the copy. The method of an embodiment includes controlling the applying using modified information of a component of the first server.
The component of an embodiment includes structural metadata of the copy.
The component of an embodiment includes a log file of the delta data.
The method of an embodiment includes modifying the component.
The component of an embodiment is structural metadata of the copy. The modifying of an embodiment comprises detecting a first state of the copy. The first state of an embodiment indicates the delta data has been applied to the copy. The modifying of an embodiment comprises changing the first state to a second state. The second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version. Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy. The additional delta data of an embodiment is received after generating the updated version.
The applying of an embodiment includes invoking an engine of the second server. The method of an embodiment includes causing the engine to reference a first unapplied log file of the delta data, wherein the first unapplied log file is a first log file unapplied to the copy.
The delta data of an embodiment is a plurality of log files. The component of an embodiment is a log file of the plurality of log files. The terminating of an embodiment comprises replacing the modified log file with the replicated log file. The applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
The method of an embodiment includes receiving at the first server a copy of the original data from the second server. The copy of an embodiment is a full copy. The copy of an embodiment is an incremental copy.
The method of an embodiment includes transferring the updated version to an indexed object repository.
The generating of an embodiment is in response to at least one of an automatic trigger, a timer notification, an event notification, a poll, and a request.
The automatic trigger of an embodiment includes a trigger automatically initiated in response to at least one pre-specified parameter. The automatic trigger of an embodiment includes content of the updated version.
The timer notification of an embodiment includes notifications corresponding to scheduled events including at least one of maintenance operations, user activities, server activities, and data population operations.
The event notification of an embodiment includes notifications corresponding to changes to data of the original data.
The request of an embodiment includes at least one of access attempts and configuration attempts to the original data by one or more of users of the second server, servers and applications.
The first server of an embodiment includes a near-line server.
The generating of an embodiment is in near real-time and maintains complete integrity and consistency of the original data.
The second server of an embodiment includes a messaging and collaboration server.
The original data of an embodiment includes one or more of application data, databases, storage groups, mailbox data, and server data.
The method of an embodiment includes maintaining the updated version. The maintaining of an embodiment includes generating another updated version by applying at least one set of log files to the updated version. The at least one set of log files of an embodiment is received later in time than the plurality of log files.
The second server of an embodiment includes one or more of local servers, remote servers, database servers, messaging servers, electronic mail servers, instant messaging servers, voice-over Internet Protocol servers, collaboration servers, Exchange Servers, portals, customer relationship management (CRM) servers, enterprise resource planning (ERP) servers, business-to-business servers, and content management servers.
The information management of an embodiment includes a method comprising receiving a copy of original data at a first server. The original data is stored at a second server. The method of an embodiment includes receiving a plurality of delta data at the first server. The delta data of an embodiment includes information of changes to the original data. The method of an embodiment includes dynamically generating and maintaining an updated version of the copy at the first server by applying at least one of the plurality of delta data to the copy. The method of an embodiment includes controlling the applying using modified information of a component of the first server.
The information management of an embodiment includes a computer readable medium including executable instructions which, when executed in a processing system, support near real-time data shadowing by receiving a plurality of delta data at a first server. The delta data of an embodiment includes information of changes to original data of a second server. The instructions of an embodiment when executed dynamically generate and maintain an updated version of a copy of the original data at the first server by applying at least one of the plurality of delta data to the copy. The instructions of an embodiment when executed control the applying using modified information of a component of the first server.
The component of an embodiment includes structural metadata of the copy.
The delta data of an embodiment comprises at least one log file, and the component of an embodiment includes one of the at least one log files.
The information management of an embodiment includes a system comprising a near-line server coupled to one or more servers that include original data. The system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive a copy of the original data. The shadowing system of an embodiment is configured to receive delta data in a plurality of instances. The delta data of an embodiment includes information of changes to the original data. The shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of the copy at the near-line server by applying the delta data to the copy as the delta data is received.
The shadowing system of an embodiment is configured to generate and maintain asynchronously.
The delta data of an embodiment includes data of an incremental difference between the original data at a plurality of instances.
The delta data of an embodiment includes data of a differential difference between the original data at a plurality of instances.
The shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
The component of an embodiment includes one or more of structural metadata of the copy and a log file of the delta data.
The shadowing system of an embodiment is configured to modify the component.
The component of an embodiment is structural metadata of the copy.
The modifying of an embodiment comprises configured to detect a first state of the copy, wherein the first state indicates the delta data has been applied to the copy.
The modifying of an embodiment comprises configured to change the first state to a second state, wherein the second state is a state from which another updated version can be generated by applying additional delta data to the updated version.
Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
The delta data of an embodiment is a plurality of log files, wherein the component is a log file of a plurality of log files.
The applying of an embodiment includes invoking an engine of the one or more servers and the terminating includes stalling the engine.
The one or more servers of an embodiment include a messaging and collaboration server.
The information management of an embodiment includes a system comprising a near-line server coupled to one or more servers that include original data. The system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive a copy of the original data. The shadowing system of an embodiment is configured to receive delta data that includes information of changes to the original data. The shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of the copy at the near-line server by applying at least one of the plurality of delta data to the copy as the delta data is received. The shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
The information management of an embodiment includes a system comprising a near-line server coupled to one or more servers. The system of an embodiment includes a shadowing system coupled to the near-line server and configured to receive delta data that describes incremental changes to original data of one or more servers. The shadowing system of an embodiment is configured to dynamically generate and maintain an updated version of a copy of the original data at the near-line server by applying at least one of the plurality of the delta data to the copy. The shadowing system of an embodiment is configured to control the applying using modified information of a component of the near-line server.
The component of an embodiment includes structural metadata of the copy.
The component of an embodiment includes a log file of the delta data.
The shadowing system of an embodiment is configured to modify the component.
The component of an embodiment is structural metadata of the copy.
Configured to modify of an embodiment comprises configured to detect a first state of the copy. The first state of an embodiment indicates the delta data has been applied to the copy.
Configured to modify of an embodiment comprises configured to change the first state to a second state. The second state of an embodiment is a state from which another updated version can be generated by applying additional delta data to the updated version.
Changing the first state to the second state of an embodiment includes modifying the structural metadata of the copy.
The additional delta data of an embodiment is received after generating the updated version.
The applying of an embodiment includes invoking an engine of the one or more servers.
The shadowing system of an embodiment is configured to cause the engine to reference a first unapplied log file of the delta data. The first unapplied log file of an embodiment is a first log file unapplied to the copy.
The delta data of an embodiment is a plurality of log files. The component of an embodiment is a log file of the plurality of log files.
The applying of an embodiment includes invoking an engine of the second server and the terminating includes stalling the engine.
The shadowing system of an embodiment is configured to receive the copy from the one or more servers.
The copy of an embodiment is a full copy.
The copy of an embodiment is an incremental copy.
The shadowing system of an embodiment is configured to transfer the updated version to an indexed object repository.
The shadowing system of an embodiment is configured to generate and maintain in response to at least one of an automatic trigger, a timer notification, an event notification, a poll, and a request.
The automatic trigger of an embodiment includes a trigger automatically initiated in response to at least one pre-specified parameter.
The automatic trigger of an embodiment includes content of the updated version.
The timer notification of an embodiment includes notifications corresponding to scheduled events including at least one of maintenance operations, user activities, server activities, and data population operations.
The event notification of an embodiment includes notifications corresponding to changes to data of the original data.
The request of an embodiment includes at least one of access attempts and configuration attempts to the original data by one or more of users of the second server, servers and applications.
The shadowing system of an embodiment is configured to generate and maintain in near real-time with complete integrity and consistency of the original data.
The one or more servers of an embodiment include a messaging and collaboration server.
The original data of an embodiment includes one or more of application data, databases, storage groups, mailbox data, and server data.
The shadowing system of an embodiment is configured to maintain the updated version by generating another updated version by applying at least one set of log files to the updated version, the at least one set of log files received later in time than the delta data.
The one or more servers of an embodiment include one or more of local servers, remote servers, database servers, messaging servers, electronic mail servers, instant messaging servers, voice-over Internet Protocol servers, collaboration servers, Exchange Servers, portals, customer relationship management (CRM) servers, enterprise resource planning (ERP) servers, business-to-business servers, and content management servers.
Aspects of the multi-dimensional surrogation described herein may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (PLDs), such as field programmable gate arrays (FPGAs), programmable array logic (PAL) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits (ASICs). Some other possibilities for implementing aspects of the multi-dimensional surrogation include: microcontrollers with memory (such as electronically erasable programmable read only memory (EEPROM)), embedded microprocessors, firmware, software, etc. Furthermore, aspects of the multi-dimensional surrogation may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. Any underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (MOSFET) technologies like complementary metal-oxide semiconductor (CMOS), bipolar technologies like emitter-coupled logic (ECL), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, etc.
It should be noted that the various components of multi-dimensional surrogation disclosed herein may be described using data and/or instructions embodied in various computer-readable media. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof. Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, etc.). When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the multi-dimensional surrogation may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
The above description of illustrated embodiments of the multi-dimensional surrogation is not intended to be exhaustive or to limit the multi-dimensional surrogation to the precise form disclosed. While specific embodiments of, and examples for, the multi-dimensional surrogation are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the multi-dimensional surrogation, as those skilled in the relevant art will recognize. The teachings of the multi-dimensional surrogation provided herein can be applied to other processing systems and methods, not only for the systems and methods described above.
The elements and acts of the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the multi-dimensional surrogation and methods in light of the above detailed description.
In general, in the following claims, the terms used should not be construed to limit the multi-dimensional surrogation and methods to the specific embodiments disclosed in the specification and the claims, but should be construed to include all processing systems that operate under the claims. Accordingly, the multi-dimensional surrogation is not limited by the disclosure, but instead the scope of the multi-dimensional surrogation is to be determined entirely by the claims.
While certain aspects of the multi-dimensional surrogation are presented below in certain claim forms, the inventors contemplate the various aspects of the multi-dimensional surrogation in any number of claim forms. For example, while only one aspect of the multi-dimensional surrogation is recited as embodied in machine-readable media, other aspects may likewise be embodied in machine-readable media. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the multi-dimensional surrogation.

Claims (46)

1. A system comprising:
a near-line server coupled to one or more servers that include original data; and
a shadowing system coupled to the near-line server and configured to:
receive a copy of the original data;
receive delta data in a plurality of instances, wherein the delta data includes data of an incremental and differential difference between the original data; and
dynamically and continuously generate and maintain an updated version of the copy at the near-line server in near real time, wherein the generating and maintaining comprise the one or more servers writing ahead information to at least one log file in response to changes in state of the original data, the information including the delta data, the one or more servers shipping the at least one log file to the near-line server, and the near-line server receiving and applying the write-ahead logs in maintaining the updated version; and
retrofit log shipping capability for application data and other structured and unstructured data based in part on the use of an override operation of an application state to allow log files to be applied as they are received, the override operation used in part to prevent a component of the near line server from entering the application state in which no further log files can be applied.
2. The system of claim 1, wherein the shadowing system is configured to generate and maintain asynchronously.
3. The system of claim 1, wherein the shadowing system is configured to control the applying using modified information of the component of the near-line server.
4. The system of claim 3, wherein the component includes one or more of structural metadata of the copy and a log file of the delta data.
5. The system of claim 4, wherein the delta data is a plurality of log files, wherein the component is a log file of a plurality of log files.
6. The system of claim 5, wherein the applying includes invoking an engine of the one or more servers and terminating the applying includes stalling the engine.
7. The system of claim 3, wherein the shadowing system is configured to modify the component.
8. The system of claim 7, wherein the component is structural metadata of the copy.
9. The system of claim 7, wherein configured to modify comprises configured to detect a first state of the copy, wherein the first state indicates the delta data has been applied to the copy.
10. The system of claim 9, wherein configured to modify comprises configured to change the first state to a second state, wherein the second state is a state from which another updated version can be generated by applying additional delta data to the updated version.
11. The system of claim 10, wherein changing the first state to the second state includes modifying the structural metadata of the copy.
12. The system of claim 1, wherein the one or more servers include a messaging and collaboration server.
13. A system comprising:
a near-line server coupled to one or more servers; and
a shadowing system coupled to the near-line server and configured to:
receive delta data that describes at least one of an incremental and differential change to original data of one or more servers;
dynamically and continuously generate and maintain an updated version of a copy of the original data at the near-line server, wherein the generating and maintaining comprise the one or more servers writing ahead information to at least one log file in response to changes in state of the original data, the information including the delta data, the one or more servers shipping the at least one log file to the near-line server, and the near-line server receiving and applying the write-ahead logs in maintaining the updated version; and
retro-fit log shipping capability for application data and other structured and unstructured data based in part on the use of an override operation of an application state to allow log files to be applied as they are received, the override operation used in part to prevent a component of the near-line server from entering the application state in which no further log files can be applied; and
control the applying using modified information of the component of the near-line server.
14. The system of claim 13, wherein the component includes structural metadata of the copy.
15. The system of claim 13, wherein the component includes a log file of the delta data.
16. The system of claim 13, wherein the shadowing system is configured to modify the component.
17. The system of claim 16, wherein the component is structural metadata of the copy.
18. The system of claim 16, wherein configured to modify comprises configured to detect a first state of the copy, wherein the first state indicates the delta data has been applied to the copy.
19. The system of claim 18, wherein configured to modify comprises configured to change the first state to a second state, wherein the second state is a state from which another updated version can be generated by applying additional delta data to the updated version.
20. The system of claim 19, wherein changing the first state to the second state includes modifying the structural metadata of the copy.
21. The system of claim 19, wherein the additional delta data is received after generating the updated version.
22. The system of claim 16, wherein the applying includes invoking an engine of the one or more servers.
23. The system of claim 22, wherein the shadowing system is configured to cause the engine to reference a first unapplied log file of the delta data, wherein the first unapplied log file is a first log file unapplied to the copy.
24. The system of claim 16, wherein the delta data is a plurality of log files, wherein the component is a log file of the plurality of log files.
25. The system of claim 24, wherein the shadowing system is configured to identify a selected log file of the plurality of log files and replicate the selected log file to form a replicated log file.
26. The system of claim 25, wherein the selected log file is a last-received log file.
27. The system of claim 25, wherein the shadowing system is configured to generate a modified log file by modifying information of the selected log file.
28. The system of claim 27, wherein the applying comprises:
applying log files of the plurality of log files in sequence; and
terminating the applying in response to encountering the modified log file.
29. The system of claim 28, wherein the terminating comprises replacing the modified log file with the replicated log file.
30. The system of claim 28, wherein the applying includes invoking an engine of a second server and the terminating the applying includes stalling the engine.
31. The system of claim 13, wherein the shadowing system is configured to receive the copy from the one or more servers.
32. The system of claim 31, wherein the copy is a full copy.
33. The system of claim 31, wherein the copy is an incremental copy.
34. The system of claim 13, wherein the shadowing system is configured to transfer the updated version to an indexed object repository.
35. The system of claim 13, wherein the shadowing system is configured to generate and maintain in response to at least one of an automatic trigger, a timer notification, an event notification, a poll, and a request.
36. The system of claim 35, wherein the automatic trigger includes a trigger automatically initiated in response to at least one pre-specified parameter.
37. The system of claim 36, wherein the automatic trigger includes content of the updated version.
38. The system of claim 35, wherein the timer notification includes notifications corresponding to scheduled events including at least one of maintenance operations, user activities, server activities, and data population operations.
39. The system of claim 35, wherein the event notification includes notifications corresponding to changes to data of the original data.
40. The system of claim 35, wherein the request includes at least one of access attempts and configuration attempts to the original data by one or more of users of a second server, servers and applications.
41. The system of claim 13, wherein the shadowing system is configured to generate and maintain the updated version with complete integrity and consistency of the original data.
42. The system of claim 13, wherein the one or more servers include a messaging and collaboration server.
43. The system of claim 13, wherein the original data includes one or more of application data, databases, storage groups, mailbox data, and server data.
44. The system of claim 13, wherein the shadowing system is configured to maintain the updated version by generating another updated version by applying at least one set of log files to the updated version, the at least one set of log files received later in time than the delta data.
45. The system of claim 13, wherein the one or more servers include one or more of local servers, remote servers, database servers, messaging servers, electronic mail servers, instant messaging servers, voice-over Internet Protocol servers, collaboration servers, Exchange Servers, portals, customer relationship management (CRM) servers, enterprise resource planning (ERP) servers, business-to-business servers, and content management servers.
46. A system comprising:
one or more near-line servers coupled to one or more servers; and
a shadowing system coupled to one or more of the near-line servers and configured to:
dynamically and continuously generate and maintain an updated version of application data at the one or more near-line servers, wherein the generating and maintaining comprise the one or more servers writing ahead information to at least one log file, the information comprising delta data that includes at least one of an incremental and differential change in state of the original data, the one or more servers shipping the at least one log file to the one or more near-line servers, and the one or more near-line servers receiving and applying the write-ahead logs in maintaining the updated version; and
retro-fit log shipping capability for application data and other structured and unstructured data based in part on use of an override operation of an application state to allow log files to be applied as they are received, the override operation to prevent a component of the one or more near-line servers from entering the application state in which no further log files can be applied;
replicate one or more of the log files associated with the delta data to form one or more replicated log files; and
generate one or more modified log files by modifying information of one or more of the log files.
US11/541,857 2005-02-07 2006-10-02 Retro-fitting synthetic full copies of data Expired - Fee Related US8271436B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/541,857 US8271436B2 (en) 2005-02-07 2006-10-02 Retro-fitting synthetic full copies of data

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US65055605P 2005-02-07 2005-02-07
US11/211,056 US7778976B2 (en) 2005-02-07 2005-08-23 Multi-dimensional surrogates for data management
US11/500,809 US8161318B2 (en) 2005-02-07 2006-08-07 Enterprise service availability through identity preservation
US11/541,857 US8271436B2 (en) 2005-02-07 2006-10-02 Retro-fitting synthetic full copies of data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/500,809 Continuation-In-Part US8161318B2 (en) 2005-02-07 2006-08-07 Enterprise service availability through identity preservation

Publications (2)

Publication Number Publication Date
US20070233756A1 US20070233756A1 (en) 2007-10-04
US8271436B2 true US8271436B2 (en) 2012-09-18

Family

ID=46326223

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/541,857 Expired - Fee Related US8271436B2 (en) 2005-02-07 2006-10-02 Retro-fitting synthetic full copies of data

Country Status (1)

Country Link
US (1) US8271436B2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8799206B2 (en) 2005-02-07 2014-08-05 Mimosa Systems, Inc. Dynamic bulk-to-brick transformation of data
US8812433B2 (en) 2005-02-07 2014-08-19 Mimosa Systems, Inc. Dynamic bulk-to-brick transformation of data
US8918366B2 (en) 2005-02-07 2014-12-23 Mimosa Systems, Inc. Synthetic full copies of data and dynamic bulk-to-brick transformation
US9275086B2 (en) 2012-07-20 2016-03-01 Commvault Systems, Inc. Systems and methods for database archiving
US9720787B2 (en) 2013-01-11 2017-08-01 Commvault Systems, Inc. Table level database restore in a data storage system
US9904598B2 (en) 2015-04-21 2018-02-27 Commvault Systems, Inc. Content-independent and database management system-independent synthetic full backup of a database based on snapshot technology
US10108687B2 (en) 2015-01-21 2018-10-23 Commvault Systems, Inc. Database protection using block-level mapping
US11269732B2 (en) 2019-03-12 2022-03-08 Commvault Systems, Inc. Managing structured data in a data storage system
US11321281B2 (en) 2015-01-15 2022-05-03 Commvault Systems, Inc. Managing structured data in a data storage system
US11580173B2 (en) 2017-12-08 2023-02-14 Palantir Technologies Inc. Systems and methods for using linked documents
US11809432B2 (en) 2002-01-14 2023-11-07 Awemane Ltd. Knowledge gathering system based on user's affinity
US11921796B2 (en) 2023-02-13 2024-03-05 Palantir Technologies Inc. Systems and methods for using linked documents

Families Citing this family (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2632935C (en) 2005-12-19 2014-02-04 Commvault Systems, Inc. Systems and methods for performing data replication
US7606844B2 (en) 2005-12-19 2009-10-20 Commvault Systems, Inc. System and method for performing replication copy storage operations
US7651593B2 (en) 2005-12-19 2010-01-26 Commvault Systems, Inc. Systems and methods for performing data replication
US8515912B2 (en) 2010-07-15 2013-08-20 Palantir Technologies, Inc. Sharing and deconflicting data changes in a multimaster database system
US8688749B1 (en) 2011-03-31 2014-04-01 Palantir Technologies, Inc. Cross-ontology multi-master replication
US8930331B2 (en) * 2007-02-21 2015-01-06 Palantir Technologies Providing unique views of data based on changes or rules
US8990378B2 (en) * 2007-07-05 2015-03-24 Interwise Ltd. System and method for collection and analysis of server log files
US8554734B1 (en) 2007-07-19 2013-10-08 American Megatrends, Inc. Continuous data protection journaling in data storage systems
US7877553B2 (en) * 2007-08-06 2011-01-25 Microsoft Corporation Sharing volume data via shadow copies using differential areas
US8554719B2 (en) 2007-10-18 2013-10-08 Palantir Technologies, Inc. Resolving database entity information
US8706694B2 (en) * 2008-07-15 2014-04-22 American Megatrends, Inc. Continuous data protection of files stored on a remote storage device
US10747952B2 (en) 2008-09-15 2020-08-18 Palantir Technologies, Inc. Automatic creation and server push of multiple distinct drafts
US8204859B2 (en) * 2008-12-10 2012-06-19 Commvault Systems, Inc. Systems and methods for managing replicated database data
US9495382B2 (en) 2008-12-10 2016-11-15 Commvault Systems, Inc. Systems and methods for performing discrete data replication
US8332678B1 (en) 2009-06-02 2012-12-11 American Megatrends, Inc. Power save mode operation for continuous data protection
US8315977B2 (en) * 2010-02-22 2012-11-20 Netflix, Inc. Data synchronization between a data center environment and a cloud computing environment
US8504515B2 (en) 2010-03-30 2013-08-06 Commvault Systems, Inc. Stubbing systems and methods in a data replication environment
US8364642B1 (en) 2010-07-07 2013-01-29 Palantir Technologies, Inc. Managing disconnected investigations
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US9092482B2 (en) 2013-03-14 2015-07-28 Palantir Technologies, Inc. Fair scheduling for mixed-query loads
US8799240B2 (en) 2011-06-23 2014-08-05 Palantir Technologies, Inc. System and method for investigating large amounts of data
US9280532B2 (en) 2011-08-02 2016-03-08 Palantir Technologies, Inc. System and method for accessing rich objects via spreadsheets
US8732574B2 (en) 2011-08-25 2014-05-20 Palantir Technologies, Inc. System and method for parameterizing documents for automatic workflow generation
US8504542B2 (en) 2011-09-02 2013-08-06 Palantir Technologies, Inc. Multi-row transactions
US8782004B2 (en) 2012-01-23 2014-07-15 Palantir Technologies, Inc. Cross-ACL multi-master replication
US9081975B2 (en) 2012-10-22 2015-07-14 Palantir Technologies, Inc. Sharing information between nexuses that use different classification schemes for information access control
US9348677B2 (en) 2012-10-22 2016-05-24 Palantir Technologies Inc. System and method for batch evaluation programs
US9501761B2 (en) 2012-11-05 2016-11-22 Palantir Technologies, Inc. System and method for sharing investigation results
US10140664B2 (en) 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database
US8903717B2 (en) 2013-03-15 2014-12-02 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US9230280B1 (en) 2013-03-15 2016-01-05 Palantir Technologies Inc. Clustering data based on indications of financial malfeasance
US8855999B1 (en) 2013-03-15 2014-10-07 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US8909656B2 (en) 2013-03-15 2014-12-09 Palantir Technologies Inc. Filter chains with associated multipath views for exploring large data sets
US8924388B2 (en) 2013-03-15 2014-12-30 Palantir Technologies Inc. Computer-implemented systems and methods for comparing and associating objects
US8868486B2 (en) 2013-03-15 2014-10-21 Palantir Technologies Inc. Time-sensitive cube
US8930897B2 (en) 2013-03-15 2015-01-06 Palantir Technologies Inc. Data integration tool
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US8886601B1 (en) 2013-06-20 2014-11-11 Palantir Technologies, Inc. System and method for incrementally replicating investigative analysis data
US8601326B1 (en) 2013-07-05 2013-12-03 Palantir Technologies, Inc. Data quality monitors
US8938686B1 (en) 2013-10-03 2015-01-20 Palantir Technologies Inc. Systems and methods for analyzing performance of an entity
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US9569070B1 (en) 2013-11-11 2017-02-14 Palantir Technologies, Inc. Assisting in deconflicting concurrency conflicts
US9105000B1 (en) 2013-12-10 2015-08-11 Palantir Technologies Inc. Aggregating data from a plurality of data sources
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9043696B1 (en) 2014-01-03 2015-05-26 Palantir Technologies Inc. Systems and methods for visual definition of data associations
US9009827B1 (en) 2014-02-20 2015-04-14 Palantir Technologies Inc. Security sharing system
US8924429B1 (en) 2014-03-18 2014-12-30 Palantir Technologies Inc. Determining and extracting changed data from a data source
US9836580B2 (en) 2014-03-21 2017-12-05 Palantir Technologies Inc. Provider portal
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9021260B1 (en) 2014-07-03 2015-04-28 Palantir Technologies Inc. Malware data item analysis
US9785773B2 (en) 2014-07-03 2017-10-10 Palantir Technologies Inc. Malware data item analysis
US10572496B1 (en) 2014-07-03 2020-02-25 Palantir Technologies Inc. Distributed workflow system and database with access controls for city resiliency
US9419992B2 (en) 2014-08-13 2016-08-16 Palantir Technologies Inc. Unwanted tunneling alert system
US9454281B2 (en) 2014-09-03 2016-09-27 Palantir Technologies Inc. System for providing dynamic linked panels in user interface
US9483546B2 (en) 2014-12-15 2016-11-01 Palantir Technologies Inc. System and method for associating related records to common entities across multiple lists
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US10452651B1 (en) 2014-12-23 2019-10-22 Palantir Technologies Inc. Searching charts
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US11302426B1 (en) 2015-01-02 2022-04-12 Palantir Technologies Inc. Unified data interface and system
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9672257B2 (en) 2015-06-05 2017-06-06 Palantir Technologies Inc. Time-series data storage and processing database system
US9384203B1 (en) 2015-06-09 2016-07-05 Palantir Technologies Inc. Systems and methods for indexing and aggregating data records
US10628834B1 (en) 2015-06-16 2020-04-21 Palantir Technologies Inc. Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces
US9407652B1 (en) 2015-06-26 2016-08-02 Palantir Technologies Inc. Network anomaly detection
US9418337B1 (en) 2015-07-21 2016-08-16 Palantir Technologies Inc. Systems and models for data analytics
US9392008B1 (en) 2015-07-23 2016-07-12 Palantir Technologies Inc. Systems and methods for identifying information related to payment card breaches
US10127289B2 (en) 2015-08-19 2018-11-13 Palantir Technologies Inc. Systems and methods for automatic clustering and canonical designation of related data in various data structures
US9537880B1 (en) 2015-08-19 2017-01-03 Palantir Technologies Inc. Anomalous network monitoring, user behavior detection and database system
US10402385B1 (en) 2015-08-27 2019-09-03 Palantir Technologies Inc. Database live reindex
US9984428B2 (en) 2015-09-04 2018-05-29 Palantir Technologies Inc. Systems and methods for structuring data from unstructured electronic data files
US9454564B1 (en) 2015-09-09 2016-09-27 Palantir Technologies Inc. Data integrity checks
US10044745B1 (en) 2015-10-12 2018-08-07 Palantir Technologies, Inc. Systems for computer network security risk assessment including user compromise analysis associated with a network of devices
US9760556B1 (en) 2015-12-11 2017-09-12 Palantir Technologies Inc. Systems and methods for annotating and linking electronic documents
US9514414B1 (en) 2015-12-11 2016-12-06 Palantir Technologies Inc. Systems and methods for identifying and categorizing electronic documents through machine learning
US9542446B1 (en) 2015-12-17 2017-01-10 Palantir Technologies, Inc. Automatic generation of composite datasets based on hierarchical fields
US10621198B1 (en) 2015-12-30 2020-04-14 Palantir Technologies Inc. System and method for secure database replication
US9753935B1 (en) 2016-08-02 2017-09-05 Palantir Technologies Inc. Time-series data storage and processing database system
US11106692B1 (en) 2016-08-04 2021-08-31 Palantir Technologies Inc. Data record resolution and correlation system
US10133588B1 (en) 2016-10-20 2018-11-20 Palantir Technologies Inc. Transforming instructions for collaborative updates
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US10884875B2 (en) * 2016-12-15 2021-01-05 Palantir Technologies Inc. Incremental backup of computer data files
US10223099B2 (en) 2016-12-21 2019-03-05 Palantir Technologies Inc. Systems and methods for peer-to-peer build sharing
US10262053B2 (en) 2016-12-22 2019-04-16 Palantir Technologies Inc. Systems and methods for data replication synchronization
US10068002B1 (en) 2017-04-25 2018-09-04 Palantir Technologies Inc. Systems and methods for adaptive data replication
US11074277B1 (en) 2017-05-01 2021-07-27 Palantir Technologies Inc. Secure resolution of canonical entities
US10896097B1 (en) 2017-05-25 2021-01-19 Palantir Technologies Inc. Approaches for backup and restoration of integrated databases
US10430062B2 (en) 2017-05-30 2019-10-01 Palantir Technologies Inc. Systems and methods for geo-fenced dynamic dissemination
GB201708818D0 (en) 2017-06-02 2017-07-19 Palantir Technologies Inc Systems and methods for retrieving and processing data
US11030494B1 (en) 2017-06-15 2021-06-08 Palantir Technologies Inc. Systems and methods for managing data spills
US11334552B2 (en) 2017-07-31 2022-05-17 Palantir Technologies Inc. Lightweight redundancy tool for performing transactions
US10417224B2 (en) 2017-08-14 2019-09-17 Palantir Technologies Inc. Time series database processing system
US10216695B1 (en) 2017-09-21 2019-02-26 Palantir Technologies Inc. Database system for time series data storage, processing, and analysis
US11281726B2 (en) 2017-12-01 2022-03-22 Palantir Technologies Inc. System and methods for faster processor comparisons of visual graph features
US10614069B2 (en) 2017-12-01 2020-04-07 Palantir Technologies Inc. Workflow driven database partitioning
US10235533B1 (en) 2017-12-01 2019-03-19 Palantir Technologies Inc. Multi-user access controls in electronic simultaneously editable document editor
US11016986B2 (en) 2017-12-04 2021-05-25 Palantir Technologies Inc. Query-based time-series data display and processing system
US11061874B1 (en) 2017-12-14 2021-07-13 Palantir Technologies Inc. Systems and methods for resolving entity data across various data structures
US10915542B1 (en) 2017-12-19 2021-02-09 Palantir Technologies Inc. Contextual modification of data sharing constraints in a distributed database system that uses a multi-master replication scheme
US10838987B1 (en) 2017-12-20 2020-11-17 Palantir Technologies Inc. Adaptive and transparent entity screening
GB201807534D0 (en) 2018-05-09 2018-06-20 Palantir Technologies Inc Systems and methods for indexing and searching
US11061542B1 (en) 2018-06-01 2021-07-13 Palantir Technologies Inc. Systems and methods for determining and displaying optimal associations of data items
US10795909B1 (en) 2018-06-14 2020-10-06 Palantir Technologies Inc. Minimized and collapsed resource dependency path
US11561999B2 (en) * 2019-01-31 2023-01-24 Rubrik, Inc. Database recovery time objective optimization with synthetic snapshots
US11042318B2 (en) 2019-07-29 2021-06-22 Commvault Systems, Inc. Block-level data replication
US11372732B2 (en) * 2020-02-25 2022-06-28 Veritas Technologies Llc Systems and methods for agentless and accelerated backup of a database
US11809285B2 (en) 2022-02-09 2023-11-07 Commvault Systems, Inc. Protecting a management database of a data storage management system to meet a recovery point objective (RPO)

Citations (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4914586A (en) * 1987-11-06 1990-04-03 Xerox Corporation Garbage collector for hypermedia systems
US5287501A (en) * 1991-07-11 1994-02-15 Digital Equipment Corporation Multilevel transaction recovery in a database system which loss parent transaction undo operation upon commit of child transaction
US5418940A (en) * 1993-08-04 1995-05-23 International Business Machines Corporation Method and means for detecting partial page writes and avoiding initializing new pages on DASD in a transaction management system environment
US5574897A (en) * 1992-09-30 1996-11-12 International Business Machines Corporation System managed logging of objects to speed recovery processing
US5689698A (en) 1995-10-20 1997-11-18 Ncr Corporation Method and apparatus for managing shared data using a data surrogate and obtaining cost parameters from a data dictionary by evaluating a parse tree object
US5706510A (en) * 1996-03-15 1998-01-06 Hewlett-Packard Company Zymbolic history management system
US5721916A (en) * 1994-11-18 1998-02-24 Microsoft Corporation Method and system for shadowing file system structures from multiple types of networks
US5870763A (en) * 1997-03-10 1999-02-09 Microsoft Corporation Database computer system with application recovery and dependency handling read cache
US5933838A (en) * 1997-03-10 1999-08-03 Microsoft Corporation Database computer system with application recovery and recovery log sequence numbers to optimize recovery
US5946698A (en) * 1997-03-10 1999-08-31 Microsoft Corporation Database computer system with application recovery
US5950212A (en) * 1997-04-11 1999-09-07 Oracle Corporation Method and system for workload based group committing for improved performance
US5966706A (en) * 1997-02-19 1999-10-12 At&T Corp Local logging in a distributed database management computer system
US5974563A (en) 1995-10-16 1999-10-26 Network Specialists, Inc. Real time backup system
US6006342A (en) * 1997-12-11 1999-12-21 International Business Machines Corporation Failover and failback system for a direct access storage device
US6014674A (en) * 1996-11-14 2000-01-11 Sybase, Inc. Method for maintaining log compatibility in database systems
US6067550A (en) * 1997-03-10 2000-05-23 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US6094654A (en) 1996-12-06 2000-07-25 International Business Machines Corporation Data management system for file and database management
US6148410A (en) 1997-09-15 2000-11-14 International Business Machines Corporation Fault tolerant recoverable TCP/IP connection router
US6182086B1 (en) * 1998-03-02 2001-01-30 Microsoft Corporation Client-server computer system with application recovery of server applications and client applications
US6185699B1 (en) * 1998-01-05 2001-02-06 International Business Machines Corporation Method and apparatus providing system availability during DBMS restart recovery
US6226651B1 (en) * 1998-03-27 2001-05-01 International Business Machines Corporation Database disaster remote site recovery
US6240414B1 (en) 1997-09-28 2001-05-29 Eisolutions, Inc. Method of resolving data conflicts in a shared data environment
US6249879B1 (en) 1997-11-11 2001-06-19 Compaq Computer Corp. Root filesystem failover in a single system image environment
US6253230B1 (en) 1998-09-22 2001-06-26 International Business Machines Corporation Distributed scalable device for selecting a server from a server cluster and a switched path to the selected server
US6256773B1 (en) * 1999-08-31 2001-07-03 Accenture Llp System, method and article of manufacture for configuration management in a development architecture framework
US6260129B1 (en) * 1998-09-08 2001-07-10 International Business Machines Corportion Management of fixed pages in memory for input/output operations
US6338146B1 (en) * 1997-09-30 2002-01-08 Compaq Computer Corporation Method and apparatus for fault-tolerant, scalable and non-blocking three-phase flushing for committing database transactions in a cluster of multiprocessors
US20020029207A1 (en) 2000-02-28 2002-03-07 Hyperroll, Inc. Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein
US6385706B1 (en) * 1998-12-31 2002-05-07 Emx Corporation Apparatus and methods for copying a logical object to a primary storage device using a map of storage locations
US6397308B1 (en) * 1998-12-31 2002-05-28 Emc Corporation Apparatus and method for differential backup and restoration of data in a computer storage system
US6434568B1 (en) 1999-08-31 2002-08-13 Accenture Llp Information services patterns in a netcentric environment
US20020129096A1 (en) 2001-02-14 2002-09-12 Mansour Peter M. Platform-independent distributed user interface client architecture
US6477580B1 (en) 1999-08-31 2002-11-05 Accenture Llp Self-described stream in a communication services patterns environment
US20020163910A1 (en) 2001-05-01 2002-11-07 Wisner Steven P. System and method for providing access to resources using a fabric switch
US6487561B1 (en) * 1998-12-31 2002-11-26 Emc Corporation Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size
US6490594B1 (en) * 1997-04-04 2002-12-03 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US20030005350A1 (en) 2001-06-29 2003-01-02 Maarten Koning Failover management system
US20030009295A1 (en) 2001-03-14 2003-01-09 Victor Markowitz System and method for retrieving and using gene expression data from multiple sources
US6516337B1 (en) 1999-10-14 2003-02-04 Arcessa, Inc. Sending to a central indexing site meta data or signatures from objects on a computer network
US6523027B1 (en) 1999-07-30 2003-02-18 Accenture Llp Interfacing servers in a Java based e-commerce architecture
US20030061456A1 (en) * 1998-12-31 2003-03-27 Yuval Ofek Apparatus and methods for copying, backing up and restoring logical objects in a computer storage system by transferring blocks out of order or in parallel
US20030058277A1 (en) 1999-08-31 2003-03-27 Bowman-Amuah Michel K. A view configurer in a presentation services patterns enviroment
US6564215B1 (en) * 1999-12-16 2003-05-13 International Business Machines Corporation Update support in database content management
US6578041B1 (en) * 2000-06-30 2003-06-10 Microsoft Corporation High speed on-line backup when using logical log operations
US20030120458A1 (en) 2001-11-02 2003-06-26 Rao R. Bharat Patient data mining
US6587933B2 (en) * 2001-01-26 2003-07-01 International Business Machines Corporation Method, system, and program for discarding data in a storage system where updates to a primary storage device are shadowed in a secondary storage device
US20030135499A1 (en) 2002-01-14 2003-07-17 Schirmer Andrew Lewis System and method for mining a user's electronic mail messages to determine the user's affinities
US20030159083A1 (en) 2000-09-29 2003-08-21 Fukuhara Keith T. System, method and apparatus for data processing and storage to provide continuous operations independent of device failure or disaster
US20030200272A1 (en) 2002-04-18 2003-10-23 Leon Campise System and method for data collection and update utilizing surrogate e-mail addresses using a server
US20030229900A1 (en) 2002-05-10 2003-12-11 Richard Reisman Method and apparatus for browsing using multiple coordinated device sets
US20030233518A1 (en) 2002-06-12 2003-12-18 Hitachi, Ltd. Method and apparatus for managing replication volumes
US6668253B1 (en) 1999-09-08 2003-12-23 Reynolds & Reynolds Holdings, Inc. Enterprise information management system and methods
US6675205B2 (en) 1999-10-14 2004-01-06 Arcessa, Inc. Peer-to-peer automated anonymous asynchronous file sharing
US6694447B1 (en) * 2000-09-29 2004-02-17 Sun Microsystems, Inc. Apparatus and method for increasing application availability during a disaster fail-back
US20040034808A1 (en) 2002-08-16 2004-02-19 International Business Machines Corporation Method, system, and program for providing a mirror copy of data
US6718361B1 (en) * 2000-04-07 2004-04-06 Network Appliance Inc. Method and apparatus for reliable and scalable distribution of data files in distributed networks
US20040098547A1 (en) * 1998-12-31 2004-05-20 Yuval Ofek Apparatus and methods for transferring, backing up, and restoring data in a computer system
US6748447B1 (en) * 2000-04-07 2004-06-08 Network Appliance, Inc. Method and apparatus for scalable distribution of information in a distributed network
US6766326B1 (en) 2000-07-24 2004-07-20 Resty M Cena Universal storage for dynamic databases
US20040158766A1 (en) 2002-09-09 2004-08-12 John Liccione System and method for application monitoring and automatic disaster recovery for high-availability
US20040215656A1 (en) 2003-04-25 2004-10-28 Marcus Dill Automated data mining runs
US20040225834A1 (en) * 2002-09-16 2004-11-11 Jun Lu Combined stream auxiliary copy system and method
US20040268175A1 (en) 2003-06-11 2004-12-30 Eternal Systems, Inc. Transparent TCP connection failover
US20050010550A1 (en) 2003-05-27 2005-01-13 Potter Charles Mike System and method of modelling of a multi-dimensional data source in an entity-relationship model
US20050044197A1 (en) 2003-08-18 2005-02-24 Sun Microsystems.Inc. Structured methodology and design patterns for web services
US20050071392A1 (en) * 2003-08-05 2005-03-31 Miklos Sandorfi Emulated storage system
US20050108486A1 (en) * 2003-08-05 2005-05-19 Miklos Sandorfi Emulated storage system supporting instant volume restore
US20050193235A1 (en) * 2003-08-05 2005-09-01 Miklos Sandorfi Emulated storage system
US20050198359A1 (en) * 2000-04-07 2005-09-08 Basani Vijay R. Method and apparatus for election of group leaders in a distributed network
US20050228937A1 (en) 2003-11-26 2005-10-13 Veritas Operating Corporation System and method for emulating operating system metadata to provide cross-platform access to storage volumes
US20050246345A1 (en) 2004-04-30 2005-11-03 Lent Arthur F System and method for configuring a storage network utilizing a multi-protocol storage appliance
US20050246510A1 (en) * 2003-11-13 2005-11-03 Retnamma Manoj V System and method for combining data streams in pipelined storage operations in a storage network
US20050257062A1 (en) * 1998-03-11 2005-11-17 Paul Ignatius System and method for providing encryption in pipelined storage operations in a storage network
US20050268145A1 (en) 2004-05-13 2005-12-01 International Business Machines Corporation Methods, apparatus and computer programs for recovery from failures in a computing environment
US20050278391A1 (en) * 2004-05-27 2005-12-15 Spear Gail A Fast reverse restore
US20060064444A1 (en) * 2004-09-22 2006-03-23 Microsoft Corporation Method and system for synthetic backup and restore
US7100195B1 (en) 1999-07-30 2006-08-29 Accenture Llp Managing user information on an e-commerce system
US7103740B1 (en) * 2003-12-31 2006-09-05 Veritas Operating Corporation Backup mechanism for a multi-class file system
US20060218210A1 (en) 2005-03-25 2006-09-28 Joydeep Sarma Apparatus and method for data replication at an intermediate node
US20060248212A1 (en) * 2005-04-01 2006-11-02 Sherer W P Stream control failover utilizing the sharing of state information within a logical group of stream servers
US7177886B2 (en) 2003-02-07 2007-02-13 International Business Machines Corporation Apparatus and method for coordinating logical data replication with highly available data replication
US7197520B1 (en) * 2004-04-14 2007-03-27 Veritas Operating Corporation Two-tier backup mechanism
US7212726B2 (en) 2000-09-15 2007-05-01 International Business Machines Corporation System and method of processing MPEG streams for file index insertion
US20070124341A1 (en) 2003-02-10 2007-05-31 Lango Jason A System and method for restoring data on demand for instant volume restoration
US7266655B1 (en) * 2004-04-29 2007-09-04 Veritas Operating Corporation Synthesized backup set catalog
US7512835B2 (en) 2003-09-29 2009-03-31 International Business Machines Corporation Method, system and article of manufacture for recovery from a failure in a cascading PPRC system
US7523348B2 (en) 2004-09-09 2009-04-21 Microsoft Corporation Method and system for monitoring and managing archive operations
US7568124B2 (en) * 2006-06-02 2009-07-28 Microsoft Corporation Driving data backups with data source tagging
US7725438B1 (en) * 2005-01-31 2010-05-25 Veritas Operating Corporation Method and apparatus for efficiently creating backup files

Patent Citations (111)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4914586A (en) * 1987-11-06 1990-04-03 Xerox Corporation Garbage collector for hypermedia systems
US5287501A (en) * 1991-07-11 1994-02-15 Digital Equipment Corporation Multilevel transaction recovery in a database system which loss parent transaction undo operation upon commit of child transaction
US5574897A (en) * 1992-09-30 1996-11-12 International Business Machines Corporation System managed logging of objects to speed recovery processing
US5625820A (en) * 1992-09-30 1997-04-29 International Business Machines Corporation System managed logging of objects to speed recovery processing
US5418940A (en) * 1993-08-04 1995-05-23 International Business Machines Corporation Method and means for detecting partial page writes and avoiding initializing new pages on DASD in a transaction management system environment
US5721916A (en) * 1994-11-18 1998-02-24 Microsoft Corporation Method and system for shadowing file system structures from multiple types of networks
US5974563A (en) 1995-10-16 1999-10-26 Network Specialists, Inc. Real time backup system
US5689698A (en) 1995-10-20 1997-11-18 Ncr Corporation Method and apparatus for managing shared data using a data surrogate and obtaining cost parameters from a data dictionary by evaluating a parse tree object
US5706510A (en) * 1996-03-15 1998-01-06 Hewlett-Packard Company Zymbolic history management system
US6014674A (en) * 1996-11-14 2000-01-11 Sybase, Inc. Method for maintaining log compatibility in database systems
US6094654A (en) 1996-12-06 2000-07-25 International Business Machines Corporation Data management system for file and database management
US5966706A (en) * 1997-02-19 1999-10-12 At&T Corp Local logging in a distributed database management computer system
US5946698A (en) * 1997-03-10 1999-08-31 Microsoft Corporation Database computer system with application recovery
US6067550A (en) * 1997-03-10 2000-05-23 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US5933838A (en) * 1997-03-10 1999-08-03 Microsoft Corporation Database computer system with application recovery and recovery log sequence numbers to optimize recovery
US6151607A (en) * 1997-03-10 2000-11-21 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US6978279B1 (en) * 1997-03-10 2005-12-20 Microsoft Corporation Database computer system using logical logging to extend recovery
US5870763A (en) * 1997-03-10 1999-02-09 Microsoft Corporation Database computer system with application recovery and dependency handling read cache
US6490594B1 (en) * 1997-04-04 2002-12-03 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US5950212A (en) * 1997-04-11 1999-09-07 Oracle Corporation Method and system for workload based group committing for improved performance
US6148410A (en) 1997-09-15 2000-11-14 International Business Machines Corporation Fault tolerant recoverable TCP/IP connection router
US6240414B1 (en) 1997-09-28 2001-05-29 Eisolutions, Inc. Method of resolving data conflicts in a shared data environment
US6338146B1 (en) * 1997-09-30 2002-01-08 Compaq Computer Corporation Method and apparatus for fault-tolerant, scalable and non-blocking three-phase flushing for committing database transactions in a cluster of multiprocessors
US6249879B1 (en) 1997-11-11 2001-06-19 Compaq Computer Corp. Root filesystem failover in a single system image environment
US6006342A (en) * 1997-12-11 1999-12-21 International Business Machines Corporation Failover and failback system for a direct access storage device
US6185699B1 (en) * 1998-01-05 2001-02-06 International Business Machines Corporation Method and apparatus providing system availability during DBMS restart recovery
US6182086B1 (en) * 1998-03-02 2001-01-30 Microsoft Corporation Client-server computer system with application recovery of server applications and client applications
US20050257062A1 (en) * 1998-03-11 2005-11-17 Paul Ignatius System and method for providing encryption in pipelined storage operations in a storage network
US7277941B2 (en) * 1998-03-11 2007-10-02 Commvault Systems, Inc. System and method for providing encryption in a storage network by storing a secured encryption key with encrypted archive data in an archive storage device
US6226651B1 (en) * 1998-03-27 2001-05-01 International Business Machines Corporation Database disaster remote site recovery
US6260129B1 (en) * 1998-09-08 2001-07-10 International Business Machines Corportion Management of fixed pages in memory for input/output operations
US6253230B1 (en) 1998-09-22 2001-06-26 International Business Machines Corporation Distributed scalable device for selecting a server from a server cluster and a switched path to the selected server
US6397308B1 (en) * 1998-12-31 2002-05-28 Emc Corporation Apparatus and method for differential backup and restoration of data in a computer storage system
US20040098547A1 (en) * 1998-12-31 2004-05-20 Yuval Ofek Apparatus and methods for transferring, backing up, and restoring data in a computer system
US7107395B1 (en) * 1998-12-31 2006-09-12 Emc Corporation Apparatus and methods for operating a computer storage system
US6385706B1 (en) * 1998-12-31 2002-05-07 Emx Corporation Apparatus and methods for copying a logical object to a primary storage device using a map of storage locations
US6487561B1 (en) * 1998-12-31 2002-11-26 Emc Corporation Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size
US20030061456A1 (en) * 1998-12-31 2003-03-27 Yuval Ofek Apparatus and methods for copying, backing up and restoring logical objects in a computer storage system by transferring blocks out of order or in parallel
US6920537B2 (en) * 1998-12-31 2005-07-19 Emc Corporation Apparatus and methods for copying, backing up and restoring logical objects in a computer storage system by transferring blocks out of order or in parallel
US6523027B1 (en) 1999-07-30 2003-02-18 Accenture Llp Interfacing servers in a Java based e-commerce architecture
US7100195B1 (en) 1999-07-30 2006-08-29 Accenture Llp Managing user information on an e-commerce system
US6477580B1 (en) 1999-08-31 2002-11-05 Accenture Llp Self-described stream in a communication services patterns environment
US6256773B1 (en) * 1999-08-31 2001-07-03 Accenture Llp System, method and article of manufacture for configuration management in a development architecture framework
US20030058277A1 (en) 1999-08-31 2003-03-27 Bowman-Amuah Michel K. A view configurer in a presentation services patterns enviroment
US6434568B1 (en) 1999-08-31 2002-08-13 Accenture Llp Information services patterns in a netcentric environment
US6668253B1 (en) 1999-09-08 2003-12-23 Reynolds & Reynolds Holdings, Inc. Enterprise information management system and methods
US6675205B2 (en) 1999-10-14 2004-01-06 Arcessa, Inc. Peer-to-peer automated anonymous asynchronous file sharing
US6516337B1 (en) 1999-10-14 2003-02-04 Arcessa, Inc. Sending to a central indexing site meta data or signatures from objects on a computer network
US20050015466A1 (en) 1999-10-14 2005-01-20 Tripp Gary W. Peer-to-peer automated anonymous asynchronous file sharing
US6564215B1 (en) * 1999-12-16 2003-05-13 International Business Machines Corporation Update support in database content management
US20020029207A1 (en) 2000-02-28 2002-03-07 Hyperroll, Inc. Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein
US7346682B2 (en) * 2000-04-07 2008-03-18 Network Appliance, Inc. System for creating and distributing prioritized list of computer nodes selected as participants in a distribution job
US20050198359A1 (en) * 2000-04-07 2005-09-08 Basani Vijay R. Method and apparatus for election of group leaders in a distributed network
US7747741B2 (en) * 2000-04-07 2010-06-29 Net App, Inc. Method and apparatus for dynamic resource discovery and information distribution in a data network
US6718361B1 (en) * 2000-04-07 2004-04-06 Network Appliance Inc. Method and apparatus for reliable and scalable distribution of data files in distributed networks
US6993587B1 (en) * 2000-04-07 2006-01-31 Network Appliance Inc. Method and apparatus for election of group leaders in a distributed network
US6748447B1 (en) * 2000-04-07 2004-06-08 Network Appliance, Inc. Method and apparatus for scalable distribution of information in a distributed network
US7451221B2 (en) * 2000-04-07 2008-11-11 Network Appliance, Inc. Method and apparatus for election of group leaders in a distributed network
US20040215709A1 (en) * 2000-04-07 2004-10-28 Basani Vijay R. Method and apparatus for dynamic resource discovery and information distribution in a data network
US6578041B1 (en) * 2000-06-30 2003-06-10 Microsoft Corporation High speed on-line backup when using logical log operations
US6766326B1 (en) 2000-07-24 2004-07-20 Resty M Cena Universal storage for dynamic databases
US7212726B2 (en) 2000-09-15 2007-05-01 International Business Machines Corporation System and method of processing MPEG streams for file index insertion
US6694447B1 (en) * 2000-09-29 2004-02-17 Sun Microsystems, Inc. Apparatus and method for increasing application availability during a disaster fail-back
US20030159083A1 (en) 2000-09-29 2003-08-21 Fukuhara Keith T. System, method and apparatus for data processing and storage to provide continuous operations independent of device failure or disaster
US6587933B2 (en) * 2001-01-26 2003-07-01 International Business Machines Corporation Method, system, and program for discarding data in a storage system where updates to a primary storage device are shadowed in a secondary storage device
US20020129096A1 (en) 2001-02-14 2002-09-12 Mansour Peter M. Platform-independent distributed user interface client architecture
US20030009295A1 (en) 2001-03-14 2003-01-09 Victor Markowitz System and method for retrieving and using gene expression data from multiple sources
US20020163910A1 (en) 2001-05-01 2002-11-07 Wisner Steven P. System and method for providing access to resources using a fabric switch
US20030005350A1 (en) 2001-06-29 2003-01-02 Maarten Koning Failover management system
US20030120458A1 (en) 2001-11-02 2003-06-26 Rao R. Bharat Patient data mining
US20030135499A1 (en) 2002-01-14 2003-07-17 Schirmer Andrew Lewis System and method for mining a user's electronic mail messages to determine the user's affinities
US20030200272A1 (en) 2002-04-18 2003-10-23 Leon Campise System and method for data collection and update utilizing surrogate e-mail addresses using a server
US20030229900A1 (en) 2002-05-10 2003-12-11 Richard Reisman Method and apparatus for browsing using multiple coordinated device sets
US20040031058A1 (en) 2002-05-10 2004-02-12 Richard Reisman Method and apparatus for browsing using alternative linkbases
US20030233518A1 (en) 2002-06-12 2003-12-18 Hitachi, Ltd. Method and apparatus for managing replication volumes
US20040034808A1 (en) 2002-08-16 2004-02-19 International Business Machines Corporation Method, system, and program for providing a mirror copy of data
US7426652B2 (en) 2002-09-09 2008-09-16 Messageone, Inc. System and method for application monitoring and automatic disaster recovery for high-availability
US20040158766A1 (en) 2002-09-09 2004-08-12 John Liccione System and method for application monitoring and automatic disaster recovery for high-availability
US20040225834A1 (en) * 2002-09-16 2004-11-11 Jun Lu Combined stream auxiliary copy system and method
US7177886B2 (en) 2003-02-07 2007-02-13 International Business Machines Corporation Apparatus and method for coordinating logical data replication with highly available data replication
US20070124341A1 (en) 2003-02-10 2007-05-31 Lango Jason A System and method for restoring data on demand for instant volume restoration
US20040215656A1 (en) 2003-04-25 2004-10-28 Marcus Dill Automated data mining runs
US20050010550A1 (en) 2003-05-27 2005-01-13 Potter Charles Mike System and method of modelling of a multi-dimensional data source in an entity-relationship model
US20040268175A1 (en) 2003-06-11 2004-12-30 Eternal Systems, Inc. Transparent TCP connection failover
US20050193235A1 (en) * 2003-08-05 2005-09-01 Miklos Sandorfi Emulated storage system
US20050071392A1 (en) * 2003-08-05 2005-03-31 Miklos Sandorfi Emulated storage system
US7430647B2 (en) * 2003-08-05 2008-09-30 Sepaton, Inc. Emulated storage system
US20050108486A1 (en) * 2003-08-05 2005-05-19 Miklos Sandorfi Emulated storage system supporting instant volume restore
US7146476B2 (en) * 2003-08-05 2006-12-05 Sepaton, Inc. Emulated storage system
US20050044197A1 (en) 2003-08-18 2005-02-24 Sun Microsystems.Inc. Structured methodology and design patterns for web services
US7512835B2 (en) 2003-09-29 2009-03-31 International Business Machines Corporation Method, system and article of manufacture for recovery from a failure in a cascading PPRC system
US20080091894A1 (en) * 2003-11-13 2008-04-17 Commvault Systems, Inc. Systems and methods for combining data streams in a storage operation
US7315923B2 (en) * 2003-11-13 2008-01-01 Commvault Systems, Inc. System and method for combining data streams in pipelined storage operations in a storage network
US20050246510A1 (en) * 2003-11-13 2005-11-03 Retnamma Manoj V System and method for combining data streams in pipelined storage operations in a storage network
US20050228937A1 (en) 2003-11-26 2005-10-13 Veritas Operating Corporation System and method for emulating operating system metadata to provide cross-platform access to storage volumes
US7103740B1 (en) * 2003-12-31 2006-09-05 Veritas Operating Corporation Backup mechanism for a multi-class file system
US7197520B1 (en) * 2004-04-14 2007-03-27 Veritas Operating Corporation Two-tier backup mechanism
US7266655B1 (en) * 2004-04-29 2007-09-04 Veritas Operating Corporation Synthesized backup set catalog
US20050246345A1 (en) 2004-04-30 2005-11-03 Lent Arthur F System and method for configuring a storage network utilizing a multi-protocol storage appliance
US20050268145A1 (en) 2004-05-13 2005-12-01 International Business Machines Corporation Methods, apparatus and computer programs for recovery from failures in a computing environment
US20050278391A1 (en) * 2004-05-27 2005-12-15 Spear Gail A Fast reverse restore
US7461100B2 (en) * 2004-05-27 2008-12-02 International Business Machines Corporation Method for fast reverse restore
US7523348B2 (en) 2004-09-09 2009-04-21 Microsoft Corporation Method and system for monitoring and managing archive operations
US7756833B2 (en) * 2004-09-22 2010-07-13 Microsoft Corporation Method and system for synthetic backup and restore
US20060064444A1 (en) * 2004-09-22 2006-03-23 Microsoft Corporation Method and system for synthetic backup and restore
US7725438B1 (en) * 2005-01-31 2010-05-25 Veritas Operating Corporation Method and apparatus for efficiently creating backup files
US20060218210A1 (en) 2005-03-25 2006-09-28 Joydeep Sarma Apparatus and method for data replication at an intermediate node
US7721117B2 (en) * 2005-04-01 2010-05-18 Sherer W Paul Stream control failover utilizing an attribute-dependent protection mechanism
US20060248212A1 (en) * 2005-04-01 2006-11-02 Sherer W P Stream control failover utilizing the sharing of state information within a logical group of stream servers
US20060248213A1 (en) * 2005-04-01 2006-11-02 Sherer W P Stream control failover utilizing an attribute-dependent protection mechanism
US7568124B2 (en) * 2006-06-02 2009-07-28 Microsoft Corporation Driving data backups with data source tagging

Non-Patent Citations (32)

* Cited by examiner, † Cited by third party
Title
Adams K. Geographically Distributed System for Catastrophic Recovery. LISA 2002. Nov. 3-8, 2002, pp. 47-64.
Adams, K. 2002. Geographically Distributed System for Catastrophic Recovery. Proceedings of LISA '02: Sixteenth Systems Administration Conference (Nov. 2002), 47-64. *
Form EPA/2906, PCT/US06/002405, "Examination", 5 pgs.
Form EPO/1503, PCT/US06/002405, "Supplementary European Search Report", 3 pgs.
Form EPO/1507S, PCT/US06/002405, "EPO Communication" 1 pg.
Form EPO/P0459, PCT/US06/002405, "Annex to the European Search Report", 1 pg.
Form PCT/ISA/210, PCT/US06/02405, "PCT International Search Report," 2 pgs.
Form PCT/ISA/210, PCT/US06/30927, "PCT International Search Report," 2 pgs.
Form PCT/ISA/210, PCT/US06/30928, "PCT International Search Report," 3 pgs.
Form PCT/ISA/210, PCT/US06/38260, "PCT International Search Report," 2 pgs.
Form PCT/ISA/210, PCT/US06/38291, "PCT International Search Report," 2 pgs.
Form PCT/ISA/220, PCT/US06/02405, "PCT Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration," 1 pg.
Form PCT/ISA/220, PCT/US06/30927, "PCT Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration," 1 pg.
Form PCT/ISA/220, PCT/US06/30928, "PCT Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration," 1 pg.
Form PCT/ISA/220, PCT/US06/38260,"PCT Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration," 1 pg.
Form PCT/ISA/220, PCT/US06/38291, "PCT Notification of Transmittal of the international Search Report and the Written Opinion of the International Searching Authority, or the Declaration," 1 pg.
Form PCT/ISA/237, PCT/US06/02405, "PCT Written Opinion of the International Searching Authority," 4 pgs.
Form PCT/ISA/237, PCT/US06/30927, "PCT Written Opinion of the International Searching Authority," 7 pgs.
Form PCT/ISA/237, PCT/US06/30928, "PCT Written Opinion of the International Searching Authority," 6 pgs.
Form PCT/ISA/237, PCT/US06/38260, "PCT Written Opinion of the International Searching Authority," 3 pgs.
Form PCT/ISA/237, PCT/US06/38291, "PCT Written Opinion of the International Searching Authority," 3 pgs.
Form/IB/373, PCT/US06/038260, "International Report on Patentability," 1 pg.
Form/IB/373, PCT/US06/038291, "International Report on Patentability," 1 pg.
Gait, J,1991. Stability, Availability, and Response in Network File Service. IEEE Trans. Softw. Eng. 17, 2(Feb. 1991),133-140 DOI=http://dx.doi.org/10.1109/32.67594.
Gait, J. 1991. Stability, Availability, and Response in Network File Service. IEEE Trans. Softw. Eng. 17, 2 (Feb. 1991), 133-140. DOI= http://dx.doi.org/10.1109/32.67594. *
Kupsys, A.; Ekwall R., "Architectural Issues of JMS Compliant Group Communication," Network Computing and Applications, Fourth IEEE International Symposium on Volume, Issue, Jul. 27-29, 2005 pp. 139-148.
Mimosa Systems: Mimosa Architecture Internet, [Online] Oct. 29, 2004, pp. 1-5, XP002519438 Retrieved from the Internet: URL:http://web.archive.org/web/20041029172122/www.mimosasystems.com/arch.htm> [retrieved on Mar. 11, 2009] * p. 4, line 5- line 24 *.
Narasimhan P. et al. "Eternal- A component-based framework for transparent fault-tolerant CORBA". In Software-Practice and Experience. vol. 32, No. 8, pp. 771-788. Jul. 10, 2002.
Rosenblum, M and Ousterhout, J.K, 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10,1 (Feb. 1992), 26-52 DOI= http://doi.acm.org/10.1145/146941.146943.
Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26-52. DOI= http://doi.acm.org/10.1145/146941.146943. *
Spurzem B: "Advantages of Mimosa NearPoint for Email Archival" Internet, [Online] Jan. 2005, pp. 1-14, XP002519383 Retrieved from the Internet: URL:http://www.Flexnet.com/Exchange-Email-Archiving-and-Compliance/NearPoint%20E-mail%20Archive%20for%20Exchange.pdf> [retrieved on Mar. 11, 2009] * p. 3, line 1-17 * * p. 9, line 1-p. 11, line 8 * * figure 1 *.
Spurzem B: "Mimosa NearPoint for Microsoft Exchange Server" Internet [Online] Jan. 2005, pp. 1-18, XP002519249 Retrieved from the Internet: URL:http://www.Flexnet.com/Exhange-Email-Archiving-and-Compliance/NearPoint%20Architecture%20for%20Exchange.pdf> [retrieved on Mar. 11, 2009] * p. 5, line 18-line 20 * * p. 6, line 2-line 5 * * p. 6, line 16-line 26 * * p. 7, line 22-line 33 * * p. 8, line 4-line 15 * * p. 8, line 25-line 28 * * figure 2 *.

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11809432B2 (en) 2002-01-14 2023-11-07 Awemane Ltd. Knowledge gathering system based on user's affinity
US8812433B2 (en) 2005-02-07 2014-08-19 Mimosa Systems, Inc. Dynamic bulk-to-brick transformation of data
US8918366B2 (en) 2005-02-07 2014-12-23 Mimosa Systems, Inc. Synthetic full copies of data and dynamic bulk-to-brick transformation
US8799206B2 (en) 2005-02-07 2014-08-05 Mimosa Systems, Inc. Dynamic bulk-to-brick transformation of data
US9275086B2 (en) 2012-07-20 2016-03-01 Commvault Systems, Inc. Systems and methods for database archiving
US9659076B2 (en) 2012-07-20 2017-05-23 Commvault Systems, Inc. Systems and methods for database archiving
US9766987B2 (en) 2013-01-11 2017-09-19 Commvault Systems, Inc. Table level database restore in a data storage system
US9846620B2 (en) 2013-01-11 2017-12-19 Commvault Systems, Inc. Table level database restore in a data storage system
US10997038B2 (en) 2013-01-11 2021-05-04 Commvault Systems, Inc. Table level database restore in a data storage system
US11726887B2 (en) 2013-01-11 2023-08-15 Commvault Systems, Inc. Table level database restore in a data storage system
US9720787B2 (en) 2013-01-11 2017-08-01 Commvault Systems, Inc. Table level database restore in a data storage system
US11023334B2 (en) 2013-01-11 2021-06-01 Commvault Systems, Inc. Table level database restore in a data storage system
US11321281B2 (en) 2015-01-15 2022-05-03 Commvault Systems, Inc. Managing structured data in a data storage system
US11436096B2 (en) 2015-01-21 2022-09-06 Commvault Systems, Inc. Object-level database restore
US11042449B2 (en) 2015-01-21 2021-06-22 Commvault Systems, Inc. Database protection using block-level mapping
US10108687B2 (en) 2015-01-21 2018-10-23 Commvault Systems, Inc. Database protection using block-level mapping
US10891199B2 (en) 2015-01-21 2021-01-12 Commvault Systems, Inc. Object-level database restore
US10223212B2 (en) 2015-01-21 2019-03-05 Commvault Systems, Inc. Restoring archived object-level database data
US10223211B2 (en) 2015-01-21 2019-03-05 Commvault Systems, Inc. Object-level database restore
US11030058B2 (en) 2015-01-21 2021-06-08 Commvault Systems, Inc. Restoring archived object-level database data
US11755424B2 (en) 2015-01-21 2023-09-12 Commvault Systems, Inc. Restoring archived object-level database data
US11119865B2 (en) 2015-01-21 2021-09-14 Commvault Systems, Inc. Cross-application database restore
US11630739B2 (en) 2015-01-21 2023-04-18 Commvault Systems, Inc. Database protection using block-level mapping
US10210051B2 (en) 2015-01-21 2019-02-19 Commvault Systems, Inc. Cross-application database restore
US10191819B2 (en) 2015-01-21 2019-01-29 Commvault Systems, Inc. Database protection using block-level mapping
US11573859B2 (en) 2015-04-21 2023-02-07 Commvault Systems, Inc. Content-independent and database management system-independent synthetic full backup of a database based on snapshot technology
US10303550B2 (en) 2015-04-21 2019-05-28 Commvault Systems, Inc. Content-independent and database management system-independent synthetic full backup of a database based on snapshot technology
US9904598B2 (en) 2015-04-21 2018-02-27 Commvault Systems, Inc. Content-independent and database management system-independent synthetic full backup of a database based on snapshot technology
US10860426B2 (en) 2015-04-21 2020-12-08 Commvault Systems, Inc. Content-independent and database management system-independent synthetic full backup of a database based on snapshot technology
US11580173B2 (en) 2017-12-08 2023-02-14 Palantir Technologies Inc. Systems and methods for using linked documents
US11269732B2 (en) 2019-03-12 2022-03-08 Commvault Systems, Inc. Managing structured data in a data storage system
US11816001B2 (en) 2019-03-12 2023-11-14 Commvault Systems, Inc. Managing structured data in a data storage system
US11921796B2 (en) 2023-02-13 2024-03-05 Palantir Technologies Inc. Systems and methods for using linked documents

Also Published As

Publication number Publication date
US20070233756A1 (en) 2007-10-04

Similar Documents

Publication Publication Date Title
US8271436B2 (en) Retro-fitting synthetic full copies of data
EP2052337B1 (en) Retro-fitting synthetic full copies of data
US8918366B2 (en) Synthetic full copies of data and dynamic bulk-to-brick transformation
US8543542B2 (en) Synthetic full copies of data and dynamic bulk-to-brick transformation
US8799206B2 (en) Dynamic bulk-to-brick transformation of data
US8812433B2 (en) Dynamic bulk-to-brick transformation of data
US20070143366A1 (en) Retro-fitting synthetic full copies of data
US8060889B2 (en) Method and system for real-time event journaling to provide enterprise data services
US20030208511A1 (en) Database replication system
US8909604B1 (en) Methods for returning a corrupted database to a known, correct state by selectively using redo and undo operations
US7788521B1 (en) Method and system for virtual on-demand recovery for real-time, continuous data protection
US7680834B1 (en) Method and system for no downtime resychronization for real-time, continuous data protection
US7096392B2 (en) Method and system for automated, no downtime, real-time, continuous data protection
EP1602042B1 (en) Database data recovery system and method
US7519870B1 (en) Method and system for no downtime, initial data upload for real-time, continuous data protection
US11494271B2 (en) Dynamically updating database archive log dependency and backup copy recoverability
US20220121524A1 (en) Identifying database archive log dependency and backup copy recoverability

Legal Events

Date Code Title Description
AS Assignment

Owner name: MIMOSA SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:D'SOUZA, ROY P.;RAVI, T.M.;REEL/FRAME:018920/0417;SIGNING DATES FROM 20070207 TO 20070208

Owner name: MIMOSA SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:D'SOUZA, ROY P.;RAVI, T.M.;SIGNING DATES FROM 20070207 TO 20070208;REEL/FRAME:018920/0417

AS Assignment

Owner name: SILICON VALLEY BANK, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIMOSA SYSTEMS, INC.;REEL/FRAME:023210/0261

Effective date: 20090828

Owner name: SILICON VALLEY BANK (AGENT), CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIMOSA SYSTEMS, INC.;REEL/FRAME:023210/0340

Effective date: 20090828

Owner name: SILICON VALLEY BANK,CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIMOSA SYSTEMS, INC.;REEL/FRAME:023210/0261

Effective date: 20090828

Owner name: SILICON VALLEY BANK (AGENT),CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIMOSA SYSTEMS, INC.;REEL/FRAME:023210/0340

Effective date: 20090828

AS Assignment

Owner name: MIMOSA SYSTEMS, INC.,CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:024037/0188

Effective date: 20100219

Owner name: MIMOSA SYSTEMS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:024037/0188

Effective date: 20100219

AS Assignment

Owner name: MIMOSA SYSTEMS, INC., MASSACHUSETTS

Free format text: RELEASE;ASSIGNOR:SILICON VALLEY BANK (AGENT);REEL/FRAME:024933/0817

Effective date: 20100901

Owner name: MIMOSA SYSTEMS, INC., MASSACHUSETTS

Free format text: RELEASE;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:024933/0819

Effective date: 20100901

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: HEWLETT-PACKARD COMPANY, CALIFORNIA

Free format text: MERGER;ASSIGNOR:MIMOSA SYSTEMS, INC.;REEL/FRAME:036002/0899

Effective date: 20141101

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:036737/0587

Effective date: 20150929

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20200918