|N鷐ero de publicaci髇||US20070094312 A1|
|Tipo de publicaci髇||Solicitud|
|N鷐ero de solicitud||US 11/638,253|
|Fecha de publicaci髇||26 Abr 2007|
|Fecha de presentaci髇||13 Dic 2006|
|Fecha de prioridad||7 May 2004|
|Tambi閚 publicado como||EP1745362A2, EP1745362A4, US8108429, US20050262097, WO2005111788A2, WO2005111788A3|
|N鷐ero de publicaci髇||11638253, 638253, US 2007/0094312 A1, US 2007/094312 A1, US 20070094312 A1, US 20070094312A1, US 2007094312 A1, US 2007094312A1, US-A1-20070094312, US-A1-2007094312, US2007/0094312A1, US2007/094312A1, US20070094312 A1, US20070094312A1, US2007094312 A1, US2007094312A1|
|Cesionario original||Asempra Technologies, Inc.|
|Exportar cita||BiBTeX, EndNote, RefMan|
|Citada por (55), Clasificaciones (9), Eventos legales (1)|
|Enlaces externos: USPTO, Cesi髇 de USPTO, Espacenet|
This application is a continuation-in-part of U.S. Ser. No. 11/123,994, filed May 6, 2005, which application was based on and claimed priority to U.S. Ser. No. 60/569,164, filed May 7, 2004.
This application is related to the following commonly-owned applications:
Ser. No. 10/841,398, filed May 7, 2004, titled “Method and system for automated, no downtime, real-time, continuous data protection,”
Ser. No. 10/842,286, filed May 10, 2004, titled “Method and system for real-time event journaling to provide enterprise data services,”
Ser. No. 10/863,117, filed Jun. 8, 2004, titled “Method and system for no downtime, real-time, continuous data protection,”
Ser. No. 10/862,971, filed Jun. 8, 2004, titled “Method and system for no downtime, resynchronization for real-time, continuous data protection,”
Ser. No. 11/185,313, filed Jul. 20, 2005, titled “Method and system for virtual on-demand recovery for real-time, continuous data protection,” and
Ser. No. 10/943,541, filed Sep. 17, 2004, titled “Method and system for data protection.”
1. Technical Field
The present invention relates generally to enterprise data management.
2. Background of the Related Art
A critical information technology (IT) problem is how to cost-effectively deliver network wide data protection and rapid data recovery. In 2002, for example, companies spent an estimated $50B worldwide managing data backup/restore and an estimated $30 B in system downtime costs. The “code red” virus alone cost an estimated $2.8 B in downtime, data loss, and recovery. The reason for these staggering costs is simple—traditional schedule based tape and in-storage data protection and recovery approaches can no longer keep pace with rapid data growth, geographically distributed operations, and the real time requirements of 2465 enterprise data centers.
Although many enterprises have embarked on availability and recovery improvement programs, many of these programs have been focused on the redundancy of the infrastructure, not on the data itself. Yet, without data availability, applications cannot be available.
Today's legacy data protection and recovery solutions are highly fragmented across a wide variety of applications, systems, and storage models. The overhead and data management maze that existing approaches bring to the network, storage, tape, and application infrastructure has caused increasing expenditures with little tangible returns for the enterprise. Worse, manual recovery techniques compound the problem with the same issues that cause downtime in the first place—human errors and process issues constitute 80% of unplanned downtime.
As a result, businesses are enduring high costs, high risk, and a constant drag on productivity. A recent survey by Aberdeen highlights IT managers' top data storage problems: managing backup and restore (78%), deploying disaster recovery (80%), and delivering required service levels (60%).
One recently-introduced technique for addressing the complex problem of providing heterogeneous, enterprise-wide data management is illustrated in
As described in co-pending application Ser. No. 10/841,398, the DMS system associates a “host driver” 128 with one or more of the application(s) running in the application servers 116 to transparently and efficiently capture the real-time, continuous history of all (or substantially all) transactions and changes to data associated with such application(s) across the enterprise network. This facilitates real-time, so-called “application aware” protection, with substantially no data loss, to provide continuous data protection and other data services including, without limitation, data distribution, data replication, data copy, data access, and the like. In operation, a given host driver 128 intercepts data events between an application and its primary data storage, and it may also receive data and application events directly from the application and database. The host driver 128 may be embedded in the host application server 116 where the application resides; alternatively, the host driver is embedded in the network on the application data path. By intercepting data through the application, fine grain (but opaque) data is captured to facilitate the data service(s). To this end, and as also illustrated in
As described in co-pending application Ser. No. 11/123,994, each DMS node executes an object runtime environment. This object runtime environment includes an object manager that manages the lifecycle of all the DMS objects during runtime. The object manager creates DMS objects, and the object manager saves them in the shared storage. The objects continually undergo modification as the system protects data in the enterprise's primary storage. In an illustrative embodiment, the system automatically creates a trail of objects called versions; typically, the versions do not actually exist on primary storage, outside of the data management system. The DMS manages the creation, storage, display, recovery to primary storage, deletion (automatic via policy, or manual) and the like, of these versions. The host drivers protect data into the continuous object data store. Using this architecture, data in primary storage can be recovered to any point-in-time.
A data management method is provided for storing a real-time history of a file system, or a component thereof, such as a directory or a file. The real-time history is stored as an object-oriented logical representation comprising at least a set of version metadata objects, and a set of one or more links that associate given objects of the set of version metadata objects. As one or more events occur in the real-time history, the logical representation is restructured dynamically. The logical representation is useful to provide any point-in-time reconstruction of the file system component on an as-needed basis.
The object-oriented logical representation is advantageous as it provides an efficient way to index a real-time history of changing data in a file system. This representation preferably is constructed over a database, such as a relational database, a raw file system, or any other set of one or more storage devices.
The foregoing has outlined some of the more pertinent features of the invention. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed invention in a different manner or by modifying the invention as will be described.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
As described in U.S. Ser. Nos. 10/841,398 and 11/123,994, the DMS provides real time data services, such as continuous data protection, data replication, data distribution, any-point-in-time recovery, and any-point-in-time snapshot. To support these services, preferably the DMS host driver resides in an application host or the network, monitoring and capturing application events and data changes in real time, and then processing and forwarding actual data changes, events, and metadata to a DMS node. The host driver preferably performs delta reduction (e.g., to extract byte level changes), identifies metadata changes such as access control, detects application checkpoint events, and then forwards this information as a stream to a DMS node in a DMS cluster. A DMS cluster is a group of DMS nodes that share a storage module. These nodes work as a cooperative unit. Preferably, they obey a set of access rules such as acquiring lock of different classes, and they honor the access locks of the others so as to perform parallel access to the storage module. These nodes also watch for the health of one another and when one node fails, the other nodes preferably repair any partially modified or corrupted data that may be caused by the failure, and take over the tasks of the failed node.
The DMS nodes are the entities that provides real-time data services. When providing continuous data protection and data distribution as subscriber, the nodes take incoming data streams, preferably translate the streams into an object-oriented data structure (or another structure that provides similar data associations and indices), and save the data in a storage module that is referred to herein as an object store. The object store is designed with the purpose of managing real-time continuous history. When providing data replication, data recovery, and generating a snapshot, the DMS node navigates its object store, reconstructs a desired point-in-time data object, and forms outbound data streams that are then delivered to target nodes or host machines. To provide continuous replication, once replicating a point-in-time data object, the DMS node also forwards, to a remote DMS or a remote host server, a continuous redo log of the objects (in the form of a real-time event journal). A goal of the DMS is to store fine grain and real-time data history. Thus, the DMS object store is designed to track fine grain data changes without using excessive storage. The DMS preferably also indexes by time and events all fine grain objects, application checkpoints, and metadata globally across DMS clusters. The events may include any object events such as email arrival, transaction activity, a file open, a file modification, a file close, a directory change, an application checkpoint, a system upgrade, a virus detection (such as from an external network service), a business event tag, or the like.
The DMS nodes create distributed object storage to provide the necessary real-time data management services. The objects created by the DMS nodes are sometimes referred to herein as active objects. The active objects at any moment in time may be dormant in the storage or instantiated by the DMS nodes to handle requests and to perform activities. The details of active objects are discussed in the following sections.
The distributed object store can be built above raw storage devices, a traditional file system, a special purpose file system, a clustered file system, a database, and so on. Preferably, DMS chooses to build the distributed object store over a special purpose file system for storage and access efficiency. The files in the special purpose file system and the active objects in the DMS preferably are all addressed by a (e.g., 128 bit) global unique identifier (GUID). During runtime, a GUID can be de-referenced to a physical address in a storage device. By doing so, this allows the object store to scale beyond a single storage device, such that an object (1) in a device (A) can refer to another object (2) in device (B), e.g., by referring to the GUID of object (2).
Preferably, each DMS node executes an object runtime environment. This object runtime environment includes an object manager that manages the lifecycle of all the DMS objects during runtime. The object manager creates DMS objects, namely the active objects, and the object manager saves them in the shared storage. When requested, the object manager loads an existing active object from the storage, and then routes object requests directly to the instantiated active object. Once an active object is created or loaded (instantiated) into the memory, it is responsible for executing requests routed from the object manager. The object manager may perform authentication and/or authorization before allowing any access to an active object. An active object, upon request, may update its internal information, execute an object specific program, and terminate itself from the runtime environment. Both the object manager and the active objects are responsible for acquiring shared lock as necessary so that all the nodes can have parallel access to the same objects. The object manager is also responsible for permanently removing active objects from the shared storage when requested. In a non object-oriented embodiment, an object manager may not be required. Also, while the use of an object manner is preferred, it is also possible to implement the present invention by storing the object information in a different manner while still achieving similar results, and the processes and object functions that change the information may be modified accordingly.
Preferably, an instance of an active object has a set of properties, with each property having a label and value pair. For example, an active object may have one property labeled as “name” with an associated value being “The design of a PC,” and another property labeled “content” which associated value is a binary blob. A property has a value type definition, for example, the value of the “name” property is a string, and the value of the “content” property is an opaque binary chunk of data.
For example, when DMS protects a file from a server, the DMS active object for the file may have a list of representative properties such as shown below:
ObjectClass: <a 128 bit file object class identifier>
ObjGUID: <a 128 bit unique identifier for this object>
Creator: <a user identifier>
ExtemalCreationDateTime: <a time stamp>
DMSCreationDateTime: <a time stamp>
Name: <a string>
ParentObject: <a GUID of a directory object>
ACL: <an object GUID>
Version: <integer or time stamp>
ExternalModifiedDateTime:<a time stamp>
DMSModifiedDateTime:<a time stamp>
DMSTerminationDateTime: <a time stamp>
ModifiedBy: <a user identifier>
Company: <a string>
Department: <a string>
Title: <a string>
Subject: <a string>
Keywords: <a string>
Comments: <a string>
Content: <a random binary blob>
In the context of a traditional file system, preferably all properties beside the “content” property are classified as metadata whereas, in the DMS, preferably all properties including the “content” itself are managed as metadata. Preferably, the DMS active objects store metadata from the protected server as well as metadata generated by the DMS itself. In DMS active object point of view, all the properties are metadata, including the binary content from the external world, while binary content is just a specific property type (random access binary blob type).
A property on an active object preferably also has specific attributes such as -modifiable, modifiable-internal, read-able, versionable, single-value vs multi-value, inheritable, index, mandatory, replicate-able, and the like. Some object properties, such as ObjectClass, ObjGUID, Creator, ExternalCreationDateTime, and DMSCreationDateTime do not change once the object is created, while the other properties can be modified. There are also properties, such as Version, DMSModifiedDateTime, and DMSTerminationDateTime, that are not modifiable by any external entity besides the Object Manager and the object itself.
The following is a table of possible property types:
Property Type Description Integer a number String a Text string GUID Preferably a 128 bit global unique ID across all DMS nodes. This property store the GUID of another object that allows objects to be linked. Constant a set of related unique numbers that represent some specific information Random access binary blob Binary data for random access Sequential access binary blob Sequential records Boolean True/false ComplexType Combination of one or more of the above
The following is a table of possible attributes for each property:
Property attributes Default Description Modifiable True Property once created, can be modified by an internal or external request Modifiable- True Property once created, can be only be internal modified by the DMS internally Read-able True Property can be accessed by external request Version-able True Property can be versioned. For example, modification-date-time is a version-able property. Multi-value False If Multi-value is True, the property has many values. For example, the children property of a directory object is a multi-value property. If Multi-value is False, the property is single value property. Single value is the default of all property. Inheritable False If True, when object receives a request for the property, and if the property is not set, the object can request for the value from its parent in the object hierarchy. For example, a file object may request for a value from its directory object. By default, all properties are not inheritable. Index False If True, the DMS automatically generates a specific index for the property to accelerate search. Indeed, the entire object structure may be considered an indexing structure; however, by having a specific property index, the system allows for a focused and efficient search on that property. Once indexed, the object can be searched using the index of the property. The indexed properties are name, fingerprint, and the like. By default, a property is specifically indexed. If a property is not indexed, preferably it is still searchable by an algorithm iterating through all the objects. Replicate-able True If Replicate-able is True, then when the associated property is replicated when the object is replicated.
To track real-time changes, some object properties are defined as version-able. In the DMS, an object data structure for tracking data history is as shown in
In DMS, preferably all the anchor and version metadata pages are combined together into a variable sized file. If desired, each one of the pages can be stored in a separate file, or in raw storage blocks. When stored in files, each file is also named by GUID. There are page GUID to file GUID mappings, and file GUID to physical address mappings so that the physical data of an object can be retrieved. An object can be reference by the GUID of its anchor page, or the GUID of its version metadata page. When referred by the GUID of its version metadata page, a point-in-time object is presented.
According to another aspect of the inventive DMS, an active object has a basic set of behaviors and some specific set of behaviors that are schema dependent and may be created specifically for the class definition. The following are examples of interfaces that initiate specific object activities. In particular, a basic set of behaviors that may be initiated by the interface for life cycle management, getting and setting attributes:
The above object interfaces are a representative subset of the actual basic object behaviors of the DMS. There are merely illustrative of the functional behavior of the active objects. If desired, an object class may define its own set of specific behaviors.
In DMS, and as will be described in more detail below, preferably there are many data source active object classes, for example, a directory object, a file object, database object, and the like.
Active object binary data management is designed for managing history of random access binary blob property type. As shown in
For the active objects to manage history of sequential access binary blob such as database journal activities, a binary page of sequentially appended records structure can be used in the DMS. Records management is designed for managing property type of sequential access binary blob. There are three different types of record management namely—formatted records, unformatted records, and file object associated records. Formatted records are a sequence of well defined records, each record is of specific structure of fields, and each field has well defined data type and length. A record schema (record structure definition) is defined for formatted record property type. This type of record can be used to track SQL requests of a database history, or email activities of an email server history. A formatted record can also be used to track real-time events associated with any data object. Unformatted records are a sequence of binary record chunks, in this case, the record chunks may be appended continuously to a binary data page with or without a header that specifies the length of the record. Alternatively, records can be appended to a binary data page without a header, in which case, the boundary of each record chunk is tracked separately. The boundary of unformatted records can be tracked using formatted record management structure. This type of record can be used to track sequential binary journal of a database or sequential journal file of any application. The characteristic of this type of binary journal is that records are always appended to the end and never override previous records. File object associated records are sequences of meta-information containing binary data updates to an associated file object. A file object associated record is used to track the location and length of each modification to a file object. Besides tracking the file modifications, a file object associated record can also be used to track checkpoints that occur with respect to the file.
Once an object schema is created, an active object instance can be created from the schema. The active object instance has the defined metadata and behavior. As in any object-oriented system, an object schema may be defined based on another object schema so that metadata and behaviors can be inherited, or so that coded functions can be reused. In DMS, a generic object is clsObject, which defines basic metadata such as name, creation date and time, creator, modification data and time, and so on. It also defines the basic object behavior. Preferably, other object schemas are defined based on clsObject (i.e., they inherit from clsObject). The object inheritance feature is an advantage of the object-oriented embodiment, however, it is not a limitation of the invention.
Thus, to provide real-time data management services, DMS preferably defines a set of data management specific object schemas as shown in
The schema clsDMSSystem is a class for creating a DMS cloud active object 520 that represents the logical network of the entire DMS system (with multiple DMS clusters over multiple regions). Preferably, there is only one instance of clsDMSSystem in a DMS network, as it is the root object instance of the entire DMS network. Preferably, this object is used for tracking DMS regions 512 (each as an instance of a clsRegion schema as described below) and DMS functional groups that own data across regions 516 (each as an instance of a clsGroup schema as described below). The instance typically has a randomly assigned unique identifier. The instance preferably is created automatically by the DMS network when a first cluster is configured, i.e. it is created by a first node. This object instance is populated to all the storage clusters in the entire DMS network. Preferably, there is only one master copy of this object, which is the original copy, the one that was created first. When the properties of the instance change, the properties are populated to all replicas.
The schema clsRegion is a class for creating DMS region active objects 512 that represents and tracks a DMS cluster network, data network, and server network. Preferably, there is one instance of clsRegion in each physical location. An active object instance of clsRegion is used for tracking all the DMS clusters 530 (each as an instance of a clsCluster schema as described below), repositories 514 (each as an instance of a clsRepository schema as described below), and host servers 528 (each as an instance of a clsHost schema as described below) in the region. Because each region may have multiple storage clusters, the local instance of the clsRegion object is replicated to all the local storage clusters. The GUID of each instance of clsRegion are randomly assigned when created. Preferably, policies are encoded as properties in clsDMSSystem, clsRegion, clsRepository, clsGroup, and clsXXDataSource.
The schema clsRepository is a class for creating a DMS data container 514 for storing protected data sources. A repository instance may have sub-repository instances 514 and/or protected data sources 518. A root repository object that is directly under a region represents a segment of a data network. A repository may be a child of a region or a child of another repository. The child of a region is the root of a DMS data object hierarchy. The repository object provides regional data grouping and policy enforcement. The policies in a repository are executed against all the data sources within the scope of the repository. Alternatively, a separate policy object may be defined and used for storing policies explicitly in the data hierarchy. If policy object instances are used, they can be attached to any one of the data container object instances.
The schema clsXXDataSource is a class for creating data source instances 518. Preferably there are three basic data source schemas, clsFSDataSource, clsDatabaseDataSource, clsCompoundDataSource. If desired, there may be more schemas for application data other than the file system and one or more databases. An active object instance of a clsXXDataSource is a root container for a protected data source where a data source from a host is streamed. An instance of clsFSDataSource contains a file, a directory, or a volume of a file system and its history, while an instance of a clsDatabaseDataSource contains one or more databases and their history from a database server. An instance of a clsCompoundDataSource is a container for multiple data source instances. Unlike a repository that only provides logical containership, a compound data source instance preferably provides activity sequencing and indexing, as well as consistency marking to the real-time activities of its related group of data sources so that group consistency can be maintained.
The class clsFile is a schema for creating object instances for the DMS to store the information of a file 520 from a host server and also to track the history of that file in the host. An instance of a clsFile is similar to a file in a file system, except that an instance captures, indexes and manages file history. In DMS, this object is used for data protection, with each instance of clsFile used to represent an external file in an external host.
The class clsDirectory is a schema for creating object instances for the DMS to store the information of a directory 522 from a host server and also to track the history of that directory in the host. An instance of a directory simply represents a container of files and other sub-directories.
The class clsDatabase is a schema for creating object instances for the DMS to store the information of a database 524 within a database server, and also for tracking and indexing the history and checkpoints of that database in the server. This object is used to provide database protection services. An instance of a clsDatabase represents a continuous consistent image of a database, across a range of time, from an external host server.
The class clsJournalGroup is a schema for creating object instances 526 for the DMS to journal the redo and undo log journal) activities of a database. The database journal activities may be a sequence of updates to a group of related journal log files, or application level transaction records.
The class clsRecordFile is a schema for creating object instances 527 for the DMS to track sequential journal entries within a journal group.
An active object instance of the clsHost is created whenever a host driver from a new host server first connects to the DMS network. This object allows the DMS to track the data services provided to the information on the host 528. This object also associates the protected data sources in the DMS to the data source on its host server. An instance of clsHost preferably contains information such as the platform of the host, the operating system, the host configuration, data sources that are protected from the host, DMS data sources that are replicated to the host, and the like. The protected or replicated data source properties preferably include the host path, the size of the sources in the host, the activities and statistical information about those data sources, and the GUID of the clsXXDataSource instance.
An active object instance of the clsDMSCluster schema represents a DMS cluster 530 with one or more DMS nodes and the DMS storage. This instance provides statistics and status information of its specific cluster. Typically, there is only instance per storage cluster, thus the processes (e.g., the object runtime environment) of all the nodes use this instance as shared memory to keep information such as node availability, master election, and the like. Information about a DMS cluster (as instances of a clsDMSCluster), a DMS node (as instances of clsDMSNode), and DMS storage (as instances of clsDMSStorage) may be stored together with the other active objects or may be in a specific volume used exclusively by the cluster manager.
An active object instance of the clsDMSNode schema represents a DMS node 532 within a DMS cluster. This instance provides statistics and status information about the DMS node it represents. Preferably, the object runtime environment of a node is responsible for locating a cluster and joining that cluster. Once joined in a cluster, the runtime environment creates the clsDMSNode instance.
An active object instance of the clsDMSStorage schema represents the storage volumes 534 of a DMS cluster. This instance allows the DMS storage to be configured, and it also provides statistics and status information of the storage volumes.
An active object instance of the clsGroup schema is a data container that also represents a logical group 516 in an organization. This allows user to map data sources from one or multiple repositories in one or more regions to a functional group of an organization. Its purpose is to enable an administrator or other permitted entity to assign data management policy across multiple regions.
As described above, in the
Similar to an instance of a clsPolicyProfile object, an active object instance of a clsPolicyOverride can be introduced that may also contain a subset of data management policies. When assigned to a data container or data object, the policies in the override object takes precedent over the default policy on an assigned policy profile objects.
The DMS object definition discussed above is merely one way of organizing the control, data and functional structure for the DMS to provide real-time data management services. One could easily reorganize the structure to achieve that goal, and the present invention is not limited to any specific organization. Thus, for example, clsRegion may be broken down in to multiple hierarchies to represent local lines of business and departments, clsDMSCluster may include and nodes and storage so as to eliminate clsDMSNode and clsDMSStorage definitions, clsJournalGroup may be part of clsDatabase definition, and so on. Also, as described above, the effect of this object-oriented hierarchy can be realized in different non object-oriented structures, such as a relational database. All of these variants are within the scope of the present invention.
Whenever a DMS host driver is installed into a host server, the host driver reports to the DMS, and this operation results in an instance of host object 528 being created in the DMS. As noted above, preferably a host object 528 contains information such as the host OS and platform about the host server. Once a host object is created, IT administrators can identify host data to be protected, and then configure for the host data to be protected. An IT administrator can also configure for DMS protected data to be replicated to a host server. As noted above, the host active object refers to the data source(s) that are protected from its host server or data sources that are replicating to its host (as illustrated by the link between 518 and 528). The host objects in the DMS form an external host server network 506.
A region may have one or more DMS clusters, with all DMS clusters preferably tracked by the DMS via DMS cluster active objects 530. Each cluster has a representation active object that refers to the node active objects 532 and storage active objects 534. The cluster object also contains cluster statistic and status information. A node object 532 contains configuration information, statistics and status about the node. The storage object contains storage volume information, and storage group information. Volume information includes all the raw storage volumes that are provisioned to the DMS. It also includes the DMS partitioning of the raw storage volumes, and the assignment of the partitions to storage groups. In the DMS, a protected data source has its own storage space that is called a storage group. A storage group is an aggregated DMS storage partitions carved out from the volume groups. The cluster, storage, and node objects form a DMS server network 504.
File System Data History
The following section describes representative object schemas defined for protecting a file system according to the present invention. Preferably, and with reference to
In the context of a file system at a host server, typically all properties except the content are usually known as metadata. As described above, the DMS active objects store metadata from the protected host server as well as metadata generated by the DMS itself. In the DMS, all the properties are metadata including the binary content from the external world. For a clsFile object, binary content is a “content” property with random access binary blob type.
This is a schema of a file system data source; as noted above, this object preferably serves as a container for the history of a protected file system or folder in a host. It is also a data service entity for managing inbound and outbound traffic for the file system, the policy management entity for its data, and a security guard for any access to the protected data history.
The properties of this object class typically include the configuration of the protected data source. The following table illustrates representative property examples of this object class:
Properties of clsFSDataSource Descriptions ID GUID Name Name of the data source Parent GUID of its parent container (a repository object) DateTimeCreated Timestamp when this data source container is created Owner The user ID of the creator ACL Key or GUID to the access control list of this object DataSourceType File system RuntimeStates Protecting, replicating, disconnected from host, and the like Status Active, archived Master GUID of the original protected data source (if this is a replica) Replicas GUID of the replicas that need input from this object Host GUID of the associated host object where the data source resides HostPath The path name in the host that is protected by this object ProtectedDateTime Timestamp when protection begun ArchivedDateTime Timestamp when this data source became idle Children The root directories or files in this container EventTags List of entries with event and timestamp. The events are in data source level, may be set by users.
As can be seen, the above table is a subset of properties used in the DMS; one may add more or remove some of the above properties. For example, policy may be added, and one may not need RuntimeStates. In one embodiment, the properties of this object are not versioned so that the history of the object is not tracked. Alternatively, one can version some of the properties, such as HostPath, ProtectedDateTime, ArchivedDateTime, and Children, such that these configuration changes are recorded in time. When properties are versioned, it is desirable to track the version begin and end timestamp.
The ClsDirectory schema is defined for tracking the history of a directory (folder) in a file system. This schema is used for protecting a directory, and it is capable of recovering a directory to any point-in-time in the past.
The properties of this object class preferably include the following:
Properties of clsDirectory Descriptions Non-versioned Properties: ID GUID of this object instance, also GUID of this anchor page DataSourceParent GUID of its protection data source container DateTimeCreated Timestamp when this object is created (from the protected host) Creator The user ID of the creator DateTimeTerminated Timestamp when this object is deleted (from the protected host) EventTags List of entries with event and timestamp. The event tags may be set by users for tracking purposes. FirstVersionID GUID of the first version LatestVersionID GUID of the latest version VersionCount Total number of versions Versioned Properties: ID The version GUID of this object, also GUID of this page AnchorID The GUID of the anchor page (i.e., the GUID of this object) PreviousVersionID The version GUID of the previous version NextVersionID The version GUID of the next version Parent GUID of parent object which can either be the data source container or another directory Name Name of the directory DateTimeModified Timestamp when the version is created (or when the modification occurs) ModifiedBy ID of a user who modified the directory DateTimeEnded Timestamp when this version is ended (i.e., a new modification results in another version being created and old version ended) ACL Key or GUID to the access control list of this object ChildrenCount Number of children Children A list of its children which can be sub-directories or files. Each entry is a reference to the version ID of a child.
These properties are merely illustrative of what a directory active object contains. One may include more properties, such as a full path name, children name, policies, metadata from the host server, and more.
As can be seen, the above table shows the logical information of a directory object. Whenever a new child is added, an existing child is deleted, the directory is moved, the directory is renamed, or the name of its child has changed (if child name is also captured in this object), a new logical version of the directory is created. Whenever a new version is created, preferably a previous version is terminated. DateTimeModified of a new version must be after or same as the DateTimeEnd of the previous version. ModifiedBy indicates the user who made the modification to create the new version. If desired, an Event tag may be added; preferably, it is a sequence list of entries each having a timestamp.
This logical layout may map directly into a physical store (e.g., overlaying a file system or a database), but this is not required. With respect to persistent storage, preferably DMS stores multiple logical versions of a directory in one physical store unit. Or, DMS may also combine multiple versions into one directory journal. For example, one physical storage unit of a directory object may contain the initial directory baseline information and all the changes to a directory within a period of time. A directory journal entry may be “version ID, DateTimeModified, add new child, child GUID, . . . ” and so on. Once that is done, a logical version of a directory can be constructed on demand (i.e., upon request) by applying the necessary activities to a baseline directory image. The present invention is not limited to any physical layout of the object in the physical store.
In this example, in the first version 612, the directory object only has the version 1 of a subdirectory object, namely, DOO 617. At some point in time, the subdirectory object changed its name to WOO; thus, a version 2 of the subdirectory object was created and is represented by reference number 618. In this example, the directory active object tracks its children's name; thus, version 2 of FOO 613 was created to link to the version 2 of the subdirectory WOO 618. On version 3 of FOO 614, a new file of the name GOO under the directory FOO was created. This is illustrated by reference numeral 619. The links, shown as dotted arrows, are references of the CHILDREN property of the directory object. As can be seen, these links allow active objects to build up object relationships or object hierarchy (e.g., a parent-child relationship). The result is a logical view of the directory data structure. To conserve data storage usage, the directory version pages may be combined into a table or a journal, and the table or journal may be stored in a physical storage unit. For simplicity, it should be appreciated that the above diagram does not show the complete subdirectory and file objects (for example, the anchor pages of the file and subdirectory objects are not shown).
The ClsFile schema is defined for tracking the history of a file from a file system. This schema is used for protecting a file, and it allows for the recovery of the protected file to any point-in-time in the past.
The properties of this object class preferably include the following:
Properties of clsFile Descriptions Non-versioned Properties: ID GUID of this object instance, also the GUID of this anchor page DataSourceParent GUID of its protection data source container DateTimeCreated Timestamp when this object is created (from the protected host) Creator The user ID of the creator DateTimeTerminated Timestamp when this object is deleted (from the protected host) AccessLog List of entries with timestamp, user id, and access mode. EventTags List of entries with event and timestamp. The event tags may be set by users for tracking purposes. FirstVersionID GUID of the first version LatestVersionID GUID of the latest version VersionCount Total number of versions Versioned Properties: ID The version GUID of this object also GUID of this page AnchorID The GUID of the anchor page (the ID of this object) PreviousVersionID The version GUID of the previous version NextVersionID The version GUID of the next version Parent GUID of parent object which can either be the data source container or a directory Name Name of this file at this version DateTimeModified Timestamp when the version is created (or when the modification occurs) ModifiedBy ID of a user who modified the file DateTimeEnded Timestamp when this version is ended (i.e., a new modification results in another version being created and an old version ended) Status Consistency, DMS checkpoint, suspect ACL Key or GUID to the access control list of this object Fingerprint A hash key of the entire content (e.g. MD5, SHA-1, CRC, or the like) Signatures A sequence of hash keys each generated from a contiguous chunk of the content Content The sparse index of this version. Sparse index is byte level reference to the binary content. Binary contents are in baseline binary pages and delta pages. Additional metadata and Information from the original document attributes
The above table is merely illustrative; other properties, such as a full path name, policies, and the like, may be included. As can be seen, the non-versioned properties preferably include a timestamp when the object is created, creator information, identification of access journal for forensic purposes, and event tags for tracking user events across time line. The versioned properties preferably include name, modification information, status, ACL, and content. Preferably, a new logical version is created when the name of the document changes, the content of the document changes, the ACL changes, document metadata or attributes change, or when the document is moved. When the document is deleted from the protected data source, the file object at the DMS is terminated with (DateTimeTerminated timestamped), and the last version ended.
In the DMS file object, preferably each version has an associated property called “CONTENT,” and this property is of the type random access binary blob. The binary value of this property type may be stored inside or outside of the metadata page. In this example, the binary data of version 1 is in the binary page 716, which has its own GUID. The changes (deltas) that are made to the file for version 2 are stored as a sequence of forward deltas in the delta page 717. The changes (deltas) of version 3 are appended to the same delta page 717 or another new delta page. A file object may have one or multiple binary pages. The binary pages contain the baseline data. A file object also may have one or multiple delta pages for all its changes. The sparse index refers to the data in both the baseline and the deltas to make up the content for the version. The binary and delta pages may be stored in one physical storage unit; alternatively, the pages may be broken up and stored in multiple physical storage units. As one of ordinary skill will appreciate, the above-described example is simply one way in which DMS structures the binary data. Alternatively, each version may have its own binary pages so that no delta has to be kept. Yet another alternative is to store reverse deltas or multiple baselines at different versions with a combination of reverse and forward deltas.
This file object structure and the associated metadata allows DMS to track a file history, e.g., what information has changed at what time, by whom, through what event, and what meaningful events to this object occurred during the object's lifecycle. To conserve data storage usage, preferably the metadata of the version pages may be combined into a table or a journal, and preferably the table or journal may be stored in a physical storage unit. Also, this structure can be stored over a raw storage device, or by overlaying a file system or a database.
Although not a limitation of the invention, the file object may also optimize storage usage through a sparse index, as more particularly described in Ser. No. 10/943,541, filed Sep. 17, 2004, which application is incorporated herein by reference.
History of Protected File System
In this example, it is assumed that the entire file system is uploaded to the DMS for protection from a host server at T1. This is represented by FSDS 801 (DataSource 1). At T1, there is a version 1 of the directories 811 and 821, and of the files 841 and 851. While this representation refers to a time T1, this time may be a time range or period, as upload of the entire file system typically takes a given amount of time.
At T2, the content of the file by the name “fo1” changed, which is represented by the new object 842. Because this is a content change, the parent “do1” v1 (reference numeral 811) is unaffected.
At T3, the directory by the name “do2” changed its name to “dox” (as now indicated by object 822). Because in this example a directory also tracks the name of its children, this change causes parent “do1” (which has reference to this object) to also generate a new version 812.
At T4, a new file by the name “fo3” is added to “do1” (as now indicated by object 831); thus, a third version of “do1” is generated as indicated by reference 813.
As can be seen, within this DMS file system history, preferably there are version links, parent-child links (i.e., object relationships), as well as the data links to binary and delta pages (852, 843, 844). Therefore, this structure forms a three dimensional (3-D) object store. Navigating this object store enables DMS to identify the history and state of the DataSource 1 at any point-in-time, as will be described in more detail below.
As mentioned in the above sections, physical structure of the object versions can be organized in any fashion to optimize for the storage usage at the DMS.
As also mentioned, an alternate non object-oriented embodiment may be implemented with techniques that link and index the object versions, object relationships and data changes by applying a schema over a relational database, together with a set of processes (described below) that update the database according to the invention.
File System Events and DMS Operations
The above sections discuss the DMS logical object data structure for file system protection. The following sections discuss the functional behavior of file system protection in more detail.
The present invention is directed to the end-to-end process (e.g., from host driver to active objects in one cluster, or from active objects in one cluster to active objects in another cluster) that changes the DMS logical object store structure when a file system history is captured. In particular, the present invention describes the processes involved in the storing of real-time history of a file system, and the use of the real-time history for any point-in-time recovery for guaranteed consistency. The DMS file system history structure changes are triggered by one or more events from file system, users, or applications.
Thus, as illustrated in
DMS Processes for File System
The following sections describe the DMS (host driver and active object) processes and the associated data history structure changes in response to specific file system events. Illustrations of sample DMS file system history are shown for each case. For illustrative purposes,
Creation of a File or a Directory
The creation of a file or a directory in the DMS file system history structure is initiated by a CREATE event captured by a DMS host driver operating to provide data protection, or by a similar event distributed from one data source to another data source. This function is illustrated in the process flow of
This process begins with the DMS host driver capturing a file or directory CREATE event at step 1071. The host driver then issues an Object Create command to the associated DMS file system data source to create an active object with the following parameters—object class, parent path, object name, creation timestamp, owner, ACL, and other attributes. This is step 1073. An object manager at the DMS handles the request for the DMS file system data source object by opening the parent active object and creating a new version thereof. This is step 1075, and this operation has been described and illustrated above. The DateTimeEnded of the parent's previous version is set to T, and the DateTimeModified of its new version is set to T as well. The object manager also creates a new file or directory object with a first version and sets all the properties as necessary. This is step 1077. The DateTimeCreated property in the anchor of the new object is set to T, as is the DateTimeModified property in the first version. Preferably, the state of the first version of this new object is temporarily set to SUSPECT, as it has not closed yet. Once the object is created, it is linked to its parent, preferably by adding its GUID to the CHILDREN property of its parent's latest version. This also occurs in step 1077. Thereafter, the parent object is closed and the new child handle remain open; this handle is passed over to the host driver, which can issue more updates to the new child. This is step 1079. If any failures occur during the above-described steps of the process, the transaction is rolled back at step 1081 and the host driver is notified; the host driver then performs a resynchronization on that object at step 1084. The host driver keeps the child handle in an open handle list at step 1083. If the new object is a file object, the host driver forwards data (via a MODIFY event) to the DMS file object handle and then closes the file handle to generate a consistent (State=CONSISTENT) version. At this point, the new object state is SUSPECT not CONSISTENT.
In the data distribution scenario, a master DMS data source object is forwarding events and a data stream to a target DMS replicated data source object. The process described in the flowchart of
The following is a representative DMS file system object (file and directory) creation algorithm. This algorithm is associated with steps 1075 and 1077 in the process flow diagram described above. Execution of this algorithm produces
Event = CREATE Create (object A, in directory B, with metadata X, at time T) Where object A may be a file or a directory. Metadata includes creator, ACL, and so on. Open parent directory object B, which has n version. Create new version (n + 1) for object B, copy all the properties from version n. Set DateTimeModified (object B, version n + 1) = T, DateTimeEnded (object B, version n + 1) = NULL Set DateTimeEnded (object B, version n) = T. Create new object A, create 1 version. Set properties of A (anchor page and version 1 page) = metadata X. Set DateTimeCreated (object A, anchor) = T, DateTimeTerminated (object A, anchor) = NULL Set DateTimeModified (object A, version 1) = T, DateTimeEnded (object A, version 1) = NULL If object A is file, set State (object A, version 1) = Suspect Add object (A, version 1) to the property Children (object B, version n + 1) Set PARENT property of (A, version 1) to refer to (B, version n + 1) Open parent directory of object B. If B is the root, then the parent is the file system data source object (FSDS). Call this object C. Change the Children property of C at the latest version to refer to the new version of B*. Close object B and C (if C there is a C) Object A handle is left open for further updates.
The above algorithm is merely representative of the steps that may be used to transform the logical representation of
Modification of a File or a Directory
A file or a directory modification in the DMS object store preferably is represented by the creation of a new object version. The modification of a file system object preferably is broken into three phases, and each phase involves a set of events. The three phases of active object modification cycle are (1) instantiating an active object, (2) modifying the object properties, and (3) freezing the version.
As noted above, phase 1 is instantiation of the target active object, which can either be a CREATE (as in
Phase 2 is the actual modification of the target object, which occurs either in block 1225 or in block 1209. The events that modify object properties preferably include MODIFY (metadata) and WRITE (binary data). There may be many more modification events depending on the application and system to which the DMS is applied. The test at step 1208 determines which event has occurred. A MODIFY event changes file system object metadata (such as access control list (ACL), object title, company name, document statistics and other user defined properties) that is associated with a file system object. The flow sub-diagram at block 1225 describes the process to modify object metadata. In particular, in response to capture of the modification event (steps 1207 and 1208), at step 1226 the host driver issues an object MODIFY event to the target active object (via the opened object handle) at the DMS, which target object then changes the object property in the latest object version at step 1227.
The WRITE event is for file object only; it represents binary content updates. In this path, when host driver captures the WRITE event (steps 1207 1208), it enters the processing block 1209 in the file modification process. Host driver first decides if it should buffer the changes. This is step 1210. For a database log file, for example, the host driver would not buffer the log entries; however, when dealing with a regular document changes are buffered. If changes are buffered, then the host driver decides if it is time to forward the changes to the DMS target active file object. This is step 1211. If no data should be forwarded at the meantime, the host driver continues to watch for events at step 1207. Otherwise, the host driver decides if delta reduction should be applied on the changes to extract the actual changed byte range. This is step 1212. Again, for database log entries, delta reduction is not necessary because log entries are appended onto the file. If no delta reduction is necessary, the changed data is packaged with a WRITE event and forwarded to the DMS target active object. This is step 1213. Otherwise, the host driver retrieves the last binary content signatures and, for one or more byte ranges that actually changed (deltas), the host driver calculates and forwards those deltas to the DMS target object. This is done using a host driver-generated (namely, APPLYDELTA) event and occurs at step 1214. The target DMS file object receives both WRITE and APPLYDELTA events, applies second stage delta reduction as necessary to extract the exact bytes changed, saves the deltas or binary data in the binary or delta pages, and then creates or updates the sparse index in the working (last) object version. This is step 1215. If the file has not been closed, the host driver continues to capture more modification events for the file at step 1207; otherwise, the closing process continues, which is step 1222. If any failure occurs, the host driver abandons all the events associated with the target file and performs resynchronization of the entire object at step 1230. In the preferred embodiment, the host driver uses a FLUSH event as one of the events to decide if buffered data should be forwarded to the DMS without causing the DMS target object to generate new version.
Phase 3 freezes the changes into an object version. This is block 1221. The possible events that cause a DMS target file or directory active object to freeze the latest object version (and thus to prevent any more changes into that version) are CLOSE, CHECKPOINT, and FORCE-CLOSE. The flow diagram in block 1221 illustrates the processing of handling of these events to freeze a DMS active file or directory object version. A CLOSE is a file system event. When this event is detected by the host driver, the associated file is consistent. A CHECKPOINT may be generated by an application with or without user initiative, or it may be generated by the host driver. A CHECKPOINT event is generated when the associated application or the host driver have taken some appropriate action to ensure that the version to be frozen is consistent in its application point of view. For example, a database manager periodically flushes its memory and generates an internal CHECKPOINT, during which a set of database files are consistent. The CHECKPOINT event from a database manager can be detected by the host driver, which indicates that the group of files belonging to the particular database is consistent and a DMS version of each of the files should be frozen. In yet another example, an application or the host driver may generate a CHECKPOINT to all the system state files (or even the entire file system) to create a snapshot of all the files all at once (an exact time). This CHECKPOINT forces all the related files to freeze their latest version all at once. If a CLOSE or CHECKPOINT event is detected by the host driver at step 1208, the host driver checks if there is any buffered data for that object. This is step 1220. If so, the host driver first forces the data to be forwarded to the target DMS object. After that, the host driver forwards the event to the DMS target object. This is step 1222. Both CLOSE and CHECKPOINT events generate a consistent DMS object version with the State property of the frozen version set to “Consistent.” This is step 1223.
As noted above, another event that freezes a file or a directory object is FORCE-CLOSE. This event may be generated by an application or the host driver itself (step 1207), or it can be generated by the DMS (step 1225) when an error is encountered. The error could be that the host driver connection to the DMS has failed. The FORCE-CLOSE event generated by an application or host driver goes through the same process as the CLOSE and CHECKPOINT events (steps 1220, 1209, 1222, and 1223). FORCE-CLOSE freeze the latest version, as it is in Suspect state (i.e., the consistency is unknown). This is step 1223. Any DMS object that ends with Suspect state is eventually resynchronized with its corresponding file system object in the host, which is step 1230.
In one embodiment, the DMS data source object automatically creates a new object version when it handles a CREATE or OPEN event. In this case, step 1223 involves freeing an unmodified object version. An alternate embodiment forces the DMS data source object to create a new object version when it receives the first modification event.
The above described process with respect to the given events is merely illustrative. Other events and/or other handling techniques may be implemented. Thus, e.g., an alternative embodiment may choose to freeze file system objects more frequently (by using events such as FLUSH or possibly the WRITE events with TIMEOUT in step 1221). Another embodiment includes other application and host driver generated events, such as AUTO-SAVE. Also, with alternate embodiments, the process may be somewhat different than as illustrated in
In the data distribution scenario as noted above, a master DMS data source object is forwarding events and a data stream to a target DMS replicated data source object. The process described in
The following is a representative DMS file system object (file and directory) modification algorithm for implementing the object store changes described and illustrated above.
Phase 1: Events = CREATE, OPEN. Create (object A, in directory B, with metadata X, at time T) Open (object A, write mode, at time T) (step 1205) 1. Open object A, which has m version. 2. Create new version (m + 1) for object A, copy all the properties from version m. 3. Set DateTimeModified (object A, version m + 1) = T, DateTimeEnded (object A, version m + 1) = NULL 4. Set DateTimeEnded (object A, version m) = T. 5. If object A is file, set State (object A, version 1) = Suspect 6. Open parent of object A, and change the child link to this latest version (m + 1) of A. Phase 2: Events = WRITE, MODIFY, APPLYDELTA (may repeat multiple times) If FLUSH is used in phase 2, it simply triggers execution of phase 2 as if there is a zero length WRITE, which forces the accumulated changes to be recorded on DMS. Modify (object A, property = value) (step 1227) 1. Set property (object A, version m + 1) = value Write (object A, content = (offset, length, data)) (step 1215) ApplyDelta (object A, content = delta string) 1. If first version or if there is no need for delta reduction, write to binary page 2. otherwise, 2nd stage delta reduction, write to delta page 3. Update sparse index of object A, version m + 1 Phase 3: Events = CLOSE, CHECKPOINT, and FORCECLOSE. FLUSH may be applied. (Step 1223) Close (object A) Checkpoint (object A) 1. if file, set state (object A, version m + 1) = Consistent 2. close the object A and its parent object ForceClose (object A) 1. Close object A and its parent object (in this case state remains Suspect).
Deletion of a File or a Directory
In a data distribution scenario, as described above a master DMS data source object forwards events and the data stream to a target DMS replicated data source object. The process described
In the “fo1” file object, the DateTimeTerminated property in the anchor page (not shown) is set to T4 to indicate that the object does not exist beyond T4. The latest version of “fo1,” which is version 2 (object 942) is ended by its DateTimeEnded property set to T4. As can be seen, preferably the object “fo1” does not get removed physically from the DMS object store when it is deleted. Instead, its history terminates at T4. Preferably, the DMS object store keeps growing its history until a user explicitly prunes the history, either manually or via retention policy, during which older versions of the active objects get dropped off at the tail end.
The following is a representative DMS file system object (file and directory) deletion algorithm for implementing the object store changes described and illustrated above.
Event = DELETE Delete (object A, at time T) (steps 1505, 1507) Where object A may be a file or a directory. Open parent directory object B, which has n version. Create new version (n + 1) for object B, copy all the properties from version n. Set DateTimeModified (object B, version n + 1) = T, DateTimeEnded (object B, version n + 1) = NULL Set DateTimeEnded (object B, version n) = T. Open parent of object B, change the child reference of the latest version to refer to version n + 1 of B. Open object A, which has m version Set DateTimeTerminated of A (anchor page) = T. Set DateTimeEnded (object A, m) = T If object A is directory, traverse all its children. For all its children, open them, set DateTimeTerminated (all its children's anchor page) = T, and set Date TimeEnded (all its children's latest version page) = T. Remove object A from property Children (object B, version n + 1) Close all the objects
Relocating or renaming a File or a Directory
In this section, renaming a file system object refers to changing the name of the object without moving it away from its directory. While relocating a file system object refers to moving an object from one directory to another different directory, the name of the object may also change during relocation. Relocation is a superset of renaming; one can relocate from within the same directory for changing an object's name path. In either case, the destination name either of the rename or relocate operation may already exist and be used by another object, in which case the original destination object is deleted and the target object to be renamed or relocated takes over the destination name.
In some file systems, a MOVE event is used for both renaming a file system object (file or directory) as well as relocating (moving) a file system object from one directory to another. In other file systems, a RENAME event may be used for both renaming and relocating of a file system object. There are also file systems where a MOVE event is used for relocating file system objects while a RENAME event is used for re-labeling (renaming) file system objects. In one embodiment of this invention, both MOVE and RENAME events are treated the same, in other words, both events initiate renaming and relocating of file system objects.
As described previously, in the data distribution scenario the host driver in the process flow is replaced by a master DMS file system data source, and the DMS file system data source is a replication data source. Otherwise, the process flow in
The following is a representative rename algorithm for use in the present invention. As noted above, the file system may use a MOVE or RENAME event to rename a file or a directory object.
Event = MOVE or RENAME Rename (object A, old name, new name, at time T) (Step 1705, 1707) Where object A may be a file or a directory. Open object A which has m version. Create a new version (m + 1). Copy all properties of m to m + 1. Set DateTimeEnded (object A, version m) = T. Set DateTimeModified (object A, version m + 1) = T, DateTimeEnded (object A, version m + 1) = NULL Open parent directory (object B which has n versions) of object A. If Object B does not track child name, simply change its child reference to (m + 1). ( If Object B does track child name, create new version (n + 1) for object B, copy all the properties from version n. ( Set DateTimeModified (object B, version n + 1) = T, DateTimeEnded (object B, version n + 1) = NULL Set DateTimeEnded (object B, version n) = T. Change child reference of version (n + 1) to object A (m + 1). Open parent of object B, change the child reference of the latest version to refer to version n + 1 of B. Close all the objects
In the versioned by object path model of
A representative file object relocation algorithm is set forth below:
Event = MOVE or RENAME RelocateFile (File A, old path, new path, at time T) (Step 1705, 1707) Open current parent of A, P1, using the old path. P1 has k versions. Create a new version (k + 1) and copy all the properties of k to (k + 1). Set DateTimeEnded (P1, version k) = T. Set DateTimeModified (P1, version k + 1) = T, DateTimeEnded (P1, version k + 1) = NULL Remove A from CHILDREN property (P1, version k + 1) Change CHILDREN property in latest version of parent of P1 to refer to (k + 1). If new parent of A is not the same as its current parent, do the following: otherwise, P2 is the same as P1: Open new parent of A, P2, using the new path. P2 has i versions. Create a new version (i + 1) and copy all the properties of i to (i + 1). Set DateTimeEnded (P2, version i) = T. Set DateTimeModified (P2, version i + 1) = T, DateTimeEnded (P2, version i + 1) = NULL Change CHILDREN property in latest version of parent of P2 to refer to (i + 1). If an object existed in the new path (call it Z with r number of version), terminate it by setting DateTimeEnded (Z, version r) = T, and DateTimeTerminated (Z, anchor) = T. Remove version r of Z from the parent (P1) CHILDREN list. If versioned by object instant ( Open file object A which has m version. Create a new version (m + 1). Copy all properties of m to m + 1. Set DateTimeEnded (A, version m) = T Set DateTimeModified (A, version m + 1) = T, DateTimeEnded (A, version m + 1) = NULL Set Name (A, version m + 1) if name is to be changed as well Add Object (A, version m + 1) to CHILDREN property of (P2, version i + 1). Set PARENT property of (A, version m + 1) to (P2, version i + 1) If versioned by object path without connecting to dead object ( Create new object C with version 1. Open file object A which has m version. Copy properties (object A, version m) to properties (object C, version 1) —this copies over the sparse index so that the binary and delta pages can be shared. Set DateTimeCreated (object C, anchor) = T, DateTimeTerminated (object C, anchor) = NULL, DateTimeModified (Object C, version 1) = T, DateTimeEnded (Object C, version 1) = NULL Set Name (C, version 1) if name is to be changed as well Set DateTimeTerminated (Object A, anchor) = T, DateTimeEnded (Object A, version m) = T. Add Object (C, version 1) to CHILDREN property of (P2, version i + 1). Set PARENT property (C, version 1) to (P2, version i + 1) If versioned by object path and connecting to dead object ( Open object C which is a dead object in the new path with the same final name as object A. Object C has n version. Create n + 1 version. Open file object A which has m version. Copy properties (object A, version m) to properties (object C, version n + 1) —this copies over the sparse index so that the binary and delta pages can be shared. Set DateTimeTerminated (object C, anchor) = NULL, DateTimeModified (Object C, version n + 1) = T, DateTimeEnded (Object C, version n + 1) = NULL Set Name (C, version n + 1) if name is to be changed as well Set DateTimeTerminated (Object A, anchor) = T, DateTimeEnded (Object A, version m) = T. Add Object (C, version n + 1) to CHILDREN property of (P2, version i + 1). Set PARENT property of (C, version n + 1) to (P2, version i + 1). Close all the objects.
The algorithm set forth above shows relocation of an object within a protected file system or folder (i.e., moving from a sub-folder to another sub-folder within a protected file system tree). When an object is relocated from a protected area to an unprotected are (e.g., outside of the protected file system), the handling process is similar to DELETE. This is because the object disappears from the protected area after the event. Conversely, when an object is moved from an unprotected area to a protected area, it is treated as a creation of a new object.
The DMS may also include a temporary file handling process to minimize storage and bandwidth usage, e.g., by not transferring and storing potentially useless temporary files in the file system history. For example, Microsoft Word creates a temporary file when a Word document is modified. The updates are entered into a temporary file; upon a save event, the temporary file is renamed to the original file name. In this case, preferably creation and/or modification of the temporary file may be avoided; thus, preferably, only when the RENAME occurs is the above relocation process applied and object history linked together by pathname.
The above algorithm implements relocation of a file object; relocation of a directory object is more complex, as a directory may have children. Similar to relocating a file, the history management of relocation of a directory can also be based on model 1 (versioned by object instance) or model 2 (versioned by path). Also, one may connect the relocated object with a dead historical object while versioned by path or without connecting the relocated object with the historical object.
As can be seen, the versioned by object path model has relatively high processing and storage cost when relocating a directory, although it is much simpler to traverse. The alternative is to have a versioned by object path and versioned by object instance hybrid as shown in
The following is a directory object relocating algorithm.
Event = MOVE or RENAME RelocateDirectory (directory A, old path, new path, at time T) (Steps 1705, 1707) Open current parent of A, P1, using the old path. P1 has k versions. Create a new version (k + 1) and copy all the properties of k to (k + 1). Set DateTimeEnded (P1, version k) = T. Set DateTimeModified (P1, version k + 1) = T, DateTimeEnded (P1, version k + 1) = NULL Remove A from CHILDREN property (P1, version k + 1) Change CHILDREN property in latest version of parent of P1 to refer to (k + 1). If new parent of A is the same as the old one, then P2 = P1, otherwise, do the following: Open new parent of A, P2, using the new path. P2 has i versions. Create a new version (i + 1) and copy all the properties of i to (i + 1). Set DateTimeEnded (P2, version i) = T. Set DateTimeModified (P2, version i + 1) = T, DateTimeEnded (P2, version i + 1) = NULL Change CHILDREN property in latest version of parent of P2 to refer to (i + 1). If an object already owns the new path name, open that object to terminate it. Remove the object version page from P2. If versioned by object instant ( Open directory object A which has m version. Create a new version (m + 1). Copy all properties of m to m + 1. Set DateTimeEnded (A, version m) = T Set DateTimeModified (A, version m + 1) = T, DateTimeEnded (A, version m + 1) = NULL Set Name (A, version m + 1) if name is to be changed as well Add Object (A, version m + 1) to CHILDREN property of (P2, version i + 1). Set PARENT property of A to refer to (P2, version i + 1). If versioned by combination of object path and object instance and do not connect to dead object ( Create new object C with version 1. Open directory object A which has m version. Copy properties (object A, version m) to properties (object C, version 1) —this copies over all the children links. Set DateTimeCreated (object C, anchor) = T, DateTimeTerminated (object C, anchor) = NULL, DateTimeModified (Object C, version 1) = T, DateTimeEnded (Object C, version 1) = NULL Set Name (C, version 1) if name is to be changed as well Set DateTimeTerminated (Object A, anchor) = T, DateTimeEnded (Object A, version m) = T. Add Object (C, version 1) to CHILDREN property of (P2, version i + 1). Set PARENT property of C to refer to (P2, version i + 1). If versioned by object path without connecting to dead object ( Perform object close for directory A. Traverse to all the descendents of directory A, terminate all the descendent using relocate with versioned by object path method. For each of the descendent, terminate the old one, create a new object with version 1, reset the parent-child link to connect the new child (version 1 of child) to the new parent (version 1 of parent). Close all the objects.
The above algorithm only illustrates the process to relocate a directory object from a sub-folder to another sub-folder, all within a protected file system. In the event a directory is relocated outside of the protected file system, the process is the same as that for deleting the direct object (i.e., the same process for a DELETE event). In the event a directory is relocated from an unprotected file system to a protected file system, the process is similar triggered by a CREATE event as has been described above.
The present invention has been described by a set of example data structures, but this is not a limitation of the present invention. Thus, while is has been convenient to explain the various file and directory object algorithms (create, modify, delete, rename and relocate) by reference to a continuous sequence of sample data structures, these algorithms may be used with any object store as the starting or baseline data structure.
As one of ordinary skill in the art will appreciate, the present invention addresses enterprise data protection and data management problems by continuously protecting all data changes and transactions in real time across local and wide area networks. Preferably, and as illustrated in
While the present invention has been described in the context of a method or process, the present invention also relates to apparatus for performing the operations herein. As described above, this apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
While the above written description also describes a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
Although different data services have been described, it should be appreciated that one advantage of the present invention is that a given DMS appliance can provide one or more such services. It is a particular advantage that a given appliance can provide a consolidated set of data services on behalf of a particular data source. As has been described, this operation is facilitated through the provision of the host driver, which captures, as a data history, an application-aware data stream. The application-aware data stream typically comprises data change(s), events associated with the data change(s), and metadata associated with the data change(s). As has been described, this information is streamed continuously to the appliance (or to a set of cooperating appliances) to facilitate provision by the appliance of the desired service(s).
As noted above, it is not required that the present invention be implemented in an “object-oriented” manner, even though it is one preferred implementation to do so. As described above, the invention may be implemented in a non object-oriented manner, e.g., by overlaying the logical representation over other data storage such as a relational database. Thus, as used herein, an “object” is not limited to a construct that is used solely in an “object-oriented” implementation but should be construed broadly. Thus, in the context of a relational database implementation, an “object” may be represented by a row of data from one or more relational database tables, and an associated “link” may simply be a reference from one row to another row (e.g., a data row address or the like).
|Patente citante||Fecha de presentaci髇||Fecha de publicaci髇||Solicitante||T韙ulo|
|US7590632 *||28 Ene 2005||15 Sep 2009||Sun Microsystems, Inc.||Method for serializer maintenance and coalescing|
|US7657543||28 Ene 2005||2 Feb 2010||Sun Microsystems, Inc.||Method and system for creating and using shadow roots|
|US7680834||8 Jun 2004||16 Mar 2010||Bakbone Software, Inc.||Method and system for no downtime resychronization for real-time, continuous data protection|
|US7689602||20 Jul 2005||30 Mar 2010||Bakbone Software, Inc.||Method of creating hierarchical indices for a distributed object system|
|US7778970 *||28 Ene 2005||17 Ago 2010||Oracle America, Inc.||Method and system for managing independent object evolution|
|US7788521||20 Jul 2005||31 Ago 2010||Bakbone Software, Inc.||Method and system for virtual on-demand recovery for real-time, continuous data protection|
|US7809687||4 Ago 2006||5 Oct 2010||Apple Inc.||Searching a backup archive|
|US7809688||4 Ago 2006||5 Oct 2010||Apple Inc.||Managing backup of content|
|US7853566||4 Ago 2006||14 Dic 2010||Apple Inc.||Navigation of electronic backups|
|US7853567||4 Ago 2006||14 Dic 2010||Apple Inc.||Conflict resolution in recovery of electronic data|
|US7856424||4 Ago 2006||21 Dic 2010||Apple Inc.||User interface for backup management|
|US7860839||4 Ago 2006||28 Dic 2010||Apple Inc.||Application-based backup-restore of electronic information|
|US7904913||1 Nov 2005||8 Mar 2011||Bakbone Software, Inc.||Management interface for a system that provides automated, real-time, continuous data protection|
|US8099671 *||29 Sep 2008||17 Ene 2012||Xcerion Aktiebolag||Opening an application view|
|US8108426||29 Sep 2008||31 Ene 2012||Xcerion Aktiebolag||Application and file system hosting framework|
|US8112460 *||29 Sep 2008||7 Feb 2012||Xcerion Aktiebolag||Framework for applying rules|
|US8156146 *||29 Sep 2008||10 Abr 2012||Xcerion Aktiebolag||Network file system|
|US8209350 *||12 Sep 2007||26 Jun 2012||The Mathworks, Inc.||Storing and maintaining consistency among folios holding associated information|
|US8219985 *||13 Dic 2011||10 Jul 2012||Trend Micro Incorporated||Method and system for version independent software release management|
|US8234315 *||29 Sep 2008||31 Jul 2012||Xcerion Aktiebolag||Data source abstraction system and method|
|US8307004||8 Jun 2007||6 Nov 2012||Apple Inc.||Manipulating electronic backups|
|US8352523 *||23 Sep 2011||8 Ene 2013||Quest Software, Inc.||Recovering a file system to any point-in-time in the past with guaranteed structure, content consistency and integrity|
|US8365017 *||27 Jun 2012||29 Ene 2013||Quest Software, Inc.||Method and system for virtual on-demand recovery|
|US8453145||5 May 2011||28 May 2013||Quest Software, Inc.||Systems and methods for instant provisioning of virtual machine files|
|US8504516||15 Jun 2009||6 Ago 2013||Apple Inc.||Manipulating electronic backups|
|US8566287 *||29 Mar 2010||22 Oct 2013||Hewlett-Packard Development Company, L.P.||Method and apparatus for scheduling data backups|
|US8615531 *||29 Sep 2008||24 Dic 2013||Xcerion Aktiebolag||Programmatic data manipulation|
|US8620863 *||29 Sep 2008||31 Dic 2013||Xcerion Aktiebolag||Message passing in a collaborative environment|
|US8738567 *||29 Sep 2008||27 May 2014||Xcerion Aktiebolag||Network file system with enhanced collaboration features|
|US8739152||30 Jun 2012||27 May 2014||Trend Micro Incorporated||Method and system for version independent software release management|
|US8805940 *||28 Feb 2012||12 Ago 2014||Microsoft Corporation||Enhanced replication for message services|
|US8856088 *||1 Abr 2008||7 Oct 2014||Microsoft Corporation||Application-managed file versioning|
|US8874517 *||31 Ene 2007||28 Oct 2014||Hewlett-Packard Development Company, L.P.||Summarizing file system operations with a file system journal|
|US8903763 *||21 Feb 2006||2 Dic 2014||International Business Machines Corporation||Method, system, and program product for transferring document attributes|
|US8959123 *||29 Sep 2008||17 Feb 2015||Xcerion Aktiebolag||User interface framework|
|US8965929||5 Nov 2012||24 Feb 2015||Apple Inc.||Manipulating electronic backups|
|US8972347 *||14 Dic 2012||3 Mar 2015||Dell Software Inc.||Recovering a file system to any point-in-time in the past with guaranteed structure, content consistency and integrity|
|US9021299 *||18 Feb 2011||28 Abr 2015||Ab Initio Technology Llc||Restarting processes|
|US9032403||24 Abr 2013||12 May 2015||Dell Software Inc.||Systems and methods for instant provisioning of virtual machine files|
|US9053107 *||6 Dic 2011||9 Jun 2015||Google Inc.||Determining updates for files based on an organization of the files on different blocks of a storage device|
|US9071623 *||29 Sep 2008||30 Jun 2015||Xcerion Aktiebolag||Real-time data sharing|
|US20050262097 *||6 May 2005||24 Nov 2005||Sim-Tang Siew Y||System for moving real-time data events across a plurality of devices in a network for simultaneous data protection, replication, and access services|
|US20100030998 *||4 Feb 2010||Vmware, Inc.||Memory Management Using Transparent Page Transformation|
|US20100257403 *||3 Abr 2009||7 Oct 2010||Microsoft Corporation||Restoration of a system from a set of full and partial delta system snapshots across a distributed system|
|US20110078218 *||29 Feb 2008||31 Mar 2011||Mitsubishi Electronic Corporation||Event history storage device, event history tracking device, event history storage method, event history storage program, and data structure|
|US20110173601 *||14 Jul 2011||Google Inc.||Operating system auto-update procedure|
|US20110191777 *||29 Mar 2010||4 Ago 2011||Ajay Bansal||Method and Apparatus for Scheduling Data Backups|
|US20120084764 *||5 Abr 2012||Thorley Jeb Stuart||Method and system for version independent software release management|
|US20120158657 *||21 Dic 2010||21 Jun 2012||International Business Machines Corporation||Role-specific access control to sections of artifact content within a configuration management (cm) system|
|US20120216073 *||18 Feb 2011||23 Ago 2012||Ab Initio Technology Llc||Restarting Processes|
|US20120266019 *||27 Jun 2012||18 Oct 2012||Quest Software, Inc.||Method and system for virtual on-demand recovery|
|US20130227028 *||28 Feb 2012||29 Ago 2013||Microsoft Corporation||Enhanced replication for message services|
|US20130311428 *||26 Oct 2012||21 Nov 2013||Splunk Inc.||Clustering for high availability and disaster recovery|
|US20130325804 *||5 Jun 2012||5 Dic 2013||International Business Machines Corporation||Replica identification and collision avoidance in file system replication|
|US20140172803 *||19 Dic 2012||19 Jun 2014||Microsoft Corporation||Main-memory database checkpointing|
|Clasificaci髇 de EE.UU.||1/1, 707/E17.01, 707/999.204|
|Clasificaci髇 internacional||G06F17/30, G06F7/00|
|Clasificaci髇 cooperativa||G06F11/3495, G06F2201/86, G06F17/30085|
|23 Jul 2009||AS||Assignment|
Owner name: BAKBONE SOFTWARE, INC.,CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SIM-TANG, SIEW YONG;REEL/FRAME:022994/0627
Effective date: 20090721