WO1998035306A1 - File comparison for data backup and file synchronization - Google Patents
File comparison for data backup and file synchronization Download PDFInfo
- Publication number
- WO1998035306A1 WO1998035306A1 PCT/US1998/002434 US9802434W WO9835306A1 WO 1998035306 A1 WO1998035306 A1 WO 1998035306A1 US 9802434 W US9802434 W US 9802434W WO 9835306 A1 WO9835306 A1 WO 9835306A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- file
- data
- function
- array
- offset
- Prior art date
Links
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 51
- 238000000034 method Methods 0.000 claims description 27
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000013144 data compression Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000012958 reprocessing Methods 0.000 abstract description 2
- 230000006835 compression Effects 0.000 description 9
- 238000007906 compression Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 6
- COCAUCFPFHUGAA-MGNBDDOMSA-N n-[3-[(1s,7s)-5-amino-4-thia-6-azabicyclo[5.1.0]oct-5-en-7-yl]-4-fluorophenyl]-5-chloropyridine-2-carboxamide Chemical compound C=1C=C(F)C([C@@]23N=C(SCC[C@@H]2C3)N)=CC=1NC(=O)C1=CC=C(Cl)C=N1 COCAUCFPFHUGAA-MGNBDDOMSA-N 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011835 investigation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/09—Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99953—Recoverability
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99955—Archiving or backup
Definitions
- the present invention is generally related to data backup and file synchronization, and more particularly to comparison of sets of digital data to determine differences therebetween to facilitate data backup and file synchronization.
- Squibb describes a method of comparing previously stored digital signatures with new digital signatures calculated for data within a sliding window in order to recognize differences between files.
- Squibb teaches using two functions to calculate digital signatures.
- the first function has the attribute of being incremental.
- the incremental function allows computation of a new digital signature using only the new data entering the window, the old data leaving the window and the old signature, and hence can be calculated relatively quickly.
- the second function has the attribute of being position sensitive and provides a more unique signature from the data in the window, being sensitive to such changes as transposed bytes.
- One combination of functions that Squibb teaches includes an Exclusive-OR (“XOR”) function as the incremental function and a Cyclic Redundancy Check (“CRC”) function as the position sensitive function.
- the XOR function of Squibb is incremental, but is not sensitive to such changes as transposed bytes.
- the CRC function of Squibb is position sensitive, but must include all of the data in the window in the signature calculation.
- the CRC function of Squibb therefore requires relatively more calculations to compute than the XOR function.
- the XOR function is employed first. If the comparison between the previously stored signature and the XOR signature indicates a changed data portion then that result is assumed to be correct. If the XOR signature comparison does not indicate a changed data portion then the CRC function is employed to produce a signature that is compared with the previously stored signature. The result from the CRC signature comparison is assumed to be correct. However, a more efficient technique would be desirable.
- a file comparison routine employs a sliding window and a single function F to generate a digital signature that is both position sensitive and incrementally computable.
- the function F enables computation of a position sensitive digital signature in ' a single step using only the new data entering the window, the old data leaving the window and the digital signature computed from the previous window.
- a file comparison system in accordance with the present invention offers improved performance because relatively fewer calculations are required and a relatively more unique digital signature is provided.
- previously detected and stored files which have been modified and saved under different file names are detected, and only those portions of the later version of the file which have changed are backed up.
- an "approximate comparison" is made between the file under investigation and the previously stored digital signature lists of other files that have been created on the source system.
- a signature is computed for a block of characters in the file under investigation and the signature is compared to previously stored signatures of backed up blocks.
- the extent of matches is then determined, such as by the number of matching blocks, and the matches are ranked.
- the previously backed up block corresponding to the highest ranked match is then employed as a baseline for storage of the new file, with only the differing portions of the new file being backed up along with an index to the baseline file.
- Fig. 1 is a block diagram of a data backup system that employs file comparison
- Fig. 2 is a block diagram which illustrates a sliding window comparison method
- Fig. 3 is a block diagram which illustrates approximate comparison of files
- Figs. 4-6 illustrate how compression and encryption affect backup and recovery
- Fig. 7 is a block diagram which illustrates the block filter routine.
- DETAILED DESCRIPTION OF THE INVENTION U.S. Provisional Patent Application No. 60/037,597 entitled FILE COMPARISON FOR DATA BACKUP AND FILE SYNCHRONIZATION, filed February 11, 1997 is incorporated herein by reference.
- a sliding window comparison technique is employed for data backup and file synchronization.
- the use of sliding windows and digital signatures for file comparison is known in the art and is taught, inter alia, in U.S. Patent No. 5,479,654 entitled APPARATUS AND METHOD FOR RECONSTRUCTING A FILE FROM A DIFFERENCE SIGNATURE AND AN ORIGINAL FILE, issued to Squibb, which is incorporated herein by reference.
- a file is selected from a storage device 10 in a source system 12 as indicated in step 14.
- the selected file is compared with files on a storage device 16 in a target system 18 by file name as indicated in step 20.
- a list of target system files can be maintained on the source system. If no matching file name is located on the target system 18, such as when a new file name 22 is selected in the source system, the entire new file 22 is transmitted from the source system 12 to the target system 18 for backup as indicated in step 24. If a matching file name is located on the target system, the file modification dates associated with the respective matching files are compared as indicated in step 26 ' . Upon locating a previously detected file with an unchanged modification date, no changes in the file are indicated and a new file is loaded as indicated in step 14.
- a control program 28 operating in the source system 12 identifies at least one contiguous portion of the file which has changed ("revision element") .
- the identified revision elements which may vary in size, are transmitted from the source system to the target system, and may be written into the backup copy of the file or separately backed up.
- the control program employs a sliding window 30 to produce digital signatures 32 from block data portions of the file in the source system to identify the revision elements.
- Each digital signature is a representation of the data characters within the window 30 when the signature is generated.
- the signature is computed by employing all of the data within the window as indicated in step 34.
- the computed signature 32 is compared with previously stored digital signatures 36 to produce a result 38 that indicates whether there is a match therebetween as indicated in step 40. If a match is not detected then the window position is recorded and the revision element is saved as indicated in step 42. If the end of the file has been reached, as determined in step 44, a new file is selected as indicated in step 14.
- the window 30 is then advanced by one character as indicated in step 46.
- a digital signature is then computed as indicated in step 48.
- the digital signature is compared with the previously stored digital signatures to determine if there is a match therebetween as indicated in step 40. If a match is detected in step 40, a match indicator and the match position are recorded as indicated in step 50. If the end of file has not been reached, as determined in step 52, the window is advanced by the number of characters within the window, e.g. , one block, as indicated in step 54. When the entire file has been analyzed, a new file is loaded as indicated in step 14.
- the signature 32 can be computed for this new array from the signature of the old array without reprocessing each of the bytes in the new array.
- the digital signature produced by the function F is relatively more unique than known incremental functions, and allows detection of such differences as transposed bytes of data within the window. The digital signature is thus "position sensitive.” The use of the function F provides improved performance in the file comparison system by reducing the number of calculations required to detect differences between sets of data such as files.
- the function enables computation of a position sensitive digital signature in a single step using only the new data 56 entering the window, the old data 58 leaving the window and the digital signature computed from the previous window. Hence, the performance is improved because relatively fewer calculations are required and a relatively more unique digital signature is provided.
- the function F is the polynomial:
- CRC Cyclic Redundancy Check
- the file comparison technique is employed to detect similar files having different filenames in order to further facilitate backup operations.
- a previously detected file such as a form letter 60
- the new version 62 of the file is detected and only the revision elements of that new version are saved.
- an "approximate comparison" is made between the new file and the previously stored digital signatures from the other files that have been created on the source system. The approximate comparison detects files that are similar, although not necessarily identical to, the new version of the file.
- control program maintains a list 64 of signatures for all source system files that have previously been backed up to the target system.
- a Match Count Table 66 is created with one row 68 for each file in the list of signatures. Each row in the Match Count Table is initialized to zero.
- the window 30 for signature computation in the new file is then positioned at the beginning of the file being examined.
- a signature 32 for the characters in the window 30 is computed with the function F.
- the computed signature is then compared against all of the signatures for the first "N" blocks of previously copied files in the list 64, where N is a predetermined integer value. If a match against at least one existing block signature is detected as indicated in step 70, then the row pointer 72 in the Match Count Table is incremented and the window 30 in the new file is moved to begin at the end 73 of the block for which the signature was just computed. If no match is detected at step 70 then the window 30 is advanced by one character as indicated in step 74. If comparison has not advanced beyond the first N blocks for the new file as determined in step 76, flow returns to step 68. Otherwise flow continues to step 78.
- step 78 the file with the highest count in the Match Count Table is selected and a comparison algorithm is used to determine the set of changes between this file and the new file.
- the set of changes is then backed up along with a pointer to the original backup of the file. In the event of equal counts resulting in "ties" in the Match Count Table one of the files is arbitrarily selected.
- restoration of revision elements is facilitated by a filter routine when compression and encryption are employed.
- the filter routine controls the encryption and compression engines on the source system.
- the compression and encryption engine is restarted with each revision element so that the target system can assemble a representation of the final file for transmission to the source system without decrypting the revision elements.
- Performance is further facilitated in the case where a plurality of relatively small revision elements are present by combining the small revision elements together and compressing the collection of revision elements in a single step.
- the compression engine is not restarted at each revision element, so the compression algorithm has a larger array to work on and hence is more efficient.
- Fig. 4 shows version 0 ("V0") of the file, consisting of 7 blocks, and the revision elements for VI through V4.
- Fig. 5 shows the complete contents of V4.
- V0 7 blocks
- UV2 4 blocks
- UV3 4 blocks
- UV4 4 blocks
- an original backup and revision elements generated from updates are grouped for compression and encryption. Not all of the updates for a single version are packed together. Rather, the updates are grouped together into units of compression ("chunks") .
- the optimal set of chunks to be returned in the illustrated example is: A2B2, C3D3, D4E4, F4G4, whereby only eight blocks are sent for a seven block file, rather than nineteen blocks as in the previously described technique.
- the updates will not always align perfectly and may have arbitrary overlaps .
- File restoration is further illustrated by Tables 1-11.
- a file to be restored is stored in the archive system as a series of updates and an archive.
- the archive contains all of the bytes of the file.
- the updates contain file change information that can be employed to update the previous version of the file to the current version.
- the update algorithm works by creating a list of updates that encompasses all of the bytes of the file. The algorithm first processes the most recent update and then works sequentially backwards through the set of updates, so that if a given byte of the file is already covered by an update, the algorithm will not employ an older update that contains that same byte . Addr
- Table 1 illustrates four versions of an example file.
- File version VI differs from version V0 in two places.
- version VI replaced two bytes at address 1 and inserted a byte at address 5.
- the update for VI is coded as follows :
- a block filter algorithm may be employed to facilitate restoration of a desired version of a file.
- the block filter algorithm is employed to recognize such situations and avoid needless computation.
- the Block Filter Algorithm determines an optimal set of updates. For example, providing file version V3 via the block filter algorithm includes creating an Update List of all of the version archives, ordered by version with the most recent at the head of the list. An empty output file and an empty GetData list are then created.
- the empty GetData list is a list of records that describes the data that is needed from each archive to create the output file. Entries are added to the end of the list and removed from the head of the list. Each record has the following four fields:
- Source Offset the location within the archive file to get the data from
- Target the location in the output file that this data will go to The following entry is placed at the head of the GetData list:
- Version most recent version of the file to be retrieved
- Source Offset 0
- the GetData list is processed by initially removing an entry from the head of the list.
- the entries in the Update List for the specified version are then searched to locate a record in the Update List whose source range overlaps the target range of the GetData entry being processed. "Overlap" indicates that some portion of the range defined by Data Offset+Length defined in the GetData entry overlaps the range defined by the Target+Length in the UpdateList . If the record found in the Update List is a "Data" record, then the data is copied from that record to the output file. If the record found in the Update List is a "Copy" record, then a new record is added to the end of the GetData list with the version entry decremented.
- Processing of the GetData entry continues by scanning the Update List entries until the entire target range of the GetData entry has been covered with either Data or Copy entries from the Update List . Flow then return to the initial step at which processing of the next entry from the GetData list begins. Exemplary block filter algorithm execution steps are shown in Tables 5-8. Entry 1 in the GetData List shows a source range of 0-8 and a Version of 3. The Update List is searched for a Version 3 entry with a matching output range. The first such entry is Entry 1. This specifies Data with a length of 1 (P) and it is placed in the output file and location 0, the target location. Entry 1 in the GetData List has been satisfied for source location 0, leaving source locations 1-8 to be satisfied. Entry 1 in the GetData List is processed for a source range of 1-8.
- Update List entry A match is found in Update List entry. This is a copy operation, so an entry is added to the GetData List (entry 2) that specifies Version 2, the next lower version, and spans the range of overlap from GetData entry 1 and Update List entry 2, namely 1-3. Entry 1 of the GetData List is then processed for source locations 4-8. This match is found with Update List item 3, a copy operation. GetData entry 3 is added. This satisfies the entire range of Entry 1 from the GetData List. Entry 2 in the GetData List shows a source range of 1-3 and a Version of 2. The Update List is searched for a Version 2 entry with an output range matching this. Such an entry is found at Update List (4) . This is a copy operation so a new entry (4) is placed in the GetData List.
- the GetData (3) entry has a source range of 5-9 which maps to an output range of 4-8. Hence the source item at location 6, gets placed in the output file at location 5.
- the GetData (3) source range 7-9 is next found to match Update (6), a copy operation. This yields GetData (6) .
- the remaining items in the GetData list are processed in a similar manner until the list is empty. This will yield the complete output file, without ever having to process entries from the update list Data items that did not affect the final output file.
Abstract
File comparison employs a single function F (28) to calculate a digital signature (32) from data in a sliding window (58, 56). The digital signature (32) is both incrementally computable and position sensitive. In particular, F is computable without reprocessing each byte in the array when the window is advanced and facilitates detection of such changes as transposed bytes of data. The function F is defined by two qualities. First, for F(A+B), where A is an array, F(A+B) = F(A) + F(B). Second, given a concatenation operator '!' such that '0!A' indicates an array A with 0 inserted before A, the function F has the property that there is a function G such that F(0!A) = G(F(A!0)). Both polynominals and cyclic redundancy checks ('CRC') may be used as that class of function.
Description
TITLE OF THE INVENTION FILE COMPARISON FOR DATA BACKUP AND FILE SYNCHRONIZATION
CROSS REFERENCE TO RELATED APPLICATIONS Priority is claimed to U.S. Provisional Patent Application No. 60/037,597 entitled FILE COMPARISON FOR DATA BACKUP AND FILE SYNCHRONIZATION, filed February 11, 1997.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR
DEVELOPMENT Not Applicable
BACKGROUND OF THE INVENTION
The present invention is generally related to data backup and file synchronization, and more particularly to comparison of sets of digital data to determine differences therebetween to facilitate data backup and file synchronization.
It is desirable to have the facility to copy files present on a first computer system onto a second computer system over a connection. Such a need arises when synchronizing the contents of an "A" disk system on the first computer system with another disk system on the second computer system. Such a need also arises when using a target system to maintain a second copy of a file from a source system for disaster recovery purposes. However, file copying
can be cumbersome and time consuming due to the bandwidth limitations of typical connections.
It is known to reduce file copying requirements, and hence more efficiently utilize bandwidth and reduce backup time, by sending only those portions of files which have changed since the last backup to the target system. However, to implement such a system it is necessary to perform data comparison to locate the changed portions.
A file comparison technique is described in U.S. Patent No. 5,479,654 entitled APPARATUS AND METHOD FOR
RECONSTRUCTING A FILE FROM A DIFFERENCE SIGNATURE AND AN ORIGINAL FILE, issued to Squibb. Squibb describes a method of comparing previously stored digital signatures with new digital signatures calculated for data within a sliding window in order to recognize differences between files. In particular, Squibb teaches using two functions to calculate digital signatures. The first function has the attribute of being incremental. The incremental function allows computation of a new digital signature using only the new data entering the window, the old data leaving the window and the old signature, and hence can be calculated relatively quickly. The second function has the attribute of being position sensitive and provides a more unique signature from the data in the window, being sensitive to such changes as transposed bytes.
One combination of functions that Squibb teaches includes an Exclusive-OR ("XOR") function as the incremental function and a Cyclic Redundancy Check ("CRC") function as the position sensitive function. The XOR function of Squibb is incremental, but is not sensitive to such changes as transposed bytes. The CRC function of Squibb is position sensitive, but must include all of the data in the window in the signature calculation. The CRC function of Squibb therefore requires relatively more calculations to compute
than the XOR function. In operation the XOR function is employed first. If the comparison between the previously stored signature and the XOR signature indicates a changed data portion then that result is assumed to be correct. If the XOR signature comparison does not indicate a changed data portion then the CRC function is employed to produce a signature that is compared with the previously stored signature. The result from the CRC signature comparison is assumed to be correct. However, a more efficient technique would be desirable.
BRIEF SUMMARY OF THE INVENTION In accordance with the present invention, a file comparison routine employs a sliding window and a single function F to generate a digital signature that is both position sensitive and incrementally computable. The function F is defined by two qualities. First, for F(A+B), where A is in array, F (A+B) = F (A) + F(B) . Second, given a concatenation operator "!" such that "0!A" indicates an array A with 0 inserted before A, the function F has the property that there is a function G such that F(0!A) = G(F(A!0)). Both polynomials and cyclic redundancy checks ("CRC") may be employed as the function. The function F enables computation of a position sensitive digital signature in 'a single step using only the new data entering the window, the old data leaving the window and the digital signature computed from the previous window. Hence, a file comparison system in accordance with the present invention offers improved performance because relatively fewer calculations are required and a relatively more unique digital signature is provided.
In further accordance with the present invention, previously detected and stored files which have been modified
and saved under different file names are detected, and only those portions of the later version of the file which have changed are backed up. To detect such a later version of the already backed up file an "approximate comparison" is made between the file under investigation and the previously stored digital signature lists of other files that have been created on the source system. In one method for approximate comparison a signature is computed for a block of characters in the file under investigation and the signature is compared to previously stored signatures of backed up blocks. The extent of matches is then determined, such as by the number of matching blocks, and the matches are ranked. The previously backed up block corresponding to the highest ranked match is then employed as a baseline for storage of the new file, with only the differing portions of the new file being backed up along with an index to the baseline file.
BRIEF DESCRIPTION OF THE DRAWING The invention will be more fully understood in view of the Detailed Description of the Invention and the Drawing, of which:
Fig. 1 is a block diagram of a data backup system that employs file comparison; Fig. 2 is a block diagram which illustrates a sliding window comparison method;
Fig. 3 is a block diagram which illustrates approximate comparison of files;
Figs. 4-6 illustrate how compression and encryption affect backup and recovery; and
Fig. 7 is a block diagram which illustrates the block filter routine.
DETAILED DESCRIPTION OF THE INVENTION U.S. Provisional Patent Application No. 60/037,597 entitled FILE COMPARISON FOR DATA BACKUP AND FILE SYNCHRONIZATION, filed February 11, 1997 is incorporated herein by reference.
Referring to Figs. 1 and 2, a sliding window comparison technique is employed for data backup and file synchronization. The use of sliding windows and digital signatures for file comparison is known in the art and is taught, inter alia, in U.S. Patent No. 5,479,654 entitled APPARATUS AND METHOD FOR RECONSTRUCTING A FILE FROM A DIFFERENCE SIGNATURE AND AN ORIGINAL FILE, issued to Squibb, which is incorporated herein by reference. A file is selected from a storage device 10 in a source system 12 as indicated in step 14. The selected file is compared with files on a storage device 16 in a target system 18 by file name as indicated in step 20. As a performance enhancing variation a list of target system files can be maintained on the source system. If no matching file name is located on the target system 18, such as when a new file name 22 is selected in the source system, the entire new file 22 is transmitted from the source system 12 to the target system 18 for backup as indicated in step 24. If a matching file name is located on the target system, the file modification dates associated with the respective matching files are compared as indicated in step 26'. Upon locating a previously detected file with an unchanged modification date, no changes in the file are indicated and a new file is loaded as indicated in step 14. Upon locating a previously detected file with a changed modification date, a control program 28 operating in the source system 12 identifies at least one contiguous portion of the file which has changed ("revision element") . The identified revision elements, which may vary in size, are transmitted from the source system to the target
system, and may be written into the backup copy of the file or separately backed up.
The control program employs a sliding window 30 to produce digital signatures 32 from block data portions of the file in the source system to identify the revision elements. Each digital signature is a representation of the data characters within the window 30 when the signature is generated. For the first signature generated from a file, the signature is computed by employing all of the data within the window as indicated in step 34. The computed signature 32 is compared with previously stored digital signatures 36 to produce a result 38 that indicates whether there is a match therebetween as indicated in step 40. If a match is not detected then the window position is recorded and the revision element is saved as indicated in step 42. If the end of the file has been reached, as determined in step 44, a new file is selected as indicated in step 14. If the end of the file has not been reached, the window 30 is then advanced by one character as indicated in step 46. A digital signature is then computed as indicated in step 48. The digital signature is compared with the previously stored digital signatures to determine if there is a match therebetween as indicated in step 40. If a match is detected in step 40, a match indicator and the match position are recorded as indicated in step 50. If the end of file has not been reached, as determined in step 52, the window is advanced by the number of characters within the window, e.g. , one block, as indicated in step 54. When the entire file has been analyzed, a new file is loaded as indicated in step 14. Calculation of the digital signature is facilitated by employing any one of a class of functions ("F") that have the property of being both incrementally computable when the window 30 is advanced and providing a position sensitive digital signature 32. For this function, F(A+B) = F (A) +
F(B) . Further, given a concatenation operator "!" such that "0!A" indicates an array A with a 0 inserted before it, the function F also has the property that there is a function G such that F(0!A) = G(F(A!0)) . As such, given an array within the window that ends with 0, after shifting each of the bytes therein down by one position and inserting 0 at the beginning of the window, the signature 32 can be computed for this new array from the signature of the old array without reprocessing each of the bytes in the new array. Further, the digital signature produced by the function F is relatively more unique than known incremental functions, and allows detection of such differences as transposed bytes of data within the window. The digital signature is thus "position sensitive." The use of the function F provides improved performance in the file comparison system by reducing the number of calculations required to detect differences between sets of data such as files. The function enables computation of a position sensitive digital signature in a single step using only the new data 56 entering the window, the old data 58 leaving the window and the digital signature computed from the previous window. Hence, the performance is improved because relatively fewer calculations are required and a relatively more unique digital signature is provided. In one embodiment the function F is the polynomial:
F(An) = Sum (a1+n * 3 (w 1~1) ) modulo 264 for i = 0 to w-1, for an array A starting at index position n with a window size of w, where ax is an element of A. If the window is advanced in A, then the new function is computed as follows: F(An+1) = 3*F(An) - an * 3W x + an+w modulo 264. Computing the new function is generally faster than computing the original F (A,,) . Hence, file comparison is facilitated by use of this function.
In view of the present disclosure it will now be appreciated by those skilled in the art that functions other than the function illustrated in the above embodiment will provide digital signatures that are both incrementally computable and position sensitive. For example, the general class of functions known as Cyclic Redundancy Check ("CRC") functions, which are polynomials over GF(2) , are incremental functions and will provide position sensitive digital signatures .
Detection of Similar Files
Referring now to Figs. 1 and 3, in a first alternative embodiment the file comparison technique is employed to detect similar files having different filenames in order to further facilitate backup operations. When a previously detected file, such as a form letter 60, is modified and subsequently saved under a different file name, the new version 62 of the file is detected and only the revision elements of that new version are saved. To detect the new version of the file an "approximate comparison" is made between the new file and the previously stored digital signatures from the other files that have been created on the source system. The approximate comparison detects files that are similar, although not necessarily identical to, the new version of the file.
In one approximate comparison technique the control program maintains a list 64 of signatures for all source system files that have previously been backed up to the target system. A Match Count Table 66 is created with one row 68 for each file in the list of signatures. Each row in the Match Count Table is initialized to zero. The window 30 for signature computation in the new file is then positioned at the beginning of the file being examined.
In an initial step 68 a signature 32 for the characters
in the window 30 is computed with the function F. The computed signature is then compared against all of the signatures for the first "N" blocks of previously copied files in the list 64, where N is a predetermined integer value. If a match against at least one existing block signature is detected as indicated in step 70, then the row pointer 72 in the Match Count Table is incremented and the window 30 in the new file is moved to begin at the end 73 of the block for which the signature was just computed. If no match is detected at step 70 then the window 30 is advanced by one character as indicated in step 74. If comparison has not advanced beyond the first N blocks for the new file as determined in step 76, flow returns to step 68. Otherwise flow continues to step 78. As indicated in step 78, the file with the highest count in the Match Count Table is selected and a comparison algorithm is used to determine the set of changes between this file and the new file. The set of changes is then backed up along with a pointer to the original backup of the file. In the event of equal counts resulting in "ties" in the Match Count Table one of the files is arbitrarily selected.
Restore Optimization
Referring to Figs. 1 and 4-7, in a second alternative embodiment, restoration of revision elements is facilitated by a filter routine when compression and encryption are employed. The filter routine controls the encryption and compression engines on the source system. When the detected revision elements are compressed and encrypted for transmission to the target system, the compression and encryption engine is restarted with each revision element so that the target system can assemble a representation of the final file for transmission to the source system without decrypting the revision elements. Hence, it is not necessary
for the target system to have knowledge of the encryption key or compression algorithm.
Performance is further facilitated in the case where a plurality of relatively small revision elements are present by combining the small revision elements together and compressing the collection of revision elements in a single step. In particular, the compression engine is not restarted at each revision element, so the compression algorithm has a larger array to work on and hence is more efficient. The problems posed for this technique when encryption is employed are evident in Fig. 4, which shows version 0 ("V0") of the file, consisting of 7 blocks, and the revision elements for VI through V4. Fig. 5 shows the complete contents of V4. If a request for the restoration of V4 is made, and the target system cannot take the revision elements apart because they are all encrypted together, then the best transmission that can be made is to send a total of V0 (7 blocks) , UV2 (4 blocks) , UV3 (4 blocks) , UV4 (4 blocks) for a total of 19 blocks to restore a 7 block long file. UV1 is unnecessary because everything in it has been replaced by later revision elements .
Referring now to Fig. 6, an original backup and revision elements generated from updates are grouped for compression and encryption. Not all of the updates for a single version are packed together. Rather, the updates are grouped together into units of compression ("chunks") . The optimal set of chunks to be returned in the illustrated example is: A2B2, C3D3, D4E4, F4G4, whereby only eight blocks are sent for a seven block file, rather than nineteen blocks as in the previously described technique. However, in practice the updates will not always align perfectly and may have arbitrary overlaps .
File restoration is further illustrated by Tables 1-11. A file to be restored is stored in the archive system as a
series of updates and an archive. The archive contains all of the bytes of the file. The updates contain file change information that can be employed to update the previous version of the file to the current version. The update algorithm works by creating a list of updates that encompasses all of the bytes of the file. The algorithm first processes the most recent update and then works sequentially backwards through the set of updates, so that if a given byte of the file is already covered by an update, the algorithm will not employ an older update that contains that same byte . Addr
Table 1
Table 1 illustrates four versions of an example file. File version VI differs from version V0 in two places. In particular, version VI replaced two bytes at address 1 and inserted a byte at address 5. The update for VI is coded as follows :
Table 2
The update expressed in Table 2 is interpreted as :
1. Copy 1 byte from the previous file (A) at address 0 to address 0 of the new version (Target) of the file.
2. Place 2 bytes of newly supplied data (LQ) into the new file starting at address 1.
3. Copy 2 bytes from the previous file (DE) at address 3 to address 3 of the new file.
4. Place 1 byte of newly supplied data (M) into the new file at address 5.
5. Copy 5 bytes from the previous file (FGHIJ) to address 6 of new file.
Given the above described coding method, the other three versions of the updates shown in Table 1 are coded as shown in Tables 3-5 below.
V0
Tab e 3
V2
Ta le 5
A block filter algorithm may be employed to facilitate restoration of a desired version of a file. When multiple updates for a file have been archived, intermediate updates may be overwritten by later updates. The block filter algorithm is employed to recognize such situations and avoid needless computation. The Block Filter Algorithm determines an optimal set of updates. For example, providing file version V3 via the block filter algorithm includes creating an Update List of all of the version archives, ordered by version with the most recent at the head of the list. An empty output file and an empty GetData list are then created. The empty GetData list is a list of records that describes the data that is needed from each archive to create the output file. Entries are added to the end of the list and removed from the head of the list. Each record has the following four fields:
Version: the archive version to get the data from
Source Offset: the location within the archive file to get the data from
Length: the length of the data to be copied in
this operation Target: the location in the output file that this data will go to The following entry is placed at the head of the GetData list:
Version=most recent version of the file to be retrieved Source Offset=0 Length=length of output file Target=0
The GetData list is processed by initially removing an entry from the head of the list. The entries in the Update List for the specified version are then searched to locate a record in the Update List whose source range overlaps the target range of the GetData entry being processed. "Overlap" indicates that some portion of the range defined by Data Offset+Length defined in the GetData entry overlaps the range defined by the Target+Length in the UpdateList . If the record found in the Update List is a "Data" record, then the data is copied from that record to the output file. If the record found in the Update List is a "Copy" record, then a new record is added to the end of the GetData list with the version entry decremented. Processing of the GetData entry continues by scanning the Update List entries until the entire target range of the GetData entry has been covered with either Data or Copy entries from the Update List . Flow then return to the initial step at which processing of the next entry from the GetData list begins. Exemplary block filter algorithm execution steps are shown in Tables 5-8. Entry 1 in the GetData List shows a source range of 0-8 and a Version of 3. The Update List is searched for a Version 3 entry with a matching output range. The first such entry is Entry 1. This specifies Data with
a length of 1 (P) and it is placed in the output file and location 0, the target location. Entry 1 in the GetData List has been satisfied for source location 0, leaving source locations 1-8 to be satisfied. Entry 1 in the GetData List is processed for a source range of 1-8. A match is found in Update List entry. This is a copy operation, so an entry is added to the GetData List (entry 2) that specifies Version 2, the next lower version, and spans the range of overlap from GetData entry 1 and Update List entry 2, namely 1-3. Entry 1 of the GetData List is then processed for source locations 4-8. This match is found with Update List item 3, a copy operation. GetData entry 3 is added. This satisfies the entire range of Entry 1 from the GetData List. Entry 2 in the GetData List shows a source range of 1-3 and a Version of 2. The Update List is searched for a Version 2 entry with an output range matching this. Such an entry is found at Update List (4) . This is a copy operation so a new entry (4) is placed in the GetData List. Note that because a source range of 1-3 was sought, and the output range from the Update List entry was 0-5, the entry is made into the GetData List, the Data Offset range and Length are adjusted to capture only the piece of Update List entry 4 to reflect what was needed by the GetData List entry 2. Entry 2 from GetData is then completely satisfied. Entry 3 from GetData shows a source range of 5-9. The first location of this range is matched by Update List (4), which is a copy operation and results in a new entry to GetData (5) . This leaves GetData (3) with a source range of 6-9 to be processed. The GetData (3) source range of 6 is matched by Update List (5) which is a Data operation. The data (N) is copied to the output file at location 5. This is because the GetData (3) entry has a source range of 5-9 which maps to an output range of 4-8. Hence the source item at location 6, gets placed in the output file at location 5. The GetData (3) source range 7-9
is next found to match Update (6), a copy operation. This yields GetData (6) . The remaining items in the GetData list are processed in a similar manner until the list is empty. This will yield the complete output file, without ever having to process entries from the update list Data items that did not affect the final output file.
Update List
10
11
13
Table 6 Table 6a
GetData
Table 7
Table 7a
Output File
Table 8
Having described the preferred embodiments of the invention, other embodiments which incorporate the concepts of the invention will now become apparent to one of skill in the art. Therefore, the invention should not be viewed as limited to the disclosed embodiments but rather should viewed as limited only by the spirit and scope of the appended claims .
Claims
1. A system for comparing a first data set on a first storage device with a second data set on a second storage device, comprising: a transmission medium for transmitting data between the first storage device and the second storage device; a control program that generates a first digital signature from the first data set and a second digital signature from the second data set, the control program including a function F that incrementally calculates position sensitive first and second digital signatures; and a comparitor for determining whether the first digital signature matches the second digital signature.
2. The comparing system of claim 1 wherein polynomials are used to implement the function F.
3. The comparing system of claim 1 wherein a cyclic redundancy check is used to implement the function F.
4. The comparing system of claim 2 wherein the function F is a polynomial: F (A = Sum (a1+n * N '"-1-1') modulo 264 for i = 0 to w-1, for an array A starting at index position n with a window size of w, where a1 is an element of A.
5. The comparing system of claim 4 wherein, if the window is advanced in A, then a new function is computed as: F ( =╬╣) = N'F A - an * N"-1 + an+w modulo 264
6. A method for calculating a position sensitive digital signature from data on a first storage medium for comparison with a digital signature that represents data on a second storage medium, comprising the steps of: selecting the data in the first system with a sliding window; applying a function F to the data within the window to generate the position sensitive digital signature; and comparing the position sensitive digital signature with at least one digital signature representing data in the second system.
7. The method of claim 6 including the further step of employing a polynomials to implement the function F.
8. The method of claim 6 including the further step of employing a cyclic redundancy check to implement the function F.
9. The method of claim 7 including the further step of employing the function: F (A = Sum (ai+n * N ("-1-1') modulo 264 for i = 0 to w-1, for an array A starting at index position n with a window size of w, where a┬▒ is an element of A, to implement the function F.
10. The method of claim 9 including the further step of, if the window is advanced in A, computing a new position sensitive digital signature as: F (A^) = N*F (A - an * N_1 + an+w modulo 264
11. A method for providing an incremental backup of a first memory in a second memory where some files have previously been stored in the second memory, comprising the steps of: selecting a file from the first memory for examination; generating a signature from a portion of the file; comparing the generated signature with signatures generated from the previously stored files; determining the closest matching previously stored file, relative to the file under examination, where the stored file and the file under examination have different filenames, by identifying at least some portions of the stored file and the file under examination which are different; and storing the portions of the file under examination identified as being different from the closest matching file in the second memory.
12. The method of claim 11 including the further step of creating a match count table with a row corresponding to each respective previously stored file.
13. The method of claim 12 including the further step of comparing the generated signature with signatures generated for N blocks of the previously stored files, where N is a predetermined integer.
14. A method for determining a minimum set of data compression units for restoring a file from a base copy and a plurality of revision elements, comprising the steps of: selecting the base copy and revision elements required to build the file; sorting the selected base copy and revision elements into a list with the most recently generated revision element at the list head and the base copy at the list tail; reading information for each selected revision element into an array with five columns: chunk, operation, data offset, data length, and target offset; creating an output array with five columns: revision element pointer, chunk, data offset, data length, and target offset ; calling a recursive function for the most recent revision, requesting data offset 0 and the final file length, and passing in target offset 0, the recursive function iterating through the array columns and comparing the requested data offset and length with the target offset and data length of each item, and in the case of a match, writing an entry into the output array if the array item is a data operation; sorting the output array by revision element and chunk; and transmitting the array followed by the transmission blocks for each data block.
15. The method of claim 14 wherein FindData operates by iterating through the array items, and comparing the requested data offset and length with the target offset and data length of each item.
16. The method of claim 15 wherein, if a match or partial match is found, either writing an entry into the output array item or calling FindData again for the next list element with the offset equal to the item's data offset minus the target offset of each item plus the requested data offset, and the length equal to the remaining requested length and the item length.
17. The method of claim 16 wherein, if a partial match was found, incrementing the requested data offset and target offset and decrementing the requested length and continuing to iterate .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU61515/98A AU6151598A (en) | 1997-02-11 | 1998-02-10 | File comparison for data backup and file synchronization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US3759797P | 1997-02-11 | 1997-02-11 | |
US60/037,597 | 1997-02-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1998035306A1 true WO1998035306A1 (en) | 1998-08-13 |
Family
ID=21895203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/002434 WO1998035306A1 (en) | 1997-02-11 | 1998-02-10 | File comparison for data backup and file synchronization |
Country Status (3)
Country | Link |
---|---|
US (1) | US6101507A (en) |
AU (1) | AU6151598A (en) |
WO (1) | WO1998035306A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU763524B2 (en) * | 1999-03-02 | 2003-07-24 | Flexera Software Llc | Data file synchronisation |
US6829640B1 (en) * | 1999-12-23 | 2004-12-07 | International Business Machines Corporation | Method and system for creating a byte stream characteristic number suitable for change quantification |
EP1587007A2 (en) | 2004-04-15 | 2005-10-19 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
US7613787B2 (en) | 2004-09-24 | 2009-11-03 | Microsoft Corporation | Efficient algorithm for finding candidate objects for remote differential compression |
US7849462B2 (en) | 2005-01-07 | 2010-12-07 | Microsoft Corporation | Image server |
US8073926B2 (en) | 2005-01-07 | 2011-12-06 | Microsoft Corporation | Virtual machine image server |
US20180121525A1 (en) * | 2016-10-28 | 2018-05-03 | Microsoft Technology Licensing, Llc | Record profiling for dataset sampling |
US11256710B2 (en) | 2016-10-20 | 2022-02-22 | Microsoft Technology Licensing, Llc | String transformation sub-program suggestion |
US11620304B2 (en) | 2016-10-20 | 2023-04-04 | Microsoft Technology Licensing, Llc | Example management for string transformation |
Families Citing this family (110)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6604118B2 (en) * | 1998-07-31 | 2003-08-05 | Network Appliance, Inc. | File system image transfer |
US7174352B2 (en) * | 1993-06-03 | 2007-02-06 | Network Appliance, Inc. | File system image transfer |
US6088515A (en) | 1995-11-13 | 2000-07-11 | Citrix Systems Inc | Method and apparatus for making a hypermedium interactive |
JPH11167510A (en) * | 1997-12-04 | 1999-06-22 | Hitachi Ltd | Replication method, replication tool, and replication server |
US6119244A (en) | 1998-08-25 | 2000-09-12 | Network Appliance, Inc. | Coordinating persistent status information with multiple file servers |
US6411966B1 (en) * | 1998-09-21 | 2002-06-25 | Microsoft Corporation | Method and computer readable medium for DNS dynamic update to minimize client-server and incremental zone transfer traffic |
US6510552B1 (en) * | 1999-01-29 | 2003-01-21 | International Business Machines Corporation | Apparatus for keeping several versions of a file |
US6463427B1 (en) * | 1999-03-16 | 2002-10-08 | Microsoft Corporation | Use of object signature property as a search parameter during synchronization of objects on a computer |
US6901413B1 (en) * | 1999-03-19 | 2005-05-31 | Microsoft Corporation | Removing duplicate objects from an object store |
US6574657B1 (en) | 1999-05-03 | 2003-06-03 | Symantec Corporation | Methods and apparatuses for file synchronization and updating using a signature list |
US6654746B1 (en) | 1999-05-03 | 2003-11-25 | Symantec Corporation | Methods and apparatuses for single-connection file synchronization workgroup file update |
US6526418B1 (en) * | 1999-12-16 | 2003-02-25 | Livevault Corporation | Systems and methods for backing up data files |
CN1411580A (en) * | 2000-01-10 | 2003-04-16 | 连接公司 | Administration of differential backup system in client-server environment |
US7028251B2 (en) * | 2000-03-02 | 2006-04-11 | Iora, Ltd. | System and method for reducing the size of data difference representations |
US7203861B1 (en) | 2000-03-08 | 2007-04-10 | Cablynx, Inc. | Method and system for remotely backing up a computer memory utilizing a global communications network |
US6601216B1 (en) * | 2000-03-31 | 2003-07-29 | Microsoft Corporation | Differential cyclic redundancy check |
US6553388B1 (en) * | 2000-07-20 | 2003-04-22 | International Business Machines Corporation | Database deltas using Cyclic Redundancy Checks |
US8473478B2 (en) | 2000-09-21 | 2013-06-25 | Warren Roach | Automatic real-time file management method and apparatus |
US20030033303A1 (en) * | 2001-08-07 | 2003-02-13 | Brian Collins | System and method for restricting access to secured data |
US20040205587A1 (en) * | 2001-08-07 | 2004-10-14 | Draper Stephen P.W. | System and method for enumerating arbitrary hyperlinked structures in which links may be dynamically calculable |
US9563869B2 (en) * | 2010-09-14 | 2017-02-07 | Zonar Systems, Inc. | Automatic incorporation of vehicle data into documents captured at a vehicle using a mobile computing device |
US20030101167A1 (en) * | 2001-11-29 | 2003-05-29 | International Business Machines Corporation | File maintenance on a computer grid |
US7631184B2 (en) * | 2002-05-14 | 2009-12-08 | Nicholas Ryan | System and method for imposing security on copies of secured items |
US8006280B1 (en) | 2001-12-12 | 2011-08-23 | Hildebrand Hal S | Security system for generating keys from access rules in a decentralized manner and methods therefor |
US7921284B1 (en) | 2001-12-12 | 2011-04-05 | Gary Mark Kinghorn | Method and system for protecting electronic data in enterprise environment |
US8065713B1 (en) | 2001-12-12 | 2011-11-22 | Klimenty Vainstein | System and method for providing multi-location access management to secured items |
US7921450B1 (en) | 2001-12-12 | 2011-04-05 | Klimenty Vainstein | Security system using indirect key generation from access rules and methods therefor |
US10360545B2 (en) | 2001-12-12 | 2019-07-23 | Guardian Data Storage, Llc | Method and apparatus for accessing secured electronic data off-line |
USRE41546E1 (en) | 2001-12-12 | 2010-08-17 | Klimenty Vainstein | Method and system for managing security tiers |
US10033700B2 (en) | 2001-12-12 | 2018-07-24 | Intellectual Ventures I Llc | Dynamic evaluation of access rights |
US7562232B2 (en) | 2001-12-12 | 2009-07-14 | Patrick Zuili | System and method for providing manageability to security information for secured items |
US7681034B1 (en) | 2001-12-12 | 2010-03-16 | Chang-Ping Lee | Method and apparatus for securing electronic data |
US7478418B2 (en) | 2001-12-12 | 2009-01-13 | Guardian Data Storage, Llc | Guaranteed delivery of changes to security policies in a distributed system |
US7380120B1 (en) | 2001-12-12 | 2008-05-27 | Guardian Data Storage, Llc | Secured data format for access control |
US7921288B1 (en) | 2001-12-12 | 2011-04-05 | Hildebrand Hal S | System and method for providing different levels of key security for controlling access to secured items |
US7930756B1 (en) | 2001-12-12 | 2011-04-19 | Crocker Steven Toye | Multi-level cryptographic transformations for securing digital assets |
US7260555B2 (en) | 2001-12-12 | 2007-08-21 | Guardian Data Storage, Llc | Method and architecture for providing pervasive security to digital assets |
US7178033B1 (en) | 2001-12-12 | 2007-02-13 | Pss Systems, Inc. | Method and apparatus for securing digital assets |
US7565683B1 (en) | 2001-12-12 | 2009-07-21 | Weiqing Huang | Method and system for implementing changes to security policies in a distributed security system |
US7950066B1 (en) | 2001-12-21 | 2011-05-24 | Guardian Data Storage, Llc | Method and system for restricting use of a clipboard application |
US8176334B2 (en) | 2002-09-30 | 2012-05-08 | Guardian Data Storage, Llc | Document security system that permits external users to gain access to secured files |
JP4205350B2 (en) * | 2002-02-28 | 2009-01-07 | 富士通株式会社 | DIFFERENTIAL DATA GENERATION METHOD, PROGRAM, RECORDING MEDIUM, AND DEVICE |
US7668901B2 (en) * | 2002-04-15 | 2010-02-23 | Avid Technology, Inc. | Methods and system using a local proxy server to process media data for local area users |
US7748045B2 (en) | 2004-03-30 | 2010-06-29 | Michael Frederick Kenrich | Method and system for providing cryptographic document retention with off-line access |
US20050071657A1 (en) * | 2003-09-30 | 2005-03-31 | Pss Systems, Inc. | Method and system for securing digital assets using time-based security criteria |
US8613102B2 (en) | 2004-03-30 | 2013-12-17 | Intellectual Ventures I Llc | Method and system for providing document retention using cryptography |
US6857001B2 (en) * | 2002-06-07 | 2005-02-15 | Network Appliance, Inc. | Multiple concurrent active file systems |
US7512810B1 (en) | 2002-09-11 | 2009-03-31 | Guardian Data Storage Llc | Method and system for protecting encrypted files transmitted over a network |
US20040068523A1 (en) * | 2002-10-07 | 2004-04-08 | Keith Robert Olan | Method and system for full asynchronous master-to-master file synchronization |
US7836310B1 (en) | 2002-11-01 | 2010-11-16 | Yevgeniy Gutnik | Security system that uses indirect password-based encryption |
US7716312B2 (en) | 2002-11-13 | 2010-05-11 | Avid Technology, Inc. | Method and system for transferring large data files over parallel connections |
US7577838B1 (en) | 2002-12-20 | 2009-08-18 | Alain Rossmann | Hybrid systems for securing digital assets |
US7890990B1 (en) | 2002-12-20 | 2011-02-15 | Klimenty Vainstein | Security system with staging capabilities |
US7055008B2 (en) * | 2003-01-22 | 2006-05-30 | Falconstor Software, Inc. | System and method for backing up data |
US20040179228A1 (en) * | 2003-03-10 | 2004-09-16 | Mccluskey Mark | Indication of image content modification |
GB0305828D0 (en) * | 2003-03-14 | 2003-04-16 | Ibm | Real time xml data update identification |
US7320009B1 (en) | 2003-03-28 | 2008-01-15 | Novell, Inc. | Methods and systems for file replication utilizing differences between versions of files |
EP1623300A2 (en) * | 2003-05-14 | 2006-02-08 | Rhysome, Inc. | Method and system for reducing information latency in a business enterprise |
US8707034B1 (en) | 2003-05-30 | 2014-04-22 | Intellectual Ventures I Llc | Method and system for using remote headers to secure electronic files |
US7730543B1 (en) | 2003-06-30 | 2010-06-01 | Satyajit Nath | Method and system for enabling users of a group shared across multiple file security systems to access secured files |
US7555558B1 (en) | 2003-08-15 | 2009-06-30 | Michael Frederick Kenrich | Method and system for fault-tolerant transfer of files across a network |
US7143117B2 (en) * | 2003-09-25 | 2006-11-28 | International Business Machines Corporation | Method, system, and program for data synchronization by determining whether a first identifier for a portion of data at a first source and a second identifier for a portion of corresponding data at a second source match |
US8127366B2 (en) | 2003-09-30 | 2012-02-28 | Guardian Data Storage, Llc | Method and apparatus for transitioning between states of security policies used to secure electronic documents |
US7703140B2 (en) | 2003-09-30 | 2010-04-20 | Guardian Data Storage, Llc | Method and system for securing digital assets using process-driven security policies |
US7472254B2 (en) * | 2003-10-10 | 2008-12-30 | Iora, Ltd. | Systems and methods for modifying a set of data objects |
US7685384B2 (en) * | 2004-02-06 | 2010-03-23 | Globalscape, Inc. | System and method for replicating files in a computer network |
US20060047855A1 (en) * | 2004-05-13 | 2006-03-02 | Microsoft Corporation | Efficient chunking algorithm |
US20050234961A1 (en) * | 2004-04-16 | 2005-10-20 | Pinnacle Systems, Inc. | Systems and Methods for providing a proxy for a shared file system |
US20050256974A1 (en) * | 2004-05-13 | 2005-11-17 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression on a remote device |
US20050262167A1 (en) * | 2004-05-13 | 2005-11-24 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression on a local device |
US20060004890A1 (en) * | 2004-06-10 | 2006-01-05 | International Business Machines Corporation | Methods and systems for providing directory services for file systems |
US7484051B2 (en) * | 2004-06-14 | 2009-01-27 | International Business Machines Corporation | Apparatus, system and method for reliably updating a data group in a read-before-write data replication environment using a comparison file |
US7580959B2 (en) * | 2004-06-14 | 2009-08-25 | International Business Machines Corporation | Apparatus, system, and method for providing efficient disaster recovery storage of data using differencing |
US7707427B1 (en) | 2004-07-19 | 2010-04-27 | Michael Frederick Kenrich | Multi-level file digests |
US8725705B2 (en) | 2004-09-15 | 2014-05-13 | International Business Machines Corporation | Systems and methods for searching of storage data with reduced bandwidth requirements |
US7523098B2 (en) * | 2004-09-15 | 2009-04-21 | International Business Machines Corporation | Systems and methods for efficient data searching, storage and reduction |
US20070094348A1 (en) * | 2005-01-07 | 2007-04-26 | Microsoft Corporation | BITS/RDC integration and BITS enhancements |
US7546322B2 (en) * | 2005-03-09 | 2009-06-09 | International Business Machines Corporation | Generating unique name/version number pairs when names can be re-used |
US7098815B1 (en) | 2005-03-25 | 2006-08-29 | Orbital Data Corporation | Method and apparatus for efficient compression |
US7774320B1 (en) * | 2005-04-01 | 2010-08-10 | Apple Inc. | Verifying integrity of file system data structures |
US20060230349A1 (en) * | 2005-04-06 | 2006-10-12 | Microsoft Corporation | Coalesced per-file device synchronization status |
GB2425623A (en) * | 2005-04-27 | 2006-11-01 | Clearswift Ltd | Tracking marked documents |
ATE504878T1 (en) * | 2005-10-12 | 2011-04-15 | Datacastle Corp | DATA BACKUP METHOD AND SYSTEM |
US20070139189A1 (en) * | 2005-12-05 | 2007-06-21 | Helmig Kevin S | Multi-platform monitoring system and method |
US7447854B1 (en) | 2005-12-30 | 2008-11-04 | Vmware, Inc. | Tracking and replicating changes to a virtual disk |
US8041641B1 (en) * | 2006-12-19 | 2011-10-18 | Symantec Operating Corporation | Backup service and appliance with single-instance storage of encrypted data |
US8095976B2 (en) * | 2007-03-15 | 2012-01-10 | Broadcom Corporation | Data excess protection |
US8209540B2 (en) * | 2007-06-28 | 2012-06-26 | Apple Inc. | Incremental secure backup and restore of user settings and data |
US20090064134A1 (en) * | 2007-08-30 | 2009-03-05 | Citrix Systems,Inc. | Systems and methods for creating and executing files |
US8894731B2 (en) * | 2007-10-01 | 2014-11-25 | Saint-Gobain Abrasives, Inc. | Abrasive processing of hard and /or brittle materials |
CN102076462B (en) * | 2008-07-02 | 2013-01-16 | 圣戈班磨料磨具有限公司 | Abrasive slicing tool for electronics industry |
CN101419616A (en) * | 2008-12-10 | 2009-04-29 | 阿里巴巴集团控股有限公司 | Data synchronization method and apparatus |
US8326813B2 (en) * | 2010-01-20 | 2012-12-04 | Siemens Product Lifecycle Management Software, Inc. | System and method for data management |
US8332420B2 (en) * | 2010-01-20 | 2012-12-11 | Siemens Product Lifecycle Management Software Inc. | System and method for performing a database query |
US8290830B2 (en) | 2010-04-07 | 2012-10-16 | Siemens Product Lifecycle Management Software Inc. | System and method for visualization and comparison of physical assets using engineering design data |
US20110296305A1 (en) * | 2010-06-01 | 2011-12-01 | Sony Corporation | Methods and apparatus for media management |
EP2455922B1 (en) | 2010-11-17 | 2018-12-05 | Inside Secure | NFC transaction method and system |
US8566336B2 (en) * | 2011-03-30 | 2013-10-22 | Splunk Inc. | File identification management and tracking |
US9600513B2 (en) | 2011-06-09 | 2017-03-21 | International Business Machines Corporation | Database table comparison |
GB201115083D0 (en) * | 2011-08-31 | 2011-10-19 | Data Connection Ltd | Identifying data items |
US8458228B2 (en) | 2011-09-23 | 2013-06-04 | Siemens Product Lifecycle Management Software Inc. | Occurrence management in product data management systems |
US8533237B2 (en) | 2011-09-23 | 2013-09-10 | Siemens Product Lifecycle Management Software Inc. | Data structure partitioning in product data management systems |
US9122740B2 (en) | 2012-03-13 | 2015-09-01 | Siemens Product Lifecycle Management Software Inc. | Bulk traversal of large data structures |
US9652495B2 (en) | 2012-03-13 | 2017-05-16 | Siemens Product Lifecycle Management Software Inc. | Traversal-free updates in large data structures |
US9256765B2 (en) * | 2012-06-29 | 2016-02-09 | Kip Sign P1 Lp | System and method for identifying software changes |
JP5954738B2 (en) * | 2013-03-19 | 2016-07-20 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Computer, system, method and program for performing file backup processing |
US9547657B2 (en) | 2014-02-18 | 2017-01-17 | Black Duck Software, Inc. | Methods and systems for efficient comparison of file sets |
US10256977B2 (en) | 2014-02-18 | 2019-04-09 | Synopsys, Inc. | Methods and systems for efficient representation of file sets |
US11204842B2 (en) * | 2017-11-22 | 2021-12-21 | Acronis International Gmbh | System and method for automating formation and execution of a backup strategy using machine learning |
US11200208B2 (en) * | 2020-01-09 | 2021-12-14 | StreamSets, Inc. | Removing non-deterministic behavior in a change data capture merge |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4897785A (en) * | 1984-09-12 | 1990-01-30 | Bbc Brown, Boveri & Company, Limited | Search system for locating values in a table using address compare circuit |
US5347652A (en) * | 1991-06-26 | 1994-09-13 | International Business Machines Corporation | Method and apparatus for saving and retrieving functional results |
US5428629A (en) * | 1990-11-01 | 1995-06-27 | Motorola, Inc. | Error check code recomputation method time independent of message length |
US5479654A (en) * | 1990-04-26 | 1995-12-26 | Squibb Data Systems, Inc. | Apparatus and method for reconstructing a file from a difference signature and an original file |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5454099A (en) * | 1989-07-25 | 1995-09-26 | International Business Machines Corporation | CPU implemented method for backing up modified data sets in non-volatile store for recovery in the event of CPU failure |
US5276860A (en) * | 1989-12-19 | 1994-01-04 | Epoch Systems, Inc. | Digital data processor with improved backup storage |
US5276867A (en) * | 1989-12-19 | 1994-01-04 | Epoch Systems, Inc. | Digital data storage system with improved data migration |
US5893117A (en) * | 1990-08-17 | 1999-04-06 | Texas Instruments Incorporated | Time-stamped database transaction and version management system |
US5210866A (en) * | 1990-09-12 | 1993-05-11 | Storage Technology Corporation | Incremental disk backup system for a dynamically mapped data storage subsystem |
JPH04181423A (en) * | 1990-11-16 | 1992-06-29 | Fujitsu Ltd | Version control system |
US5214696A (en) * | 1992-02-04 | 1993-05-25 | International Business Machines Corporation | Data processing system and method to produce softcopy book readers which are limited to reading only books published by a specific publisher |
US5263154A (en) * | 1992-04-20 | 1993-11-16 | International Business Machines Corporation | Method and system for incremental time zero backup copying of data |
US5642496A (en) * | 1993-09-23 | 1997-06-24 | Kanfi; Arnon | Method of making a backup copy of a memory over a plurality of copying sessions |
US5446888A (en) * | 1994-01-14 | 1995-08-29 | Pyne; Charles F. | Remote file transfer method and apparatus |
US5668897A (en) * | 1994-03-15 | 1997-09-16 | Stolfo; Salvatore J. | Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases |
US5544255A (en) * | 1994-08-31 | 1996-08-06 | Peripheral Vision Limited | Method and system for the capture, storage, transport and authentication of handwritten signatures |
JP2719761B2 (en) * | 1995-02-24 | 1998-02-25 | パイオニア株式会社 | Data storage device |
GB9506501D0 (en) * | 1995-03-30 | 1995-05-17 | Int Computers Ltd | Incremental disk backup |
US5628012A (en) * | 1995-05-16 | 1997-05-06 | Lucent Technologies Inc. | Method and apparatus for querying a database containing disjunctive information |
US5768582A (en) * | 1995-06-07 | 1998-06-16 | International Business Machines Corporation | Computer program product for domained incremental changes storage and retrieval |
JP3856855B2 (en) * | 1995-10-06 | 2006-12-13 | 三菱電機株式会社 | Differential backup method |
US5745906A (en) * | 1995-11-14 | 1998-04-28 | Deltatech Research, Inc. | Method and apparatus for merging delta streams to reconstruct a computer file |
US5893113A (en) * | 1996-04-25 | 1999-04-06 | Navigation Technologies Corporation | Update transactions and method and programming for use thereof for incrementally updating a geographic database |
US5909677A (en) * | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US5881292A (en) * | 1996-09-26 | 1999-03-09 | Microsoft Corporation | Dynamic versioning system for multiple users of multi-module software system |
US5802521A (en) * | 1996-10-07 | 1998-09-01 | Oracle Corporation | Method and apparatus for determining distinct cardinality dual hash bitmaps |
US5794254A (en) * | 1996-12-03 | 1998-08-11 | Fairbanks Systems Group | Incremental computer file backup using a two-step comparison of first two characters in the block and a signature with pre-stored character and signature sets |
-
1998
- 1998-02-10 AU AU61515/98A patent/AU6151598A/en not_active Abandoned
- 1998-02-10 WO PCT/US1998/002434 patent/WO1998035306A1/en active Application Filing
- 1998-02-10 US US09/021,705 patent/US6101507A/en not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4897785A (en) * | 1984-09-12 | 1990-01-30 | Bbc Brown, Boveri & Company, Limited | Search system for locating values in a table using address compare circuit |
US5479654A (en) * | 1990-04-26 | 1995-12-26 | Squibb Data Systems, Inc. | Apparatus and method for reconstructing a file from a difference signature and an original file |
US5428629A (en) * | 1990-11-01 | 1995-06-27 | Motorola, Inc. | Error check code recomputation method time independent of message length |
US5347652A (en) * | 1991-06-26 | 1994-09-13 | International Business Machines Corporation | Method and apparatus for saving and retrieving functional results |
Non-Patent Citations (2)
Title |
---|
GUTMAN M.: "A METHOD FOR UPDATING A CYCLIC REDUNDANCY CODE.", IEEE TRANSACTIONS ON COMMUNICATIONS., IEEE SERVICE CENTER, PISCATAWAY, NJ. USA., vol. 40., no. 06., 1 June 1992 (1992-06-01), PISCATAWAY, NJ. USA., pages 989 - 991., XP002912524, ISSN: 0090-6778, DOI: 10.1109/26.142787 * |
RAMABADRAN T. V., ET AL.: "A TUTORIAL ON CRC COMPUTATIONS.", IEEE MICRO., IEEE SERVICE CENTER, LOS ALAMITOS, CA., US, vol. 08., no. 04., 1 August 1988 (1988-08-01), US, pages 62 - 75., XP002912523, ISSN: 0272-1732, DOI: 10.1109/40.7773 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU763524B2 (en) * | 1999-03-02 | 2003-07-24 | Flexera Software Llc | Data file synchronisation |
US6636872B1 (en) | 1999-03-02 | 2003-10-21 | Managesoft Corporation Limited | Data file synchronization |
US6829640B1 (en) * | 1999-12-23 | 2004-12-07 | International Business Machines Corporation | Method and system for creating a byte stream characteristic number suitable for change quantification |
CN102170455A (en) * | 2004-04-15 | 2011-08-31 | 微软公司 | Method and system for updating objects between a local device and a remote device |
EP1587007A3 (en) * | 2004-04-15 | 2007-04-18 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
US7555531B2 (en) | 2004-04-15 | 2009-06-30 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
CN1684464B (en) * | 2004-04-15 | 2011-07-27 | 微软公司 | Method and system for updating object between local device and remote device |
EP1587007A2 (en) | 2004-04-15 | 2005-10-19 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
US7613787B2 (en) | 2004-09-24 | 2009-11-03 | Microsoft Corporation | Efficient algorithm for finding candidate objects for remote differential compression |
US7849462B2 (en) | 2005-01-07 | 2010-12-07 | Microsoft Corporation | Image server |
US8073926B2 (en) | 2005-01-07 | 2011-12-06 | Microsoft Corporation | Virtual machine image server |
US11256710B2 (en) | 2016-10-20 | 2022-02-22 | Microsoft Technology Licensing, Llc | String transformation sub-program suggestion |
US11620304B2 (en) | 2016-10-20 | 2023-04-04 | Microsoft Technology Licensing, Llc | Example management for string transformation |
US20180121525A1 (en) * | 2016-10-28 | 2018-05-03 | Microsoft Technology Licensing, Llc | Record profiling for dataset sampling |
US10846298B2 (en) * | 2016-10-28 | 2020-11-24 | Microsoft Technology Licensing, Llc | Record profiling for dataset sampling |
Also Published As
Publication number | Publication date |
---|---|
AU6151598A (en) | 1998-08-26 |
US6101507A (en) | 2000-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6101507A (en) | File comparison for data backup and file synchronization | |
US9128940B1 (en) | Method and apparatus for performing file-level restoration from a block-based backup file stored on a sequential storage device | |
US7134041B2 (en) | Systems and methods for data backup over a network | |
US7478113B1 (en) | Boundaries | |
US6665815B1 (en) | Physical incremental backup using snapshots | |
US7765160B2 (en) | System and method for backing up data | |
US9904601B2 (en) | Synchronization of storage using comparisons of fingerprints of blocks | |
US6073128A (en) | Method and apparatus for identifying files used to restore a file | |
US20200412525A1 (en) | Blockchain filesystem | |
US8028138B2 (en) | Replication of deduplicated storage system | |
US6049874A (en) | System and method for backing up computer files over a wide area computer network | |
EP1333375B1 (en) | Software patch generator | |
US7506010B2 (en) | Storing and retrieving computer data files using an encrypted network drive file system | |
US5479654A (en) | Apparatus and method for reconstructing a file from a difference signature and an original file | |
CN105009067B (en) | Managing operations on units of stored data | |
EP1962209A2 (en) | Systems and methods for searching and storage of data | |
US9009202B2 (en) | Garbage collection for merged collections | |
US20070220222A1 (en) | Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application | |
JPH1153240A (en) | Data backup device and method for computer, and computer-readable recording medium recurred with data backup program | |
US10380141B1 (en) | Fast incremental backup method and system | |
US20110069833A1 (en) | Efficient near-duplicate data identification and ordering via attribute weighting and learning | |
KR102275240B1 (en) | Managing operations on stored data units | |
US8126852B1 (en) | Merged collections | |
JP4768009B2 (en) | How to store less redundant data using a data cluster | |
KR101623508B1 (en) | System and Method for Recovery of Deleted Event Log Files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA JP NO NZ |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
CFP | Corrected version of a pamphlet front page |
Free format text: REVISED ABSTRACT RECEIVED BY THE INTERNATIONAL BUREAU AFTER COMPLETION OF THE TECHNICAL PREPARATIONS FOR INTERNATIONAL PUBLICATION |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: CA |
|
122 | Ep: pct application non-entry in european phase |