WO2001082078A2 - Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances - Google Patents

Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances Download PDF

Info

Publication number
WO2001082078A2
WO2001082078A2 PCT/US2001/012861 US0112861W WO0182078A2 WO 2001082078 A2 WO2001082078 A2 WO 2001082078A2 US 0112861 W US0112861 W US 0112861W WO 0182078 A2 WO0182078 A2 WO 0182078A2
Authority
WO
WIPO (PCT)
Prior art keywords
network
configuration data
configuration
scm
storage
Prior art date
Application number
PCT/US2001/012861
Other languages
French (fr)
Other versions
WO2001082078A3 (en
WO2001082078A9 (en
Inventor
Ben H. Mc Millan, Jr.
Original Assignee
Ciprico, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ciprico, Inc. filed Critical Ciprico, Inc.
Priority to AU2001255523A priority Critical patent/AU2001255523A1/en
Publication of WO2001082078A2 publication Critical patent/WO2001082078A2/en
Publication of WO2001082078A3 publication Critical patent/WO2001082078A3/en
Publication of WO2001082078A9 publication Critical patent/WO2001082078A9/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/024Standardisation; Integration using relational databases for representation of network management data, e.g. managing via structured query language [SQL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/085Retrieval of network configuration; Tracking network configuration history
    • H04L41/0853Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0866Checking the configuration
    • H04L41/0869Validating the configuration within one network element
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors

Definitions

  • the invention relates to network appliances and, more particularly, the invention relates to a method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances.
  • Network appliances may include a general purpose computer that executes particular software to perform a specific network task, such as file server services, domain name services, data storage services, and the like. Because these network appliances have become important to the day-to-day operation of a network, the appliances are generally required to be fault-tolerant. Typically, fault tolerance is accomplished by using redundant appliances, such that, if one appliance becomes disabled, another appliance takes over its duties on the network. However, the process for transferring operations from one appliance to another leads to a loss of network information. For instance, if a pair of redundant data storage units are operating on a network and one unit fails, the second unit needs to immediately perform the duties of the failed unit.
  • the delay in transitioning from one storage unit to another may cause a loss of some data.
  • One factor in performing a rapid recovery of network appliances after a failure is to utilize identical configuration data in the recovered appliance as was used prior to the fault. Generally this requires booting the recovered network appliance using default configuration data and then altering the data after the recovered network appliance is operational. Such boot up using default data and then amending the data is time consuming and results in a slow recovery process.
  • the apparatus comprises a pair of network appliances coupled to a network.
  • the appliances interact with one another to detect a failure in one appliance and instantly transition operations from the failed appliance to a functional appliance.
  • Each appliance monitors the status of another appliance using multiple, redundant communication channels.
  • the configuration data of the network appliances is distributed and stored in a redundant manner in storage system that is accessible to all the network appliances.
  • the configuration data is updated on a regular basis while the network appliances are operational .
  • the appliance accesses the stored configuration data such that the appliance may boot-up to operate as if a failure never occurred.
  • the apparatus comprises a pair of storage controller modules (SCM) that are coupled to a storage pool, i.e., one or more data storage arrays.
  • the storage controller modules are coupled to a host network (or local area network (LAN) ) .
  • the network comprises a plurality of client computers that are interconnected by the network.
  • Each SCM comprises a database management module that maintains a shared replicated configuration database wherein the configuration data for one or more SCMs is stored.
  • the database is stored in a replicated manner within the storage pool.
  • each SCM stores its own configuration data locally on a disk drive within the SCM.
  • FIG. 1 depicts a block diagram of one embodiment of the present invention
  • FIG. 2 depicts a functional block diagram of the status monitoring system of the pair of storage controller modules
  • FIG. 3 depicts a flow of a configuration transaction process .
  • identical reference numerals have been used, where possible, to designate identical elements that are common to the figures .
  • One embodiment of the invention is a modular, high- performance, highly scalable, highly available, fault tolerant network appliance that is illustratively embodied in a data storage system that maintains a redundant configuration database to facilitate rapid failure recovery.
  • FIG. 1 depicts a data processing system 50 comprising a plurality of client computers 102, 104, and 106, a host network 130, and a storage system 100.
  • the storage system 100 comprises a plurality of network appliances 108 and 110 and a storage pool 112.
  • the plurality of clients comprise one or more of a network attached storage (NAS) client 102, a direct attached storage (DAS) client 104 and a storage area network (SAN) client 106.
  • the plurality of network appliances 108 and 110 comprise a storage controller module A (SCM A) 108 and storage controller module B (SCM B) 110.
  • the storage pool 112 is coupled to the storage controller modules 108, 110 via a fiber channel network 114.
  • One embodiment of the storage pool 112 comprises a pair of storage arrays 116, 118 that are coupled to the fiber channel network 114 via a pair of fiber channel switches 124, 126 and a communications gateway 120, 122.
  • a tape library 128 is also provided for storage backup.
  • the DAS client directly accesses the storage pool 112 via the fiber channel network 114, while the SAN client accesses the storage pool 112 via both the LAN 130 and the fiber channel network 114.
  • the SAN client 104 communicates via the LAN with the SCMs 108, 110 to request access to the storage pool 112.
  • the SCMs inform the SAN client 104 where in the storage arrays the requested data is located or where the data from the SAN client is to be stored.
  • the SAN client 104 then directly accesses a storage array using the location information provided by the SCMs.
  • the NAS client 106 only communicates with the storage pool 112 via the SCMs 108, 110.
  • a fiber channel network is depicted as one way of connecting the SCMs 108, 110 to the storage pool 112, the connection may be accomplished using any form of data network protocol such as SCSI, HIPPI and the like.
  • the storage system is a hierarchy of system components that are connected together within the framework established by the system architecture.
  • the major active system level components are:
  • Fibre channel switches are Fibre channel switches, hubs, routers and gateways
  • the system architecture provides an environment in which each of the storage components that comprise the storage system embodiment of the invention operate and interact to form a cohesive storage system.
  • the architecture is centered around a pair of SCMs 108 and 110 that provide storage management functions .
  • the SCMs are connected to a host network that allows the network community to access the services offered by the SCMs 108, 110.
  • Each SCM 108, 110 is connected to the same set of networks. This allows one SCM to provide the services of the other SCM in the event that one of the SCMs becomes faulty.
  • Each SCM 108, 110 has access to the entire storage pool 112.
  • the storage pool is logically divided by assigning a particular storage device (array 116 or 118) to one of the SCMs 108, 110.
  • a storage device 116 or 118 is only assigned to one SCM 108 or 110 at a time.
  • both SCMs 108, 110 are connected to the entirety of the storage pool 112, the storage devices 116, 118 assigned to a faulted SCM can be accessed by the remaining SCM to provide its services to the network community on behalf of the faulted SCM.
  • the SCMs communicate with one another via the host networks. Since each SCM 108, 110 is connected to the same set of physical networks as the other, they are able to communicate with each other over these same links. These links allow the SCMs to exchange configuration information with each other and synchronize their operation.
  • the host network 130 is the medium through which the storage system communicates with the clients 104 and 106.
  • the SCMs 108, 110 provide network services such as NFS and HTTP to the clients 104, 106 that reside on the host network 130.
  • the host network 130 runs network protocols through which the various services are offered. These may include TCP/IP, UDP/IP, ARP, SNMP, NFS, CIFS, HTTP, NDMP, and the like.
  • each SCM From an SCM point of view, its front-end interfaces are network ports running file protocols .
  • the back-end interface of each SCM provides channel ports running raw block access protocols.
  • the SCMs 108, 110 accept network requests from the various clients and process them according to the command issued.
  • the main function of the SCM is to act as a network-attached storage (NAS) device. It therefore communicates with the clients using file protocols such as NFSv2, NFSv3, SMB/CIFS, and HTTP.
  • the SCM converts these file protocol requests into logical block requests suitable for use by a direct-attach storage device.
  • the storage array on the back-end is a direct-attach disk array controller with RAID and caching technologies.
  • the storage array accepts the logical block requests issued to a logical volume set and converts it into a set of member disk requests suitable for a disk drive.
  • the redundant SCMs both are connected to the same set of networks. This allows either of the SCMs to respond to the IP address of the other SCM in the event of failure of one of the SCMs.
  • the SCMs support lOBaseT, 100BaseT, and gigabit Ethernet.
  • the SCMs can communicate with each other through a dedicated inter-SCM network 132 as a primary means of inter-SCM communications. This dedicated connection can employ gigabit Ethernet or fiber channel. In the event of the failure of this link 132, the host network 130 may be used as a backup network.
  • the SCMs 108, 110 connect to the storage arrays 116, 118 through parallel differential SCSI (not shown) or a fiber channel network 114. Each SCM 108, 110 may be connected through their own private SCSI connection to one of the ports on the storage array.
  • the storage arrays 116, 118 provide a high availability mechanism for RAID management. Each of the storage arrays provides a logical volume view of the storage to a respective SCM. The SCM does not have to perform any volume management .
  • Each SCM 108, 110 comprises a local memory device 150, 154 that contains configuration data 152 and 156 (referred to herein as the local configuration data) .
  • This configuration data is also stored in a replicated configuration database 162 containing configuration data 158 and 160.
  • the shared replicated database 162 is considered the primary source of configuration data for either SCM 108, 110.
  • an SCM by executing a configuration transaction control module (CTCM) 250, ascertains whether the local configuration data is correct. If the data is incorrect, (i.e., not the latest version or contains corrupt information) , the configuration data is retrieved from the database 162.
  • CTCM configuration transaction control module
  • the UPS 134 provides a temporary secondary source of AC power source in the event the primary source fails.
  • FIG. 2 depicts an embodiment of the invention having the SCMs 108, 110 coupled to the storage arrays 116, 118 via SCSI connections 200.
  • Each storage array 116, 118 comprises an array controller 202, 204 coupled to a disk enclosure 206, 208.
  • the array controllers 202, 204 support RAID techniques to facilitate redundant, fault tolerant storage of data.
  • the SCMs 108, 110 are connected to both the host network 130 and to array controllers 202, 204.
  • Every host network interface card (NIC) 210 connections on one SCM is duplicated on the other. This allows a SCM to assume the IP address of the other on every network in the event of a SCM failure.
  • One of the NICs 212 in each SCM 108, 110 is dedicated for communications between the two SCMs .
  • each SCM 108, 110 is connected to an array controller 202, 204 through its own host SCSI port 214. All volumes in each of the storage arrays 202, 204 are dual-ported through SCSI ports 216 so that access to any volume is available to both SCMs 108, 110.
  • the SCM 108, 110 is based on a general purpose computer (PC) such as a ProLiant 1850R manufactured by COMPAQ Computer Corporation. This product is a Pentium PC platform mounted in a 3U 19 " rack-mount enclosure .
  • the SCM comprises a plurality of network interface controls 210, 212, a central processing unit (CPU) 218, a memory unit 220, support circuits 222 and SCSI ports 214. Communication amongst the SCM components is supported by a PCI bus 224.
  • the SCM employs, as a support circuit 222, dual hot-pluggable power supplies with separate AC power connections and contains three f ns. (One fan resides in each of the two power supplies) .
  • the SCM is, for example, based on the Pentium III architecture running at 600 MHz and beyond.
  • the PC has 4 horizontal mount 32-bit 33 MHz PCI slots.
  • the PC comes equipped with 128 MB of 100 MHz SDRAM standard and is upgradable to 1 GB.
  • a Symbios 53c8xx series chipset resides on the 1850R motherboard that can be used to access the boot drive.
  • the SCM boots off the internal hard drive (also part of the memory unit 220) .
  • the internal drive is, for example, a SCSI drive and provides at least 1 GB of storage.
  • the internal boot device must be able to hold the SCSI executable image, a mountable file system with all the configuration files, HTML documentation, and the storage administration application. This information may consume anywhere from 20 to 50 MB of disk space.
  • the SCM's 108, 110 are identically equipped in at least the external interfaces and the connections to external storage.
  • the memory configuration should also be identical. Temporary differences in configuration can be tolerated provided that the SCM with the greater number of external interfaces is not configured to use them. This exception is permitted since it allows the user to upgrade the storage system without having to shut down the system.
  • one network port is designated as the dedicated inter-SCM network. Only SCMs and UPS ' s are allowed on this network 132.
  • the storage device module (storage pool 112) is an enclosure containing the storage arrays 116 and 118 and provides an environment in which they operate.
  • a disk array 116, 118 that can be used with the embodiment of the present invention is the Synchronix 2000 manufactured by ECCS, Inc. of Tinton Falls, New Jersey.
  • the Synchronix 2000 provides disk storage, volume management and RAID capability. These functions may also be provided by the SCM through the use of custom PCI I/O cards. Depending on the I/O card configuration, multiple Synchronix 2000 units can be employed in this storage system.
  • each of the storage arrays 116, 118 uses 4 PCI slots in a 1 host/3 target configuration, 6 SCSI target channels 228 are available allowing six disk array units each with thirty 50GB disk drives. As such, the 180 drives provide 9 TB of total storage.
  • Each storage array 116, 118 can utilize RAID techniques through a RAID processor 226 such that data redundancy and disk drive fault tolerance is achieved.
  • FIG. 2 further depicts a block diagram of the configuration database 162 and an associated CTCM 250 that is responsible for maintaining the integrity of the configuration of the storage system.
  • the CTCM 250 is executed on each of the SCMs 108, 110.
  • the module's main responsibility is to maintain the shared replicated configuration database 162 that is stored in part on the two SCMs 108, 110 and the disk arrays 206, 208.
  • the module 250 is not responsible for maintaining the actual contents of the configuration files, but acts as a transaction wrapper around which configuration changes take place. Only the SCM running in MASTER mode is allowed to perform a configuration transaction.
  • the SLAVE mode SCM must run under the direct supervision of the MASTER SCM. Only the MASTER can dictate to the SLAVE as to when its own timestamp files can be updated.
  • the configuration information is distributed across the SCM memories and the shared disk arrays. Specifically, the configuration information is stored both on the SCM's internal hard drive (referred to as the local configuration database) in its usable format as well as on the private extent of a selected group of logical storage devices (referred to as the shared- replicated configuration database) .
  • the shared replicated configuration database is considered the primary source of configuration information. The system is able to ascertain on powerup, if the local configuration information is correct. If it is not the latest version or if it is corrupt, the configuration information can retrieve a copy of the latest configuration from the shared replicated configuration database and correct the corruption. This is implemented by performing a restore operation of a valid tar archive found on a common storage volume.
  • a configuration transaction is composed of many smaller component transactions .
  • a component transaction is begun when a starting timestamp file is written onto the medium to be updated. This transaction is completed when the ending timestamp file has been updated. Therefore, a component transaction is identified as valid by comparing the starting and ending timestamps . If they are identical, the component transaction is complete
  • FIG. 3 depicts a flow diagram of a configuration transaction process 300 as performed by the CTCM consists of the following steps:
  • Step 302. Create starting timestamp file on MASTER SCM with a new version number and the current date and time .
  • Step 304 If running in DAC mode, copy to SLAVE SCM. Step 306. Perform local configuration change (this updates the local configuration database of both the MASTER and SLAVE SCMs)
  • Step 308 Create ending timestamp file on MASTER SCM Step 310. If running in DAC mode, copy ending timestamp file to SLAVE SCM Step 312. Read configuration files and deposit in temporary tar archive
  • Step 314. For each logical storage device, perform the following steps:
  • Step 316 Write the new Starting Timestamp. Step 318. If this logical storage device is to store a copy of the configuration information, the Archive Sani ty Block is written with a valid Archive Sani ty Block, then the Tar Archive segment is updated with the tar archived configuration files; otherwise, the Archive
  • Step 320 The Ending Timestamp is updated.
  • step 306 the system administration module or other module that is updated configuration is responsible for updating the local configuration database on the SLAVE SCM. If the MASTER SCM faults during a configuration transaction, the shared replicated configuration database is left in an inconsistent state. The configuration information can either be rolled forward or backward depending on how far through the transaction the SCM was at the time of the fault. The CTCM must be able to recover the configuration information and place the configuration database back into a consistent state regardless of whether the MASTER SCM faulted and the SLAVE SCM took over or the entire storage system simultaneously lost AC power.
  • DAC dual active configuration
  • the CTCM will require at least two valid component transactions, on logical storage devices, in order to roll the transaction forward, otherwise, the transaction is rolled backwards. If the entire storage system crashes due to some catastrophic error or environmental condition, (e.g. AC power and the UPS faults within a period of time such that power is not available to the storage system) , the configuration information must be brought into a consistent state the next time the storage system is powered on. If the configuration transaction reached a point where at least two logical storage devices completed a component transaction, the configuration transaction can be rolled forward and the transaction completed. This can occur regardless of whether the MASTER SCM was operating at the time the original configuration transaction took place. If a rollback operation needs to be performed, the local configuration stored on the SCM may be inconsistent and will need to be recovered from the shared replicated configuration database.
  • some catastrophic error or environmental condition e.g. AC power and the UPS faults within a period of time such that power is not available to the storage system
  • the SLAVE SCM is capable of rolling an uncompleted configuration transaction forward or backward, in the event that the MASTER SCM faults during a configuration transaction.
  • the SLAVE SCM performs a transition from SLAVE mode to MASTER mode.
  • the software controller notifies the CTCM that an operational mode change is in effect.
  • the CTCM examines the local and shared replicated configuration database at this time to determine its state. If the configuration database is in an inconsistent state, the CTCM performs a rollback or rollforward operation to bring it back to a consistent state. After this SLAVE to MASTER mode transition has completed, the software controller performs a transition from DAC to DEGRADED mode.
  • This case examines what happens if drives from another storage system are added to the current storage system while the unit is powered down.
  • the CTCM is capable of distinguishing between this case and the case of an incomplete transaction.
  • the difference in the version of the configuration data (VERSION) is simply off by 1.
  • the VERSION will probably be off by some number other than 1 and will probably have a different SYSTEMNAME. This and other scenarios are addressed during configuration initialization.
  • the starting timestamp is created using the new system name. If the SCM crashed during this configuration transaction, the starting timestamp contains the new system name whereas the ending timestamp will contain the previous system name. This allows the system to know that a system name change was in effect instead of thinking that some components of the storage system belong to another storage system. In addition, the starting timestamp will have a sequence number one greater than in the ending timestamp.
  • the system name change applies to logical storage devices as well.
  • a logical storage device configuration update may be consistent with respect to any one device but is inconsistent when looking at the entire pool of storage devices. If the system name does not match between sets of storage devices in the pool, they must differ in sequence number by 1 if they are to be considered part of the same storage unit. If this difference is greater, they are considered belonging to two different storage systems.
  • the SCM determines that the configuration information stored on the logical storage devices does not belong to itself, it must enter MONITOR mode. This means that the system name information recorded on the logical storage device does not match the system name recorded on the SCM. Only the MASTER SCM is allowed to modify the contents of the configuration database.
  • the timestamp is used to determine the time and date of the last configuration transaction as well as who was responsible for it.
  • the timestamp is a set of text lines, formatted as a list of 2-ples (keyword, value) , one per line. Each line is terminated with a CR LF combination. An equal sign is used to delimit the two fields.
  • the keywords appear in a predefined order. This order is depicted in the following example :
  • THISFILE SYSTEM CONFIGURATION TIMESTAMP
  • TIMEDATE yyyy/mm/dd hh/mm/ss
  • the first line, THISFILE keyword, identifies this timestamp as a "System Configuration Timestamp".
  • the second line contains the MAGIC keyword and contains a magic number and indicates that this is a configuration transaction timestamp.
  • the magic number of a System Configuration Timestamp file is 15238847.
  • the SYSTEMNAME keyword identifies which Storage System this configuration timestamp belongs to. This allows a disk to be prequalified if it already contains a configuration timestamp but belongs to a different storage system. In which case, its vote does not apply.
  • the TIMEDATE keyword indicates the time and date to the second as to when the last configuration transaction was started.
  • the VERSION keyword is an integer and is incremented each time a configuration transaction takes place.
  • the WHO keyword indicates what was being updated at the time. This can be a software module name or the name of a file which was being updated at the time. This is just to indicate which file was changing at the time of the last update in case the update failed.
  • This object is terminated by another instance of the MAGIC keyword which must match the first instance. For implementation reasons, it is recommended that the timestamp not exceed 512 bytes.
  • the timestamp can be stored in a file or in a raw disk block which is 512 bytes.
  • the CTCM will use the private extent on the logical storage device to store the timestamp and the tar archive of the configuration files. This allows a SCM to get the latest configuration information in case its local configuration information is out of date.
  • the data to be stored in the private extent is laid out as follows:
  • the two fields, Tar Archive and Archive Sani ty Header are optional. Not all logical storage devices need to have configuration information stored on them. This improves the performance of the system by limiting the extent of the replication.
  • the logical storage devices that are accessible through the first LUN in each active SCSI ID will be used to store the configuration information. The remaining logical storage devices will contain timestamp information only.
  • the Tar Archive contains the tar archive of all the local configuration files and databases.
  • the Archive Sani ty Header indicates if the tar archive is valid or invalid.
  • the Archive Sani ty Header has the following format :
  • the MagicNumber field identifies this header as a valid Archive Sani ty Header.
  • the Owner field indicates which SCM created this archive. It is set to a value of SCM_OPMODE_MASTER if the configured MASTER wrote it and is set to a value of SCM_OPMODE_SLAVE is the configured SLAVE wrote it. The remaining field pads out the structure to the end of the disk block.
  • the Starting Timestamp and Ending Timestamp blocks contain an identical copy of the configuration timestamp presented in the previous section. All extra padding bytes beyond the timestamp to the end of the disk block must be set to a value of 0x00.
  • the information in this private extent is ordered in this manner on purpose to improve the performance of the component transaction.

Abstract

A method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances. The apparatus comprises a pair of network appliances coupled to a network. The appliances interact with one another to detect a failure in one appliance and instantly transition operations from the failed appliance to a functional appliance. Each appliance monitors the status of another appliance using multiple, redundant communication channels. The configuration data of the network appliances is distributed and stored in a redundant manner in storage system that is accessible to the network appliances. The configuration data is updated on a regular basis while the network appliances are operational. Upon recovery of previously failed network appliances, the appliances access the stored configuration data such that the appliance may boot-up to operate in a manner as if a failure never occurred.

Description

METHOD AND APPARATUS FOR MAINTAINING THE INTEGRITY OF CONFIGURATION DATA IN REDUNDANT, FAULT TOLERANT NETWORK
APPLIANCES
BACKGROUND OF THE DISCLOSURE
1. Field of the Invention
The invention relates to network appliances and, more particularly, the invention relates to a method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances.
2. Description of the Background Art
Data processing and storage systems that are connected to a network to perform task specific operations are known as network appliances. Network appliances may include a general purpose computer that executes particular software to perform a specific network task, such as file server services, domain name services, data storage services, and the like. Because these network appliances have become important to the day-to-day operation of a network, the appliances are generally required to be fault-tolerant. Typically, fault tolerance is accomplished by using redundant appliances, such that, if one appliance becomes disabled, another appliance takes over its duties on the network. However, the process for transferring operations from one appliance to another leads to a loss of network information. For instance, if a pair of redundant data storage units are operating on a network and one unit fails, the second unit needs to immediately perform the duties of the failed unit. However, the delay in transitioning from one storage unit to another may cause a loss of some data. One factor in performing a rapid recovery of network appliances after a failure is to utilize identical configuration data in the recovered appliance as was used prior to the fault. Generally this requires booting the recovered network appliance using default configuration data and then altering the data after the recovered network appliance is operational. Such boot up using default data and then amending the data is time consuming and results in a slow recovery process.
Therefore, a need exists in the art for a method and apparatus for maintaining the integrity of the configuration data to facilitate rapid recovery of failed network appliances.
SUMMARY OF THE INVENTION
The disadvantages associated with the prior art are overcome by the present invention of a method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances. The apparatus comprises a pair of network appliances coupled to a network. The appliances interact with one another to detect a failure in one appliance and instantly transition operations from the failed appliance to a functional appliance. Each appliance monitors the status of another appliance using multiple, redundant communication channels. The configuration data of the network appliances is distributed and stored in a redundant manner in storage system that is accessible to all the network appliances. The configuration data is updated on a regular basis while the network appliances are operational . Upon recovery of a previously failed network appliances, the appliance accesses the stored configuration data such that the appliance may boot-up to operate as if a failure never occurred.
In one embodiment of the invention, the apparatus comprises a pair of storage controller modules (SCM) that are coupled to a storage pool, i.e., one or more data storage arrays. The storage controller modules are coupled to a host network (or local area network (LAN) ) . The network comprises a plurality of client computers that are interconnected by the network. Each SCM comprises a database management module that maintains a shared replicated configuration database wherein the configuration data for one or more SCMs is stored. The database is stored in a replicated manner within the storage pool. In addition to storing the configuration data in the storage pool, each SCM stores its own configuration data locally on a disk drive within the SCM.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 depicts a block diagram of one embodiment of the present invention; FIG. 2 depicts a functional block diagram of the status monitoring system of the pair of storage controller modules; and
FIG. 3 depicts a flow of a configuration transaction process . To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures .
DETAILED DESCRIPTION
One embodiment of the invention is a modular, high- performance, highly scalable, highly available, fault tolerant network appliance that is illustratively embodied in a data storage system that maintains a redundant configuration database to facilitate rapid failure recovery.
FIG. 1 depicts a data processing system 50 comprising a plurality of client computers 102, 104, and 106, a host network 130, and a storage system 100. The storage system 100 comprises a plurality of network appliances 108 and 110 and a storage pool 112. The plurality of clients comprise one or more of a network attached storage (NAS) client 102, a direct attached storage (DAS) client 104 and a storage area network (SAN) client 106. The plurality of network appliances 108 and 110 comprise a storage controller module A (SCM A) 108 and storage controller module B (SCM B) 110. The storage pool 112 is coupled to the storage controller modules 108, 110 via a fiber channel network 114. One embodiment of the storage pool 112 comprises a pair of storage arrays 116, 118 that are coupled to the fiber channel network 114 via a pair of fiber channel switches 124, 126 and a communications gateway 120, 122. A tape library 128 is also provided for storage backup. In storage system 100, the DAS client directly accesses the storage pool 112 via the fiber channel network 114, while the SAN client accesses the storage pool 112 via both the LAN 130 and the fiber channel network 114. For example, the SAN client 104 communicates via the LAN with the SCMs 108, 110 to request access to the storage pool 112. The SCMs inform the SAN client 104 where in the storage arrays the requested data is located or where the data from the SAN client is to be stored. The SAN client 104 then directly accesses a storage array using the location information provided by the SCMs. The NAS client 106 only communicates with the storage pool 112 via the SCMs 108, 110. Although a fiber channel network is depicted as one way of connecting the SCMs 108, 110 to the storage pool 112, the connection may be accomplished using any form of data network protocol such as SCSI, HIPPI and the like.
The storage system is a hierarchy of system components that are connected together within the framework established by the system architecture. The major active system level components are:
SCM - Storage Controller Module SDM - Storage Device Module (Storage Pool) UPS - Uninterruptible Power Supply
Fibre channel switches, hubs, routers and gateways
The system architecture provides an environment in which each of the storage components that comprise the storage system embodiment of the invention operate and interact to form a cohesive storage system.
The architecture is centered around a pair of SCMs 108 and 110 that provide storage management functions . The SCMs are connected to a host network that allows the network community to access the services offered by the SCMs 108, 110. Each SCM 108, 110 is connected to the same set of networks. This allows one SCM to provide the services of the other SCM in the event that one of the SCMs becomes faulty. Each SCM 108, 110 has access to the entire storage pool 112. The storage pool is logically divided by assigning a particular storage device (array 116 or 118) to one of the SCMs 108, 110. A storage device 116 or 118 is only assigned to one SCM 108 or 110 at a time. Since both SCMs 108, 110 are connected to the entirety of the storage pool 112, the storage devices 116, 118 assigned to a faulted SCM can be accessed by the remaining SCM to provide its services to the network community on behalf of the faulted SCM. The SCMs communicate with one another via the host networks. Since each SCM 108, 110 is connected to the same set of physical networks as the other, they are able to communicate with each other over these same links. These links allow the SCMs to exchange configuration information with each other and synchronize their operation.
The host network 130 is the medium through which the storage system communicates with the clients 104 and 106. The SCMs 108, 110 provide network services such as NFS and HTTP to the clients 104, 106 that reside on the host network 130. The host network 130 runs network protocols through which the various services are offered. These may include TCP/IP, UDP/IP, ARP, SNMP, NFS, CIFS, HTTP, NDMP, and the like.
From an SCM point of view, its front-end interfaces are network ports running file protocols . The back-end interface of each SCM provides channel ports running raw block access protocols.
The SCMs 108, 110 accept network requests from the various clients and process them according to the command issued. The main function of the SCM is to act as a network-attached storage (NAS) device. It therefore communicates with the clients using file protocols such as NFSv2, NFSv3, SMB/CIFS, and HTTP. The SCM converts these file protocol requests into logical block requests suitable for use by a direct-attach storage device.
The storage array on the back-end is a direct-attach disk array controller with RAID and caching technologies. The storage array accepts the logical block requests issued to a logical volume set and converts it into a set of member disk requests suitable for a disk drive.
The redundant SCMs both are connected to the same set of networks. This allows either of the SCMs to respond to the IP address of the other SCM in the event of failure of one of the SCMs. The SCMs support lOBaseT, 100BaseT, and gigabit Ethernet. The SCMs can communicate with each other through a dedicated inter-SCM network 132 as a primary means of inter-SCM communications. This dedicated connection can employ gigabit Ethernet or fiber channel. In the event of the failure of this link 132, the host network 130 may be used as a backup network. The SCMs 108, 110 connect to the storage arrays 116, 118 through parallel differential SCSI (not shown) or a fiber channel network 114. Each SCM 108, 110 may be connected through their own private SCSI connection to one of the ports on the storage array.
The storage arrays 116, 118 provide a high availability mechanism for RAID management. Each of the storage arrays provides a logical volume view of the storage to a respective SCM. The SCM does not have to perform any volume management .
Each SCM 108, 110 comprises a local memory device 150, 154 that contains configuration data 152 and 156 (referred to herein as the local configuration data) . This configuration data is also stored in a replicated configuration database 162 containing configuration data 158 and 160. The shared replicated database 162 is considered the primary source of configuration data for either SCM 108, 110. Upon boot up, an SCM, by executing a configuration transaction control module (CTCM) 250, ascertains whether the local configuration data is correct. If the data is incorrect, (i.e., not the latest version or contains corrupt information) , the configuration data is retrieved from the database 162.
The UPS 134 provides a temporary secondary source of AC power source in the event the primary source fails.
This allows time for the storage arrays 116, 118 to flush the write-back cache and for the SCMs 108, 110 to perform an orderly shutdown of network services. The UPS is monitored by the SCMs through the serial port or over the host networks using SNMP . FIG. 2 depicts an embodiment of the invention having the SCMs 108, 110 coupled to the storage arrays 116, 118 via SCSI connections 200. Each storage array 116, 118 comprises an array controller 202, 204 coupled to a disk enclosure 206, 208. The array controllers 202, 204 support RAID techniques to facilitate redundant, fault tolerant storage of data. The SCMs 108, 110 are connected to both the host network 130 and to array controllers 202, 204. Note that every host network interface card (NIC) 210 connections on one SCM is duplicated on the other. This allows a SCM to assume the IP address of the other on every network in the event of a SCM failure. One of the NICs 212 in each SCM 108, 110 is dedicated for communications between the two SCMs . On the target channel side of the SCM, each SCM 108, 110 is connected to an array controller 202, 204 through its own host SCSI port 214. All volumes in each of the storage arrays 202, 204 are dual-ported through SCSI ports 216 so that access to any volume is available to both SCMs 108, 110.
The SCM 108, 110 is based on a general purpose computer (PC) such as a ProLiant 1850R manufactured by COMPAQ Computer Corporation. This product is a Pentium PC platform mounted in a 3U 19 " rack-mount enclosure . The SCM comprises a plurality of network interface controls 210, 212, a central processing unit (CPU) 218, a memory unit 220, support circuits 222 and SCSI ports 214. Communication amongst the SCM components is supported by a PCI bus 224. The SCM employs, as a support circuit 222, dual hot-pluggable power supplies with separate AC power connections and contains three f ns. (One fan resides in each of the two power supplies) . The SCM is, for example, based on the Pentium III architecture running at 600 MHz and beyond. The PC has 4 horizontal mount 32-bit 33 MHz PCI slots. As part of the memory (MEM) unit 220, the PC comes equipped with 128 MB of 100 MHz SDRAM standard and is upgradable to 1 GB. A Symbios 53c8xx series chipset resides on the 1850R motherboard that can be used to access the boot drive. The SCM boots off the internal hard drive (also part of the memory unit 220) . The internal drive is, for example, a SCSI drive and provides at least 1 GB of storage. The internal boot device must be able to hold the SCSI executable image, a mountable file system with all the configuration files, HTML documentation, and the storage administration application. This information may consume anywhere from 20 to 50 MB of disk space.
In a redundant SCM configuration, the SCM's 108, 110 are identically equipped in at least the external interfaces and the connections to external storage. The memory configuration should also be identical. Temporary differences in configuration can be tolerated provided that the SCM with the greater number of external interfaces is not configured to use them. This exception is permitted since it allows the user to upgrade the storage system without having to shut down the system. As mentioned previously, one network port is designated as the dedicated inter-SCM network. Only SCMs and UPS ' s are allowed on this network 132. The storage device module (storage pool 112) is an enclosure containing the storage arrays 116 and 118 and provides an environment in which they operate.
One example of a disk array 116, 118 that can be used with the embodiment of the present invention is the Synchronix 2000 manufactured by ECCS, Inc. of Tinton Falls, New Jersey. The Synchronix 2000 provides disk storage, volume management and RAID capability. These functions may also be provided by the SCM through the use of custom PCI I/O cards. Depending on the I/O card configuration, multiple Synchronix 2000 units can be employed in this storage system. In one illustrative implementation of the invention, each of the storage arrays 116, 118 uses 4 PCI slots in a 1 host/3 target configuration, 6 SCSI target channels 228 are available allowing six disk array units each with thirty 50GB disk drives. As such, the 180 drives provide 9 TB of total storage. Each storage array 116, 118 can utilize RAID techniques through a RAID processor 226 such that data redundancy and disk drive fault tolerance is achieved.
A detailed disclosure of the redundant, fault tolerant system described briefly above is more fully described in U.S. patent application number , filed simultaneously herewith, which is incorporated herein by reference.
FIG. 2 further depicts a block diagram of the configuration database 162 and an associated CTCM 250 that is responsible for maintaining the integrity of the configuration of the storage system. The CTCM 250 is executed on each of the SCMs 108, 110. The module's main responsibility is to maintain the shared replicated configuration database 162 that is stored in part on the two SCMs 108, 110 and the disk arrays 206, 208. The module 250 is not responsible for maintaining the actual contents of the configuration files, but acts as a transaction wrapper around which configuration changes take place. Only the SCM running in MASTER mode is allowed to perform a configuration transaction. The SLAVE mode SCM must run under the direct supervision of the MASTER SCM. Only the MASTER can dictate to the SLAVE as to when its own timestamp files can be updated. This database module must be invoked anytime one of the SCM configuration files is updated. The configuration information is distributed across the SCM memories and the shared disk arrays. Specifically, the configuration information is stored both on the SCM's internal hard drive (referred to as the local configuration database) in its usable format as well as on the private extent of a selected group of logical storage devices (referred to as the shared- replicated configuration database) . The shared replicated configuration database is considered the primary source of configuration information. The system is able to ascertain on powerup, if the local configuration information is correct. If it is not the latest version or if it is corrupt, the configuration information can retrieve a copy of the latest configuration from the shared replicated configuration database and correct the corruption. This is implemented by performing a restore operation of a valid tar archive found on a common storage volume.
Simply stated, a configuration transaction is composed of many smaller component transactions . A component transaction is begun when a starting timestamp file is written onto the medium to be updated. This transaction is completed when the ending timestamp file has been updated. Therefore, a component transaction is identified as valid by comparing the starting and ending timestamps . If they are identical, the component transaction is complete
(referred to as a valid component transaction) and can therefore be used to determine the consistency of the entire configuration transaction. If the two timestamps differ, then the component transaction never completed and is therefore referred to as an invalid component transaction. A component transaction is applied to SCMs or to logical storage devices. By examining the set of all valid component transactions, this module can extract a consistent configuration regardless of whether the original configuration transaction completed or not. More specifically, FIG. 3 depicts a flow diagram of a configuration transaction process 300 as performed by the CTCM consists of the following steps:
Step 302. Create starting timestamp file on MASTER SCM with a new version number and the current date and time .
Step 304. If running in DAC mode, copy to SLAVE SCM. Step 306. Perform local configuration change (this updates the local configuration database of both the MASTER and SLAVE SCMs)
Step 308. Create ending timestamp file on MASTER SCM Step 310. If running in DAC mode, copy ending timestamp file to SLAVE SCM Step 312. Read configuration files and deposit in temporary tar archive
Step 314. For each logical storage device, perform the following steps:
Step 316. Write the new Starting Timestamp. Step 318. If this logical storage device is to store a copy of the configuration information, the Archive Sani ty Block is written with a valid Archive Sani ty Block, then the Tar Archive segment is updated with the tar archived configuration files; otherwise, the Archive
Sani ty Block is destroyed. Step 320. The Ending Timestamp is updated.
Notice that this algorithm checks for dual active configuration (DAC) mode explicitly when copying files between the MASTER and SLAVE SCMs. In step 306, the system administration module or other module that is updated configuration is responsible for updating the local configuration database on the SLAVE SCM. If the MASTER SCM faults during a configuration transaction, the shared replicated configuration database is left in an inconsistent state. The configuration information can either be rolled forward or backward depending on how far through the transaction the SCM was at the time of the fault. The CTCM must be able to recover the configuration information and place the configuration database back into a consistent state regardless of whether the MASTER SCM faulted and the SLAVE SCM took over or the entire storage system simultaneously lost AC power. As a rule of thumb, the CTCM will require at least two valid component transactions, on logical storage devices, in order to roll the transaction forward, otherwise, the transaction is rolled backwards. If the entire storage system crashes due to some catastrophic error or environmental condition, (e.g. AC power and the UPS faults within a period of time such that power is not available to the storage system) , the configuration information must be brought into a consistent state the next time the storage system is powered on. If the configuration transaction reached a point where at least two logical storage devices completed a component transaction, the configuration transaction can be rolled forward and the transaction completed. This can occur regardless of whether the MASTER SCM was operating at the time the original configuration transaction took place. If a rollback operation needs to be performed, the local configuration stored on the SCM may be inconsistent and will need to be recovered from the shared replicated configuration database.
The SLAVE SCM is capable of rolling an uncompleted configuration transaction forward or backward, in the event that the MASTER SCM faults during a configuration transaction. The SLAVE SCM performs a transition from SLAVE mode to MASTER mode. During this transition, the software controller notifies the CTCM that an operational mode change is in effect. The CTCM examines the local and shared replicated configuration database at this time to determine its state. If the configuration database is in an inconsistent state, the CTCM performs a rollback or rollforward operation to bring it back to a consistent state. After this SLAVE to MASTER mode transition has completed, the software controller performs a transition from DAC to DEGRADED mode. This case examines what happens if drives from another storage system are added to the current storage system while the unit is powered down. The CTCM is capable of distinguishing between this case and the case of an incomplete transaction. In the case of an incomplete transaction, the difference in the version of the configuration data (VERSION) is simply off by 1. In the case of logical storage devices from another system being added, the VERSION will probably be off by some number other than 1 and will probably have a different SYSTEMNAME. This and other scenarios are addressed during configuration initialization.
If the timestamp files residing on the internal hard drive of a SCM do not match, this indicates that the SCM crashed during a configuration transaction. This further indicates that the current configuration of the local SCM may be corrupted and will therefore need to be restored.
If a system name change occurs, the starting timestamp is created using the new system name. If the SCM crashed during this configuration transaction, the starting timestamp contains the new system name whereas the ending timestamp will contain the previous system name. This allows the system to know that a system name change was in effect instead of thinking that some components of the storage system belong to another storage system. In addition, the starting timestamp will have a sequence number one greater than in the ending timestamp.
The system name change applies to logical storage devices as well. In addition, a logical storage device configuration update may be consistent with respect to any one device but is inconsistent when looking at the entire pool of storage devices. If the system name does not match between sets of storage devices in the pool, they must differ in sequence number by 1 if they are to be considered part of the same storage unit. If this difference is greater, they are considered belonging to two different storage systems.
If there are two groups of logical storage devices that have different system names and vary in sequence number by more than one, compare them against the system name of the SCM. If the SCM matches one of the groups, use that group for the new configuration. If not, assume the majority rule and pick the group with the larger number of elements . Anytime a configuration transaction occurs, the sequence number is incremented by one and the date/time stamp is set to the date and time at the time the transaction was initiated.
If the SCM determines that the configuration information stored on the logical storage devices does not belong to itself, it must enter MONITOR mode. This means that the system name information recorded on the logical storage device does not match the system name recorded on the SCM. Only the MASTER SCM is allowed to modify the contents of the configuration database.
One requirement of the initialization process is that the SCM must wait for all configured logical storage devices to boot before continuing. The timestamp is used to determine the time and date of the last configuration transaction as well as who was responsible for it. The timestamp is a set of text lines, formatted as a list of 2-ples (keyword, value) , one per line. Each line is terminated with a CR LF combination. An equal sign is used to delimit the two fields. The keywords appear in a predefined order. This order is depicted in the following example :
THISFILE=SYSTEM CONFIGURATION TIMESTAMP
MAGIC=15238847
SYSTEMNAME=System 1
TIMEDATE=yyyy/mm/dd hh/mm/ss
VERSION=1234 WHO=/admin/group
MAGIC=15238847
The first line, THISFILE keyword, identifies this timestamp as a "System Configuration Timestamp". The second line contains the MAGIC keyword and contains a magic number and indicates that this is a configuration transaction timestamp. The magic number of a System Configuration Timestamp file is 15238847. The SYSTEMNAME keyword, identifies which Storage System this configuration timestamp belongs to. This allows a disk to be prequalified if it already contains a configuration timestamp but belongs to a different storage system. In which case, its vote does not apply. The TIMEDATE keyword indicates the time and date to the second as to when the last configuration transaction was started. The VERSION keyword is an integer and is incremented each time a configuration transaction takes place. It is used to keep track of how many times the configuration has changed since this SCM went into service so as to determine which of a series of component transactions is the latest one. The WHO keyword indicates what was being updated at the time. This can be a software module name or the name of a file which was being updated at the time. This is just to indicate which file was changing at the time of the last update in case the update failed. This object is terminated by another instance of the MAGIC keyword which must match the first instance. For implementation reasons, it is recommended that the timestamp not exceed 512 bytes. The timestamp can be stored in a file or in a raw disk block which is 512 bytes.
The CTCM will use the private extent on the logical storage device to store the timestamp and the tar archive of the configuration files. This allows a SCM to get the latest configuration information in case its local configuration information is out of date.
In one embodiment of the invention, the data to be stored in the private extent is laid out as follows:
Block Addresses 0 through (n - 4) (n - 3) (n - 2)
(n - 1) Use
Tar Archive Archive Sani ty Header Starting Times tamp
Ending Timestamp
The two fields, Tar Archive and Archive Sani ty Header are optional. Not all logical storage devices need to have configuration information stored on them. This improves the performance of the system by limiting the extent of the replication. The logical storage devices that are accessible through the first LUN in each active SCSI ID will be used to store the configuration information. The remaining logical storage devices will contain timestamp information only. The Tar Archive contains the tar archive of all the local configuration files and databases. The Archive Sani ty Header indicates if the tar archive is valid or invalid. The Archive Sani ty Header has the following format :
typedef struct ctcm_ArchiveSanityHeader_s { int MagicNumber ; scm_OpMode Owner; char Padding [CONFIG_DISKBLKSZ - (sizeof(int) + sizeof (scm_OpMode) ) ] ,- } ctcm_ArchiveSanityHeader_t;
The MagicNumber field identifies this header as a valid Archive Sani ty Header. The Owner field indicates which SCM created this archive. It is set to a value of SCM_OPMODE_MASTER if the configured MASTER wrote it and is set to a value of SCM_OPMODE_SLAVE is the configured SLAVE wrote it. The remaining field pads out the structure to the end of the disk block.
The Starting Timestamp and Ending Timestamp blocks contain an identical copy of the configuration timestamp presented in the previous section. All extra padding bytes beyond the timestamp to the end of the disk block must be set to a value of 0x00.
The information in this private extent is ordered in this manner on purpose to improve the performance of the component transaction.
The concept of taking the local configuration files and creating a tar archive is not exactly straightforward. The problem here is that both the MASTER and SLAVE must be able to read and write this Tar Archive and convert it into the local configuration files depending on whether it is the MASTER or SLAVE. To give an example of the problem consider that when the MASTER SCM backups up /admin/vfstab, this file cannot be simply restored onto the SLAVE SCM as is . This file has to be moved to / remoteadmin /vfstab and the / remoteadmin /vfstab must be moved to
/admin/vfstab. Also to be considered is that if the MASTER SCM faults and the SLAVE SCM is running in DEGRADED mode, the SLAVE SCM needs to store the Tar Archive such that the MASTER SCM can read it. An additional problem arises when the SCM is operating in DEGRADED mode which means that some of the configuration files are actually a composite of the information from both SCMs. This occurs in the
/admin/vfstab, /admin/dktab, and the /admin/sharetab files.
Although various embodiments which incorporate the teachings of the present invention have been shown and described in detail herein, those skilled in the art can readily devise many other varied embodiments that still incorporate these teachings .

Claims

What is claimed is:
1. Apparatus for monitoring the status of multiple devices comprising: a plurality of network appliances; a storage array coupled to said plurality of network app1iances ; and a distributed configuration database for storing, in a distributed manner, configuration information within said plurality of network appliances and said storage array.
2. The apparatus of claim 1 wherein said network appliances are storage controller modules.
3. The apparatus of claim 1 wherein each of said network appliances comprises a local memory for storing local configuration data.
4. The apparatus of claim 3 wherein said local memory is a disk drive.
5. The apparatus of claim 3 wherein each of said network appliances comprises a database module.
6. The apparatus of claim 5 wherein said database module determines whether said local configuration data is a latest version or corrupt.
7. The apparatus of claim 1 wherein said plurality of network appliances comprise a first network appliance and a second network appliance, where said first network appliance operates in a master mode and said second network appliance operates in a slave mode.
8. The apparatus of claim 7 wherein said first network appliance is capable of altering configuration data in said configuration database, while said second network appliance is not capable of altering configuration data in said configuration database.
9. Apparatus for maintaining the integrity of configuration data in a redundant, fault tolerant system comprising: a first network appliance; a second network appliance; a storage array coupled to said first and second network appliances; and means for updating and making consistent said configuration data that is stored in said first network appliance, second network appliance and said storage array.
10. The apparatus of claim 9 wherein said updating means is a database module.
11. The apparatus of claim 9 further comprising: means for monitoring a timestamp file to determine a validity of said configuration data.
12. A method of ensuring integrity of configuration data in a redundant, fault tolerant system comprising: creating a starting timestamp file; copying the starting timestamp file to a slave network appliance; altering locally stored configuration data; creating an ending timestamp file; copying the altered locally stored configuration data to said slave network appliance; and storing said altered locally stored configuration data on a storage device.
13. The method of claim 12 wherein said storing step further comprises: reading a configuration file containing the altered locally stored configuration data and depositing the file into a tar archive; for each storage device in a plurality of storage devices : writing a starting timestamp; updating a tar archive segment or destroying said archive sanity block; update ending timestamp.
PCT/US2001/012861 2000-04-20 2001-04-20 Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances WO2001082078A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001255523A AU2001255523A1 (en) 2000-04-20 2001-04-20 Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US55277600A 2000-04-20 2000-04-20
US09/552,776 2000-04-20

Publications (3)

Publication Number Publication Date
WO2001082078A2 true WO2001082078A2 (en) 2001-11-01
WO2001082078A3 WO2001082078A3 (en) 2002-07-18
WO2001082078A9 WO2001082078A9 (en) 2002-10-17

Family

ID=24206762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/012861 WO2001082078A2 (en) 2000-04-20 2001-04-20 Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances

Country Status (2)

Country Link
AU (1) AU2001255523A1 (en)
WO (1) WO2001082078A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1830515A1 (en) * 2004-12-06 2007-09-05 Huawei Technologies Co., Ltd. A method for transferring the network management configuration information between the element management systems
CN100413256C (en) * 2005-12-22 2008-08-20 中山大学 Equipment failure restoring device and method for digital household network
GB2531404A (en) * 2014-08-14 2016-04-20 Zodiac Aero Electric Electrical distribution system for an aircraft and corresponding control method
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0632379A2 (en) * 1993-06-04 1995-01-04 Digital Equipment Corporation Fault tolerant storage controller utilizing tightly coupled dual controller modules
US5430866A (en) * 1990-05-11 1995-07-04 International Business Machines Corporation Method and apparatus for deriving mirrored unit state when re-initializing a system
US5590276A (en) * 1992-01-08 1996-12-31 Emc Corporation Method for synchronizing reserved areas in a redundant storage array
US5617530A (en) * 1991-01-04 1997-04-01 Emc Corporation Storage device array architecture with copyback cache
US5696895A (en) * 1995-05-19 1997-12-09 Compaq Computer Corporation Fault tolerant multiple network servers
WO1999017201A1 (en) * 1997-09-30 1999-04-08 Tandem Computers Incorporated A fault tolerant method of maintaining and distributing configuration information in a distributed processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5430866A (en) * 1990-05-11 1995-07-04 International Business Machines Corporation Method and apparatus for deriving mirrored unit state when re-initializing a system
US5617530A (en) * 1991-01-04 1997-04-01 Emc Corporation Storage device array architecture with copyback cache
US5590276A (en) * 1992-01-08 1996-12-31 Emc Corporation Method for synchronizing reserved areas in a redundant storage array
EP0632379A2 (en) * 1993-06-04 1995-01-04 Digital Equipment Corporation Fault tolerant storage controller utilizing tightly coupled dual controller modules
US5696895A (en) * 1995-05-19 1997-12-09 Compaq Computer Corporation Fault tolerant multiple network servers
WO1999017201A1 (en) * 1997-09-30 1999-04-08 Tandem Computers Incorporated A fault tolerant method of maintaining and distributing configuration information in a distributed processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEKONING R ET AL: "DUAL ACTIVE REDUNDANT CONTROLLERS: THE HIGH ROAD TO PERFORMANCE ANDAVAILABILITY" COMPUTER TECHNOLOGY REVIEW, WESTWORLD PRODUCTION CO. LOS ANGELES, US, vol. 15, no. 3, 1 March 1995 (1995-03-01), page 44,46 XP000498651 ISSN: 0278-9647 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1830515A1 (en) * 2004-12-06 2007-09-05 Huawei Technologies Co., Ltd. A method for transferring the network management configuration information between the element management systems
EP1830515A4 (en) * 2004-12-06 2008-02-27 Huawei Tech Co Ltd A method for transferring the network management configuration information between the element management systems
EP2256990A1 (en) * 2004-12-06 2010-12-01 Huawei Technologies Co., Ltd. A method for transferring the network management configuration information between the element management systems
CN100413256C (en) * 2005-12-22 2008-08-20 中山大学 Equipment failure restoring device and method for digital household network
GB2531404A (en) * 2014-08-14 2016-04-20 Zodiac Aero Electric Electrical distribution system for an aircraft and corresponding control method
GB2531404B (en) * 2014-08-14 2021-02-03 Zodiac Aero Electric Electrical distribution system for an aircraft and corresponding control method
US10958724B2 (en) 2014-08-14 2021-03-23 Zodiac Aero Electric Electrical distribution system for an aircraft and corresponding control method
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system
CN109828868B (en) * 2019-01-04 2023-02-03 新华三技术有限公司成都分公司 Data storage method, device, management equipment and double-active data storage system

Also Published As

Publication number Publication date
AU2001255523A1 (en) 2001-11-07
WO2001082078A3 (en) 2002-07-18
WO2001082078A9 (en) 2002-10-17

Similar Documents

Publication Publication Date Title
US11567674B2 (en) Low overhead resynchronization snapshot creation and utilization
JP4400913B2 (en) Disk array device
US7203732B2 (en) Flexible remote data mirroring
US9679039B1 (en) Continuous protection of data and storage management configuration
US7143307B1 (en) Remote disaster recovery and data migration using virtual appliance migration
US7203796B1 (en) Method and apparatus for synchronous data mirroring
JP4945047B2 (en) Flexible remote data mirroring
US20020069324A1 (en) Scalable storage architecture
US20150113095A1 (en) Flexible remote data mirroring
US20050188248A1 (en) Scalable storage architecture
JP2007086972A (en) Storage system, duplex control method, and program
EP2883147A1 (en) Synchronous local and cross-site failover in clustered storage systems
AU2001265335A1 (en) Flexible remote data mirroring
EP1687721B1 (en) Computer cluster, computer unit and method to control storage access between computer units
US20030200389A1 (en) System and method of cache management for storage controllers
WO2001082078A2 (en) Method and apparatus for maintaining the integrity of configuration data in redundant, fault tolerant network appliances
Dell
Dell
WO2001082080A9 (en) Network appliance

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/3-3/3, DRAWINGS, REPLACED BY NEW PAGES 1/3-3/3; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP