US20010002480A1 - Method and apparatus for providing centralized intelligent cache between multiple data controlling elements - Google Patents

Method and apparatus for providing centralized intelligent cache between multiple data controlling elements Download PDF

Info

Publication number
US20010002480A1
US20010002480A1 US08/941,770 US94177097A US2001002480A1 US 20010002480 A1 US20010002480 A1 US 20010002480A1 US 94177097 A US94177097 A US 94177097A US 2001002480 A1 US2001002480 A1 US 2001002480A1
Authority
US
United States
Prior art keywords
cache
data
controllers
controller
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US08/941,770
Other versions
US6381674B2 (en
Inventor
Rodney A. DeKoning
Bret S. Weber
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetApp Inc
Original Assignee
LSI Logic Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LSI Logic Corp filed Critical LSI Logic Corp
Priority to US08/941,770 priority Critical patent/US6381674B2/en
Assigned to SYMBIOS, INC. reassignment SYMBIOS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DEKONING, RODNEY A., WEBER, BRET S.
Priority to AU96733/98A priority patent/AU9673398A/en
Priority to PCT/US1998/020423 priority patent/WO1999017208A1/en
Assigned to LSI LOGIC CORPORATION reassignment LSI LOGIC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SYMBIOS, INC.
Publication of US20010002480A1 publication Critical patent/US20010002480A1/en
Application granted granted Critical
Publication of US6381674B2 publication Critical patent/US6381674B2/en
Assigned to NETAPP, INC. reassignment NETAPP, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LSI LOGIC CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1666Error detection or correction of the data by redundancy in hardware where the redundant component is memory or memory area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements

Definitions

  • This invention relates generally to caching within a data storage subsystem and in particular to controller element(s) used as intelligent central cache apparatus within multiple redundant controller data storage subsystems.
  • RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array.
  • the controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive.
  • the RAID controller may distribute the host computer system supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability.
  • a portion of data is distributed across a plurality of data disk drives and associated redundancy information is added on an additional drive (often referred to as a parity drive when XOR parity is used for the redundancy information).
  • the related data so distributed across a plurality of drives is often referred to as a stripe.
  • the “write” operation involves both a write of the data to the data disk and also a adjustment of parity information.
  • the parity information adjustment may involve the reading of other data in the same stripe and writing of the newly computed parity for the blocks of the stripe. This imposes a large “write penalty” upon RAID systems (RAID levels 3 - 6 ), often making them slower than traditional disk systems in the typical write I/O operation.
  • Known RAID subsystems provide cache memory structures to further improve the performance of the RAID subsystem write operations.
  • the cache memory is associated with the control module such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host system. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
  • the controllers gain the advantage of being able to simultaneously handle multiple read and write requests directed to the same volume of data storage.
  • the control modules since the control modules may access the same data, the control modules must communicate with one another to assure that the cache modules are synchronized. Other communications among the cooperating controllers are used to coordinate concurrent access to the common resources. Semaphore locking and related multi-tasking techniques are often utilized for this purpose.
  • the control modules therefore communicate among themselves to maintain synchronization of their respective, independent cache memories. Since many cache operations require the controllers to generate these synchronization signals and messages or semaphore locking and releasing messages, the amount of traffic (also referred to as coordination traffic or cache coordination traffic) generated can be substantial.
  • This coordination traffic imposes a continuing penalty upon the operation of the data storage subsystem by utilizing valuable bandwidth on the interconnection bus as well as processing overhead within the multiple control modules. If not for this overhead imposed by coordination traffic, the data storage subsystem would have more bandwidth and processing power available for I/O processing and would thus operate faster.
  • each control module has its own independent cache memory (also referred to herein as decentralized cache)
  • decentralized cache there is significant duplication of the circuits and memory that comprise the cache memory on each control module. This duplication increases the complexity (and therefore the cost of manufacture) of the individual control modules.
  • a decentralized cache architecture subsystem is scaled up by addition of control modules, each with its own duplicated cache memory circuits. This added complexity (and associated costs) therefore makes simple scaling of performance problematic.
  • the present invention solves the above and other problems, and thereby advances the useful arts, by providing an intelligent central cache shared among a plurality of storage controllers in a storage subsystem.
  • An intelligent central cache is a cache cooperatively engaged with the control modules (storage controllers) to provide caching within the storage subsystem.
  • Various functions are performed within the intelligent central cache including storage, generation, and maintenance of cache meta-data, stripe lock functions to enable coordinated sharing of the central cache features, and functions to coordinate cache flush operations among the plurality of attached control modules.
  • a “dumb” (unintelligent) cache though it may be a centralized resource, is one used merely as a memory bank, typically for myriad purposes within the data storage subsystem.
  • the intelligent cache of the present invention shares with the attached controllers much of the control logic and processing for determining, for example, when, whether, and how to cache data and meta-data in the cache memory.
  • Cache meta-data includes information regarding the type of data stored in the cache including indications that corresponding data is clean or dirty, current or old data, and redundancy (e.g., RAID parity) data or user related data.
  • the intelligent central cache of the present invention generates, stores, and utilizes cache meta-data for making such determinations relating to the operation of the central cache independently of and/or cooperatively with the storage controllers of the subsystem. Furthermore, the intelligent central cache of the present invention coordinates the management of non-volatility in the cache memory by coordinating with the control modules the monitoring of battery backup status, etc.
  • the features of the central cache are made accessed by the plurality of controllers through an application program interface (API) via inter-process communication techniques.
  • the control modules may request, via an API function, that information be inserted or deleted from the cache. Attributes are provided by the requesting controller to identify the type of data to be inserted (e.g., clean or dirty, new or old, user data or parity, etc.). Other API functions are used to request that the central controller read or return identified data to a requesting controller. Attribute data may also be so retrieved. API functions of the intelligent central cache also assist the controllers in performing cache flush operations (such as required in write-back cache management operations). An API function requests of the central cache a map identifying the status of data blocks in particular identified stripes.
  • the requesting control module may then use this map information to determine which data blocks in the identified stripes are to be flushed to disk.
  • Other API functions allow the central cache to perform cache flush operations independent of requests from the attached control modules.
  • Still other API functions provide the low level stripe lock (semaphore management) functions required to coordinate the shared access by control modules to the central cache. Details of exemplary API operations are discussed below.
  • the preferred embodiment of the present invention includes a plurality of control modules interconnected by redundant serial communication media such as redundant Fibre Channel Arbitrated Loops (“FC-AL”).
  • FC-AL redundant Fibre Channel Arbitrated Loops
  • the disk array control modules share access to an intelligent central cache memory (also referred to herein as a caching controller or cache control module).
  • the caching controller is cooperatively engaged with the control modules in the data storage subsystem (also referred to herein as controllers or as host adapters to indicate their primary function within the storage subsystem) to provide intelligent management of the cache.
  • the controllers access the caching controller to perform required caching operations relating to an I/O request processed within the controller.
  • This centralized cache architecture obviates the need to exchange substantial volumes of information between control modules to maintain consistency between their individual caches and to coordinate their shared access to common storage elements, as is taught by co-pending U.S. patent application Ser. No. 08/772,614. Eliminating coordination traffic within the storage subsystem frees the processing power of the several controllers for use in processing of I/O requests. Further, the reduced bandwidth utilization of the interconnecting bus (e.g., FC-AL) allows the previously consumed bandwidth to be used for data storage purposes other then mere overhead communication.
  • FC-AL interconnecting bus
  • the caching controller is a modification of an ordinary control module (host adapter) in the subsystem.
  • the caching controller is simply populated with significant cache memory as compared to the other controllers (host adapters) which are substantially depopulated of cache memory.
  • host adapter an ordinary control module
  • the caching controller is simply populated with significant cache memory as compared to the other controllers (host adapters) which are substantially depopulated of cache memory.
  • host adapters One skilled in the art will recognize that a limited amount of memory on each host adapter may be used for staging or buffering in communication with the central cache.
  • a multi-tiered cache structure may utilize a small cache on each controller but the large cache is centralized in accordance with the present invention.
  • controllers of the present invention are therefore simplified as compared to those of prior decentralized cache designs wherein each controller has local cache memory. Additional controllers may be added to the subsystem of the present invention to thereby increase I/O processing capability without the added complexity (cost) of duplicative cache memory.
  • the central cache controller of the present invention may be easily scaled to meet the needs of a particular application.
  • an additional cache controller is added in the preferred embodiment to provide redundancy for the centralized cache of the subsystem.
  • the redundant cache controllers communicate via a separate communication link (e.g., an FC-AL link) to maintain mirrored cache synchronization.
  • additional cache controllers may be added to the subsystem of the present invention for purposes of enlarging the central cache capacity. The additional cache controllers cooperate and communicate via the separate communication link isolated to the cache controllers.
  • a first cache controller may perform cache operations for a first segment of the cache (mapped to a particular portion of the disk array) while other cache controllers process other segments of the cache (mapped to other portions of the disk array).
  • Mirrored cache controllers may be added to the subsystem associated with each of the segment cache controllers.
  • It is still another object of the present invention is to improve performance in a data storage subsystem having a plurality of storage controllers by providing an intelligent central cache accessible to the plurality of storage controllers.
  • FIG. 1A is a block diagram of a prior art data storage subsystem
  • FIG. 1B is a block diagram of a prior art data storage subsystem having only generalized system memory and non-centralized data storage controller memory;
  • FIG. 2 is a block diagram of a first embodiment of the present invention, showing an intelligent central cache accessible by multiple controllers;
  • FIG. 3 is a block diagram of a prior art Fibre Channel Loop Architecture data storage subsystem having redundant controllers
  • FIG. 4 is a block diagram of a preferred embodiment of the present invention, showing a plurality of controllers and caching controllers interconnected by a FC-AL with a plurality of data storage elements;
  • FIG. 5 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a host requested write operation
  • FIG. 6 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a host requested read operation
  • FIG. 7 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a cache flush operation
  • FIG. 8 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache read operation
  • FIG. 9 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache insert operation
  • FIG. 10 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache flush operation
  • FIG. 11 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform an operation to retrieve a map of status information regarding stripes for flushing by a data storage controller.
  • FIG. 1A is a block diagram of data storage subsystem 102 as known in the prior art having a decentralized cache architecture.
  • the system has a plurality of storage controllers 104 (also referred to as control modules). Each control module 104 has its own local cache memory 106 . Controllers 104 are connected via communication medium 108 to data storage elements 110 . In normal operation, controllers 104 receive I/O requests and process the requests reading or writing as appropriate from or to data storage elements 110 . Each controller 104 utilizes its local cache memory 106 to speed response of common I/O requests.
  • each control module accesses a distinct portion of the storage elements.
  • the control modules do not share simultaneous access to any portion of the storage elements.
  • the control modules are operable independent of one another.
  • each control module of such a plurality of controllers is responsible for one or more logical units (LUNs) of the storage array. No other controller has the ability to simultaneously access those LUNs. Though a redundant or mirrored controller may be present, it is not simultaneously operable to access the LUNs managed by the first control module.
  • LUNs logical units
  • the plurality of control modules may simultaneously access common portions of the storage elements to thereby enhance the subsystem performance.
  • the plurality of controllers exchange messages amongst themselves to coordinate the shared access to the storage elements.
  • a plurality of RAID controllers in the subsystem may simultaneously access common LUNs.
  • Each controller may be operating on a separate I/O request associated with the shared LUN.
  • the controllers exchange messages with one another to coordinate the shared access to the common storage elements.
  • cache synchronization messages required to assure that all controllers which share access to a common portion of the storage elements are aware of the cache contents of other controllers which manipulate the shared storage elements. For example, if one controller completes an I/O operation which results in updates to its local cache memory, it must inform all other controllers of the cache update so that all caches maintain synchronization with respect to cached data not yet flushed to disk. Similarly, when one of the control modules sharing access to a common portion of the storage elements determines that the cached data need be flushed to the storage elements, it must notify other controllers associated with the shared storage elements to assure that all are aware of the updated state of the storage elements.
  • This coordination message exchange (coordination traffic) imposes significant overhead processing on the control modules (host adapters) and consumes valuable bandwidth on the communication medium interconnecting the subsystem components and thus impairs system performance.
  • FIG. 1B is a block diagram of another prior art data storage subsystem 150 exemplifying the use of a “dumb” central cache by a plurality of controllers.
  • This configuration may represent, for example, a device commonly referred to as a network file server.
  • a network file server is often a general purpose computer system with special software dedicated to the provision of file system services to an attached network of host systems (clients).
  • clients Such a system has a variety of processors operating on bus 166 and using the same memory, general host memory 160 (dumb central cache).
  • network controller 152 , local host controller 154 , file controller 156 , storage controller 162 , and potentially other controllers 158 all share access host memory 160 via bus 166 .
  • Each controller performs a unique function within the subsystem 150 .
  • network controller 152 manages network connections between the storage subsystem and external host systems
  • file controller 156 manages file system operations within the subsystem 150 to perform file operations requested by external host systems
  • storage controller 162 translates I/O requests generated by, for example, file controller 156 into appropriate lower level signals appropriate to the storage element 164 and its connection bus 168 (e.g., SCSI, IDE, EIDE, etc.).
  • Local host processor 154 guides and coordinates the overall operation of the controllers of subsystem 150 .
  • All the controllers share access to the host memory 160 via bus 166 .
  • the uses of host memory 160 may vary widely.
  • Network controller 152 may use the storage space for network protocol management while file controller 156 may use the storage space for file system management functions.
  • All processors and controllers may use the host memory for initial loading of their operation programs if not also for runtime fetch and execution of those programs.
  • host memory 160 is exemplary of a dumb memory bank used for myriad purposes within the storage subsystem 150 (e.g., a RAMdisk or solid state disk as known in the art). It is not dedicated to the cache storage of data and meta-data relating to I/O requests from attached host systems.
  • Typical systems with an architecture as depicted in FIG. 1B add local cache memory to controllers in the subsystem which require specialized, dedicated caching operations.
  • file controller 154 , network controller 152 , and storage controller 162 may each have local cache memory used for their specific functions.
  • the central cache (host memory 160 ) provides no specialized functionality for any of the myriad controllers sharing access to it. Rather, it is no more than a “dumb” memory bank in which various controllers may store information for any purpose.
  • FIG. 3 is a block diagram exemplifying another storage subsystem architecture known in the art.
  • Each storage control module 304 includes a local cache memory 306 used exclusively by its corresponding control module 304 .
  • Controllers 304 are connected via redundant FC-AL loops 308 and 310 to data storage elements 312 .
  • data storage elements 312 are disk arrays.
  • Control modules 304 are disk array control modules having RAID management capabilities. Each control module 304 maintains a decentralized cache 306 to aid it in rapid performance of I/O operations. In order to maintain cache synchronization, disk array control modules 304 must continuously signal back and forth to each other. In addition, each disk array control module 304 must carry out all RAID operations individually: configuration of LUNs, calculation of parity data, RAID management of failed devices, etc. As noted above with respect to FIG. 1A, coordination traffic on FC-AL loops 308 and 310 uses valuable processing power of the controllers 304 as well as communication bandwidth which could otherwise be used for performing I/O requests initiated by attached host systems.
  • FIGS. 1A, 1B and 3 All prior storage subsystems exemplified by FIGS. 1A, 1B and 3 share certain common problems. As noted above, when a plurality of controllers within such subsystems share access to common storage elements, a large volume of cache coordination message traffic is generated on the interconnection medium thereby reducing available processing power and communication bandwidth for processing of I/O requests between the controllers and the storage elements. In addition, the prior storage subsystems are not easily scaled up for performance enhancement. Since each controller may include a local cache for boosting its individual performance, the incremental cost of adding another controller is increased. Each controller has the added complexity of potentially large cache memory devices and associated glue and custom assist logic circuits (such as RAID parity assist circuits).
  • storage subsystems of the present invention include an intelligent centralized cache (also referred to as a cache controller) which is shared by all controllers in the storage subsystem. Since the cache controller of the present invention is a centralized resource, each controller sharing its function may be simplified by eliminating its local cache memory. Such a simplified controller reduces the incremental cost associated with adding a controller to the subsystem to enhance overall performance.
  • an intelligent centralized cache also referred to as a cache controller
  • the central cache of the present invention is intelligent in that it includes circuits dedicated to enhancing its specific purpose of caching data destined for storage elements.
  • the intelligent central cache of the present invention preferably includes parity assist (generation and checking) circuits to aid in rapidly performing required parity operations. Centralizing such intelligent assist circuits further reduces the cost and complexity of the RAID controllers in the storage subsystem.
  • the centralized cache of the present invention obviates the need found in the prior art for extensive cache coordination message traffic (such as cache and stripe lock message traffic).
  • the central cache preferably maintains control over the cache on behalf of all controllers in the subsystem.
  • a redundant (mirrored) or additional cache controller is added to the subsystem, a dedicated communication path is available for the exclusive purpose of inter-cache controller synchronization communication. No bandwidth on the common controller communication medium is required to assure mirrored cache synchronization.
  • a simpler (e.g., lower cost) embodiment may utilize the existing communication paths to avoid the cost of an additional dedicated communication path. Such an embodiment would sacrifice some performance enhancements of the present invention but at a cost and complexity savings.
  • the intelligent central cache of the present invention provides semaphore services for resource locking (stripe locking) to coordinate common access to the disk array by the plurality of control modules.
  • the intelligent cache controller of the present invention also provide cache mirroring features when additional cache controllers are added to the subsystem. As discussed below, multiple cache controllers coordinate their intelligent cache management functions in accordance with the present invention through a separate communication channel. The primary communication channel interconnecting the control modules, the cache controllers, and the storage elements remains unburdened by the requisite coordination traffic for mirrored cache operation. Additional cache modules may also operate in a cooperative manner rather than a mirrored architecture wherein each controller is responsible for cache operations associated with a particular portion of the storage elements total capacity.
  • FIG. 2 is a block diagram of a first embodiment of data storage subsystem 202 operable in accordance with the methods and structures of the present invention.
  • Controllers 204 access intelligent central cache 206 via communications medium 208 .
  • Controllers 204 and central cache 206 both access storage elements 210 via communication medium 208 .
  • Communication medium 208 may be any of several well known buses used for interconnection of electronic devices including, for example, SCSI, IDE, EIDE, IPI.
  • communication medium 208 may represent any of several serial communication medium such as FC-AL or SSA as depicted in FIG. 4 and as discussed below.
  • Intelligent central cache 206 is dedicated to data and meta-data caching in data storage subsystem 202 as distinct from controllers 204 which primarily serve to interface with attached host computer systems.
  • the intelligent central cache 206 eliminates the need for coordination traffic among controllers having local caches thereby freeing processing power within, and communication bandwidth between, controllers 204 thereby improving overall performance of data storage subsystem 202 .
  • Intelligent central cache 206 cooperates with controllers 204 to manage the storage subsystem 202 structure and organization of information on the storage elements 210 . For example, where storage subsystem 202 uses RAID storage management techniques, many RAID management functions specific to the cache are performed within intelligent central cache 206 .
  • RAID cache management functions including parity generation and checking and logical to physical mapping of host request supplied addresses to locations on the array of disk drive storage elements 210 may be performed entirely within the intelligent central cache 206 .
  • the management of the RAID disk array geometry may therefore be off-loaded from the RAID controllers 204 to the intelligent central cache 206 .
  • customized circuits to assist in RAID parity generation and checking can be integrated within intelligent central cache 206 .
  • intelligent central cache 206 maintains cache data and associated cache meta-data. Generation and maintenance of cache meta-data in a decentralized cache architecture requires significant processing within, and communication among, a plurality of controller sharing access to common storage elements.
  • the intelligent central cache 206 of the present invention centralizes this management function to reduce processing overhead load on the controllers 204 and to reduce communication (coordination traffic) among the controllers 204 .
  • Intelligent central cache 206 can also calculate cache statistical information. Using the cache statistical information, controllers 204 can tune their respective performance in view of statistical data corresponding to the cache usage in the overall subsystem 202 .
  • intelligent central cache 206 is designed as an electronic circuit board substantially identical to that of controller 204 but populated differently at time of manufacture to distinguish their respective function.
  • controller 204 may be depopulated of any RAID parity assist circuits and depopulated of substantially all cache memory and related support circuits.
  • Intelligent central cache 206 is preferably populated with parity assist devices and with a large cache memory for caching of data supplied from controllers 204 and related meta-data generated by the cache management functions operable within intelligent central cache 206 .
  • a controller 204 When a controller 204 prepares units of data to be cached in preparation for future posting to the storage elements 210 , it simply transmits the data and a cache request over bus 208 to the intelligent central cache 206 .
  • the intelligent central cache 206 places the received data in its cache memory along with any generated meta-data used to manage the cache memory contents.
  • the meta-data may be generated within central cache 206 such as noted above with respect to RAID parity assist or may be supplied by controller 204 as parameters when the data is supplied to central cache 206 .
  • the data supplied to central cache 206 is provided with addresses indicative of the desired disk location of the data on storage elements 210 .
  • central cache 206 determines which other data either in its cache memory or on the storage elements 210 are required for updating associated redundancy information.
  • the meta-data therefore indicates data that is new (currently unposted to the storage elements 210 ) versus old (presently posted to the storage elements 210 and also resident in central cache 206 ).
  • Other meta-data distinguishes parity/redundancy information from data in central cache 206
  • This central cache architecture improves overall subsystem performance by obviating the need for cache coordination message traffic over bus 208 thereby reducing overhead processing within the controller 204 and eliminating cache coordination message traffic over bus 208 .
  • Controllers 204 are therefore simpler than prior controllers exemplified as discussed above.
  • the simpler controllers are substantially void of any local cache memory and parity assist circuits.
  • the primary function served by the simpler controller is to provide an interface to attached host systems consistent with the storage management structure (e.g., RAID) of the subsystem.
  • This simpler design permits easier scaling of the subsystem's performance by reducing the costs (complexity) associated with adding additional controllers.
  • additional intelligent central cache devices may be added either to increase the cache size and/or to provide mirrored redundancy of the central cache contents.
  • the plurality of central cache devices communicate among themselves over a dedicated communication medium.
  • FIG. 4 is a block diagram of a preferred embodiment of the present invention representing the best presently known mode of practicing the invention.
  • FIG. 4 shows the data storage subsystem 402 having caching controller 406 dedicated to serving as an intelligent central cache memory and a second caching controller 408 dedicated to serving as a mirror of caching controller 406 .
  • Embodied in each cache controller 406 and 408 is cache memory 410 .
  • Cache controller 408 maintains in its local cache memory 410 a mirrored image of the content of cache memory 410 in cache controller 406 .
  • caching controller 408 is not limited to the role of mirroring cache controller 406 as in the preferred embodiment.
  • Caching controller 408 may also function as an additional intelligent central cache to provide enhanced cache capacity.
  • a first cache controller e.g., 406
  • a subsequent cache controller e.g., 408
  • a first cache controller 406 (with its local cache memory 410 ) may provide intelligent caching services to RAID controllers 404 for a first half of the storage elements 416 while the additional cache controller 408 (with its local cache memory 410 ) provides caching services for the second half of the storage elements 416 .
  • the preferred embodiment interconnects controllers 404 , caching controllers 406 and 408 , and data storage elements 416 via redundant FC-AL media 412 and 414 .
  • caching controller 406 and caching controller 408 have an additional dedicated FC-AL loop 418 which allows communication between them. Coordination traffic between the caching controllers 406 and 408 thus does not utilize any bandwidth on FC-AL loops 412 and 414 , thereby enabling the desired performance increase in data storage subsystem 402 .
  • controllers 404 are substantially identical electronic assemblies to that of cache controllers 406 and 408 but have been largely depopulated of their cache memory and associated circuits.
  • the cache memory function is provided centrally by caching controllers 406 and 408 . Because the caching function is centralized, overhead processing by RAID controllers 404 and communication on FC-AL 412 and 414 relating to cache synchronization is reduced or eliminated to thereby enhance subsystem performance.
  • Data storage elements 416 are preferably disk arrays. Controllers 404 and the caching controllers 406 and 408 cooperate to “map” the host-supplied address to the physical address of the storage elements. The tasks involved in this “mapping” or “translation” are one important part of the RAID management of the disk array 416 . Specifically, controllers 404 receive I/O requests from a host system (not shown) and translate those requests into the proper addressing format used by the caching controllers 406 and 408 . The data supplied in the host requests is mapped into appropriate parameters corresponding to the API operations described below. The cache controllers 406 and 408 then perform the logical to physical mapping required to store the data in cache memory 410 and to later retrieve the data for posting to the storage element 416 .
  • mapping host supplied request addresses into locations in the central cache may be recognized by those skilled in the art. Each such method may suggest a different distribution of the RAID management between the controllers 404 and the caching controllers 406 and 408 .
  • the mapping process which determines how stripes are mapped across a plurality of disk drives may be distributed between the control modules and the central cache.
  • the control modules may be solely responsible for mapping host addresses to RAID level 2 - 5 stripe locations and geometries (i.e., the central cache provides a linear address space for the control modules to access).
  • the central cache may possess exclusive knowledge of the mapping to RAID stripe geometries and distribution of data over the disk array.
  • the parameters supplied to the API functions of the central cache describe the addresses as known to the central cache.
  • cache controllers 406 and 408 are preferably responsible for RAID management tasks such as parity generation and checking for data supplied from the RAID controllers 404 .
  • the redundancy information, and other cache meta-data, generated and stored within cache memory 410 of cache controllers 406 and 408 is used to assist RAID controllers 404 in their RAID management of storage elements 416 .
  • RAID controllers 404 operable in a cache write-back mode may request the return of all dirty data along with associated redundancy information for posting (flushing) to storage elements 416 .
  • cache controllers 406 and 408 determine which data in its cache is marked as dirty, further determines what other data may be related to the dirty data (i.e., other data associated with the same stripe), and further generates or retrieves associated redundancy information for return with the dirty data to the requesting RAID controller 404 .
  • Cache controller 406 and 408 may, for example, read other related blocks of data from storage elements 416 and/or read old parity data from storage elements 416 in order to generate updated redundancy information.
  • Central cache controllers 406 and 408 therefore retain all information necessary to associate cache blocks with particular stripes of the disk array.
  • the cache meta-data identifies new data (dirty data yet unposted to the storage elements) versus old data (already posted to the storage elements 210 ).
  • central cache controllers 406 and 408 also provide a centralized control point for semaphore allocation, lock, and release to coordinate stripe locking.
  • Stripe locking as taught in co-pending U.S. patent application Ser. No. 08/772,614, enables a plurality of controllers (e.g., 404 ) to share and coordinate access to commonly attached storage elements (e.g., shared access to one or more RAID LUNs).
  • controllers e.g., 404
  • commonly attached storage elements e.g., shared access to one or more RAID LUNs.
  • central cache controllers 406 and 408 free computational processing power within controllers 404 and frees communication bandwidth on FC-AL 412 and 414 . The freed processing power and communication bandwidth is then available for improved processing of host generated I/O requests.
  • cache controllers 406 and 408 may operate in a mirrored operation mode. Cache mirroring operations and communications are also off-loaded from controllers 404 . Rather, cache controllers 406 and 408 communicate directly with one another via a dedicated communication path 418 . Still further as noted above, in the preferred embodiment of FIG. 4, caching controllers 406 and 408 preferably provide centralized cache statistical information such as write over-writes or cache hit rate to controllers 404 (or to a host system not shown). Controllers 404 can use this centralized cache statistical information to tune the performance of the data storage subsystem 402 in view of subsystem wide cache efficiency.
  • the centralized cache of the present invention presents its features to the commonly attached controllers via an API. These API features are then accessed by the controllers using well known inter-process communication techniques applied to a shared communication path. As noted above with respect to FIGS. 2 and 4, the shared communication path may utilize any of several communication media and topologies.
  • a BLOCKLIST is a variable length list of entries each of which describes a particular range of logical blocks in a logical unit (LUN) which are relevant to the central cache operation requested.
  • a STRIPELIST is a variable length list of entries each of which describes a particular range of RAID stripes in a LUN which are relevant to the central cache operation requested.
  • Each BLOCKLIST entry contains substantially the following fields: long LUN // the logical unit identifier for the desired blocks long st_block // logical block number of the first block of interest long n_block // number of contiguous blocks of interest parm_t params // attributes and parameters of the identified blocks
  • Each STRIPELIST entry contains substantially the following fields: long LUN // the logical unit identifier for the desired stripes long st_stripe // logical stripe number of the first stripe of interest long n_stripe // number of contiguous stripes of interest parm_t params // clean/dirty, new/old, etc. attributes of data
  • the API function exchanges data along with the BLOCKLIST or STRIPELIST parameters
  • the associated block data is transferred over the communication medium following the API request. For example, blocks to be added to the cache are preceded by an appropriate API request to the central cache then the actual information in those blocks is transferred to the central cache.
  • data requested from the central cache is returned from the central cache to the requesting controller following execution of the API function within the central cache.
  • Communication protocols and media appropriate to control such multi-point communications are well known in the art.
  • those skilled in the art will readily recognize a variety of error conditions and appropriate recovery techniques therefor. Error status indications are exchanged between the central cache and the controllers as appropriate for the particular API function.
  • Exemplary API functions include:
  • cache_insert(BLOCKLIST blist) //inserts blocks in blist to central cache
  • the specified list of blocks are inserted in the central cache with the parameters and attributes as specified in each clocks BLOCKLIST entry. As noted, the actual data to be inserted in the specified blocks of the central cache are transferred following the transfer of the API request.
  • the specified parameters and attributes include:
  • the associated block is either a NEW block in a stripe or an OLD block.
  • the associated block is either a DATA portion of a stripe or a PARITY portion of a stripe.
  • Each sector may be VALID (e.g., contains useful data) or INVALID.
  • Each sector may be DIRTY (e.g., contains data not yet posted to the disk array) or CLEAN.
  • cache_modify(BLOCKLIST blist) //modifies attributes of blocks in blist
  • cache_delete(BLOCKLIST blist) //deletes blocks in blist from central cache
  • cache_read(BLOCKLIST blist) //returns information from the specified blocks
  • cache_xor (BLOCKLIST blist 1 , BLOCKLIST blist 2 , . . . BLOCKLIST blistN, BLOCKLIST blistdest) //returns the XOR of the specified blocks
  • the central cache retrieves the specified blocks in central cache and computes the XOR parity of those blocks for return to the requesting controller in the supplied destination block list.
  • a variable number of “source” block lists may be supplied to this API function.
  • the first block in the first block list parameter is XOR'd with the first block of the second block list parameter, and the third, fourth, etc.
  • the second block in the first block list parameter is XOR'd with the second block in the second block list , and the third, fourth, etc. until all specified blocks are XOR'd together.
  • the last block list parameter identifies a list in which the XOR results of the specified blocks are returned.
  • this function may be centralized in the central cache to further simplify the structure of the control modules.
  • the control modules need not include special parity assist circuits. Rather, the central cache can provide the requisite functions to all control modules.
  • the central cache maintains centralized knowledge regarding the parameters of each block in a stripe.
  • a controller determines that it must flush the cache contents of “dirty” data, it may invoke this function to retrieve a map of the status (attributes) of each block in each of the specified stripes.
  • the map information regarding the requested stripes is returned to the requesting controller.
  • cache_flush(STRIPELIST slist) //performs a flush of the requested stripes
  • the controllers may perform flushes by requesting a stripe map then requesting a read of the specific blocks to be flushed.
  • the controllers may request that the central cache perform a flush on their behalf.
  • the central cache has centralized information regarding the attributes of each block in the cache.
  • the central cache may have a communication path to the disk array devices as do the controllers. Where such access to the disk drives is provided to the central cache modules, the central cache may locally perform the requested flush operations directly without further intervention by the controllers.
  • the central cache flushes all blocks in the requested stripes which are dirty then alters the attributes of those blocks as required to indicate their new status.
  • the cooperating controllers which share access to common disk drives must coordinate their concurrent access via semaphore locking procedures.
  • the central cache may provide such semaphore lock procedures for use by the cooperating controllers.
  • the requested stripes if not presently locked, are locked and appropriate status returned to the requesting controller. If some of the requested stripes are presently locked, a failure status may be returned to the requesting controller.
  • the central controller may queue such requests and coordinate the allocation of locked stripes among the various controllers.
  • FIGS. 5 - 11 are flowcharts describing the methods of the present invention operable in controllers (e.g., 404 of FIG. 4 and 204 of FIG. 2) in cooperation with cache controllers (e.g., 406 and 408 of FIG. 4 and 206 of FIG. 2) utilizing the above API functions.
  • FIGS. 5 - 7 describe methods operable in storage controllers in accordance with the present invention to perform host initiated write and read requests and to initiate a cache flush operation.
  • FIGS. 8 - 11 are flowcharts describing cooperative methods operable within the caching controllers to perform read and write requests as well as cache flush operations.
  • the flowcharts of FIGS. 5 - 7 are intended as examples of the application of the API functions of the present invention. They are not intended to be exhaustive in demonstrating the use of every API function or every combination of API functions.
  • FIG. 5 illustrates the operation of controllers (e.g., 404 of FIG. 4) in processing host generated I/O write requests in accordance with the present invention.
  • Element 500 is first operable to translate the received write request into appropriately formatted central cache operations required.
  • the request to the central cache passes the host supplied data to the central cache controller along with block addressing information as discussed above.
  • the particular cache addressing structure employed determines the precise processing performed by operation of element 500 .
  • Element 502 is next operable to transfer the translated cache request to the central cache controller (e.g., 406 and 408 ) via the controller communication medium (e.g., 412 and 414 ).
  • the controller may indicate completion of the I/O request to the host computer and thereby complete processing of the received I/O request from the perspective of the attached host computer.
  • the controller invokes the cache_insert API function to request the new data be inserted in the central cache.
  • the BLOCKLIST provided includes the NEW attribute for all blocks so added to the cache.
  • the host computer may then continue with other processing and generation of other I/O requests. Subsequent operation of the controller, discussed below, may determine that the newly posted data in the central cache needs to be flushed to the disk array.
  • FIG. 6 is a flowchart describing the operation of controllers (e.g., 404 of FIG. 4) in processing host generated I/O read requests in accordance with the present invention.
  • Element 600 is first operable to translate the received read request into appropriately formatted central cache operations required. The request to the central cache passes the block addressing information as discussed above for the data requested by the host read request.
  • Element 616 is then operable to determine if the storage subsystem is operating in a RAID degraded mode due to failure of a drive in the specified LUN. If not, processing continues with element 602 as discussed below. If the subsystem is operating in a degraded mode, element 618 is next operable to translate the host request into appropriate requests for the entire stripe(s) associated with the requested blocks. In particular, the controller requests the stripes by use of the cache_read API function where the BLOCKLIST requests all blocks in the associated stripes. Element 620 then awaits return of the requested information by the central cache controller.
  • Element 622 then performs an XOR parity computation on the returned strip blocks to generate any missing data blocks due to the failed drive.
  • the XOR parity computation may be performed locally by the controller or may be performed by invoking the cache_xor API function to generate the parity for a list of blocks in the affected stripe(s). As noted above, the latter approach may be preferred if the controllers are simplified to eliminate XOR parity assist circuits while the central cache controller retains this centralized capability on behalf of the control modules. Processing then completes with element 614 returning the requested data, retrieved from the central cache, to the requesting host system.
  • Element 622 therefore represents any such RAID processing to assure reliable, redundant data storage as defined for the selected RAID management technique.
  • element 602 is next operable to transfer the translated cache request to the central cache controller (e.g., 406 and 408 ) via the controller communication medium (e.g., 412 and 414 ).
  • the controller issues a cache_read API function to retrieve the requested blocks of data.
  • Element 604 then awaits return of the requested data (or other status) from the central cache controller.
  • Central cache controller may return one of three possible conditions. First, central cache controller may return the requested data in its entirety. Second, only a portion of the requested data may reside in cache memory of the central cache controller and therefore only that portion of the requested data may be returned. Third, none of the requested data may reside in cache memory and therefore none of the requested data may be returned. A status code indicative of one of these three conditions is returned from central cache controller to the requesting RAID controller.
  • Element 606 is then operable to determine from the returned status code which of the three possible conditions is actually returned. If all requested data resided in the central cache controller's cache memory, then all requested data was returned and processing continues with element 614 to return the data to the host system and to thereby complete processing of the I/O read request. If less than all data was returned from central cache controller, element 608 is next operable to read the additional data from the storage elements. The additional data comprises any requested data not returned from central cache controller.
  • Element 610 is next operable after reading the additional data from disk to determine whether the additional data should be transferred to central cache.
  • Well known storage management techniques may be applied to make the determination as to whether the additional data should be added to the central cache. If so, element 612 is operable in a manner similar to that of element 502 above to transfer the additional data read from the disk array to the central cache. Specifically, the controller issues a cache_insert API request to insert the additional data blocks into the central cache memory.
  • element 614 is operable, as noted above, to return all requested data to the host system and to thereby complete processing of the host system generated I/O read request.
  • FIG. 7 is a flowchart describing the operation of a RAID controller in accordance with the present invention to flush new data (dirty data) from the central cache to the disk array.
  • Well known storage management techniques may be applied to determine when the cache need be flushed.
  • the methods of the present invention are rather directed to techniques to flush a centralized cache shared by a plurality of controllers. Each controller may therefore make independent determinations as to whether and when to lush new data from central cache to the disk array.
  • the methods and structure of the present invention allow for the intelligent central cache controller(s) to determine independently that the cache memory content should be flushed (posted) to disk.
  • the flowchart of FIG. 7 therefore describes processing within any of the RAID controllers after a determination has been made that the cache data should be flushed to the disk array.
  • Element 700 is first operable to determine whether the cache flush operation should be performed by the RAID controller itself or should be requested of the central cache controllers. This determination may be made based upon present loading of the requesting RAID controller as compared to the central cache controller. If the determination is made that the central cache controller should perform the cache flush, element 702 is operable to generate and transfer a request to central cache controller requesting that it flush all new data for a given stripe list from its cache memory to the disk array. If the local RAID controller is to perform the flush operation, element 708 is next operable to request a stripe lock from the central cache controller for all stripes affected by the flush request. As noted, other well known methods are applied to determine which stripes are to be flushed at a particular time.
  • the controller issues a cache stripe ⁇ lock req API request for the affected stripes.
  • the central cache controller returns if the lock is granted. If the requested lock cannot be immediately granted, the central cache controller may queue the request and grant it at a later time. In the alternative, the central cache controller may return a failure status (not shown) and allow the controller to determine a strategy for handling the failure.
  • element 710 is operable to request a stripe map from the central cache controller to identify which blocks in the affected stripes are still marked as “dirty.” Only the central cache retains centralized knowledge of the present state of each block in cache. Other controllers may have previously requested a flush of the affected stripes and therefore blocks through to be “dirty” by this requesting controller may have been previously posted to the disk array. Specifically, the controller issues a cache_stripe_map API request to obtain this map information. Next, element 712 performs an XOR parity computation to generate updated parity blocks for the affected stripes. As above, this parity computation may be performed locally on the requesting controller or centrally in the central cache controller via a cache_xor API function.
  • Element 704 is next operable to request and retrieve all new (dirty) data from the central cache controller as indicated by the strip map previously retrieved.
  • element 704 issues cache_read API requests for the data blocks having dirty data to be posted.
  • Element 706 is then operable to perform the required disk operations to flush the retrieved new data from the central cache to the disk array. Further, element 706 issues an appropriate API request to alter the attributes for the posted blocks.
  • a cache_modify API request is issued to alter parameters for an identified list of blocks. The blocks just posted to disk by the flush operation would be altered to a CLEAN attribute.
  • a cache_delete API request may be issued to remove the flushed blocks from the cache. Element 714 then unlocks the affected stripes.
  • FIGS. 8 - 11 describe methods of the present invention operable within central cache controllers 406 and 408 in response to API requests generated by RAID controller 404 as noted above.
  • FIG. 8 describes the operation of the central cache controller in response to a cache_read API request generated by one of the RAID controllers.
  • a cache_read API request generated by one of the RAID controllers.
  • such a request may result in all requested data being found in the cache memory and returned, a portion of the requested data being found in the cache memory and returned, or none of the requested data being found in the cache memory.
  • Element 800 first determines whether all requested data presently resides in the cache memory. If so, element 806 is next operable to return all requested data from the cache memory to the requesting RAID controller to thereby complete the read cache data request.
  • element 802 is operable to determine whether disk read operations to retrieve the additional data should be issued locally within the central cache controller or left to the option of the requesting RAID controller. If the additional data will be retrieved from disk by the RAID controller, element 808 is next operable to return that portion of the requested data which was found in the cache memory to thereby complete the read cache data request.
  • Element 810 is next operable if the additional data is to be read from the disk drive locally within the central cache controller. Element 810 determines whether the subsystem is operating in a degraded mode due to failure of a disk in the requested LUN. If not in degraded mode, processing continues with element 804 discussed below. If operating in degraded mode, element 812 is operable to retrieve from the cache the entire stripe associated with each requested block. Element 814 then performs a local parity computation using the parity assist features of the central cache controller to recover any data missing due to the disk failure. Processing then continues with element 806 below.
  • Element 804 reads any additional data required to satisfy the requested read cache data request.
  • Well known cache management techniques may operate within central cache controller to determine what data, in addition to the requested data, may also be read. For example, other data physically near the requested data (such as the remainder of a track or cylinder) may be read in anticipation of future use. Or, for example, associated parity data may be read from the disk array in anticipation of its use in the near future.
  • Element 806 is then operable in response to reading the additional data to return all data requested by the read cache data request to the requesting RAID controller to thereby complete the request.
  • FIG. 9 describes the operation of the central cache controllers 406 and 408 in response to a cache_insert API request from a RAID controller.
  • Element 900 is first operable to lock the stripe(s) associated with the blocks to be inserted. Since the central cache controls the semaphore locking, it performs the lock locally without intervention by or notice to attached controllers. The lock prevents other controllers from accessing the affected blocks until the insert operation is completed. For example, the lock prevents another controller from requesting a cache insert or flush operation.
  • Element 902 then inserts the supplied blocks into the cache memory of the central cache controller in accordance with its specified block numbers and with attributes as indicated by the parameters of the BLOCKLIST entries. Where the blocks contain new data, the new data overwrites any previous data in the cache whether clean or dirty, etc. Lastly, element 904 unlocks the-locked stripes to permit other operations.
  • FIG. 10 describes the operation of the central cache controller in response to a cache_flush API request from an attached controller.
  • the present invention permits the controllers to perform flushes locally such that each controller performs its own flush operation by use of cache_stripe_map and cache_read API function requests.
  • the central cache controller responds to such requests to supply the data requested by the controller with centralized knowledge of the present status of each block in the central cache memory.
  • the controllers may request that the central cache controller perform the cache flush operation on behalf of the controller.
  • the controller issues a cache_flush API request with a STRIPELIST indicating the stripes that the controller has determined should be flushed.
  • the central cache controller performs the cache flush for the requested stripes but with centralized knowledge as to the present status of each block in the requested stripes. In particular, some of the requested stripes may have been previously flushed by operations requested from other controllers. The central cache controller therefore performs the requested flush in accordance with the present status of each block in the requested stripes.
  • the central cache controller may include background processing which periodically flushes data from the central cache memory to the disk array in response to loading analysis within the central cache controllers.
  • background processing which determines what data to flush at what time may simply invoke the processing depicted in FIG. 10 to perform the desired flush operations.
  • Element 1000 is first operable to lock all stripes in the STRIPELIST of the cache_flush API request.
  • Element 1002 locates all new (unposted or dirty) data in the cache memory of the central cache controller for the requested stripes.
  • the central controller is the central repository for present status information regarding all blocks in the central cache. It is therefore possible that the controller has requested the flushing of one or more stripes which no longer contain “dirty” data.
  • Element 1004 is therefore operable to unlock any stripes among the requested, locked stripes which no longer contain any dirty data to be flushed.
  • Element 1006 then reads any additional data required for posting of the located data. For example, current data corresponding to other data blocks in a stripe and/or the redundancy information (parity) for a stripe may be required in order to update the parity (redundancy information) for stripes about to be flushed. Or for example, element 1006 may determine that other data, unrelated to the particular stripe to be flushed, could be optimally read at this time in anticipation of future access (e.g., a read-ahead determination made by the controller or by the central cache controller). Element 1008 is operable to perform any required disk operation required to flush the located dirty data and associated parity updates to the disk array.
  • parity redundancy information
  • Element 1008 is further operable to update the status of all blocks flushed by the disk operations performed. Those blocks which were marked as “dirty” blocks are now marked as “clean”, no longer in need of flushing. Lastly, element 1010 unlocks the stripes which are now successfully flushed by operation of element 1008 .
  • the cache flush method of FIG. 10 may be invoked by request of a RAID controller as noted above or may be invoked by local RAID management intelligence of the central cache controller.
  • a decision to flush the contents of the central cache may be made by one of the plurality of RAID controllers or by the intelligent central cache controller(s) themselves.
  • the operations required to flush the cache content may be performed within the central cache controller or by one of the RAID controllers by retrieval of new data from the central cache.
  • FIG. 11 describes the operation of the central cache controller in response to cache_stripe_map API requests from a RAID controller.
  • controllers may perform their own flush operations by requesting dirty data from the central cache for stripes to be flushed.
  • the controllers request information from the central cache controller for stripes believed to contain dirty data.
  • the information consists of a map of each stripe of interest which describes the status of each block in the identified stripes.
  • Element 1100 first locates the requested status information regarding blocks in the stripes identified by the controllers STRIPELIST parameter. Element 1102 then builds the map information into data structure for return to the requesting controller. Element 1104 then returns the data to the requesting controller.

Abstract

Apparatus and methods which allow multiple storage controllers sharing access to common data storage devices in a data storage subsystem to access a centralized intelligent cache. The intelligent central cache provides substantial processing for storage management functions. In particular, the central cache of the present invention performs RAID management functions on behalf of the plurality of storage controllers including, for example, redundancy information (parity) generation and checking as well as AID geometry (striping) management. The plurality of storage controllers (also referred to herein as RAID controllers) transmit cache requests to the central cache controllers. The central cache controllers performs all operations related to storing supplied data in cache memory as well as posting such cached data to the storage array as required. The storage controllers are significantly simplified because the present invention obviates the need for duplicative local cache memory on each of the plurality of storage controllers. The storage subsystem of the present invention obviates the need for inter-controller communication for purposes of synchronizing local cache contents of the storage controllers. The storage subsystem of the present invention offers improved scalability in that the storage controllers are simplified as compared to those of prior designs. Addition of controllers to enhance subsystem performance is less costly than prior designs. The central cache controller may include a mirrored cache controller to enhance redundancy of the central cache controller. Communication between the cache controller and its mirror are performed over a dedicated communication link.

Description

    RELATED PATENTS
  • This patent is related to commonly assigned, U.S. patent application Ser. No. 08/772,614 entitled METHODS AND APPARATUS FOR COORDINATING SHARED MULTIPLE RAID CONTROLLER ACCESS TO COMMON STORAGE DEVICES filed Dec. 23, 1996 which is hereby incorporated by reference. [0001]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0002]
  • This invention relates generally to caching within a data storage subsystem and in particular to controller element(s) used as intelligent central cache apparatus within multiple redundant controller data storage subsystems. [0003]
  • 2. Discussion of Related Art [0004]
  • Modern mass storage subsystems are continuing to provide increasing storage capacities to fulfill user demands from host computer system applications. Due to this critical reliance on large capacity mass storage, demands for enhanced reliability are also high. Various storage device configurations and geometries are commonly applied to meet the demands for higher storage capacity while maintaining or enhancing reliability of the mass storage subsystems. [0005]
  • A popular solution to these mass storage demands for increased capacity and reliability is the use of multiple smaller storage modules configured in geometries that permit redundancy of stored data to assure data integrity in case of various failures. In many such redundant subsystems, recovery from many common failures can be automated within the storage subsystem itself due to the use of data redundancy, error codes, and so-called “hot spares” (extra storage modules which may be activated to replace a failed, previously active storage module). These subsystems are typically referred to as redundant arrays of inexpensive (or independent) disks (or more commonly referred to by the acronym RAID). The 1987 publication by David A. Patterson, et al., from University of California at Berkeley entitled [0006] A Case for Redundant Arrays of Inexpensive Disks (RAID), reviews the fundamental concepts of RAID technology.
  • RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array. The controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive. In fact, the RAID controller may distribute the host computer system supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability. [0007]
  • In some RAID configurations a portion of data is distributed across a plurality of data disk drives and associated redundancy information is added on an additional drive (often referred to as a parity drive when XOR parity is used for the redundancy information). In such configurations, the related data so distributed across a plurality of drives is often referred to as a stripe. In most RAID architectures, the “write” operation involves both a write of the data to the data disk and also a adjustment of parity information. The parity information adjustment may involve the reading of other data in the same stripe and writing of the newly computed parity for the blocks of the stripe. This imposes a large “write penalty” upon RAID systems (RAID levels [0008] 3-6), often making them slower than traditional disk systems in the typical write I/O operation.
  • Known RAID subsystems provide cache memory structures to further improve the performance of the RAID subsystem write operations. The cache memory is associated with the control module such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host system. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required. [0009]
  • It is taught in co-pending U.S. patent application Ser. No. 08/772,614 to provide redundant control modules sharing access to common storage modules to improve subsystem performance while reducing the failure rate of the subsystem due to control electronics failures. In such redundant architectures as taught by co-pending U.S. patent application Ser. No. 08/772,614, a plurality of control modules are configured such that they control the same physical array of disk drives. As taught by prior designs, a cache memory module is associated with each of the redundant control modules. Each controller will use its cache during control of the data storage volume which it accesses. [0010]
  • In this configuration, the controllers gain the advantage of being able to simultaneously handle multiple read and write requests directed to the same volume of data storage. However, since the control modules may access the same data, the control modules must communicate with one another to assure that the cache modules are synchronized. Other communications among the cooperating controllers are used to coordinate concurrent access to the common resources. Semaphore locking and related multi-tasking techniques are often utilized for this purpose. The control modules therefore communicate among themselves to maintain synchronization of their respective, independent cache memories. Since many cache operations require the controllers to generate these synchronization signals and messages or semaphore locking and releasing messages, the amount of traffic (also referred to as coordination traffic or cache coordination traffic) generated can be substantial. This coordination traffic imposes a continuing penalty upon the operation of the data storage subsystem by utilizing valuable bandwidth on the interconnection bus as well as processing overhead within the multiple control modules. If not for this overhead imposed by coordination traffic, the data storage subsystem would have more bandwidth and processing power available for I/O processing and would thus operate faster. [0011]
  • In such a configuration wherein each control module has its own independent cache memory (also referred to herein as decentralized cache), there is significant duplication of the circuits and memory that comprise the cache memory on each control module. This duplication increases the complexity (and therefore the cost of manufacture) of the individual control modules. A decentralized cache architecture subsystem is scaled up by addition of control modules, each with its own duplicated cache memory circuits. This added complexity (and associated costs) therefore makes simple scaling of performance problematic. [0012]
  • In view of the above it is clear that a need exists for an improved cache architecture for redundant control module data storage subsystems which improves data storage subsystem performance and scalability while reducing duplication and complexity of known designs. [0013]
  • SUMMARY OF THE INVENTION
  • The present invention solves the above and other problems, and thereby advances the useful arts, by providing an intelligent central cache shared among a plurality of storage controllers in a storage subsystem. An intelligent central cache is a cache cooperatively engaged with the control modules (storage controllers) to provide caching within the storage subsystem. Various functions are performed within the intelligent central cache including storage, generation, and maintenance of cache meta-data, stripe lock functions to enable coordinated sharing of the central cache features, and functions to coordinate cache flush operations among the plurality of attached control modules. [0014]
  • By contrast, a “dumb” (unintelligent) cache, though it may be a centralized resource, is one used merely as a memory bank, typically for myriad purposes within the data storage subsystem. The intelligent cache of the present invention shares with the attached controllers much of the control logic and processing for determining, for example, when, whether, and how to cache data and meta-data in the cache memory. Cache meta-data includes information regarding the type of data stored in the cache including indications that corresponding data is clean or dirty, current or old data, and redundancy (e.g., RAID parity) data or user related data. The intelligent central cache of the present invention generates, stores, and utilizes cache meta-data for making such determinations relating to the operation of the central cache independently of and/or cooperatively with the storage controllers of the subsystem. Furthermore, the intelligent central cache of the present invention coordinates the management of non-volatility in the cache memory by coordinating with the control modules the monitoring of battery backup status, etc. [0015]
  • The features of the central cache are made accessed by the plurality of controllers through an application program interface (API) via inter-process communication techniques. In particular, the control modules may request, via an API function, that information be inserted or deleted from the cache. Attributes are provided by the requesting controller to identify the type of data to be inserted (e.g., clean or dirty, new or old, user data or parity, etc.). Other API functions are used to request that the central controller read or return identified data to a requesting controller. Attribute data may also be so retrieved. API functions of the intelligent central cache also assist the controllers in performing cache flush operations (such as required in write-back cache management operations). An API function requests of the central cache a map identifying the status of data blocks in particular identified stripes. The requesting control module may then use this map information to determine which data blocks in the identified stripes are to be flushed to disk. Other API functions allow the central cache to perform cache flush operations independent of requests from the attached control modules. Still other API functions provide the low level stripe lock (semaphore management) functions required to coordinate the shared access by control modules to the central cache. Details of exemplary API operations are discussed below. [0016]
  • The preferred embodiment of the present invention includes a plurality of control modules interconnected by redundant serial communication media such as redundant Fibre Channel Arbitrated Loops (“FC-AL”). The disk array control modules share access to an intelligent central cache memory (also referred to herein as a caching controller or cache control module). The caching controller is cooperatively engaged with the control modules in the data storage subsystem (also referred to herein as controllers or as host adapters to indicate their primary function within the storage subsystem) to provide intelligent management of the cache. The controllers access the caching controller to perform required caching operations relating to an I/O request processed within the controller. [0017]
  • This centralized cache architecture obviates the need to exchange substantial volumes of information between control modules to maintain consistency between their individual caches and to coordinate their shared access to common storage elements, as is taught by co-pending U.S. patent application Ser. No. 08/772,614. Eliminating coordination traffic within the storage subsystem frees the processing power of the several controllers for use in processing of I/O requests. Further, the reduced bandwidth utilization of the interconnecting bus (e.g., FC-AL) allows the previously consumed bandwidth to be used for data storage purposes other then mere overhead communication. [0018]
  • The I/O request processing power in a storage subsystem in accordance with the present invention is easily scaled as compared to known systems. In the preferred embodiment of the present invention, the caching controller is a modification of an ordinary control module (host adapter) in the subsystem. The caching controller is simply populated with significant cache memory as compared to the other controllers (host adapters) which are substantially depopulated of cache memory. One skilled in the art will recognize that a limited amount of memory on each host adapter may be used for staging or buffering in communication with the central cache. Or for example, a multi-tiered cache structure may utilize a small cache on each controller but the large cache is centralized in accordance with the present invention. The controllers of the present invention are therefore simplified as compared to those of prior decentralized cache designs wherein each controller has local cache memory. Additional controllers may be added to the subsystem of the present invention to thereby increase I/O processing capability without the added complexity (cost) of duplicative cache memory. [0019]
  • In addition, the central cache controller of the present invention, per se, may be easily scaled to meet the needs of a particular application. First, an additional cache controller is added in the preferred embodiment to provide redundancy for the centralized cache of the subsystem. The redundant cache controllers communicate via a separate communication link (e.g., an FC-AL link) to maintain mirrored cache synchronization. Secondly, additional cache controllers may be added to the subsystem of the present invention for purposes of enlarging the central cache capacity. The additional cache controllers cooperate and communicate via the separate communication link isolated to the cache controllers. A first cache controller may perform cache operations for a first segment of the cache (mapped to a particular portion of the disk array) while other cache controllers process other segments of the cache (mapped to other portions of the disk array). Mirrored cache controllers may be added to the subsystem associated with each of the segment cache controllers. [0020]
  • It is therefore an object of the present invention to improve data storage subsystem performance in a data storage subsystem having a plurality of controllers. [0021]
  • It is another object of the present invention to improve data storage subsystem performance by providing an intelligent central cache within the data storage subsystem. [0022]
  • It is still another object of the present invention is to improve performance in a data storage subsystem having a plurality of storage controllers by providing an intelligent central cache accessible to the plurality of storage controllers. [0023]
  • It is a further object of the present invention to reduce the complexity of storage controllers in a data storage subsystem having a plurality of such storage controllers by providing an intelligent central cache shared by all such storage controllers. [0024]
  • It is yet a further object of the present invention to improve the scalability of a data storage subsystem having a plurality of storage controllers by obviating the need for local cache memory on each such storage controller and providing an intelligent central cache shared by all such storage controllers in the subsystem. [0025]
  • The above and other objects, aspects, features and advantages of the present invention will become apparent from the following detailed description and the attached drawings. [0026]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1A is a block diagram of a prior art data storage subsystem; [0027]
  • FIG. 1B is a block diagram of a prior art data storage subsystem having only generalized system memory and non-centralized data storage controller memory; [0028]
  • FIG. 2 is a block diagram of a first embodiment of the present invention, showing an intelligent central cache accessible by multiple controllers; [0029]
  • FIG. 3 is a block diagram of a prior art Fibre Channel Loop Architecture data storage subsystem having redundant controllers; [0030]
  • FIG. 4 is a block diagram of a preferred embodiment of the present invention, showing a plurality of controllers and caching controllers interconnected by a FC-AL with a plurality of data storage elements; [0031]
  • FIG. 5 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a host requested write operation; [0032]
  • FIG. 6 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a host requested read operation; [0033]
  • FIG. 7 is a flowchart illustrating the operation of the data storage controllers of the preferred embodiment in performing a cache flush operation; [0034]
  • FIG. 8 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache read operation; [0035]
  • FIG. 9 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache insert operation; [0036]
  • FIG. 10 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform a cache flush operation; and [0037]
  • FIG. 11 is a flowchart illustrating the operation of the caching controllers in conjunction with the data storage controllers of the preferred embodiment to perform an operation to retrieve a map of status information regarding stripes for flushing by a data storage controller. [0038]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • While the invention is susceptible to various modifications and alternative forms, a specific embodiment thereof has been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that it is not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. [0039]
  • Prior Art Storage Subsystems
  • FIG. 1A is a block diagram of [0040] data storage subsystem 102 as known in the prior art having a decentralized cache architecture. The system has a plurality of storage controllers 104 (also referred to as control modules). Each control module 104 has its own local cache memory 106. Controllers 104 are connected via communication medium 108 to data storage elements 110. In normal operation, controllers 104 receive I/O requests and process the requests reading or writing as appropriate from or to data storage elements 110. Each controller 104 utilizes its local cache memory 106 to speed response of common I/O requests.
  • In most known storage subsystems having a plurality of control modules, each control module accesses a distinct portion of the storage elements. The control modules do not share simultaneous access to any portion of the storage elements. In such known systems, the control modules are operable independent of one another. For example, in most RAID storage subsystems, each control module of such a plurality of controllers is responsible for one or more logical units (LUNs) of the storage array. No other controller has the ability to simultaneously access those LUNs. Though a redundant or mirrored controller may be present, it is not simultaneously operable to access the LUNs managed by the first control module. [0041]
  • In other storage subsystems as taught in co-pending U.S. patent application Ser. No. 08/772,614, the plurality of control modules may simultaneously access common portions of the storage elements to thereby enhance the subsystem performance. In such systems, the plurality of controllers exchange messages amongst themselves to coordinate the shared access to the storage elements. For example, in some RAID subsystems, a plurality of RAID controllers in the subsystem may simultaneously access common LUNs. Each controller may be operating on a separate I/O request associated with the shared LUN. As noted in such systems, the controllers exchange messages with one another to coordinate the shared access to the common storage elements. Among the messages are cache synchronization messages required to assure that all controllers which share access to a common portion of the storage elements are aware of the cache contents of other controllers which manipulate the shared storage elements. For example, if one controller completes an I/O operation which results in updates to its local cache memory, it must inform all other controllers of the cache update so that all caches maintain synchronization with respect to cached data not yet flushed to disk. Similarly, when one of the control modules sharing access to a common portion of the storage elements determines that the cached data need be flushed to the storage elements, it must notify other controllers associated with the shared storage elements to assure that all are aware of the updated state of the storage elements. This coordination message exchange (coordination traffic) imposes significant overhead processing on the control modules (host adapters) and consumes valuable bandwidth on the communication medium interconnecting the subsystem components and thus impairs system performance. [0042]
  • For example, in FIG. 1A, significant coordination traffic in [0043] communication medium 108 between control modules 104 to maintain synchronization of cache memories 106 consumes available bandwidth on communication medium 108 reducing available bandwidth for operations between the control modules 104 and the storage elements 110.
  • FIG. 1B is a block diagram of another prior art [0044] data storage subsystem 150 exemplifying the use of a “dumb” central cache by a plurality of controllers. This configuration may represent, for example, a device commonly referred to as a network file server. A network file server is often a general purpose computer system with special software dedicated to the provision of file system services to an attached network of host systems (clients). Such a system has a variety of processors operating on bus 166 and using the same memory, general host memory 160 (dumb central cache). For example, in FIG. 1B, network controller 152, local host controller 154, file controller 156, storage controller 162, and potentially other controllers 158 all share access host memory 160 via bus 166. Each controller performs a unique function within the subsystem 150. For example, network controller 152 manages network connections between the storage subsystem and external host systems, file controller 156 manages file system operations within the subsystem 150 to perform file operations requested by external host systems, and storage controller 162 translates I/O requests generated by, for example, file controller 156 into appropriate lower level signals appropriate to the storage element 164 and its connection bus 168 (e.g., SCSI, IDE, EIDE, etc.). Local host processor 154 guides and coordinates the overall operation of the controllers of subsystem 150.
  • All the controllers share access to the [0045] host memory 160 via bus 166. The uses of host memory 160 may vary widely. Network controller 152 may use the storage space for network protocol management while file controller 156 may use the storage space for file system management functions. All processors and controllers may use the host memory for initial loading of their operation programs if not also for runtime fetch and execution of those programs. In other words, host memory 160 is exemplary of a dumb memory bank used for myriad purposes within the storage subsystem 150 (e.g., a RAMdisk or solid state disk as known in the art). It is not dedicated to the cache storage of data and meta-data relating to I/O requests from attached host systems.
  • Typical systems with an architecture as depicted in FIG. 1B add local cache memory to controllers in the subsystem which require specialized, dedicated caching operations. For example, [0046] file controller 154, network controller 152, and storage controller 162 may each have local cache memory used for their specific functions. The central cache (host memory 160) provides no specialized functionality for any of the myriad controllers sharing access to it. Rather, it is no more than a “dumb” memory bank in which various controllers may store information for any purpose.
  • FIG. 3 is a block diagram exemplifying another storage subsystem architecture known in the art. Each [0047] storage control module 304 includes a local cache memory 306 used exclusively by its corresponding control module 304. Controllers 304 are connected via redundant FC- AL loops 308 and 310 to data storage elements 312.
  • In this prior art system, [0048] data storage elements 312 are disk arrays. Control modules 304 are disk array control modules having RAID management capabilities. Each control module 304 maintains a decentralized cache 306 to aid it in rapid performance of I/O operations. In order to maintain cache synchronization, disk array control modules 304 must continuously signal back and forth to each other. In addition, each disk array control module 304 must carry out all RAID operations individually: configuration of LUNs, calculation of parity data, RAID management of failed devices, etc. As noted above with respect to FIG. 1A, coordination traffic on FC- AL loops 308 and 310 uses valuable processing power of the controllers 304 as well as communication bandwidth which could otherwise be used for performing I/O requests initiated by attached host systems.
  • All prior storage subsystems exemplified by FIGS. 1A, 1B and [0049] 3 share certain common problems. As noted above, when a plurality of controllers within such subsystems share access to common storage elements, a large volume of cache coordination message traffic is generated on the interconnection medium thereby reducing available processing power and communication bandwidth for processing of I/O requests between the controllers and the storage elements. In addition, the prior storage subsystems are not easily scaled up for performance enhancement. Since each controller may include a local cache for boosting its individual performance, the incremental cost of adding another controller is increased. Each controller has the added complexity of potentially large cache memory devices and associated glue and custom assist logic circuits (such as RAID parity assist circuits).
  • Storage Subsystems of the Present Invention
  • By contrast with prior designs, storage subsystems of the present invention include an intelligent centralized cache (also referred to as a cache controller) which is shared by all controllers in the storage subsystem. Since the cache controller of the present invention is a centralized resource, each controller sharing its function may be simplified by eliminating its local cache memory. Such a simplified controller reduces the incremental cost associated with adding a controller to the subsystem to enhance overall performance. [0050]
  • More importantly, the central cache of the present invention is intelligent in that it includes circuits dedicated to enhancing its specific purpose of caching data destined for storage elements. For example, in a RAID subsystem, the intelligent central cache of the present invention preferably includes parity assist (generation and checking) circuits to aid in rapidly performing required parity operations. Centralizing such intelligent assist circuits further reduces the cost and complexity of the RAID controllers in the storage subsystem. [0051]
  • In addition, the centralized cache of the present invention obviates the need found in the prior art for extensive cache coordination message traffic (such as cache and stripe lock message traffic). The central cache preferably maintains control over the cache on behalf of all controllers in the subsystem. When, as in the preferred embodiment of the present invention, a redundant (mirrored) or additional cache controller is added to the subsystem, a dedicated communication path is available for the exclusive purpose of inter-cache controller synchronization communication. No bandwidth on the common controller communication medium is required to assure mirrored cache synchronization. A simpler (e.g., lower cost) embodiment may utilize the existing communication paths to avoid the cost of an additional dedicated communication path. Such an embodiment would sacrifice some performance enhancements of the present invention but at a cost and complexity savings. [0052]
  • Furthermore, the intelligent central cache of the present invention provides semaphore services for resource locking (stripe locking) to coordinate common access to the disk array by the plurality of control modules. No one of the controllers, as taught in co-pending U.S. patent application Ser. No. 08/772,614, need be designated as a primary controller with respect to a particular shared LUN. Rather, in accordance with the present invention, the intelligent central cache provides such multiple access coordination through semaphore stripe lock features. [0053]
  • The intelligent cache controller of the present invention also provide cache mirroring features when additional cache controllers are added to the subsystem. As discussed below, multiple cache controllers coordinate their intelligent cache management functions in accordance with the present invention through a separate communication channel. The primary communication channel interconnecting the control modules, the cache controllers, and the storage elements remains unburdened by the requisite coordination traffic for mirrored cache operation. Additional cache modules may also operate in a cooperative manner rather than a mirrored architecture wherein each controller is responsible for cache operations associated with a particular portion of the storage elements total capacity. [0054]
  • FIG. 2 is a block diagram of a first embodiment of [0055] data storage subsystem 202 operable in accordance with the methods and structures of the present invention. Controllers 204 access intelligent central cache 206 via communications medium 208. Controllers 204 and central cache 206 both access storage elements 210 via communication medium 208. Communication medium 208 may be any of several well known buses used for interconnection of electronic devices including, for example, SCSI, IDE, EIDE, IPI. In addition, communication medium 208 may represent any of several serial communication medium such as FC-AL or SSA as depicted in FIG. 4 and as discussed below.
  • Intelligent [0056] central cache 206 is dedicated to data and meta-data caching in data storage subsystem 202 as distinct from controllers 204 which primarily serve to interface with attached host computer systems. The intelligent central cache 206 eliminates the need for coordination traffic among controllers having local caches thereby freeing processing power within, and communication bandwidth between, controllers 204 thereby improving overall performance of data storage subsystem 202. Intelligent central cache 206 cooperates with controllers 204 to manage the storage subsystem 202 structure and organization of information on the storage elements 210. For example, where storage subsystem 202 uses RAID storage management techniques, many RAID management functions specific to the cache are performed within intelligent central cache 206.
  • RAID cache management functions including parity generation and checking and logical to physical mapping of host request supplied addresses to locations on the array of disk [0057] drive storage elements 210 may be performed entirely within the intelligent central cache 206. The management of the RAID disk array geometry may therefore be off-loaded from the RAID controllers 204 to the intelligent central cache 206. Or for example, customized circuits to assist in RAID parity generation and checking can be integrated within intelligent central cache 206.
  • In particular, intelligent [0058] central cache 206 maintains cache data and associated cache meta-data. Generation and maintenance of cache meta-data in a decentralized cache architecture requires significant processing within, and communication among, a plurality of controller sharing access to common storage elements. The intelligent central cache 206 of the present invention centralizes this management function to reduce processing overhead load on the controllers 204 and to reduce communication (coordination traffic) among the controllers 204.
  • Intelligent [0059] central cache 206 can also calculate cache statistical information. Using the cache statistical information, controllers 204 can tune their respective performance in view of statistical data corresponding to the cache usage in the overall subsystem 202.
  • Preferably, intelligent [0060] central cache 206 is designed as an electronic circuit board substantially identical to that of controller 204 but populated differently at time of manufacture to distinguish their respective function. For example, controller 204 may be depopulated of any RAID parity assist circuits and depopulated of substantially all cache memory and related support circuits. Intelligent central cache 206, by contrast, is preferably populated with parity assist devices and with a large cache memory for caching of data supplied from controllers 204 and related meta-data generated by the cache management functions operable within intelligent central cache 206.
  • When a [0061] controller 204 prepares units of data to be cached in preparation for future posting to the storage elements 210, it simply transmits the data and a cache request over bus 208 to the intelligent central cache 206. The intelligent central cache 206 places the received data in its cache memory along with any generated meta-data used to manage the cache memory contents. The meta-data may be generated within central cache 206 such as noted above with respect to RAID parity assist or may be supplied by controller 204 as parameters when the data is supplied to central cache 206. The data supplied to central cache 206 is provided with addresses indicative of the desired disk location of the data on storage elements 210. In generating related meta-data, central cache 206 determines which other data either in its cache memory or on the storage elements 210 are required for updating associated redundancy information. The meta-data therefore indicates data that is new (currently unposted to the storage elements 210) versus old (presently posted to the storage elements 210 and also resident in central cache 206). Other meta-data distinguishes parity/redundancy information from data in central cache 206
  • This central cache architecture improves overall subsystem performance by obviating the need for cache coordination message traffic over [0062] bus 208 thereby reducing overhead processing within the controller 204 and eliminating cache coordination message traffic over bus 208. Controllers 204 are therefore simpler than prior controllers exemplified as discussed above. The simpler controllers are substantially void of any local cache memory and parity assist circuits. The primary function served by the simpler controller is to provide an interface to attached host systems consistent with the storage management structure (e.g., RAID) of the subsystem. This simpler design permits easier scaling of the subsystem's performance by reducing the costs (complexity) associated with adding additional controllers. In like manner, additional intelligent central cache devices may be added either to increase the cache size and/or to provide mirrored redundancy of the central cache contents. As noted below with respect to FIG. 4, when adding cache devices to the central cache, it is preferred that the plurality of central cache devices communicate among themselves over a dedicated communication medium.
  • FIG. 4 is a block diagram of a preferred embodiment of the present invention representing the best presently known mode of practicing the invention. FIG. 4 shows the [0063] data storage subsystem 402 having caching controller 406 dedicated to serving as an intelligent central cache memory and a second caching controller 408 dedicated to serving as a mirror of caching controller 406. Embodied in each cache controller 406 and 408 is cache memory 410. Cache controller 408 maintains in its local cache memory 410 a mirrored image of the content of cache memory 410 in cache controller 406.
  • Those skilled in the art will note that caching [0064] controller 408 is not limited to the role of mirroring cache controller 406 as in the preferred embodiment. Caching controller 408 may also function as an additional intelligent central cache to provide enhanced cache capacity. In such a configuration, a first cache controller (e.g., 406) provides caching services for a first range of the cache memory while a subsequent cache controller (e.g., 408) provides caching services for another portion cache memory. For example, a first cache controller 406 (with its local cache memory 410) may provide intelligent caching services to RAID controllers 404 for a first half of the storage elements 416 while the additional cache controller 408 (with its local cache memory 410) provides caching services for the second half of the storage elements 416.
  • The preferred [0065] embodiment interconnects controllers 404, caching controllers 406 and 408, and data storage elements 416 via redundant FC- AL media 412 and 414. Note that caching controller 406 and caching controller 408 have an additional dedicated FC-AL loop 418 which allows communication between them. Coordination traffic between the caching controllers 406 and 408 thus does not utilize any bandwidth on FC- AL loops 412 and 414, thereby enabling the desired performance increase in data storage subsystem 402.
  • In the preferred embodiment, [0066] controllers 404 are substantially identical electronic assemblies to that of cache controllers 406 and 408 but have been largely depopulated of their cache memory and associated circuits. The cache memory function is provided centrally by caching controllers 406 and 408. Because the caching function is centralized, overhead processing by RAID controllers 404 and communication on FC- AL 412 and 414 relating to cache synchronization is reduced or eliminated to thereby enhance subsystem performance.
  • [0067] Data storage elements 416 are preferably disk arrays. Controllers 404 and the caching controllers 406 and 408 cooperate to “map” the host-supplied address to the physical address of the storage elements. The tasks involved in this “mapping” or “translation” are one important part of the RAID management of the disk array 416. Specifically, controllers 404 receive I/O requests from a host system (not shown) and translate those requests into the proper addressing format used by the caching controllers 406 and 408. The data supplied in the host requests is mapped into appropriate parameters corresponding to the API operations described below. The cache controllers 406 and 408 then perform the logical to physical mapping required to store the data in cache memory 410 and to later retrieve the data for posting to the storage element 416.
  • A variety of alternative modes of mapping host supplied request addresses into locations in the central cache may be recognized by those skilled in the art. Each such method may suggest a different distribution of the RAID management between the [0068] controllers 404 and the caching controllers 406 and 408. For example, the mapping process which determines how stripes are mapped across a plurality of disk drives (e.g., in RAID levels 2-5) may be distributed between the control modules and the central cache. A spectrum of possible distributions are possible. For example at one extreme, the control modules may be solely responsible for mapping host addresses to RAID level 2-5 stripe locations and geometries (i.e., the central cache provides a linear address space for the control modules to access). Or for example, at another extreme, the central cache may possess exclusive knowledge of the mapping to RAID stripe geometries and distribution of data over the disk array. The parameters supplied to the API functions of the central cache describe the addresses as known to the central cache.
  • Regardless of the particular addressing mode (mapping of addresses) used by [0069] cache controllers 406 and 408, they are preferably responsible for RAID management tasks such as parity generation and checking for data supplied from the RAID controllers 404. The redundancy information, and other cache meta-data, generated and stored within cache memory 410 of cache controllers 406 and 408 is used to assist RAID controllers 404 in their RAID management of storage elements 416. For example, RAID controllers 404 operable in a cache write-back mode may request the return of all dirty data along with associated redundancy information for posting (flushing) to storage elements 416. In response to such a request, cache controllers 406 and 408 determine which data in its cache is marked as dirty, further determines what other data may be related to the dirty data (i.e., other data associated with the same stripe), and further generates or retrieves associated redundancy information for return with the dirty data to the requesting RAID controller 404. Cache controller 406 and 408 may, for example, read other related blocks of data from storage elements 416 and/or read old parity data from storage elements 416 in order to generate updated redundancy information. Central cache controllers 406 and 408 therefore retain all information necessary to associate cache blocks with particular stripes of the disk array. Furthermore, the cache meta-data identifies new data (dirty data yet unposted to the storage elements) versus old data (already posted to the storage elements 210).
  • As noted elsewhere herein, [0070] central cache controllers 406 and 408 also provide a centralized control point for semaphore allocation, lock, and release to coordinate stripe locking. Stripe locking, as taught in co-pending U.S. patent application Ser. No. 08/772,614, enables a plurality of controllers (e.g., 404) to share and coordinate access to commonly attached storage elements (e.g., shared access to one or more RAID LUNs). These centralized features provided by the central cache controllers 406 and 408 frees resources of the controllers 404 to provide improved overall subsystem throughput. Specifically, the features and services provided by central cache controllers 406 and 408 free computational processing power within controllers 404 and frees communication bandwidth on FC- AL 412 and 414. The freed processing power and communication bandwidth is then available for improved processing of host generated I/O requests.
  • Further, as noted above, [0071] cache controllers 406 and 408 may operate in a mirrored operation mode. Cache mirroring operations and communications are also off-loaded from controllers 404. Rather, cache controllers 406 and 408 communicate directly with one another via a dedicated communication path 418. Still further as noted above, in the preferred embodiment of FIG. 4, caching controllers 406 and 408 preferably provide centralized cache statistical information such as write over-writes or cache hit rate to controllers 404 (or to a host system not shown). Controllers 404 can use this centralized cache statistical information to tune the performance of the data storage subsystem 402 in view of subsystem wide cache efficiency.
  • Centralized Cache API
  • As noted above, the centralized cache of the present invention presents its features to the commonly attached controllers via an API. These API features are then accessed by the controllers using well known inter-process communication techniques applied to a shared communication path. As noted above with respect to FIGS. 2 and 4, the shared communication path may utilize any of several communication media and topologies. [0072]
  • The API functions use essentially two data structures for passing of parameters. A BLOCKLIST is a variable length list of entries each of which describes a particular range of logical blocks in a logical unit (LUN) which are relevant to the central cache operation requested. A STRIPELIST is a variable length list of entries each of which describes a particular range of RAID stripes in a LUN which are relevant to the central cache operation requested. [0073]
  • Each BLOCKLIST entry contains substantially the following fields: [0074]
    long LUN // the logical unit identifier for the desired blocks
    long st_block // logical block number of the first block of interest
    long n_block // number of contiguous blocks of interest
    parm_t params // attributes and parameters of the identified blocks
  • Each STRIPELIST entry contains substantially the following fields: [0075]
    long LUN // the logical unit identifier for the desired stripes
    long st_stripe // logical stripe number of the first stripe of interest
    long n_stripe // number of contiguous stripes of interest
    parm_t params // clean/dirty, new/old, etc. attributes of data
  • Where the API function exchanges data along with the BLOCKLIST or STRIPELIST parameters, the associated block data is transferred over the communication medium following the API request. For example, blocks to be added to the cache are preceded by an appropriate API request to the central cache then the actual information in those blocks is transferred to the central cache. In like manner, data requested from the central cache is returned from the central cache to the requesting controller following execution of the API function within the central cache. Communication protocols and media appropriate to control such multi-point communications are well known in the art. In addition, those skilled in the art will readily recognize a variety of error conditions and appropriate recovery techniques therefor. Error status indications are exchanged between the central cache and the controllers as appropriate for the particular API function. Exemplary API functions include: [0076]
  • cache_insert(BLOCKLIST blist) //inserts blocks in blist to central cache [0077]
  • The specified list of blocks are inserted in the central cache with the parameters and attributes as specified in each clocks BLOCKLIST entry. As noted, the actual data to be inserted in the specified blocks of the central cache are transferred following the transfer of the API request. The specified parameters and attributes include: [0078]
  • NEW/OLD [0079]
  • The associated block is either a NEW block in a stripe or an OLD block. [0080]
  • DATA/PARITY [0081]
  • The associated block is either a DATA portion of a stripe or a PARITY portion of a stripe. [0082]
  • VALID/INVALID [0083]
  • A bitmap parameter value having a bit for each sector in the associated block. Each sector may be VALID (e.g., contains useful data) or INVALID. [0084]
  • CLEAN/DIRTY [0085]
  • A bitmap parameter value having a bit for each sector in the associated block. Each sector may be DIRTY (e.g., contains data not yet posted to the disk array) or CLEAN. [0086]
  • cache_modify(BLOCKLIST blist) //modifies attributes of blocks in blist [0087]
  • The attributes of the specified list of blocks are altered in accordance with the parameters of the block list entries. [0088]
  • cache_delete(BLOCKLIST blist) //deletes blocks in blist from central cache [0089]
  • The specified list of blocks are removed from the central cache contents. [0090]
  • cache_read(BLOCKLIST blist) //returns information from the specified blocks [0091]
  • The information in the specified list of blocks is retrieved from the central cache memory and returned to the requesting controllers. [0092]
  • cache_xor(BLOCKLIST blist[0093] 1, BLOCKLIST blist2, . . . BLOCKLIST blistN, BLOCKLIST blistdest) //returns the XOR of the specified blocks
  • The central cache retrieves the specified blocks in central cache and computes the XOR parity of those blocks for return to the requesting controller in the supplied destination block list. In particular, a variable number of “source” block lists may be supplied to this API function. The first block in the first block list parameter is XOR'd with the first block of the second block list parameter, and the third, fourth, etc. Then the second block in the first block list parameter is XOR'd with the second block in the second block list , and the third, fourth, etc. until all specified blocks are XOR'd together. The last block list parameter identifies a list in which the XOR results of the specified blocks are returned. As noted above, this function may be centralized in the central cache to further simplify the structure of the control modules. The control modules need not include special parity assist circuits. Rather, the central cache can provide the requisite functions to all control modules. [0094]
  • cache_stripe_map(STRIPELIST slist) //returns info about the specified stripes [0095]
  • The central cache maintains centralized knowledge regarding the parameters of each block in a stripe. When a controller determines that it must flush the cache contents of “dirty” data, it may invoke this function to retrieve a map of the status (attributes) of each block in each of the specified stripes. The map information regarding the requested stripes is returned to the requesting controller. [0096]
  • cache_flush(STRIPELIST slist) //performs a flush of the requested stripes [0097]
  • As above, the controllers may perform flushes by requesting a stripe map then requesting a read of the specific blocks to be flushed. In the alternative, the controllers may request that the central cache perform a flush on their behalf. The central cache has centralized information regarding the attributes of each block in the cache. In addition, the central cache may have a communication path to the disk array devices as do the controllers. Where such access to the disk drives is provided to the central cache modules, the central cache may locally perform the requested flush operations directly without further intervention by the controllers. In response to this API function, the central cache flushes all blocks in the requested stripes which are dirty then alters the attributes of those blocks as required to indicate their new status. [0098]
  • cache_stripe_lock_req(STRIPELIST slist) //locks the requested stripes [0099]
  • As noted above, the cooperating controllers which share access to common disk drives must coordinate their concurrent access via semaphore locking procedures. The central cache may provide such semaphore lock procedures for use by the cooperating controllers. The requested stripes, if not presently locked, are locked and appropriate status returned to the requesting controller. If some of the requested stripes are presently locked, a failure status may be returned to the requesting controller. In the alternative, the central controller may queue such requests and coordinate the allocation of locked stripes among the various controllers. [0100]
  • cache_stripe_lock_release(STRIPELIST slist) //unlocks the specified stripes [0101]
  • The converse of the lock request API function. Releases the previously locked stripes and returns control to the requesting controller. [0102]
  • Exemplary Centralized Cache Methods
  • FIGS. [0103] 5-11 are flowcharts describing the methods of the present invention operable in controllers (e.g., 404 of FIG. 4 and 204 of FIG. 2) in cooperation with cache controllers (e.g., 406 and 408 of FIG. 4 and 206 of FIG. 2) utilizing the above API functions. In particular, FIGS. 5-7 describe methods operable in storage controllers in accordance with the present invention to perform host initiated write and read requests and to initiate a cache flush operation. FIGS. 8-11 are flowcharts describing cooperative methods operable within the caching controllers to perform read and write requests as well as cache flush operations. The flowcharts of FIGS. 5-7 are intended as examples of the application of the API functions of the present invention. They are not intended to be exhaustive in demonstrating the use of every API function or every combination of API functions.
  • FIG. 5 illustrates the operation of controllers (e.g., [0104] 404 of FIG. 4) in processing host generated I/O write requests in accordance with the present invention. Element 500 is first operable to translate the received write request into appropriately formatted central cache operations required. The request to the central cache passes the host supplied data to the central cache controller along with block addressing information as discussed above. The particular cache addressing structure employed (as noted above) determines the precise processing performed by operation of element 500.
  • [0105] Element 502 is next operable to transfer the translated cache request to the central cache controller (e.g., 406 and 408) via the controller communication medium (e.g., 412 and 414). Once successfully transferred to the cache controller, the controller may indicate completion of the I/O request to the host computer and thereby complete processing of the received I/O request from the perspective of the attached host computer. In particular, the controller invokes the cache_insert API function to request the new data be inserted in the central cache. The BLOCKLIST provided includes the NEW attribute for all blocks so added to the cache.
  • The host computer may then continue with other processing and generation of other I/O requests. Subsequent operation of the controller, discussed below, may determine that the newly posted data in the central cache needs to be flushed to the disk array. [0106]
  • FIG. 6 is a flowchart describing the operation of controllers (e.g., [0107] 404 of FIG. 4) in processing host generated I/O read requests in accordance with the present invention. Element 600 is first operable to translate the received read request into appropriately formatted central cache operations required. The request to the central cache passes the block addressing information as discussed above for the data requested by the host read request.
  • [0108] Element 616 is then operable to determine if the storage subsystem is operating in a RAID degraded mode due to failure of a drive in the specified LUN. If not, processing continues with element 602 as discussed below. If the subsystem is operating in a degraded mode, element 618 is next operable to translate the host request into appropriate requests for the entire stripe(s) associated with the requested blocks. In particular, the controller requests the stripes by use of the cache_read API function where the BLOCKLIST requests all blocks in the associated stripes. Element 620 then awaits return of the requested information by the central cache controller.
  • [0109] Element 622 then performs an XOR parity computation on the returned strip blocks to generate any missing data blocks due to the failed drive. The XOR parity computation may be performed locally by the controller or may be performed by invoking the cache_xor API function to generate the parity for a list of blocks in the affected stripe(s). As noted above, the latter approach may be preferred if the controllers are simplified to eliminate XOR parity assist circuits while the central cache controller retains this centralized capability on behalf of the control modules. Processing then completes with element 614 returning the requested data, retrieved from the central cache, to the requesting host system.
  • Those skilled in the art will recognize that XOR parity computation are associated with particular levels of RAID management. Other RAID management, e.g., level [0110] 1 mirroring, do not require parity computation but rather duplicate the newly posted data to a mirror disk. Element 622 therefore represents any such RAID processing to assure reliable, redundant data storage as defined for the selected RAID management technique.
  • If not in degraded mode, [0111] element 602 is next operable to transfer the translated cache request to the central cache controller (e.g., 406 and 408) via the controller communication medium (e.g., 412 and 414). In particular, the controller issues a cache_read API function to retrieve the requested blocks of data. Element 604 then awaits return of the requested data (or other status) from the central cache controller. Central cache controller may return one of three possible conditions. First, central cache controller may return the requested data in its entirety. Second, only a portion of the requested data may reside in cache memory of the central cache controller and therefore only that portion of the requested data may be returned. Third, none of the requested data may reside in cache memory and therefore none of the requested data may be returned. A status code indicative of one of these three conditions is returned from central cache controller to the requesting RAID controller.
  • [0112] Element 606 is then operable to determine from the returned status code which of the three possible conditions is actually returned. If all requested data resided in the central cache controller's cache memory, then all requested data was returned and processing continues with element 614 to return the data to the host system and to thereby complete processing of the I/O read request. If less than all data was returned from central cache controller, element 608 is next operable to read the additional data from the storage elements. The additional data comprises any requested data not returned from central cache controller.
  • [0113] Element 610 is next operable after reading the additional data from disk to determine whether the additional data should be transferred to central cache. Well known storage management techniques may be applied to make the determination as to whether the additional data should be added to the central cache. If so, element 612 is operable in a manner similar to that of element 502 above to transfer the additional data read from the disk array to the central cache. Specifically, the controller issues a cache_insert API request to insert the additional data blocks into the central cache memory. Lastly, element 614 is operable, as noted above, to return all requested data to the host system and to thereby complete processing of the host system generated I/O read request.
  • FIG. 7 is a flowchart describing the operation of a RAID controller in accordance with the present invention to flush new data (dirty data) from the central cache to the disk array. Well known storage management techniques may be applied to determine when the cache need be flushed. The methods of the present invention are rather directed to techniques to flush a centralized cache shared by a plurality of controllers. Each controller may therefore make independent determinations as to whether and when to lush new data from central cache to the disk array. In addition, the methods and structure of the present invention allow for the intelligent central cache controller(s) to determine independently that the cache memory content should be flushed (posted) to disk. The flowchart of FIG. 7 therefore describes processing within any of the RAID controllers after a determination has been made that the cache data should be flushed to the disk array. [0114]
  • [0115] Element 700 is first operable to determine whether the cache flush operation should be performed by the RAID controller itself or should be requested of the central cache controllers. This determination may be made based upon present loading of the requesting RAID controller as compared to the central cache controller. If the determination is made that the central cache controller should perform the cache flush, element 702 is operable to generate and transfer a request to central cache controller requesting that it flush all new data for a given stripe list from its cache memory to the disk array. If the local RAID controller is to perform the flush operation, element 708 is next operable to request a stripe lock from the central cache controller for all stripes affected by the flush request. As noted, other well known methods are applied to determine which stripes are to be flushed at a particular time. Whichever stripes are to be flushed must be locked to prevent interference from other operations in the shared central cache controllers. Specifically, the controller issues a cache stripe~lock req API request for the affected stripes. As noted above, the central cache controller returns if the lock is granted. If the requested lock cannot be immediately granted, the central cache controller may queue the request and grant it at a later time. In the alternative, the central cache controller may return a failure status (not shown) and allow the controller to determine a strategy for handling the failure.
  • Once the requested stripe lock is successfully granted, [0116] element 710 is operable to request a stripe map from the central cache controller to identify which blocks in the affected stripes are still marked as “dirty.” Only the central cache retains centralized knowledge of the present state of each block in cache. Other controllers may have previously requested a flush of the affected stripes and therefore blocks through to be “dirty” by this requesting controller may have been previously posted to the disk array. Specifically, the controller issues a cache_stripe_map API request to obtain this map information. Next, element 712 performs an XOR parity computation to generate updated parity blocks for the affected stripes. As above, this parity computation may be performed locally on the requesting controller or centrally in the central cache controller via a cache_xor API function.
  • [0117] Element 704 is next operable to request and retrieve all new (dirty) data from the central cache controller as indicated by the strip map previously retrieved. In particular, element 704 issues cache_read API requests for the data blocks having dirty data to be posted. Element 706 is then operable to perform the required disk operations to flush the retrieved new data from the central cache to the disk array. Further, element 706 issues an appropriate API request to alter the attributes for the posted blocks. In the preferred embodiment, a cache_modify API request is issued to alter parameters for an identified list of blocks. The blocks just posted to disk by the flush operation would be altered to a CLEAN attribute. Alternatively, a cache_delete API request may be issued to remove the flushed blocks from the cache. Element 714 then unlocks the affected stripes.
  • FIGS. [0118] 8-11 describe methods of the present invention operable within central cache controllers 406 and 408 in response to API requests generated by RAID controller 404 as noted above.
  • FIG. 8 describes the operation of the central cache controller in response to a cache_read API request generated by one of the RAID controllers. As noted above, such a request may result in all requested data being found in the cache memory and returned, a portion of the requested data being found in the cache memory and returned, or none of the requested data being found in the cache memory. [0119] Element 800 first determines whether all requested data presently resides in the cache memory. If so, element 806 is next operable to return all requested data from the cache memory to the requesting RAID controller to thereby complete the read cache data request.
  • If less than all the requested data is found in cache memory, [0120] element 802 is operable to determine whether disk read operations to retrieve the additional data should be issued locally within the central cache controller or left to the option of the requesting RAID controller. If the additional data will be retrieved from disk by the RAID controller, element 808 is next operable to return that portion of the requested data which was found in the cache memory to thereby complete the read cache data request.
  • [0121] Element 810 is next operable if the additional data is to be read from the disk drive locally within the central cache controller. Element 810 determines whether the subsystem is operating in a degraded mode due to failure of a disk in the requested LUN. If not in degraded mode, processing continues with element 804 discussed below. If operating in degraded mode, element 812 is operable to retrieve from the cache the entire stripe associated with each requested block. Element 814 then performs a local parity computation using the parity assist features of the central cache controller to recover any data missing due to the disk failure. Processing then continues with element 806 below.
  • [0122] Element 804 reads any additional data required to satisfy the requested read cache data request. Well known cache management techniques may operate within central cache controller to determine what data, in addition to the requested data, may also be read. For example, other data physically near the requested data (such as the remainder of a track or cylinder) may be read in anticipation of future use. Or, for example, associated parity data may be read from the disk array in anticipation of its use in the near future.
  • [0123] Element 806 is then operable in response to reading the additional data to return all data requested by the read cache data request to the requesting RAID controller to thereby complete the request.
  • FIG. 9 describes the operation of the [0124] central cache controllers 406 and 408 in response to a cache_insert API request from a RAID controller. Element 900 is first operable to lock the stripe(s) associated with the blocks to be inserted. Since the central cache controls the semaphore locking, it performs the lock locally without intervention by or notice to attached controllers. The lock prevents other controllers from accessing the affected blocks until the insert operation is completed. For example, the lock prevents another controller from requesting a cache insert or flush operation. Element 902 then inserts the supplied blocks into the cache memory of the central cache controller in accordance with its specified block numbers and with attributes as indicated by the parameters of the BLOCKLIST entries. Where the blocks contain new data, the new data overwrites any previous data in the cache whether clean or dirty, etc. Lastly, element 904 unlocks the-locked stripes to permit other operations.
  • FIG. 10 describes the operation of the central cache controller in response to a cache_flush API request from an attached controller. As noted above, the present invention permits the controllers to perform flushes locally such that each controller performs its own flush operation by use of cache_stripe_map and cache_read API function requests. The central cache controller responds to such requests to supply the data requested by the controller with centralized knowledge of the present status of each block in the central cache memory. [0125]
  • In the alternative, the controllers may request that the central cache controller perform the cache flush operation on behalf of the controller. In this case as shown in FIG. 10, the controller issues a cache_flush API request with a STRIPELIST indicating the stripes that the controller has determined should be flushed. The central cache controller performs the cache flush for the requested stripes but with centralized knowledge as to the present status of each block in the requested stripes. In particular, some of the requested stripes may have been previously flushed by operations requested from other controllers. The central cache controller therefore performs the requested flush in accordance with the present status of each block in the requested stripes. [0126]
  • In addition to such controller direct[0127] 4ed flush operations, the central cache controller may include background processing which periodically flushes data from the central cache memory to the disk array in response to loading analysis within the central cache controllers. Such background processing which determines what data to flush at what time may simply invoke the processing depicted in FIG. 10 to perform the desired flush operations.
  • [0128] Element 1000 is first operable to lock all stripes in the STRIPELIST of the cache_flush API request. Element 1002 then locates all new (unposted or dirty) data in the cache memory of the central cache controller for the requested stripes. As noted, above, the central controller is the central repository for present status information regarding all blocks in the central cache. It is therefore possible that the controller has requested the flushing of one or more stripes which no longer contain “dirty” data. Element 1004 is therefore operable to unlock any stripes among the requested, locked stripes which no longer contain any dirty data to be flushed.
  • [0129] Element 1006 then reads any additional data required for posting of the located data. For example, current data corresponding to other data blocks in a stripe and/or the redundancy information (parity) for a stripe may be required in order to update the parity (redundancy information) for stripes about to be flushed. Or for example, element 1006 may determine that other data, unrelated to the particular stripe to be flushed, could be optimally read at this time in anticipation of future access (e.g., a read-ahead determination made by the controller or by the central cache controller). Element 1008 is operable to perform any required disk operation required to flush the located dirty data and associated parity updates to the disk array. Element 1008 is further operable to update the status of all blocks flushed by the disk operations performed. Those blocks which were marked as “dirty” blocks are now marked as “clean”, no longer in need of flushing. Lastly, element 1010 unlocks the stripes which are now successfully flushed by operation of element 1008.
  • Those skilled in the art will recognize that the cache flush method of FIG. 10 may be invoked by request of a RAID controller as noted above or may be invoked by local RAID management intelligence of the central cache controller. In other words, in accordance with the present invention, a decision to flush the contents of the central cache may be made by one of the plurality of RAID controllers or by the intelligent central cache controller(s) themselves. Also, as noted here, the operations required to flush the cache content may be performed within the central cache controller or by one of the RAID controllers by retrieval of new data from the central cache. [0130]
  • FIG. 11 describes the operation of the central cache controller in response to cache_stripe_map API requests from a RAID controller. As noted above, controllers may perform their own flush operations by requesting dirty data from the central cache for stripes to be flushed. The controllers request information from the central cache controller for stripes believed to contain dirty data. The information consists of a map of each stripe of interest which describes the status of each block in the identified stripes. [0131]
  • [0132] Element 1100 first locates the requested status information regarding blocks in the stripes identified by the controllers STRIPELIST parameter. Element 1102 then builds the map information into data structure for return to the requesting controller. Element 1104 then returns the data to the requesting controller.
  • While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description is to be considered as exemplary and not restrictive in character, it being understood that only the preferred embodiment and minor variants thereof have been shown and described and that all changes and modifications that come within the spirit of the invention are desired to be protected. [0133]

Claims (27)

What is claimed is:
1. In a data storage subsystem having a plurality of data storage elements, an apparatus comprising:
a plurality of storage controllers,
an intelligent central cache dedicated to use by the data storage subsystem, said central cache being cooperatively engaged with said plurality of storage controllers to provide management of said plurality of data storage elements, and
a controller communication medium operable for exchange of information among said plurality of storage controllers and said intelligent central cache and said data storage elements.
2. The apparatus of
claim 1
wherein said intelligent central cache is further operable to provide cache statistical information.
3. The apparatus of
claim 1
wherein requests directed to said intelligent central cache are addressed in correspondence with physical placement of data within said intelligent central cache.
4. The apparatus of
claim 1
wherein requests directed to said intelligent central cache are addressed in correspondence with logical block address of said data storage elements.
5. The apparatus of
claim 1
further comprising:
at least one additional intelligent central cache, said controller communications medium being further operable for exchange of information among said at least one additional intelligent central cache and said plurality of storage controllers and said intelligent central cache and said data storage elements.
6. The apparatus of
claim 5
wherein said at least one additional intelligent central cache is operable to mirror data in said intelligent central cache.
7. The apparatus of claim I wherein said controller communication medium includes:
a serial communication medium.
8. The apparatus of
claim 7
wherein said serial communication medium includes:
a Fibre Channel Arbitrated Loop.
9. The apparatus of
claim 7
wherein said serial communication medium includes:
a plurality of redundant Fibre Channel Arbitrated Loops.
10. The apparatus of
claim 1
wherein said plurality of storage controllers provide RAID management of said plurality of data storage elements.
11. The apparatus of
claim 10
wherein said intelligent central cache provides RAID management of said plurality of data storage elements in cooperation with said storage controllers.
12. In a data storage subsystem having a plurality of data storage elements, an apparatus comprising:
a plurality of RAID controllers a proper subset of which are cache controllers having cache memory associated therewith;
a controller communication medium operable for exchange of information among said plurality of RAID controllers and said plurality of data storage elements.
13. The apparatus of
claim 12
wherein said proper subset includes at least two of said plurality of RAID controllers.
14. The apparatus of
claim 13
where said at least two of said plurality of RAID controllers are operable in a redundant manner such that each mirrors the operation of another.
15. The apparatus of
claim 12
, wherein said controller communication medium includes:
a serial communication medium.
16. The apparatus of
claim 15
, wherein said serial communication medium includes:
a Fibre Channel Arbitrated Loop.
17. The apparatus of
claim 15
, wherein said serial communication medium includes:
a plurality of redundant Fibre Channel Arbitrated Loops.
18. A data storage subsystem comprising:
at least one data storage element,
at least one controller having no cache memory, said at least one controller being operable to read and write data to said at least one data storage element and being further operable to provide cooperative RAID management of said at least one data storage element,
a plurality of caching controllers having caches dedicated to use by the data storage subsystem, said plurality of caching controllers being operable to maintain the cache memory as a intelligent central cache accessible by said at least one controller, said plurality of caching controllers being further operable in write-back mode, said plurality of caching controllers being further operable to provide cooperative RAID management of said at least one data storage element, said plurality of caching controllers being further operable to redundantly protect cached data, and
at least one serial communication medium operable for communication between said at least one data storage element, said at least one controller, and said at least one caching controller.
19. In a storage subsystem having a plurality of storage controllers, an intelligent central cache comprising:
a central cache memory; and
an intelligent cache controller coupled to said central cache memory and coupled to said plurality of storage controllers wherein said central cache controller is adapted to process cache requests received from said plurality of storage controllers and wherein said cache requests include:
requests to insert data into said central cache memory,
requests to delete previously inserted data from said central cache memory, and
requests to retrieve previously inserted data from said cache memory.
20. The intelligent central cache of
claim 19
wherein said data inserted in said central cache memory includes cache meta-data associated with said data supplied by a requesting one of said plurality of storage controllers.
21. The intelligent cache controller of
claim 20
wherein said cache meta-data includes:
indicia of a clean status associated with said data, and
indicia of a dirty status associated with said data.
22. The intelligent cache controller of
claim 21
wherein said cache requests further include:
requests to return information identifying particular portions of said data previously inserted in said central cache memory having a dirty status associated therewith.
23. The intelligent cache controller of
claim 21
wherein said cache requests further include:
requests to flush to disk drives associated with said storage subsystem particular portions of said data previously inserted in said central cache memory having a dirty status associated therewith.
24. The intelligent cache controller of
claim 20
wherein said cache meta-data includes:
indicia of a new status associated with said data, and
indicia of a old status associated with said data.
25. The intelligent cache controller of
claim 20
wherein said cache meta-data includes:
indicia of a parity type associated with said data, and
indicia of a non-parity type associated with said data.
26. The intelligent cache controller of
claim 19
wherein said cache requests further include:
requests to lock for exclusive access particular portions of said data previously insert ed in said central cache memory, and
requests to unlock previously locked particular portions of said data previously inserted in said central cache memory.
27. The intelligent cache controller of
claim 21
wherein said cache requests further include:
requests to compute the bitwise XOR of particular portions of said data previously inserted in said central cache memory.
US08/941,770 1997-09-30 1997-09-30 Method and apparatus for providing centralized intelligent cache between multiple data controlling elements Expired - Lifetime US6381674B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US08/941,770 US6381674B2 (en) 1997-09-30 1997-09-30 Method and apparatus for providing centralized intelligent cache between multiple data controlling elements
AU96733/98A AU9673398A (en) 1997-09-30 1998-09-29 Multiple data controllers with centralized cache
PCT/US1998/020423 WO1999017208A1 (en) 1997-09-30 1998-09-29 Multiple data controllers with centralized cache

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/941,770 US6381674B2 (en) 1997-09-30 1997-09-30 Method and apparatus for providing centralized intelligent cache between multiple data controlling elements

Publications (2)

Publication Number Publication Date
US20010002480A1 true US20010002480A1 (en) 2001-05-31
US6381674B2 US6381674B2 (en) 2002-04-30

Family

ID=25477036

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/941,770 Expired - Lifetime US6381674B2 (en) 1997-09-30 1997-09-30 Method and apparatus for providing centralized intelligent cache between multiple data controlling elements

Country Status (3)

Country Link
US (1) US6381674B2 (en)
AU (1) AU9673398A (en)
WO (1) WO1999017208A1 (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6347358B1 (en) * 1998-12-22 2002-02-12 Nec Corporation Disk control unit and disk control method
US20020124137A1 (en) * 2001-01-29 2002-09-05 Ulrich Thomas R. Enhancing disk array performance via variable parity based load balancing
US20020138559A1 (en) * 2001-01-29 2002-09-26 Ulrich Thomas R. Dynamically distributed file system
US20020156974A1 (en) * 2001-01-29 2002-10-24 Ulrich Thomas R. Redundant dynamically distributed file system
US20030200389A1 (en) * 2002-04-18 2003-10-23 Odenwald Louis H. System and method of cache management for storage controllers
US20030204683A1 (en) * 2002-04-30 2003-10-30 Hitachi, Ltd. Method, system, and storage controller for controlling shared memories
US6745310B2 (en) * 2000-12-01 2004-06-01 Yan Chiew Chow Real time local and remote management of data files and directories and method of operating the same
US20040128326A1 (en) * 1999-06-25 2004-07-01 Lecrone Douglas E. Method and apparatus for monitoring update activity in a data storage facility
US6799284B1 (en) * 2001-02-28 2004-09-28 Network Appliance, Inc. Reparity bitmap RAID failure recovery
US20050050273A1 (en) * 2003-08-27 2005-03-03 Horn Robert L. RAID controller architecture with integrated map-and-forward function, virtualization, scalability, and mirror consistency
US20050144381A1 (en) * 2003-12-29 2005-06-30 Corrado Francis R. Method, system, and program for managing data updates
US20050144514A1 (en) * 2001-01-29 2005-06-30 Ulrich Thomas R. Dynamic redistribution of parity groups
US20050216660A1 (en) * 2003-06-19 2005-09-29 Fujitsu Limited RAID apparatus, RAID control method, and RAID control program
US6965979B2 (en) 2003-01-29 2005-11-15 Pillar Data Systems, Inc. Methods and systems of host caching
US6990667B2 (en) 2001-01-29 2006-01-24 Adaptec, Inc. Server-independent object positioning for load balancing drives and servers
US6990547B2 (en) 2001-01-29 2006-01-24 Adaptec, Inc. Replacing file system processors by hot swapping
US20060123200A1 (en) * 2004-12-02 2006-06-08 Fujitsu Limited Storage system, and control method and program thereof
EP1431861A3 (en) * 2002-12-17 2006-07-19 Activcard Ireland Limited Security token sharable data and synchronization cache
US20060236029A1 (en) * 2005-04-15 2006-10-19 Corrado Francis R Power-safe disk storage apparatus, systems, and methods
US20060282700A1 (en) * 2005-06-10 2006-12-14 Cavallo Joseph S RAID write completion apparatus, systems, and methods
US20060288161A1 (en) * 2005-06-17 2006-12-21 Cavallo Joseph S RAID power safe apparatus, systems, and methods
US7236987B1 (en) 2003-02-28 2007-06-26 Sun Microsystems Inc. Systems and methods for providing a storage virtualization environment
US20070180296A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Back-annotation in storage-device array
US20070180295A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Virtual profiles for storage-device array encoding/decoding
US20070180297A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Ping-pong state machine for storage-device array
US20070180298A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Parity rotation in storage-device array
US7290168B1 (en) 2003-02-28 2007-10-30 Sun Microsystems, Inc. Systems and methods for providing a multi-path network switch system
US20080046647A1 (en) * 2006-08-15 2008-02-21 Katherine Tyldesley Blinick Apparatus, system, and method for integrating multiple raid storage instances within a blade center
US20080109627A1 (en) * 2004-11-10 2008-05-08 Matsushita Electric Industrial Co., Ltd. Nonvolatile Memory Device And Method For Accessing Nonvolatile Memory Device
US7383381B1 (en) 2003-02-28 2008-06-03 Sun Microsystems, Inc. Systems and methods for configuring a storage virtualization environment
US7430568B1 (en) 2003-02-28 2008-09-30 Sun Microsystems, Inc. Systems and methods for providing snapshot capabilities in a storage virtualization environment
US7487152B1 (en) * 2000-05-31 2009-02-03 International Business Machines Corporation Method for efficiently locking resources of a global data repository
US20090077333A1 (en) * 2007-09-18 2009-03-19 Agere Systems Inc. Double degraded array protection in an integrated network attached storage device
US20090172464A1 (en) * 2007-12-30 2009-07-02 Agere Systems Inc. Method and apparatus for repairing uncorrectable drive errors in an integrated network attached storage device
US7770059B1 (en) * 2004-03-26 2010-08-03 Emc Corporation Failure protection in an environment including virtualization of networked storage resources
US7861107B1 (en) * 2006-08-14 2010-12-28 Network Appliance, Inc. Dual access pathways to serially-connected mass data storage units
CN101165666B (en) * 2006-10-17 2011-07-20 国际商业机器公司 Method and device establishing address conversion in data processing system
CN102541471A (en) * 2011-12-28 2012-07-04 创新科软件技术(深圳)有限公司 Storage system with multiple controllers
US20130239124A1 (en) * 2012-01-20 2013-09-12 Mentor Graphics Corporation Event Queue Management For Embedded Systems
US8615678B1 (en) * 2008-06-30 2013-12-24 Emc Corporation Auto-adapting multi-tier cache
US9288077B1 (en) * 2012-09-28 2016-03-15 Emc Corporation Cluster file system with server block cache
CN105511811A (en) * 2015-12-07 2016-04-20 浪潮(北京)电子信息产业有限公司 Method and system for raising throughput of file system
US20160246661A1 (en) * 2015-02-20 2016-08-25 Kai Höfig Analyzing the availability of a system
US20160266802A1 (en) * 2015-03-10 2016-09-15 Kabushiki Kaisha Toshiba Storage device, memory system and method of managing data
US20170097887A1 (en) * 2015-10-02 2017-04-06 Netapp, Inc. Storage Controller Cache Having Reserved Parity Area
CN109298837A (en) * 2018-09-13 2019-02-01 郑州云海信息技术有限公司 A kind of multi-controller caching backup method, device, equipment and readable storage medium storing program for executing
US20190042413A1 (en) * 2018-03-02 2019-02-07 Intel Corporation Method and apparatus to provide predictable read latency for a storage device
US10389342B2 (en) 2017-06-28 2019-08-20 Hewlett Packard Enterprise Development Lp Comparator
US10402113B2 (en) 2014-07-31 2019-09-03 Hewlett Packard Enterprise Development Lp Live migration of data
US10402287B2 (en) * 2015-01-30 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in a fault-tolerant memory
US10402261B2 (en) 2015-03-31 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in fault-tolerant memory fabrics
US10409681B2 (en) 2015-01-30 2019-09-10 Hewlett Packard Enterprise Development Lp Non-idempotent primitives in fault-tolerant memory
US20190317898A1 (en) * 2018-04-12 2019-10-17 International Business Machines Corporation Using track locks and stride group locks to manage cache operations
US10540109B2 (en) 2014-09-02 2020-01-21 Hewlett Packard Enterprise Development Lp Serializing access to fault tolerant memory
US10594442B2 (en) 2014-10-24 2020-03-17 Hewlett Packard Enterprise Development Lp End-to-end negative acknowledgment
US10664369B2 (en) 2015-01-30 2020-05-26 Hewlett Packard Enterprise Development Lp Determine failed components in fault-tolerant memory
US10761735B2 (en) * 2014-03-17 2020-09-01 Primaryio, Inc. Tier aware caching solution to increase application performance
US10831597B2 (en) 2018-04-27 2020-11-10 International Business Machines Corporation Receiving, at a secondary storage controller, information on modified data from a primary storage controller to use to calculate parity data
US10884849B2 (en) 2018-04-27 2021-01-05 International Business Machines Corporation Mirroring information on modified data from a primary storage controller to a secondary storage controller for the secondary storage controller to use to calculate parity data
US20210090619A1 (en) * 2016-02-09 2021-03-25 Samsung Electronics Co., Ltd. Multi-port memory device and a method of using the same
US11099948B2 (en) * 2018-09-21 2021-08-24 Microsoft Technology Licensing, Llc Persistent storage segment caching for data recovery

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6591337B1 (en) * 1999-04-05 2003-07-08 Lsi Logic Corporation Method and apparatus for caching objects in a disparate management environment
US6480930B1 (en) * 1999-09-15 2002-11-12 Emc Corporation Mailbox for controlling storage subsystem reconfigurations
JP3951547B2 (en) * 2000-03-24 2007-08-01 株式会社日立製作所 Data sharing method between hosts by replication
US6490659B1 (en) * 2000-03-31 2002-12-03 International Business Machines Corporation Warm start cache recovery in a dual active controller with cache coherency using stripe locks for implied storage volume reservations
US6980510B1 (en) * 2000-09-12 2005-12-27 International Business Machines Corporation Host interface adaptive hub storage system
US6625698B2 (en) * 2000-12-28 2003-09-23 Unisys Corporation Method and apparatus for controlling memory storage locks based on cache line ownership
US6862659B1 (en) * 2001-01-04 2005-03-01 Emc Corporation Utilizing disk cache as part of distributed cache
US7155569B2 (en) * 2001-02-28 2006-12-26 Lsi Logic Corporation Method for raid striped I/O request generation using a shared scatter gather list
US20070022364A1 (en) * 2001-06-14 2007-01-25 Mcbryde Lee Data management architecture
US7472231B1 (en) 2001-09-07 2008-12-30 Netapp, Inc. Storage area network data cache
US20030212859A1 (en) * 2002-05-08 2003-11-13 Ellis Robert W. Arrayed data storage architecture with simultaneous command of multiple storage media
US7275084B2 (en) * 2002-05-28 2007-09-25 Sun Microsystems, Inc. Method, system, and program for managing access to a device
US7069465B2 (en) * 2002-07-26 2006-06-27 International Business Machines Corporation Method and apparatus for reliable failover involving incomplete raid disk writes in a clustering system
EP1546884A4 (en) * 2002-09-16 2007-08-01 Tigi Corp Storage system architectures and multiple caching arrangements
JP2004110503A (en) * 2002-09-19 2004-04-08 Hitachi Ltd Memory control device, memory system, control method for memory control device, channel control part and program
US20050166086A1 (en) * 2002-09-20 2005-07-28 Fujitsu Limited Storage control apparatus, storage control method, and computer product
US6993630B1 (en) * 2002-09-26 2006-01-31 Unisys Corporation Data pre-fetch system and method for a cache memory
US6934810B1 (en) * 2002-09-26 2005-08-23 Unisys Corporation Delayed leaky write system and method for a cache memory
US6973541B1 (en) * 2002-09-26 2005-12-06 Unisys Corporation System and method for initializing memory within a data processing system
US6976128B1 (en) * 2002-09-26 2005-12-13 Unisys Corporation Cache flush system and method
US7017017B2 (en) * 2002-11-08 2006-03-21 Intel Corporation Memory controllers with interleaved mirrored memory modes
US7130229B2 (en) * 2002-11-08 2006-10-31 Intel Corporation Interleaved mirrored memory systems
US7080060B2 (en) * 2003-01-08 2006-07-18 Sbc Properties, L.P. System and method for intelligent data caching
US7827282B2 (en) * 2003-01-08 2010-11-02 At&T Intellectual Property I, L.P. System and method for processing hardware or service usage data
JP4100256B2 (en) * 2003-05-29 2008-06-11 株式会社日立製作所 Communication method and information processing apparatus
DE10327955A1 (en) * 2003-06-20 2005-01-13 Fujitsu Siemens Computers Gmbh A mass storage device and method for operating a mass storage device
US7035952B2 (en) * 2003-09-24 2006-04-25 Hewlett-Packard Development Company, L.P. System having storage subsystems and a link coupling the storage subsystems
US7533181B2 (en) * 2004-02-26 2009-05-12 International Business Machines Corporation Apparatus, system, and method for data access management
JP4402997B2 (en) * 2004-03-26 2010-01-20 株式会社日立製作所 Storage device
US7246258B2 (en) * 2004-04-28 2007-07-17 Lenovo (Singapore) Pte. Ltd. Minimizing resynchronization time after backup system failures in an appliance-based business continuance architecture
JP2006072634A (en) * 2004-09-01 2006-03-16 Hitachi Ltd Disk device
JP4448005B2 (en) * 2004-10-22 2010-04-07 株式会社日立製作所 Storage system
US20060129559A1 (en) * 2004-12-15 2006-06-15 Dell Products L.P. Concurrent access to RAID data in shared storage
US20060236032A1 (en) * 2005-04-13 2006-10-19 Campbell Brian K Data storage system having memory controller with embedded CPU
US7350031B2 (en) * 2005-06-28 2008-03-25 Intel Corporation Mechanism for automatic backups in a mobile system
US20070022250A1 (en) * 2005-07-19 2007-01-25 International Business Machines Corporation System and method of responding to a cache read error with a temporary cache directory column delete
US7788420B2 (en) * 2005-09-22 2010-08-31 Lsi Corporation Address buffer mode switching for varying request sizes
US7558981B2 (en) * 2005-10-18 2009-07-07 Dot Hill Systems Corp. Method and apparatus for mirroring customer data and metadata in paired controllers
US20080005509A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation Caching recovery information on a local system to expedite recovery
US8423720B2 (en) * 2007-05-10 2013-04-16 International Business Machines Corporation Computer system, method, cache controller and computer program for caching I/O requests
US20090006804A1 (en) * 2007-06-29 2009-01-01 Seagate Technology Llc Bi-level map structure for sparse allocation of virtual storage
US7827441B1 (en) * 2007-10-30 2010-11-02 Network Appliance, Inc. Disk-less quorum device for a clustered storage system
US8255562B2 (en) * 2008-06-30 2012-08-28 International Business Machines Corporation Adaptive data throttling for storage controllers
US8898674B2 (en) * 2009-12-23 2014-11-25 International Business Machines Corporation Memory databus utilization management system and computer program product
US8954670B1 (en) 2011-04-18 2015-02-10 American Megatrends, Inc. Systems and methods for improved fault tolerance in RAID configurations
US9268644B1 (en) * 2011-04-18 2016-02-23 American Megatrends, Inc. Systems and methods for raid acceleration
US8825724B2 (en) 2012-03-29 2014-09-02 Lsi Corporation File system hinting
US9509747B2 (en) * 2014-01-23 2016-11-29 Dropbox, Inc. Content item synchronization by block

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5257367A (en) 1987-06-02 1993-10-26 Cab-Tek, Inc. Data storage system with asynchronous host operating system communication link
US5163131A (en) 1989-09-08 1992-11-10 Auspex Systems, Inc. Parallel i/o network file server architecture
US5249279A (en) 1989-11-03 1993-09-28 Compaq Computer Corporation Method for controlling disk array operations by receiving logical disk requests and translating the requests to multiple physical disk specific commands
US5493668A (en) * 1990-12-14 1996-02-20 International Business Machines Corporation Multiple processor system having software for selecting shared cache entries of an associated castout class for transfer to a DASD with one I/O operation
US5274799A (en) * 1991-01-04 1993-12-28 Array Technology Corporation Storage device array architecture with copyback cache
US5297258A (en) 1991-11-21 1994-03-22 Ast Research, Inc. Data logging for hard disk data storage systems
JP2888401B2 (en) * 1992-08-03 1999-05-10 インターナショナル・ビジネス・マシーンズ・コーポレイション Synchronization method for redundant disk drive arrays
US5423046A (en) 1992-12-17 1995-06-06 International Business Machines Corporation High capacity data storage system using disk array
US5539345A (en) 1992-12-30 1996-07-23 Digital Equipment Corporation Phase detector apparatus
JP3264465B2 (en) * 1993-06-30 2002-03-11 株式会社日立製作所 Storage system
US5548711A (en) * 1993-08-26 1996-08-20 Emc Corporation Method and apparatus for fault tolerant fast writes through buffer dumping
US5504882A (en) 1994-06-20 1996-04-02 International Business Machines Corporation Fault tolerant data storage subsystem employing hierarchically arranged controllers
US5499341A (en) 1994-07-25 1996-03-12 Loral Aerospace Corp. High performance image storage and distribution apparatus having computer bus, high speed bus, ethernet interface, FDDI interface, I/O card, distribution card, and storage units
US5619642A (en) * 1994-12-23 1997-04-08 Emc Corporation Fault tolerant memory system which utilizes data from a shadow memory device upon the detection of erroneous data in a main memory device
US5640506A (en) * 1995-02-15 1997-06-17 Mti Technology Corporation Integrity protection for parity calculation for raid parity cache
EP0787323A1 (en) * 1995-04-18 1997-08-06 International Business Machines Corporation High available error self-recovering shared cache for multiprocessor systems
US5917723A (en) * 1995-05-22 1999-06-29 Lsi Logic Corporation Method and apparatus for transferring data between two devices with reduced microprocessor overhead
US5588110A (en) * 1995-05-23 1996-12-24 Symbios Logic Inc. Method for transferring data between two devices that insures data recovery in the event of a fault
WO1997007464A1 (en) * 1995-08-11 1997-02-27 Siemens Nixdorf Informationssysteme Ag Arrangement for connecting peripheral storage devices
US5826001A (en) * 1995-10-13 1998-10-20 Digital Equipment Corporation Reconstructing data blocks in a raid array data storage system having storage device metadata and raid set metadata
US5761705A (en) * 1996-04-04 1998-06-02 Symbios, Inc. Methods and structure for maintaining cache consistency in a RAID controller having redundant caches
US5884098A (en) * 1996-04-18 1999-03-16 Emc Corporation RAID controller system utilizing front end and back end caching systems including communication path connecting two caching systems and synchronizing allocation of blocks in caching systems
US5778430A (en) * 1996-04-19 1998-07-07 Eccs, Inc. Method and apparatus for computer disk cache management
US5819310A (en) * 1996-05-24 1998-10-06 Emc Corporation Method and apparatus for reading data from mirrored logical volumes on physical disk drives
US5812754A (en) * 1996-09-18 1998-09-22 Silicon Graphics, Inc. Raid system with fibre channel arbitrated loop
US6009481A (en) * 1996-09-30 1999-12-28 Emc Corporation Mass storage system using internal system-level mirroring
US5892913A (en) * 1996-12-02 1999-04-06 International Business Machines Corporation System and method for datastreams employing shared loop architecture multimedia subsystem clusters
US6098149A (en) * 1997-06-13 2000-08-01 Emc Corporation Method and apparatus for extending commands in a cached disk array
US6055603A (en) * 1997-09-18 2000-04-25 Emc Corporation Method and apparatus for performing pre-request operations in a cached disk array storage system
US6041394A (en) * 1997-09-24 2000-03-21 Emc Corporation Disk array write protection at the sub-unit level
US6101589A (en) * 1998-04-01 2000-08-08 International Business Machines Corporation High performance shared cache

Cited By (100)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6347358B1 (en) * 1998-12-22 2002-02-12 Nec Corporation Disk control unit and disk control method
US7516168B2 (en) * 1999-06-25 2009-04-07 Emc Corporation Program for monitoring update activity in a data storage facility
US20040128326A1 (en) * 1999-06-25 2004-07-01 Lecrone Douglas E. Method and apparatus for monitoring update activity in a data storage facility
US7487152B1 (en) * 2000-05-31 2009-02-03 International Business Machines Corporation Method for efficiently locking resources of a global data repository
US6745310B2 (en) * 2000-12-01 2004-06-01 Yan Chiew Chow Real time local and remote management of data files and directories and method of operating the same
US20060031287A1 (en) * 2001-01-29 2006-02-09 Ulrich Thomas R Systems and methods for load balancing drives and servers
US20050144514A1 (en) * 2001-01-29 2005-06-30 Ulrich Thomas R. Dynamic redistribution of parity groups
US20020174295A1 (en) * 2001-01-29 2002-11-21 Ulrich Thomas R. Enhanced file system failure tolerance
US20080126704A1 (en) * 2001-01-29 2008-05-29 Adaptec, Inc. Systems and methods for storing parity groups
US20020124137A1 (en) * 2001-01-29 2002-09-05 Ulrich Thomas R. Enhancing disk array performance via variable parity based load balancing
US20020165942A1 (en) * 2001-01-29 2002-11-07 Ulrich Thomas R. Data path accelerator with variable parity, variable length, and variable extent parity groups
US6754773B2 (en) 2001-01-29 2004-06-22 Snap Appliance, Inc. Data engine with metadata processor
US20020156973A1 (en) * 2001-01-29 2002-10-24 Ulrich Thomas R. Enhanced disk array
US6775792B2 (en) 2001-01-29 2004-08-10 Snap Appliance, Inc. Discrete mapping of parity blocks
US8943513B2 (en) 2001-01-29 2015-01-27 Overland Storage, Inc. Systems and methods for load balancing drives and servers by pushing a copy of a frequently accessed file to another disk drive
US8214590B2 (en) 2001-01-29 2012-07-03 Overland Storage, Inc. Systems and methods for storing parity groups
US6871295B2 (en) * 2001-01-29 2005-03-22 Adaptec, Inc. Dynamic data recovery
US7917695B2 (en) 2001-01-29 2011-03-29 Overland Storage, Inc. Systems and methods for storing parity groups
US20020166079A1 (en) * 2001-01-29 2002-11-07 Ulrich Thomas R. Dynamic data recovery
US8782661B2 (en) 2001-01-29 2014-07-15 Overland Storage, Inc. Systems and methods for load balancing drives and servers
US20020138559A1 (en) * 2001-01-29 2002-09-26 Ulrich Thomas R. Dynamically distributed file system
US20020156974A1 (en) * 2001-01-29 2002-10-24 Ulrich Thomas R. Redundant dynamically distributed file system
US6990667B2 (en) 2001-01-29 2006-01-24 Adaptec, Inc. Server-independent object positioning for load balancing drives and servers
US6990547B2 (en) 2001-01-29 2006-01-24 Adaptec, Inc. Replacing file system processors by hot swapping
US10079878B2 (en) 2001-01-29 2018-09-18 Overland Storage, Inc. Systems and methods for load balancing drives and servers by pushing a copy of a frequently accessed file to another disk drive
US6799284B1 (en) * 2001-02-28 2004-09-28 Network Appliance, Inc. Reparity bitmap RAID failure recovery
US7080197B2 (en) * 2002-04-18 2006-07-18 Lsi Logic Corporation System and method of cache management for storage controllers
US20030200389A1 (en) * 2002-04-18 2003-10-23 Odenwald Louis H. System and method of cache management for storage controllers
US6983349B2 (en) * 2002-04-30 2006-01-03 Hitachi, Ltd. Method, system, and storage controller for controlling shared memories
US20030204683A1 (en) * 2002-04-30 2003-10-30 Hitachi, Ltd. Method, system, and storage controller for controlling shared memories
EP1431861A3 (en) * 2002-12-17 2006-07-19 Activcard Ireland Limited Security token sharable data and synchronization cache
US6965979B2 (en) 2003-01-29 2005-11-15 Pillar Data Systems, Inc. Methods and systems of host caching
US7447939B1 (en) 2003-02-28 2008-11-04 Sun Microsystems, Inc. Systems and methods for performing quiescence in a storage virtualization environment
US8166128B1 (en) 2003-02-28 2012-04-24 Oracle America, Inc. Systems and methods for dynamically updating a virtual volume in a storage virtualization environment
US7430568B1 (en) 2003-02-28 2008-09-30 Sun Microsystems, Inc. Systems and methods for providing snapshot capabilities in a storage virtualization environment
US7383381B1 (en) 2003-02-28 2008-06-03 Sun Microsystems, Inc. Systems and methods for configuring a storage virtualization environment
US7236987B1 (en) 2003-02-28 2007-06-26 Sun Microsystems Inc. Systems and methods for providing a storage virtualization environment
US7290168B1 (en) 2003-02-28 2007-10-30 Sun Microsystems, Inc. Systems and methods for providing a multi-path network switch system
US7610446B2 (en) * 2003-06-19 2009-10-27 Fujitsu Limited RAID apparatus, RAID control method, and RAID control program
US20050216660A1 (en) * 2003-06-19 2005-09-29 Fujitsu Limited RAID apparatus, RAID control method, and RAID control program
US20050050273A1 (en) * 2003-08-27 2005-03-03 Horn Robert L. RAID controller architecture with integrated map-and-forward function, virtualization, scalability, and mirror consistency
US7197599B2 (en) * 2003-12-29 2007-03-27 Intel Corporation Method, system, and program for managing data updates
US20050144381A1 (en) * 2003-12-29 2005-06-30 Corrado Francis R. Method, system, and program for managing data updates
US7770059B1 (en) * 2004-03-26 2010-08-03 Emc Corporation Failure protection in an environment including virtualization of networked storage resources
US20080109627A1 (en) * 2004-11-10 2008-05-08 Matsushita Electric Industrial Co., Ltd. Nonvolatile Memory Device And Method For Accessing Nonvolatile Memory Device
US7320055B2 (en) * 2004-12-02 2008-01-15 Fujitsu Limited Storage system, and control method and program thereof
US20060123200A1 (en) * 2004-12-02 2006-06-08 Fujitsu Limited Storage system, and control method and program thereof
US7779294B2 (en) 2005-04-15 2010-08-17 Intel Corporation Power-safe disk storage apparatus, systems, and methods
US20060236029A1 (en) * 2005-04-15 2006-10-19 Corrado Francis R Power-safe disk storage apparatus, systems, and methods
US20060282700A1 (en) * 2005-06-10 2006-12-14 Cavallo Joseph S RAID write completion apparatus, systems, and methods
US7441146B2 (en) 2005-06-10 2008-10-21 Intel Corporation RAID write completion apparatus, systems, and methods
US7562188B2 (en) 2005-06-17 2009-07-14 Intel Corporation RAID power safe apparatus, systems, and methods
US20060288161A1 (en) * 2005-06-17 2006-12-21 Cavallo Joseph S RAID power safe apparatus, systems, and methods
US20070180296A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Back-annotation in storage-device array
US7644303B2 (en) 2005-10-07 2010-01-05 Agere Systems Inc. Back-annotation in storage-device array
US7653783B2 (en) 2005-10-07 2010-01-26 Agere Systems Inc. Ping-pong state machine for storage-device array
US7769948B2 (en) 2005-10-07 2010-08-03 Agere Systems Inc. Virtual profiles for storage-device array encoding/decoding
US20070180298A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Parity rotation in storage-device array
US8291161B2 (en) * 2005-10-07 2012-10-16 Agere Systems Llc Parity rotation in storage-device array
US20070180297A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Ping-pong state machine for storage-device array
US20070180295A1 (en) * 2005-10-07 2007-08-02 Byrne Richard J Virtual profiles for storage-device array encoding/decoding
US7861107B1 (en) * 2006-08-14 2010-12-28 Network Appliance, Inc. Dual access pathways to serially-connected mass data storage units
US7546415B2 (en) 2006-08-15 2009-06-09 International Business Machines Corporation Apparatus, system, and method for integrating multiple raid storage instances within a blade center
US20080046647A1 (en) * 2006-08-15 2008-02-21 Katherine Tyldesley Blinick Apparatus, system, and method for integrating multiple raid storage instances within a blade center
CN101165666B (en) * 2006-10-17 2011-07-20 国际商业机器公司 Method and device establishing address conversion in data processing system
US20090077333A1 (en) * 2007-09-18 2009-03-19 Agere Systems Inc. Double degraded array protection in an integrated network attached storage device
US7861036B2 (en) 2007-09-18 2010-12-28 Agere Systems Inc. Double degraded array protection in an integrated network attached storage device
US8001417B2 (en) 2007-12-30 2011-08-16 Agere Systems Inc. Method and apparatus for repairing uncorrectable drive errors in an integrated network attached storage device
US20090172464A1 (en) * 2007-12-30 2009-07-02 Agere Systems Inc. Method and apparatus for repairing uncorrectable drive errors in an integrated network attached storage device
US8615678B1 (en) * 2008-06-30 2013-12-24 Emc Corporation Auto-adapting multi-tier cache
US8819478B1 (en) * 2008-06-30 2014-08-26 Emc Corporation Auto-adapting multi-tier cache
CN102541471A (en) * 2011-12-28 2012-07-04 创新科软件技术(深圳)有限公司 Storage system with multiple controllers
US20130239124A1 (en) * 2012-01-20 2013-09-12 Mentor Graphics Corporation Event Queue Management For Embedded Systems
US10157089B2 (en) 2012-01-20 2018-12-18 Mentor Graphics Corporation Event queue management for embedded systems
US9288077B1 (en) * 2012-09-28 2016-03-15 Emc Corporation Cluster file system with server block cache
US10761735B2 (en) * 2014-03-17 2020-09-01 Primaryio, Inc. Tier aware caching solution to increase application performance
US10402113B2 (en) 2014-07-31 2019-09-03 Hewlett Packard Enterprise Development Lp Live migration of data
US11016683B2 (en) 2014-09-02 2021-05-25 Hewlett Packard Enterprise Development Lp Serializing access to fault tolerant memory
US10540109B2 (en) 2014-09-02 2020-01-21 Hewlett Packard Enterprise Development Lp Serializing access to fault tolerant memory
US10594442B2 (en) 2014-10-24 2020-03-17 Hewlett Packard Enterprise Development Lp End-to-end negative acknowledgment
US10664369B2 (en) 2015-01-30 2020-05-26 Hewlett Packard Enterprise Development Lp Determine failed components in fault-tolerant memory
US10402287B2 (en) * 2015-01-30 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in a fault-tolerant memory
US10409681B2 (en) 2015-01-30 2019-09-10 Hewlett Packard Enterprise Development Lp Non-idempotent primitives in fault-tolerant memory
US10185612B2 (en) * 2015-02-20 2019-01-22 Siemens Aktiengesellschaft Analyzing the availability of a system
US20160246661A1 (en) * 2015-02-20 2016-08-25 Kai Höfig Analyzing the availability of a system
US20160266802A1 (en) * 2015-03-10 2016-09-15 Kabushiki Kaisha Toshiba Storage device, memory system and method of managing data
US10402261B2 (en) 2015-03-31 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in fault-tolerant memory fabrics
US20170097887A1 (en) * 2015-10-02 2017-04-06 Netapp, Inc. Storage Controller Cache Having Reserved Parity Area
CN105511811A (en) * 2015-12-07 2016-04-20 浪潮(北京)电子信息产业有限公司 Method and system for raising throughput of file system
US20210090619A1 (en) * 2016-02-09 2021-03-25 Samsung Electronics Co., Ltd. Multi-port memory device and a method of using the same
US11837319B2 (en) * 2016-02-09 2023-12-05 Samsung Electronics Co., Ltd. Multi-port memory device and a method of using the same
US10389342B2 (en) 2017-06-28 2019-08-20 Hewlett Packard Enterprise Development Lp Comparator
US20190042413A1 (en) * 2018-03-02 2019-02-07 Intel Corporation Method and apparatus to provide predictable read latency for a storage device
US20190317898A1 (en) * 2018-04-12 2019-10-17 International Business Machines Corporation Using track locks and stride group locks to manage cache operations
CN111837102A (en) * 2018-04-12 2020-10-27 国际商业机器公司 Managing cache operations using track locks and stride group locks
US11151037B2 (en) * 2018-04-12 2021-10-19 International Business Machines Corporation Using track locks and stride group locks to manage cache operations
US10831597B2 (en) 2018-04-27 2020-11-10 International Business Machines Corporation Receiving, at a secondary storage controller, information on modified data from a primary storage controller to use to calculate parity data
US10884849B2 (en) 2018-04-27 2021-01-05 International Business Machines Corporation Mirroring information on modified data from a primary storage controller to a secondary storage controller for the secondary storage controller to use to calculate parity data
CN109298837A (en) * 2018-09-13 2019-02-01 郑州云海信息技术有限公司 A kind of multi-controller caching backup method, device, equipment and readable storage medium storing program for executing
US11099948B2 (en) * 2018-09-21 2021-08-24 Microsoft Technology Licensing, Llc Persistent storage segment caching for data recovery

Also Published As

Publication number Publication date
WO1999017208A1 (en) 1999-04-08
AU9673398A (en) 1999-04-23
US6381674B2 (en) 2002-04-30

Similar Documents

Publication Publication Date Title
US6381674B2 (en) Method and apparatus for providing centralized intelligent cache between multiple data controlling elements
US6912669B2 (en) Method and apparatus for maintaining cache coherency in a storage system
US5895485A (en) Method and device using a redundant cache for preventing the loss of dirty data
US7600152B2 (en) Configuring cache memory from a storage controller
US6658542B2 (en) Method and system for caching data in a storage system
US6968425B2 (en) Computer systems, disk systems, and method for controlling disk cache
US5459857A (en) Fault tolerant disk array data storage subsystem
US5051887A (en) Maintaining duplex-paired storage devices during gap processing using of a dual copy function
US6151659A (en) Distributed raid storage system
US6058489A (en) On-line disk array reconfiguration
US8028191B2 (en) Methods and systems for implementing shared disk array management functions
US7269667B2 (en) Disk array system and method for migrating from one storage system to another
US7849254B2 (en) Create virtual track buffers in NVS using customer segments to maintain newly written data across a power loss
US6272662B1 (en) Distributed storage system using front-end and back-end locking
US7130961B2 (en) Disk controller and method of controlling the cache
US20070094465A1 (en) Mirroring mechanisms for storage area networks and network based virtualization
US20090228651A1 (en) Mirroring Mechanisms For Storage Area Networks and Network Based Virtualization
JP3713788B2 (en) Storage device and storage device system
US6446220B1 (en) Updating data and parity data with and without read caches
CN111722791A (en) Information processing system, storage system, and data transmission method
JP4911198B2 (en) Storage control device, storage system, and storage control method
CN107533537B (en) Storage system, method for storing and non-transitory computer readable medium
CN111857540A (en) Data access method, device and computer program product
US9244868B2 (en) Leased lock in active-active high availability DAS systems
EP0303856B1 (en) Method and apparatus for maintaining duplex-paired devices by means of a dual copy function

Legal Events

Date Code Title Description
AS Assignment

Owner name: SYMBIOS, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DEKONING, RODNEY A.;WEBER, BRET S.;REEL/FRAME:009066/0951

Effective date: 19980305

AS Assignment

Owner name: LSI LOGIC CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SYMBIOS, INC.;REEL/FRAME:009500/0554

Effective date: 19980922

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: NETAPP, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LSI LOGIC CORPORATION;REEL/FRAME:026661/0205

Effective date: 20110506

FPAY Fee payment

Year of fee payment: 12