WO2007077585A1

WO2007077585A1 - Computer system comprising at least two servers and method of operation

Info

Publication number: WO2007077585A1
Application number: PCT/IT2005/000783
Authority: WO
Inventors: Baldo Dal Porto
Original assignee: Elettrodata S.P.A.
Priority date: 2005-12-30
Filing date: 2005-12-30
Publication date: 2007-07-12

Abstract

The invention regards a computer system (100) comprising at least one server (SRV1) connected to a second server (SRV2), and a storage block (9) of system resources. The system is characterised in that it moreover comprises a management block (7, 7') capable of selectively bringing the system into a first configuration in which the access to the storage block (9) is permitted to the first server and blocked to the second server and into a second configuration opposite the first, the management block permitting a switching from the first to the second configuration following the detection of an operation anomaly of the first server.

Description

DESCRIPTION

"Computer system comprising at least two servers and method of operation"

[0001] The present invention refers to a computer system of the type comprising at least two computers, in particular two servers, connected so to form a single entity for an outside user.

[0002] With the increase in importance of information technology infrastructure, the need has also grown for systems capable of guaranteeing a high reliability of both internal and external services, beginning with the servers. The reliability of a system is tied to its capacity to retake or not interrupt its normal functioning even following a defect (such as a failure or malfunctioning) of its components or of components to which it is connected, and ensuring a certain consistency of the data in this situation.

[0003] In the past, the need for high reliability servers always came up against the complexity and cost of the necessary hardware and software, both for the purchase and for its maintenance.

[0004] With reference to conventional single server systems (i.e., those systems which employ a single computer server) , over the last few years, these have markedly improved under the profile of the capacity of supporting several types of malfunctioning, thanks to the presence of redundant cooling and feeding subsystems and data protection mechanisms with RAID technology (Redundant Array of Independent Disks) , and "intelligent" management systems of memory and processors. It has been sought to limit the damage caused by the failure of the most delicate parts.

[0005] Despite technological progress, however, a single server is still subject to numerous failure or malfunctioning risks, which may block the supply of services. The most serious limitation. is due to the fact that it is not possible to foresee the time necessary for the operation restoration. Indeed, such time depends, in turn, on the assistance times, which vary in relation with the hour (day/night) , day (workday/holiday) and the distance which the technical assistant must cover. The risk is worsened by the times necessary for finding exchange parts, also in consideration of the fact that the technology rapidly evolves, and often - after a few years - some components are no longer easily found on the market. For this reason, some companies decide paradoxically - to purchase two complete servers, and to keep one as a spare for the emergency situations . The cluster technology permits uniting two or more servers, not necessarily equal, within a single logical entity. Moreover, the complexity (and cost) of a cluster with more than two nodes is such that this configuration will not be taken into consideration.

[0006] From the physical point of view, all the servers (nodes) which form part of a cluster are united together by a dedicated connection (normally a LAN) and may gain access to the same pool of disks (with SCSI SATA / ATA technology or Fiber Channel) . The notes all have a local, unshared disk, which hosts the operating system and applications, or part of tern, while the user data lies in the shared pool.

[0007] Another conventional approach to the problem of searching for a system which offers high reliability is that of the cluster technology. The cluster technology foresees the virtualisation of resources such as network names, IP addresses, applications, services and disk space. These virtual units may be active on only one of the nodes of the cluster, and are moved in entirely automatic manner from one note to the other when malfunctioning has occurred or for maintenance operations .

[0008] The users of the services are automatically directed toward the node where they find the corresponding virtual units. There also exist particular applications which function simultaneously on both cluster nodes. [0009] A cluster resolves the problem of high reliability but not that of "continuous processing": when a node fails, in fact, all activities underway on that node are stopped, with all consequent effects, as well the connections opened towards the users. The services are moved on another node, where they are activated in the same made in which they are activated at the starting of the server. All of the main operating systems offer cluster options, very similar to each other in terms of functionality: of these, the most noted and widespread is that of Microsoft^® , which was used for the examples contained in this document.

[0010] A cluster solution permits a considerable increase in quality, under the profile of the technology, with respect to a single server solution, but has however several critical issues. The cost of the operating system and clustering software is very high, equal to over 5 times the cost of the software for a single server; also higher is the cost of configuration, which is a rather delicate operation. The management costs are also high, due to the complexity and fragility of the two nodes in which a single wrong manoeuvre may lead to the loss of all of the user data. The solution, therefore, is not within the capacity of a normal System Administrator. [0011] In addition to the above mentioned drawback, related to the high cost, the cluster also has a limitation due to the fact that the data of the users and of the applications are placed in a pool of disks directly- visible (DAS) from every node through bus of SCSI SATA / ATA type or Fiber Channel. The failure of any component connected to this bus may produce a block of the entire bus and therefore the entire cluster (single point of failure) . [0012] Furthermore, in a cluster, a disk of the pool is used as quorum disk, and contains all that serves for the correct operation of the cluster itself. A situation of data corruption on this disk causes the blocking of the entire cluster (single point of failure) . [0013] Object of the present invention is that of proposing a computer system^' which is less costly with respect to conventional cluster systems and which, at the same time, offers a higher reliability than that offered by the above mentioned single server systems. [0014] The object of the present invention is overcome by a computer system as described in claim 1. Preferred embodiments of the system of the invention are defined in the independent claims from 2 to 14. Also object of the present invention is an operation method of a computer system as described in claim 15 and its preferred embodiments defined in the dependent claims from 15 to 22 .

[0015] To better understand the invention and appreciate the advantages, some of its exemplifying and not limiting embodiments are described below, making reference to the attached drawings, wherein:

Figure 1 schematically shows an embodiment of the computer system according to the invention comprising two servers ;

Figure 2 is a perspective view of said system in assembled configuration;

Figure 3 shows through a flow diagram and in a schematic manner several operation steps of said system according to a particular operating situation. [0016] In figure 1, a computer system 100 is shown in schematic manner, realised in accordance with an example of the invention. The computer system 100 comprises at least a first server SRVl connected to a second server SRV2 so to form a single entity for an outside user who uses the system. It is observed that even if the teachings of the invention are particularly adapted to systems which employ servers, these are also applicable for other types of electronic processors for which the provided reliability may be of interest. [0017] According to the described example, the first server SRVl is of per se conventional type and comprises a first motherboard 1 (MTHB) , a first control and processing unit 2 (CPU) , a first memory 3 (MEM) , such as a RAM memory, a first RAID controller module 4 (CTRL- RAID) . [0018] For example, the motherboard 1 may be an Intel SE7320SP2 board capable of loading a conventional BIOS system (Basic Input/Output System) such as, for example, a AMI BIOS. As known, the BIOS system is a software which determines the operations which the respective computer must carry out without accessing applications residing on a disk.

[0019] As is clear for those skilled in the art, the RAID controller module (where the acronym RAID stands for Redundant Array of Independent Disks) is responsible for the management of the disks, also resolving failure situations by means of reorganisation of the same. [0020] Furthermore, advantageously, the first server SRVl includes a first LAN interface module 5 (INTR-LAN) which permits the server itself to access a LAN (Local Area Network) network and therefore, an outside network like an Internet network.

[0021] In figure 1, a first power supply unit 6 (PWR) is also schematised, such as a power supply connected to the public supply network, in order to provide the server with adequate supply voltage. Such first power supply unit may be controlled by appropriate signals in order to interrupt the supply of the server and, therefore, shut it off.

[0022] The first server SRVl is also provided with a first electronic management block 7 (ELTR-MNG) which is entrusted, as shall be made clear below, with part of the management of anomalies or failures in the system. Advantageously, the first electronic management block is realised in hardware and, for example, by means of integrated logic circuits, so that it results more reliable with respect to software modules. [0023] The second server . SRV2 may be realised in an analogous manner to the first server SRVl. Therefore, the second server SRV2 may comprise modules and blocks analogous to the corresponding components of the first server, represented in figure 1 with the same numerical references used for the first server, but with a prime "'". It is observed that the two servers SRVl and SRV2 are independent from each other and are intended to operate alternatively.

[0024] The computer system 100 also includes an interconnection module 8 (INTRC-MD) having a first input port Pl connected to the first server SRVl (in particular, to the first management block 7) and a second port P2 connected to the second server SRV2 (in particular, to the second, management block 7') and a third port P3 connected to a storage block 9. [0025] For example, the interconnection module 8 is a passive component (provided with appropriately connected transmission or bus lines to the aforementioned ports) which makes possible both the access to the storage block 9 by the first server SRVl or second server SRV2 and the exchange of signals of different type between the first management block 7 and the second management block 7' , and therefore between the two servers.

[0026] The storage block 9 . comprises memory modules (including for example hard disks) in which resources are stored, i.e. data and software applications, of the system 100. In the example of figure 1, a pool of eight hard disks HD1-HD8 is schematically shown, capable of storing at least the operating system (in a first hard disk HDl, for example) , loadable from each of the two servers, and every other software application or data which may be employed by the servers SRVl and SRV2 themselves .

[0027] According to one example, the first hard disk HDl may store the following operating systems: Microsoft Windows Server 2003 Enterprise Edition, Microsoft Windows 2000 Advanced Server, Red Hat Linux Enterprise, SUSE LINUX Enterprise Server and Novell Netware. [0028] Advantageously, each of the memory modules of the storage block 9 is equipped with a disk-management module EC1-EC8 (such as an electronic circuit which implements a control logic) of the respective hard disks HD1-HD8. For example, the hard disks HD1-HD8 may be in SATA technology

(in particular, they are HDU SATA II disks) and, together with the related disk-management module (EC1-EC8,) they are mounted on a respective tray of the type which permits hot swap. [0029] It is important to observe that the disk pool HDl- HD8 is not proprietor of only one of the two servers but is a pool open to the two servers themselves, which, however, only one server may access at a time. Moreover, the disk pool HD1-HD8 is optionally provided with a respective redundant power supply unit 10.

[0030] The storage block 9 is connected to the third port P3 of the interconnection block by means of a suitable bus 11 such as, in accordance with the example technology cited above, a bus of SATA type. [0031] As shown as an example in figure 2, from the external structure standpoint, the computer system 100 may be provided with a single container or chassis 20 in which the first server SRVl, the second server SRV2 and the storage module 9 are housed as well as an interconnection panel (not shown in figure 2) which carries out the role of the module 8.

[0032] Furthermore, both the first server SRVl and the second server SRV2 are equipped with a respective application for the identification of anomalies of the servers. In particular, each motherboard 1 and 1' is capable of running a (per se conventional) "watch-dog" application software such to reveal an anomalous event (such as a failure) of the other server. This type of application may be analogous to that implemented on cluster servers and foresees that one of the two servers periodically sends an inquiry or monitoring signal to the other server. In the case in which no response is received to the interrogation signal, the signal which has sent such signal may have revealed the presence of an anomaly in the other server.

[0033] The detectable anomalies or failures are of different type and for example comprise: the non functioning of the CPU 2 or 2' of the servers, failure of the servers SRVl o SRV2, failure of a hard disk HD1-HD8, failure of the RAID control module 4 or 4' , failure of a power supply unit 6, 6' or 10, operating system hang, connection hang, detachment of the cables, failure of a hub port, non functioning of an application definable by the user. [0034] With reference to figure 3, an operation example of the computer system 100 will now be made.

After a symbolic SRT start step, both the first server SRVl and second server SRV2 are on and therefore powered (PWR-ON step, 31) by the units 6 and 6' . [0035] Following the starting, both servers SRVl and SRV2 carry out loading operations of the related BIOS (LD-BIOS step, 32) . Furthermore, the system 100 is configured such that only one of the two servers, for example the first server SRVl, may access (to have control of it) the storage block 9, by means of the interconnection module 8, and therefore access the contents of the hard disks HD1-HD8.

[0036] Therefore, the first server SRVl accesses the first disk HDl and loads the operating system (LD-OS step, 33) , bringing it to an active state. In such active state, the first server SRVl may access all the disks of the storage block 9 to employ all the software applications residing there, make them run on its own control and processing unit 2, in conventional manner (RUN-SW step, 34) . [0037] On the other hand, the second server SRV2, after having carried out the loading of the BIOS, does not access the storage block 9 and does not carry out the loading of the operating system, bringing it therefore t to a stand-by state. [0038] The different actions of the two servers SRVl and SRV2 following their starting (permitted or blocked access to the storage block 9) is preferably imposed by the relative management block 7 and T . [0039] During the normal operation of the first server SRVl, the second server SRV2 from the stand-by state sends (thanks to its own watch-dog application) inquiry signals so to be able to detect a possible failure situation of the first server SRVl (monitoring MNRT step, 35) . Such inquiry signals are send from the second server SRV2 to the interconnection block 8 and then to the first server SRVl. Furthermore, each disk-management module EC1-EC8 sends and receives signals through the interconnection block 8 to/from the first server SRVl. [0040] In the case in which an anomalous event is revealed related to the first server SRVl (FL-SRV step, 36) the system 100 blocks the first server SRVl' s access to and control of the storage block 9 and instead permits access to the second server SRV2. [0041] In particular, the blocking of access to the storage block of the first server SRVl advantageously occurs by shutting off the first server. More in detail, the management block 7 (which received from the first server SRVl itself a signal which informed it of the anomaly) operating on the power supply unit 6 , controls its switching so to interrupt the electric voltage supply to the first server SRVl (PWR-OFF step, 37) .

[0042] Furthermore, permitting the second server SRV2 access to the storage block 9, it is done so that this carries out a reboot, i.e. loads the operating system from the first hard disk HDl, bringing it to an active state and taking full control of the storage block 9 (switching SWTC-SRV2 step, 38) . The management of the transfer step of the storage block to the second server SRV2 is charged to the second management block V . [0043] At this point, the second server SRV2 accessing the storage block 9 may read or update the data stored by the first server SRVl and run the applications which are resident on such block 9 (RUN-SW step, 39) as needed. Hence, at the completion of the reboot carried out by the second server SRV2, the failure situation of the first server SRVl was resolved, since the computer system 100 is capable of continuing its operation. The example of operation of the described computer system 100 therefore ends with the symbolic end step ED. [0044] It is important to note that the transfer of the control of the storage block 9 of the system resources

(not of only one specific disk) from one server to another is in clear contrast with the mode with which the server clusters operate, which foresee, on the other hand, the transfer of such resources from one server to the other.

[0045] It should be noted that both the first server SRVl and the second server SRV2 are identified by the same IP address IP. Therefore, the outside user which contacts the system 100 through a computer client is not affected by the transfer procedure of the control of the resources from one server to the other as described above . [0046] In order to avoid that an undesired failure of one or both management blocks 7 and 7' causes to connection of both servers SRVl and SRV2 to the storage block 9, it has been advantageously foreseen that the disk-management modules EC1-EC8 operate such that the hard disks HD-HD8 connect to the servers (through the interconnection module 8) in a selective manner, i.e. they connect to only one of the two servers, in each operating condition. [0047] It is observed that even if in the previous description only two servers are shown, the teachings of the present invention are also applicable to more than two servers, wherein on or more of these are normally found in the active state and the others are in the stand-by state (i.e., they have only carried out the loading of BIOS) . Such normally active servers are shut off following the detection of one or more anomalies in analogous manner to that described above. [0048] The computer system 100 is particularly adapted for the following applications: File Server, Application Server, Mail Server, Back End Server, Front End Parallel Processing, Internet Security, VideoSurveillance, Storage Server and CTI (Computer Telephony Integration) . [0049] The invention has considerable advantages with regard to conventional systems and, in particular, the high reliability cluster server systems. In particular, the computer system of the type 100 described above is more economical than the normal cluster servers, with performance in terms of response times (depending on the reboot time of the server to be brought into active state) comparable to that of conventional clusters. [0050] The lower cost of the computer system of the invention is explained by the fact that it does not require the purchase of software (operating system or application) dedicated to the management of the cluster mode, nor is it necessary to purchase the same software for both servers . [0051] The Applicant has experimentally proven that the response times of the system 100 after the detection of a failure are approximately 60 seconds. Furthermore, the types of detectable failure are not less than those of the cluster servers, since the same software applications may be used. [0052] Another advantage is due to the use of distributed management blocks (such as the blocks 7 and 7' and ECl- EC8) preferably realised as hardware which ensures a greater reliability with respect to a software management module and does not require the risky use of quorum disk containing all of the management modes of the system, as instead is undertaken for the cluster servers.

Claims

1. Computer system (100) comprising at least a first computer (SRVl) connected to a second computer (SRV2) , and a storage block (9) of system resources, characterised in that it moreover comprises a management block (7, V) capable of selectively bringing the system into a first configuration in which the access to the storage block (9) is permitted to the first computer and into a second configuration opposite that of the first, the management block permitting a switching from the first to the second configuration following the detection of an operation anomaly in the first computer.

2. System (100) according to claim 1, such that in the first configuration: - the first computer assumes an active state in which it is capable of accessing the storage block so to employ at least one application residing on the storage block; the second computer assumes a stand-by state in which it is excluded from the employment of said at least one application residing on the storage block.

3. System (100) according to claim 2, in which at the detection of an anomaly of the first computer (SRVl) , the management block (7, T) is capable of generating signals for carrying out the following operations : - causing the shutdown of the first computer (SRVl) by blocking its electrical power supply; bringing the second computer from the stand-by state to a respective active state in which it is capable of accessing the storage block (9) to employ said at least one application.

4. System (100) according to claim 1, moreover comprising an interconnection module (8) to permit access to the storage block . (9) by the computers and in which said management block includes a first subblock (7) associated with the first computer and a second subblock

(7') associated with the second computer, each of said subblocks being capable of sending/receiving signals to/from the respective computer and to each other by means of the interconnection module.

5. System (100) according to claim 4, in which each of said first (7) and second subblock (7') is an electronic circuit realised in hardware.

6. System (100) according to claim 3, configured such that: - to bring the first computer into active state, the first computer loads the respective BIOS and may access the storage block (9) to load the operating system, the second computer in the stand-by state has loaded the respective BIOS but is excluded from loading the operating system; - in the transition from the stand-by state to the active state, the second computer is such to carry out a reboot operation, loading the operating system.

7. System (100) according to claim 1, in which the first and second computer have the same IP address.

8. System (100) according to at least one of the previous claims, in which at each of said computers there resides a respective software application for the detection of anomalies so that the second computer in the stand-by state sends inquiry signals to the first computer in the active state, an anomaly being detected when the first computer in the active state does not send a response signal to said inquiry signal.

9. System (100) according to claim 1, in which said anomaly includes at least one of the following conditions related to each of said computers: failure of a processor module of said computers, failure of the active computer, failure of a RAID control module, operating system hang, connection hang, cable detachment, hub port failure, non functioning of an application.

10. System (100) according to claim 1, in which each of said computers include: a motherboard (1, 1'), a control and processing unit (2, 2'), a memory (3, 3'), a RAID control module (4, 4')_/ a LAN interface (5, 5') and a power supply unit (6, 6') .

11. System (100) according to claim 1, in which said storage block (9) comprises a plurality of independent hard disks (HDl-HC8) , each associated with a related control module (EC1-EC8) adapted to permit the access to the related hard disk by said computers in a selective manner .

12. System (100) according to claim 1, in which the system comprises an outer container for housing the first and second . computer, the storage block, the interconnection block.

13. System (100) according to at least one of the previous claims, in which said first and second computer are realised by means of a respective server.

14. System (100) according to claim 13, such to be adapted and to operate in at least one of the following applications: File Server, Application Server, Mail Server, Back End Server, Front End Parallel Processing, Internet Security, VideoSurveillance, Storage Server and CTI (Computer Telephony Integration) .

15. Method of operation of a computer system (100) comprising at least a first computer (SRVl) connected to a second computer (SRV2) , and a storage block (9) of system resources; the method includes: bringing (31, 32, 33) the system into a first configuration in which the first computer (SRVl) may access the storage block and the second computer's access is blocked to the storage block; detecting (36) an operation anomaly of the first computer; - bringing the system into a second configuration (37, 38) in which the second computer (SRV2) may access the storage block and the first computer is blocked access to the storage block.

16. Method of operation according to claim 15, in which bringing the system into the first configuration includes: the first computer accessing the storage block and carrying out (33) a loading of the operating system to bring itself to an active state.

17. Method of operation according to claim 16, in which bringing the system into the first configuration moreover includes: the second computer carrying out (32) the loading of a related BIOS system residing on the second computer .

18. Method of operation according to claim 15, comprising: the first computer (34) in the first configuration running software applications residing on said storage block.

19. Method of operation according to claim 15, in which detecting an anomaly comprises: the second computer sending (35) inquiry signals to the first computer.

20. Method of operation according to claim 15, in which bringing the system into a second configuration includes : blocking (37) the power supply of the first computer, in order to shut it off.

21. Method of operation according to claim 20, in which bringing the system into a second configuration moreover includes: the second computer accessing the storage block and carrying out (38) a loading of the operating system to bring itself to a corresponding active state.

22. Method of operation according to at least one of the claims from 15 to 21, in which said first and second computer are realised by means of a respective server.