DE19726088C1

DE19726088C1 - Performance increase method for multi-processor system

Info

Publication number: DE19726088C1
Application number: DE1997126088
Authority: DE
Inventors: Thomas Dr Rer Nat Delica
Original assignee: Wincor Nixdorf International GmbH
Current assignee: Fujitsu Technology Solutions Intellectual Property GmbH
Priority date: 1997-06-19
Filing date: 1997-06-19
Publication date: 1998-11-12
Anticipated expiration: 2017-06-20

Abstract

The method involves distinguishing between accessible and blocked processes which are arranged in separate queues of a multi-processor system with common main memory and processor-specific cache memories. Each processor is associated with its own queue for accessible processes, while a central queue is provided for the blocked processes. The blocked processes are distinguished between long-term and short-term blocked processes, and the short-term blocked processes are arranged in separate processor-specific queues.

Description

Die Erfindung betrifft Verfahren Verwaltung von Prozessen bei Multiprozessorsystemen entsprechend dem Oberbegriff des An spruches 1 und ein entsprechend gestaltetes Multiprozessor sysstem gemäß Anspruch 3.The invention relates to process management processes Multiprocessor systems according to the preamble of the An saying 1 and a correspondingly designed multiprocessor sysstem according to claim 3.

Es ist allgemmmein bekannt, daß bei der Erweiterung eines Multiprozessorsystems die erreichbare Nutzleistung nicht mit jedem weiteren Prozessor linear ansteigt. Das hat verschie dene Ursachen, die sowohl software- als auch hardwarebedingt sein können. Softwarebedingte Ursachen bestehen z. B. in der mangelnden Parallelisierbarkeit der Programmabläufe und in den möglichen Wartezeiten beim konkurrierenden Zugriff mehre rer Instanzen auf gemeinsame Resourcen, weil die Zugriffe durch Sperrenvergabe serialisiert werden müssen. Auf der Hardwareseite ergeben sich trotz der Verwendung privater Cachespeicher Ablaufverzögerungen, wenn benötigte Daten im Cachespeicher nicht vorhanden sind, wobei dieser Effekt noch verstärkt wird, wenn sich die Zuordnung zwischen einem auszu führenden Prozeß und dem ausführenden Prozessor häufig än dert.It is common knowledge that when expanding a Multiprocessor system not with the achievable useful performance every additional processor increases linearly. That was different causes, which are both software and hardware-related could be. Software-related causes exist e.g. B. in the lack of parallelism of the program sequences and in the possible waiting times for competing access instances of shared resources because of the accesses must be serialized by assigning locks. On the Hardware side arise despite the use of private Cache expiration delays when needed data in Cache memories are not present, this effect still is amplified when the assignment between one is made leading process and the executing processor different.

Eine der Aufgaben des Betriebssystems eines Multiprozes sorsystems ist es daher, für eine optimale Nutzung des Sy stems zu sorgen. Dies betrifft insbesondere die Verwaltung der rechenbereiten und der blockierten Prozesse und die Nut zung der natürlichen Prozeß-Prozessor-Affinität, indem ein Prozessor möglichst lange immer wieder demselben Prozeß zuge ordnet wird.One of the tasks of the operating system of a multiprocess sorsystems is there for optimal use of the Sy to worry about. This particularly affects the administration the computational and the blocked processes and the groove natural process processor affinity by adding a Processor always the same process for as long as possible is arranged.

Das bei Monoprozessorsystemen bewährte Verfahren, blockierte und rechenbereite Prozesse in zentralen Warteschlangen zu verwalten, führt bei Multiprozessorsystemen zu mit der Anzahl der Prozessoren linear ansteigenden Verlusten, die durch die Wartezeiten für die Zuteilung der die Zugriffe serialisie renden Sperren bedingt sind.The tried and tested method for monoprocessor systems blocked and ready processes in central queues manage leads to multiprocessor systems with the number of the processors linearly increasing losses caused by the Waiting times for the serialization of the accesses locking blocks are conditional.

Zur Verringerung dieser Verluste wurde daher bereits vorge schlagen, für jeden Prozessor eine durch eine eigene Sperre geschützte Warteschlange für rechenbereite Prozesse anstelle der zentralen Warteschlange vorzusehen, während die blockierten Prozesse weiterhin in der zentralen Warteschlange geführt werden. Das hat aber den Nachteil, daß für jeden Pro zeß der sehr häufige Zustandsübergang "blockiert" nach "rechenbereit" weiterhin mit der Notwendigkeit der Anfor derung einer globalen Sperre verbunden ist.To reduce these losses, therefore, has already been proposed beat, one for each processor by its own Lock protected queue for computational processes instead of the central queue while the blocked processes remain in the central queue be performed. But this has the disadvantage that for every pro the very frequent state transition "blocked" "ready to compute" continues with the need to request global lock is connected.

Weiterhin ist es bekannt, das Potential der natürlichen Pro zeß-Prozessor-Affinität dadurch zu nutzen, daß bei Verwendung einer zentralen Warteschlange für die rechenbereiten Prozesse einem freien Prozessor aus der Warteschlange nur solche Pro zesse zugeteilt werden, die in jüngster Vergangenheit bereits diesem Prozessor zugeteilt waren. Ein solches Vorgehen erfor dert aber ein relativ aufwendiges Durchsuchen der Warte schlange, was wiederum die Dauer der Sperrennutzung verlän gert und damit die Verluste erhöht.Furthermore, it is known the potential of the natural pro zeß processor affinity to use in that when used a central queue for the computational processes a free processor from the queue only such pro processes that have been allocated in the recent past were assigned to this processor. Such an approach is required but a relatively time-consuming search of the control room queue, which in turn extends the duration of the lock usage and thus increases losses.

Andere bekannte Lösungen weisen neben einer zentralen Warte schlange für rechnerbereite Prozesse für jeden Prozessor noch eine individuelle Warteschlange für rechnerbereite Prozesse hoher Affinität für den jeweiligen Prozessor auf - man siehe US 5,261,053. Auch ist es bekannt, für jeden Prozessor zwei gesonderte lokale Warteschlangen vorzusehen, von denen eine die rechnerbereiten Prozesse aufnimmt, während blockierte Prozesse in die jeweils zweite Warteschlange eingeordnet wer den - man siehe US 5,506,987.Other known solutions have a central control room queue for computer-ready processes for each processor an individual queue for computer-ready processes high affinity for the respective processor - one see US 5,261,053. It is also known to have two for each processor to provide separate local queues, one of which the computer-ready processes while blocking Processes are placed in the second queue den - see US 5,506,987.

Aufgabe der Erfindung ist es, die Nutzleistung eines Multi prozessorsystem durch geeignete Maßnahmen gegenüber den be kannten Systemen zu erhöhen und insbesondere den Verlust anstieg bei zunehmender Anzahl der Prozessoren erheblich zu verringern.The object of the invention is the useful performance of a multi processor system through appropriate measures against the be known systems to increase and especially the loss increased significantly with increasing number of processors reduce.

Dies wird beim Verfahren gemäß Anspruch 1 dadurch erreicht, daß bei den blockierten Prozessen zwischen einer nur kurz fristigen Blockierung und den länger andauernden Blockie rungen unterschieden wird und die kurzfristig blockierten Prozesse ebenfalls in prozessorindividuellen Warteschlangen geführt werden. Damit bleibt ein Prozeß automatisch so lange an einen Prozessor gebunden, bis er längerfristig blockiert wird und mit großer Wahrscheinlichkeit die Affinität zum bisherigen Prozessor verliert. Eine Prüfung der Affinität eines Prozesses zu einem freiwerdenden Prozessor ist daher entbehrlich. Auch braucht eine globale Sperre nur noch mit Ende einer langfristigen Blockierung eines Prozesses angefordert zu werden, und da langfristige Blockierungen selten sind, sind die dadurch entstehenden Verluste weitgehend vernachlässigbar.This is achieved in the method according to claim 1 by that in the blocked processes between one only briefly long-term blocking and the longer-lasting blockage a distinction is made and those temporarily blocked Processes also in processor-specific queues be performed. This means that a process automatically stays that long bound to a processor until it blocks in the long term and with a high probability the affinity for previous processor loses. An affinity test is a process to a processor becoming free dispensable. A global lock only needs to be included End of a long-term blocking of a process to be requested, and since long-term blockages the resulting losses are rare largely negligible.

Um wegen der prozessorindividuellen Warteschlangen für die rechenbereiten Prozesse ungerechtfertigte Leerlaufzustände eines der Prozessoren zu vermeiden, wird gemäß Anspruch 2 das übliche Zuteilungsverfahren aus der prozessorindividuellen Warteschlange durchbrochen, wenn der Leerlaufzustand eine vorgegebene Zeitspanne überdauert. In diesem Falle holt sich der leerlaufende Prozessor einen rechenbereiten Prozeß aus der Warteschlange des zur Zeit am stärksten belasteten Pro zessors.To because of the processor-specific queues for the Computational processes of unjustified idle states To avoid one of the processors, according to claim 2 usual allocation procedure from the processor-individual Queue broken when idle one predetermined period of time persists. In this case catches up the idling processor starts a process ready for computing the queue of the most busy pro at the moment cessors.

Ein für die Betriebsweise entsprechend den Ansprüchen 1 und 2 gestaltetes und die entsprechenden Vorteile aufweisendes Mulitiprozessorsystem ergibt sich aus den Merkmalen des Anspruches 3.One for the mode of operation according to claims 1 and 2 designed and with the corresponding advantages Multi processor system results from the characteristics of Claim 3.

Einzelheiten der Erfindung seien nachfolgend an Hand eines in der Zeichnung dargestellten Ausführungsbeispieles näher beschrieben. Im einzelnen zeigenDetails of the invention are described below with reference to an in the drawing shown embodiment closer described. Show in detail

Fig. 1 ein Multiprozessorsystem mit vier Prozessoren und einer schematischen Darstellung der Warteschlangen struktur, Fig. 1 shows a multiprocessor system with four processors and a schematic representation of the queue structure,

Fig. 2 ein Flußdiagrammm zur Erläuterung der Zuordnung des von einem Prozessor abgegebenen Prozesses zu den Warteschlangen und Fig. 3 ein Flußdiagramm zur Erläuterung der Zuteilung eines leerlaufenden Prozessors an einen Prozeß. Fig. 2 is a Flußdiagrammm for explaining the assignment of the output from a processor to process the queues and Fig. 3 is a flowchart for explaining the allocation of an idle processor with a process.

Fig. 1 zeigt die Warteschlangenstruktur für ein Multiprozes sorsystem mit vier Prozessoren CPU. Jedem dieser Prozessoren sind gemäß der Erfindung jeweils zwei individuelle Warte schlangen LWS-R und LWS-W zugeordnet, nämlich eine LWS-R für rechenbereite Prozesse und eine LWS-W für kurzfristig blockierte Prozesse. Langfristig blockierte Prozesse werden dagegen in einer globalen zentralen Warteschlange GWS-W geführt. Alle Warteschlangen sind inbekannter Weise durch Sperren geschützt, wobei jedoch für die prozessorindivi duellen Warteschlangenpaare LWS-R/LWS-W jeweils nur eine Sperre erforderlich ist. Fig. 1 shows the queue structure for a multiprocessor system with four processors CPU. According to the invention, each of these processors is assigned two individual queues LWS-R and LWS-W, namely an LWS-R for computational processes and an LWS-W for processes blocked for a short time. Long-term blocked processes, on the other hand, are managed in a global central queue GWS-W. All queues are protected by locks in a known manner, but only one lock is required for the processor-specific queue pairs LWS-R / LWS-W.

Die Zuordnung der einzelnen Prozesse zu den einzelnen Warte schlangen sei an Hand des Flußdiagrammes von Fig. 2 erläutert. Ausgangspunkt ist dabei die Abgabe durch einen Prozessor CPU. Handelt es sich dabei um einen rechenbereiten Prozeß, weil z. B. eine Zeitscheibe abgelaufen ist oder der Prozeß durch einen höherprioren verdrängt wurde, dann wird dieser in die prozessorindividuelle lokale Warteschleife LWS-R für rechenbereite Prozesse eingehängt. Ist der Prozeß dagegen blockiert, dann entscheidet der Grund der Blockierung darüber, welche Warteschlange den Prozeß übernimmt. Ist die Blockierung kurzfristig, weil das baldige Ende der Blockierung absehbar ist, wie z. B. das Ende einer Platten-Ein-/Ausgabe, dann erfolgt die Einhängung in der lokalen Warteschleife LWS-W für blockierte Prozesse. Ist dagegen das Ende der Blockierung nicht absehbar, wie z. B. beim Warten auf eine Nachricht, dann handelt es sich um eine langfristige Blockierung, und der Prozeß ist in die globale Warteschlange GWS-W für blockierte Prozesse einzuhängen.The assignment of the individual processes to the individual control queues is explained using the flow chart of FIG. 2. The starting point is the delivery by a processor CPU. Is it a computational process because z. B. a time slice has expired or the process has been displaced by a higher priority, then this is hooked into the processor-specific local queue LWS-R for computational processes. If, on the other hand, the process is blocked, the reason for the block determines which queue takes over the process. Is the blocking short-term because the end of the blocking is foreseeable soon, e.g. B. the end of a disk input / output, then the mounting in the local queue LWS-W for blocked processes. However, if the end of the blockage is not foreseeable, such as. For example, when waiting for a message, it is a long-term block and the process must be put on the global queue GWS-W for blocked processes.

Endet die Blockierung eines Prozesses, dann wird ein Prozeß aus der lokalen Warteschlange LWS-W für blockierte Prozesse grundsätzlich immer in die Warteschlange LWS-R für rechen bereite Prozesse desselben Prozessors CPU eingehängt. Die Prozeß-Prozessor-Affinität bleibt also gewahrt. Prozesse aus der globalen Warteschleife GWS-W werden dagegen jeweils in die Warteschleife LWS-R für rechenbereite Prozesse des am wenigsten ausgelasteten Prozessors CPU eingehängt.If the blocking of a process ends, then it becomes a process from the local queue LWS-W for blocked processes always in the queue LWS-R for computing ready processes of the same processor CPU mounted. The Process-processor affinity is therefore preserved. Processes the global queue GWS-W, however, are each in the LWS-R on hold for computational processes of the least CPU loaded.

Beim Freiwerden eines Prozessors CPU wird diesem in der Regel ein Prozeß aus der eigenen Warteschlange LWS-R für rechen bereite Prozesse zugeteilt, was entsprechend einer vorgebenen Strategie erfolgt, z. B. prioritätsgesteuert oder nach dem "first-in-first-out" Prinzip.As a rule, when a processor becomes free, it becomes CPU a process from the own queue LWS-R for computing prepared processes assigned, which according to a given Strategy is done, e.g. B. priority or after "First-in-first-out" principle.

Um dabei jedoch ungerechtfertigte Leerlaufzustände eines Prozessors zu vermeiden, wird entsprechend dem Flußdiagramm von Fig. 3 verfahren. Sobald ein Prozessor feststellt, daß seine Warteschlange LWS-R für rechenbereite Prozesse leer ist, dann wird eine Überwachung für die Zeitdauer T_W gestartet. Wird innerhalb dieser Überwachungsdauer kein neuer Prozeß durch die eigene Warteschlange LWS-R für rechenbereite Prozesse bereitgestellt, dann wird überprüft, ob auch die entsprechenden Warteschlangen LWS-R der andere Prozessoren leer sind. Ist dies der Fall, dann erfolgt ein Fremdzugriff, d. h. der Prozessor holt sich einen rechenbereiten Prozeß aus der Warteschlange des am stärksten ausgelasteten Prozessors.However, in order to avoid unjustified idle states of a processor, the flowchart in FIG. 3 is used. As soon as a processor determines that its queue LWS-R is empty for computational processes, monitoring is started for the period T _W. If no new process is made available by the own queue LWS-R for computational processes within this monitoring period, it is checked whether the corresponding queues LWS-R of the other processors are also empty. If this is the case, external access takes place, ie the processor fetches a process that is ready for computing from the queue of the processor with the highest load.

Die Auslastung eines Prozessors CPU kann beispielsweise daran gemessen werden, wie oft er pro Zeiteinheit auf die Warte schlange LWS-R eines anderen Prozessors zugegriffen hat. Je kleiner die Anzahl dieser Fremdzugriffe ist, desto höher ist der Prozessor ausgelastet.The utilization of a processor CPU can be an example of this measured how often he waits per unit of time queue LWS-R has accessed another processor. Each the smaller the number of these external accesses, the higher the processor is busy.

Claims

1. Process for the management of processes in multiprocessor systems with shared main memory and process-specific cache memory, a distinction being made between compute-ready and blocked processes, which are placed in separate queues, and each processor (CPU) has its own queue (LWS-R) is assigned for processes ready for computing, while a central queue (GWS-W) is provided for the blocked processes, characterized in that a distinction is made between long-term and short-term blocked processes in the blocked processes and that the short-term blocked processes are separated into separate process-specific queues (LWS-W) for short-term blocked processes.

2. The method according to claim 1, characterized in that each processor (CPU) checks when it is freed whether its own queue (LWS-R) is empty for computational processes, that a processor (CPU), its own queue (LWS-R ) for computational processes is empty until a computational process from its own queue (LWS-R) is available for computational processes or a specified period of time (T _W ) has elapsed, and that if a computational process has not occurred within this period (T _W ) a computer-ready process from the queue (LWS-R) for computer-ready processes of the most heavily loaded processor (CPU) in the system.

3. Multiprocessor system with shared memory and processor-specific cache, in which between right processes that are ready and blocked are distinguished, which are placed in separate queues, and at each processor (CPU) has its own queue (LWS-R) is allocated for computational processes, while for the processes blocked a central queue (GWS-W) is seen characterized, that in addition to the processor-specific queues (LWS-R) one processor each for processes ready for computing individual queue (LWS-W) for short-term blocked Processes of the associated processor (CPU) is provided, while the central queue (GWS-W) for longer term blocked processes is provided.