CN1091274C - System and method for reducing power consumption in electronic circuit - Google Patents

System and method for reducing power consumption in electronic circuit Download PDF

Info

Publication number
CN1091274C
CN1091274C CN97117939A CN97117939A CN1091274C CN 1091274 C CN1091274 C CN 1091274C CN 97117939 A CN97117939 A CN 97117939A CN 97117939 A CN97117939 A CN 97117939A CN 1091274 C CN1091274 C CN 1091274C
Authority
CN
China
Prior art keywords
circuit
instruction
processor
packing
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN97117939A
Other languages
Chinese (zh)
Other versions
CN1180194A (en
Inventor
阿尔伯特·J·卢普尔
苏姆亚·马里克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN1180194A publication Critical patent/CN1180194A/en
Application granted granted Critical
Publication of CN1091274C publication Critical patent/CN1091274C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/325Power saving in peripheral device
    • G06F1/3275Power saving in memory, e.g. RAM, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1028Power efficiency
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

While the fetch circuit is operated in the 1st power mode, N pieces of instructions are fetched at a maximum from the memory for every cycle of the fetch circuit. While the fetch circuit is operated in the 2nd power mode, M pieces of instructions are fetched at a maximum from the memory for every cycle of the fetch circuit, M,N is the integer of N>M>0.

Description

Reduce the system and method for power consumption in electronic circuit
Present patent application is relevant with following common unsettled U.S. Patent application: people's such as Loper common unsettled U.S. Patent Application Serial Number 08/726,871, act on behalf of number of documents AA9-96-071, exercise question be " system and method for reduction power consumption in electronic circuit "; People's such as Loper common unsettled U.S. Patent Application Serial Number 08/726,395, act on behalf of number of documents AT9-96-198, exercise question and be " reducing the system and method for power consumption in electronic circuit "; People's such as Loper common unsettled U.S. Patent Application Serial Number 08/726,370, act on behalf of number of documents AT9-96-199, exercise question and be " reducing the system and method for electronic circuit merit ".
Technical field
Present patent application relate generally to electronic circuit relates in particular to the system and method that reduces power consumption in electronic circuit.
Background technology
In recent years, portable laptop computer increased popularity.In order to improve portable these laptop computers often is battery-powered.Best, in battery charge or before replacing it, battery powered laptop computer can move the lasting time under powered battery.
Therefore, in order to prolong the working time of electronic circuit before battery charge or replacing battery, the power consumption that reduces the electronic circuit of laptop computer is important.For this purpose, the technology before some are stopped power supply according to the activity that does not sense particular type in official hour or are stopped clock signal to electronic circuit.The shortcoming of " timer " technology before these is that even when electronic circuit is not carried out any operation, electronic circuit can unnecessarily consume too much electric energy in the time of waiting timer expiration.
Thereby, need a kind ofly make electronic circuit consume the method and system of less additional electrical energy with respect to former technology.
Summary of the invention
When the circuit of packing into moved under the full power mode, in each cycle of the circuit of packing into, from storer maximum N data bit of packing into, wherein N was integer and N>1.When the circuit of packing into moved under low power mode, in each cycle of the circuit of packing into, from storer maximum M data bit of packing into, wherein M was integer and 0<M<N.
Its technological merit is the additional electrical energy that the technology before the electronic circuit consumption rate will be lacked.
Description of drawings
By understanding exemplary embodiment and advantage thereof better with reference to following explanation and accompanying drawing, accompanying drawing is:
Fig. 1 is the calcspar according to the processor system that is used for process information of this exemplary embodiment;
Fig. 2 is the calcspar of sequencer unit of the processor of Fig. 1;
Fig. 3 is the calcspar of instruction buffer formation of the sequencer unit of Fig. 2;
Fig. 4 is the ordering impact damper of the sequencer unit of key diagram 2 conceptually;
Fig. 5 is the impact damper of renaming of the processor of key diagram 1 conceptually;
Fig. 6 is the calcspar of instruction cache of the processor of Fig. 1; And
Fig. 7 is the schematic circuit diagram of reading amplifying circuit of the instruction cache of Fig. 6.
Embodiment
By accompanying drawing, can understand a kind of exemplary embodiment and its advantage better referring to figs. 1 to Fig. 7.
Fig. 1 is the calcspar according to processor 10 systems of this exemplary embodiment process information.In this exemplary embodiment, processor 10 is single integrated circuit superscale microprocessor.Thereby as further discussion the hereinafter, processor 10 comprises various unit, register, impact damper, storer and other parts that are made of integrated circuit entirely.And in this exemplary embodiment, processor 10 calculates (" RISC ") technical operation according to reduced instruction set.As shown in fig. 1, the Bus Interface Unit (" BIU ") 12 of system bus 11 and processor 10 is connected.Information between BIU12 controlling processor 10 and the system bus 11 transmits.
The instruction cache 14 and the data cache 16 of BIU12 and processor 10 are connected.Instruction cache 14 is to sequencer unit 18 output orders.Response is from these instructions of instruction cache 14, and sequencer unit 18 optionally outputs to instruction other executive circuit of processor 10.
Except that sequencer unit 18, the executive circuit of processor 10 comprises a plurality of performance elements in this exemplary embodiment, that is, buanch unit 20, fixed point unit (" FXU ") 22, fix a point unit (" CFXU ") 26, pack into/storage unit (" LSU ") 28 and floating point unit (" FPU ") 30 again.FXU22, CFXU26 and LSU28 are from each the general system register (" GPRs ") 32 and impact damper 34 their the source operand information of input of renaming of fixing a point.In addition, FXU22 is from 42 inputs " carry digit " of carry digit (" CA ") register.FXU22, CFXU26 and LSU28 output to fixed point to their operation result (destination operand information) and rename on the selected inlet of impact damper 34 with storage.And, CFXU26 and each special register (" SPRs ") 40 dealings ground input and output source operand information and destination operand information.
FPU30 is from each floating-point system register (" FPRs ") 36 and floating-point impact damper 38 its source operand information of input of renaming.FPU30 outputs to floating-point to its operation result (destination operand information) and renames on the selected inlet of impact damper 38 to store.
The response load, LSU28 is from data cache 16 input informations and duplicate this information several to select to rename impact damper 34 and 38.If this information is not stored in the data cache 16, then data cache 16 is imported these information (by BIU12 and system bus 11) from the system storage 39 that is connected with system bus 11.And data cache 16 can be from data cache 16 to system storage 39 output (by BIU12 and the system bus 11) information that are connected with system bus 11.The response storage instruction, LSU28 from GPSs32 and FPRs36 a selected input information and this information reproduction in data cache 16.
18 contact ground, sequencer unit and GPRs32 and FPRs36 input and output information.The instruction and the signal of buanch unit 20 18 input expression processors, 10 current states from the sequencer unit.Respond these instructions and signal, the signal of the suitable storage address of the instruction sequence of being carried out by processor 10 is being stored in buanch unit 20 (to sequencer unit 18) output expression.Response is from these signals of buanch unit 20, and sequencer unit 18 is from instruction cache 14 input index instruction sequences.If the one or more instruction in this instruction sequence is not stored in the instruction cache 14, instruction cache 14 from system storage 39 that system bus 11 is connected (by BIU12 and system bus 11) import these instructions.
Response is from the instruction of instruction cache 14 input, sequencer 18 selectively with these instruction schedulings in the performance element of selecting 20,22,26,28 and 30 several.Each performance element is carried out one or more instructions of specific instruction family.For example, FXU22 carries out the computing of first kind fixed-point arithmetic to source operand, such as addition, subtraction, AND-operation, OR operation and " exclusive OR " operation.CFXU26 carries out the second class fixed-point arithmetic to source operand, such as fixed-point multiplication and removing.FPU30 carries out floating-point operation to source operand, as floating-point multiplication or division.
When information stores when renaming 34 li in impact damper for selected one, this information is relating to the memory location (being or CA register 42 among the GPRs32) by the selected instruction regulation that impact damper distributed of renaming.Response is from the signal of sequencer 18, and the information that is stored in selected 34 li in an impact damper of renaming is copied to a general system register (or CA register 42) lining relevant among the GPSs32.Sequencer 18 guiding is this to the duplicating of the information that is stored in selected 34 li in an impact damper of renaming, and generates the requirement of this information instruction with response " finishing ".This duplicating is called as " write-back ".
When information stores when renaming 38 li in impact damper for selected one, this information relates among the FPRs36.Response is from the signal of 18 li of sequencers, and the information that is stored in selected 38 li in an impact damper of renaming is copied among the FPRs36 relevant one.Sequencer 18 guiding is this to being stored in duplicating of selected 38 li in an impact damper of renaming, and generates the requirement of this information instruction with response " finishing ".
By handle a plurality of instructions simultaneously in several different performance elements 20,22,26,28 and 30, processor 10 reaches high-performance.Therefore, each instruction is the series processing by several stages, each stage can with a plurality of stage executed in parallel of other instruction.This technology is called " pipelining ".In this exemplary embodiment, an instruction is carried out by six stages usually, promptly takes out, deciphers, dispatches, carries out, finishes and write-back.
In the taking-up stage, sequencer unit 18 (from instruction cache 14) is in being stored in instruction sequence and the one or more instructions of a plurality of storage addresss input optionally, above this are being done more detailed discussion in conjunction with buanch unit 20 and sequencer unit 18.
In the decoding stage, sequencer unit 18 is at the most to two instruction decodes of having taken out.
At scheduling phase, sequencer unit 18 is optionally given selected (according to the decoding in decoding stage) some performance elements 20,22,26,28 and 30 instruction scheduling after two decoding at the most keep the impact damper inlet of renaming for the instruction results (destination operand information) after the scheduling after.At scheduling phase, operand information offers the selected some performance elements that are used to dispatch the back instruction.Processor 10 is according to its programmed order dispatch command.
In the execute phase, performance element is carried out the operation result (destination operand information) that instructs after their scheduling and export them to store the selected porch in the rename impact damper 34 and the impact damper of above having discussed 38 of renaming into.By this way, processor 10 can be with the order execution command of the programmed order that is different from it.
In the stage of finishing, an instruction " has been finished " in 18 indications of sequencer unit.Processor 10 " is finished " instruction by its programmed order.
In write back stage, sequencer 18 guides respectively from impact damper 34 and 38 to GPRs32 and the information reproduction of FPRs36 of renaming.18 guiding of sequencer unit are this to being stored in duplicating of the information in the selected reason name impact damper.And in the write back stage of a certain specific instruction, processor 10 these specific instruction of response are upgraded its system status.Processor 10 is by each " write-back " stage of its programmed order processing instruction.Under the situation of regulation, processor 10 merges the stage of finishing and the write back stage of an instruction valuably.
In this exemplary embodiment, for each stage every instruction of finishing instruction process needs a machine cycle.But some instruction (for example multiple fixed point instruction of being carried out by CFXU26) may need the more than one cycle.Therefore, according to finishing the required different time of above-mentioned instruction, between the execute phase of specific instruction and the stage of finishing different delays may appear.
Realize and operation processor 10 according to five kinds of power modes.Four kinds of methods of operation that mode is " economize on electricity " in these five kinds of power modes.According to the state of the control bit in machine status register(MSR) (" MSR ") and the hardware realization register, these five kinds of power modes obtain selectable startup and forbid.These register-bit are in SPRs40.Therefore, according to the execution of CFXU26 to the move instruction of sensing SPRs40, these control bits obtain setting and/or removing.These five kinds of importances that power mode is full power, microsleep, nap, sound sleep and this exemplary embodiment: low power mode.
1. full power mode.The full power mode is the default power supply mode of processor 10.Under the full power mode, processor 10 complete carrying are powered, each unit moves under the processor clock speed of processor 10.Processor 10 is also carried out a kind of dynamic power management mode that optionally starts and forbid.If the enables dynamic power management is not influencing performance, software execution or do not influencing under the external hardware circuit, each unit of the free time in the processor 10 automatically enters low power state.
Microelectric technique branch of IBM Corporation (phone the is 1-800-PowerPc) issue " PowerPc 630e RISC microprocessor user manual " that is positioned at Hopewell Junction town, New York more completely illustrates above-mentioned dynamic power management mode and full power, microsleep, nap and sound sleep power supply mode, thereby this handbook is all as the list of references of this paper.In addition, the dynamic power management mode is at United States Patent (USP) 5,420, obtains explanation in 808, and this patent is all as the list of references of this paper.In this exemplary embodiment, processor 10 is the enhancement mode Power PC 603e RISC microprocessor that can buy from the microelectric technique branch of IBM Corporation in Hopewell Junction town, New York.Because processor 10 is realized low power mode, so strengthen with respect to Power PC 603e RISC microprocessor processor 10.Thereby low power mode is an importance of this exemplary embodiment.
2. microsleep mode.Under the microsleep mode, except that phase-locked loop (" the PLL ") (not shown in figure 1) of the time base/decremeter register (not shown in figure 1) of the bus snooping logic of BIU12, processor 10 and processor 10, all unit of the processor 10 of stopping using.Under the microsleep mode, the PLL of processor 10 keep full power state and maintenance and system bus 11 external system clock synchronously, thereby only in several clock period of processor, just can realize returning to the full power mode.
From the microsleep mode, the external asynchronous that processor 10 response disrupted circuit INT assert interrupts turning back to the full power mode, thereby INT provides a signal with logical one state to processor 10.And from the microsleep mode, the system management interrupt that processor 10 responding system management interrupt circuit SMI assert turns back to the full power mode, and SMI provides a signal with logical one state to processor 10 like this.And, response attenuation meter unusual, hard or warm reset or machine check MACH CHK input, processor 10 turns back to the full power mode from the microsleep mode.
Response switches to predetermined voltage (for example 2.5 volts) with respect to voltage reference node GND voltage supply node Vdd from low-voltage (for example 0 volt) and hard reset occurs.It should be noted that for clarity sake Fig. 1-6 does not all show the connection of treated machine 10 from INT, SMI, Vdd and GND to various circuit.Response warm reset processor 10 turns back to the full power mode from arbitrary battery saving mode, response CFXU26 carries out the suitable move instruction towards SPRs40 under warm reset, control bit is set and/or removes, and these move instructions are the parts in software reset's instruction sequence.
3. nap mode.With respect to the microsleep mode, by the bus snooping logic of the BIU12 that stops using, the nap mode further reduces the power consumption of processor 10, thereby has only PLL and Shi Ji/decremeter register to keep the full power state.The external asynchronous interruption that response disrupted circuit INT asserts, system management interrupt, decremeter are unusual, hard reset or warm reset and machine check MACH CHK input, and processor 10 turns back to the full power mode from the nap mode.As the microsleep mode, only in several clock period of processor 10, just can realize from nap mode returning to the full power mode.
4. sound sleep mode.Under the sound sleep mode, by all unit of the processor 10 of stopping using, power consumption is reduced to almost minimum, at the external logic signal of this mode postprocessor 10 can stop using PLL and external system clock.Response enables PLL and external system clock again, and the suitable PLL that allows of process becomes the minimum time synchronous with external system clock, then unusual by the asserting of disrupted circuit INT, system management interrupt, decremeter then, hard reset or warm reset or machine check MACH CHK input, processor 10 turns back to full power master formula from the sound sleep mode.
5. low mode.In an importance of this exemplary embodiment, a kind of in processor 10 response (1) hardware events or (2) software event enters low mode.In this exemplary embodiment, when hardware event appears in converter 41 when circuit HPS (hardware event, energy-conservation, low mode) goes up the signal that output has the logical one state.Similar, when software event appears in SPRs40 when circuit SPS (software event, energy-conservation, particular form) goes up the signal that output has the logical one state.Response CFXU26 carries out suitable the instructing to special register transmission (" MTSPR ") of a pre-determined bit of " HID0 " register that points to SPRs40, and SPRs40 outputs to sort signal on the SPS.
Converter 41 comprises the thermal sensor to the relative temperature sensitivity of the integrated circuit that constitutes processor 10.Response (converter 41) thermal sensor detects the relative temperature that exceeds threshold temperature, and hardware event (being that converter 41 is exported the signal with logical one state on HPS) takes place.In this exemplary embodiment, the threshold temperature maximum safety temperature that processor 10 moves under the full power mode of hanking in advance, therefore, if the temperature of processor 10 surpasses the maximum safety temperature of the processor that moves under the full power mode, then processor 10 continuation operation under the full power mode might cause damage.Advantageously, enter special " economize on electricity " method of operation, can avoid this damage greatly by making processor 10 response hardware events.
If processor 10 response hardware events enter low mode, processor 10 reduces to take out in each cycle of processor 10 maximum quantity of instruction, thereby further discuss like that in conjunction with Fig. 2 and Fig. 3 as following, in each cycle of processor 10, dispatch less instruction.By this way, each performance element more may be idle, the therefore easier low power state that calls dynamic current way to manage (at United States Patent (USP) 5,420,808 in explanation) valuably.In addition, if response hardware event processor 10 enters particular form, processor 10 changes the operation of LSU28, as following discuss in conjunction with Fig. 5.
Compare with hardware event, if processor 10 response software events enter low mode, processor 10 (a) reduces the maximum quantity of the instruction that processor taken out in 10 single cycles as further discuss in conjunction with Fig. 2 and Fig. 3 the back, (b) as further discuss in conjunction with Fig. 5 the back, change the operation of LSU28, and (c) further discuss like that by minimizing in conjunction with Fig. 6 as the back and instruct the quantity on " road " of data cache 16 of cache 14 to reduce their power consumption.
Response SPS and HPS do not have the logical one state, and processor 10 turns back to the full power mode from low mode.And, if only according to software event (be SPS have the logical one state simultaneously HPS have the logical zero state) processor 10 enters into low mode, then processor 10 and then response (1) are interrupted through the external asynchronous of asserting of INT, perhaps (2) hard reset or warm reset, perhaps (3) machine check MACH CHK input (from low mode) turns back to the full power mode.In a kind of embodiment of selecting fully, if processor 10 only enters into low mode according to software event, then the system management interrupt of asserting of response process SMI (from low mode) turns back to the full power mode after the processor 10.When selecting embodiment fully, processor 10 will turn back to the full power mode according to asserting of SMI such, and it is similar to processor 10 and turns back to the full power mode according to asserting of INT.
Select fully in the embodiment at another, processor 10 is gone back the response attenuation meter and is turned back to the full power mode unusually.SPRs40 is comprising the circuit of clock signal (for concisely not shown at Fig. 1) countdown that is used for according to processor.Responding this counting is decremented to null value to generate decremeter unusual.
Fig. 1 represents that wall scroll SPS circuit and instruction cache 14, data cache 16, sequencer unit 18 are connected with among the LSU28 each.Equally, Fig. 1 represents that wall scroll HPS circuit and instruction cache 14, data cache 16, sequencer unit 18 are connected with among the LSU 28 each.Similarly, Fig. 1 represents that wall scroll INT circuit and instruction cache 14, data cache 16, sequencer unit 18 are connected with among the LSU 28 each.
Fig. 2 is the calcspar of sequencer unit 18.That is discussed as mentioned is such, if move under the full power mode taking out phase process machine 10 (therefore and fetch logic 71), then fetch logic 71 optionally arrives 70 li of instruction buffers to maximum two instructions of instruction cache 14 requests (processor 10 and each cycle of fetch logic in) thus and instruction storage.Therefore, in the specific period of processor 10, sequencer 18 is to the instruction of instruction cache 14 request variable numbers (scope from 0 to 2), and wherein this variable quantity depends on the extra-instruction number (promptly depending on spendable number of buffers in the instruction buffer 70) that can also store in the instruction buffer 70.
In the decoding stage, if processor 10 (thus and decoding logic 72) operates under the full power mode, then decoding logic 72 is from instruction buffer 70 inputs and decoding two taking-up instruction (in each cycle of processor 10 and decoding logic 72) at the most.Therefore, in the specific period of processor, decoding logic 72 is from the instruction of instruction buffer 70 input and decoding variable number (scope from 0 to 2), and wherein this variable number depends on the quantity of the instruction that scheduling logic 74 is dispatched during this specific period.
At scheduling phase, if processor 10 (thus and scheduling logic 74) operates under the full power mode, then (according to the decoding in decoding stage) several performance elements 20,22,26,28 and 30 of selecting of 74 pairs of scheduling logics optionally are dispatched to the instruction (in each cycle of processor 10 and scheduling logic 74) after many two decodings.Therefore, in the specific period of processor 10, the translation instruction of the 74 pairs of performance elements of scheduling logic scheduling variable numbers (scope from 0 to 2), wherein this variable quantity depends on and can store the quantity (quantity that for example depends on each performance element spendable reservation station) of each performance element with the instruction that adds carried out into.
By comparison, in this exemplary embodiment, if processor 10 moves under low power mode, then fetch logic 71 (according to the logic state of SPS, MPS and INT) in each cycle of processor 10 to the request of instruction cache 14 at the most an instruction (rather than two instructions) and this instruction storage in instruction buffer 70.Under this mode, (a) the about instruction of taking-up of decoding logic input and decoding (on average) in the cycle of each processor 10 from instruction buffer 70, (b) scheduling logic 74 (on average) is about an instruction (in each cycle of processor 10) to selected in performance element 20,22,26,28 and 30 scheduling, and (c) instruction (in each cycle of processor 10) of completion logic 80 indications (on average) " finishing " (following further discussion).Therefore therefore, (with respect to the full power mode) each performance element more likely is idle, can call the low power state of dynamic power way to manage (United States Patent (USP) 5,420,808 in explanation) advantageously easilier.
In a kind of embodiment of selecting fully, if processor 10 operates under the low power mode, then scheduling logic (according to the logic state of SPS, MPS and INT) in each cycle of processor 10 in performance element 20,22,26,28 and 30 selected one be dispatched to many instructions (to replace dispatching two instructions); This this technology of selecting embodiment has fully substituted the technology of maximum quantity of the instruction that reduces to take out of (but also can replenish) this exemplary embodiment in the single cycle of processor 10.Therefore, Fig. 2 shows that SPS, HPS are connected with fetch logic 71 and scheduling logic 74 with INT.
Fig. 3 is the calcspar of instruction impact damper 70.Instruction buffer 70 stores I0 instruction and I1 instruction respectively among the impact damper I0 and impact damper I1 of scheduler buffer 56.In this exemplary embodiment, the one-period of response handler 10, perhaps the I0 instruction is scheduled for decoding logic 72 (Fig. 2) separately, and perhaps I0 and I1 instruction is scheduled for decoding logic 72 together; Perhaps the I1 instruction is scheduled for decoding logic 72 separately.Content through circuit 55a, 55b impact damper I0 and I1 outputs to decoding logic 72 respectively.
In this exemplary embodiment, instruction buffer 70 can be imported 32 instruction of two of as many as concurrently from instruction cache 14 by 64 buses 50 in the cycle of single processor 10.Response I0 and I1 instruction are all being dispatched to decoding logic 72, and instruction buffer 70 is sent to impact damper I0 and I1 to any instructions of storage in the past respectively from instruction buffer 54a-b.And in this case, instruction buffer 70 is sent to instruction buffer 54a-b to any instruction of storage in the past respectively from instruction buffer 52a-b.And, in this case, if processor 10 operates under the full power mode, instruction buffer 70 is imported the instruction of 32 of too many by twos through 64 buses 50 from instruction cache 14, and pair spendable (promptly these instruction storage to the first, empty, do not storing and instructing) impact damper, i.e. (a) impact damper I0 and I1, (b) 54b and 54a, (c) arbitrary centering of 52b and 52a, but from impact damper I0.
Response is arrived decoding logic 72 to the I0 instruction scheduling separately, and instruction buffer 70 is sent to impact damper I0 to any instructions of storage in the past from impact damper I1.And, in this case, instruction buffer 70 is sent to I1 to any instruction of storage in the past from instruction buffer 54a, be sent to instruction buffer 54a from instruction buffer 54b, be sent to instruction buffer 54b and be sent to instruction buffer 52a from instruction buffer 52a from instruction buffer 52b.And, in this case, instruction buffer 70 by 64 buses 50 from single 32 instruction of instruction cache 14 inputs, and this instruction storage in first spendable impact damper, be among impact damper I0, I1,54b, 54a, 52b, the 52a, from impact damper I0.
If processor 10 moves under the full power mode, then instruction buffer 70 can be imported maximum two 32 instruction from instruction cache 14 by 64 buses 50 concurrently in the single cycle of processor 10.By this way, as an example because the instruction cache 14 are one four tunnel associative caches in groups, processor in 10 these single cycles at least 256 sensor amplifiers (64/road * 4 tunnel * 1 sensor amplifier/position) be movable.The activation of these sensor amplifiers constitutes the part of the average power consumption of processor 10.
By comparison, if processor 10 operates under the low power mode, then instruction buffer 70 can be imported single 32 instruction by 64 buses 50 from instruction cache 14 in the single cycle of processor 10.By this way, as an example, (32/road * 4 tunnel * 1, sensor amplifier/position is movable (other 128 sensor amplifiers of the instruction cache 14 of stopping using simultaneously) to 128 sensor amplifiers of instruction cache 14 in each cycle of processor 10.Be valuably, by only activating 128 sensor amplifiers (under low mode) to replace activating 256 sensor amplifiers (completely cutting under the rate mode), reduced the average power consumption of processor, because in each cycle of processor 10, only activate the sensor amplifier of half.
If entering low power mode, can more be reduced processor 10 response software events, because processor 10 as following the minimizing discussing in conjunction with Fig. 6 are instructed the quantity on " road " in cache 14 and the data cache 16 in this case.For example, if processor 10 response software events are reduced to two the tunnel to " road " number within the instruction cache 14 from four the tunnel, 64 sensor amplifiers (32/road * 2 tunnel * 1 sensor amplifier/position) of an activation instruction cache 14 (inactive simultaneously other 192 sensor amplifiers that instruct cache 14 in processor 10 operates in each cycle in low power mode two cycles of following time at above-mentioned processor 10 then.Be valuably, by only activating 64 sensor amplifiers (if processor 10 response software events enter special " economize on electricity " mode) rather than 256 sensor amplifiers (under the full power mode), can reduce the average power consumption of processor 10, because in each cycle of processor 10, only activate 1/4 sensor amplifier.
Fig. 4 is the conceptual illustration of ordering impact damper 76 of the sequencer unit 18 of this exemplary embodiment.As shown in Figure 4, the ordering impact damper 76 have five respectively label be the item of impact damper 0-4.Each has five main fields, that is, and and " instruction type " field, " GPR destination number " field, " FPR destination number " field, " finishing " field and " unusually " field.
Again with reference to figure 2, when scheduling logic 74 is given a performance element instruction scheduling, sequencer unit 18 this command assignment of being dispatched in the continuous item of ordering impact damper 76.Sequencer unit 18 is by instruction first in first out and that recycle design is dispatched every the distributing to of ordering impact damper 76 (or " link and be tied to "), thereby sequencer unit 18 first assignment item 0 are followed order assignment item 1-4, and then from item 0.When the command assignment of being dispatched during to the continuous item of ordering impact damper 76, the scheduling logic 74 outputs information relevant with being subjected to dispatch command is gone in each field that stores 76 relevant in ordering impact damper into and sub section.
For example, in the item 1 of Fig. 4, ordering impact damper 76 these instruction schedulings of indication are to FXU22.On the others of this exemplary embodiment, 1 also indicates this instruction of being dispatched to have a GPR destination register (thereby " GPR destination number "=1), has 0 FPR destination register (thereby " FPR destination number "=0), finish (thereby " finishing "=0), and do not cause unusual (thereby " unusually "=0) as yet.
Along with performance element is carried out the instruction of being dispatched, this performance element is revised the item relevant with this instruction in the ordering impact damper 76.More specifically, the execution of this instruction that is scheduled is finished in response, and this performance element is revised this " finishing " field (thereby " finishing "=1).Performance element runs into unusually in the implementation of dispatch command if be subjected at this, and this performance element is revised this " unusually " field (thereby " unusually "=1).
Fig. 4 shows that an allocation pointer 173 and one finish pointer 175.Processor 10 keeps these pointers with the read and write of control to ordering impact damper 76.Processor 10 keeps allocation pointer 173 to indicate whether that an item of ordering impact damper is distributed to (or " link and be tied to ") specific instruction.As shown in Figure 4, allocation pointer 173 is pointed to ordering buffer entries 3, thereby the item 3 of expression ordering impact damper is the item that the next one can be distributed to the ordering impact damper of an instruction.
And processor 10 keeps finishing pointer 175 and represents with (being the item of distributing to the ordering impact damper of a specific instruction in the past) whether this specific instruction satisfies following condition:
Condition 1-(scheduling to this instruction) performance element finishes the execution of this instruction;
Condition 2-does not run into unusual in any stage of handling this instruction; And
The instruction that condition 3-is scheduled before any all satisfies condition 1 and condition 2.
As shown in Figure 4, finish the item 1 that pointer 175 points to the ordering impact dampers, thus the item 1 of expression ordering impact damper is the next one can the satisfy condition item of 1,2 and 3 ordering impact damper.Therefore, " effectively " ordering buffer entries can be defined as by finish ordering buffer entries that pointer 175 points to and the item of the impact damper that respectively sorts before the ordering buffer entries that allocation pointer 173 is pointed to subsequently.
Refer again to Fig. 2, read the every of impact damper 76 that sort by sequencer unit 18 completion logics 80 and abnormal logic 82.According to " unusually " field of ordering impact damper 76, abnormal logic 82 is handled run in the execution process instruction that is scheduled unusual.According to " finishing " field and " unusually " field of ordering impact damper 76, completion logic 80 is to scheduling logic 70 and ordering impact damper 76 output signals.By these signals, completion logic 80 expressions are by the instruction of its programmed order " finishing ".If one following condition is satisfied in instruction, completion logic 80 indication " finishing " these instructions:
Condition 1-(scheduling to this instruction) performance element finishes the execution (thereby in ordering impact damper 76 in the continuous item of this instruction " finishing "=1) of this instruction;
Condition 2-handles and does not run into unusual (thereby in ordering impact damper 76 in the continuous item of this instruction " unusually "=0) in relevant arbitrary stage of this instruction; And
Condition 3-any instruction of scheduling in the past all satisfies condition 1 and condition 2.
According to the information in the ordering impact damper 76, scheduling logic 74 is determined other right quantity with dispatch command.
Fig. 5 is the rename conceptual illustration of impact damper 38 of floating-point.As shown in Figure 5, the impact damper 38 of renaming comprises four impact dampers of indicating with impact damper 0-3 respectively of renaming.Sequencer unit 18 is distributed to the instruction that (or " link and be tied to ") is scheduled to the impact damper 0-3 that renames by first in first out and recycle design, thereby sequencer unit 18 distributes the impact damper number 0 of renaming, then one after the other distribute the impact damper 1-3 that renames, and then from the impact damper number 0 of renaming.
With reference to Fig. 5, the impact damper 2 of renaming is distributed into and is the instruction storage destination operand information (Fig. 2) by scheduling logic 74 scheduling.Fig. 5 shows that an allocation pointer 180, write-back pointer 182 and one finish pointer 184.Processor 10 keeps these pointers with the read and write of control to the impact damper 38 of renaming.Processor 10 keeps allocation pointer 180 to indicate whether giving a specific instruction buffer allocation of renaming.As shown in Figure 5, allocation pointer 180 is pointed to the impact damper 3 of renaming, and impact damper 3 is the impact dampers of renaming that the next one can be used for distributing to an instruction thereby expression is renamed.
Processor 10 comprises also whether write-back pointer 182 can be redistributed to another instruction with indication (the distributing to a specific instruction in the past) impact damper of renaming.As shown in Figure 5, write-back pointer 182 points to the impact damper 2 of renaming, and impact damper 2 is that next processor 10 will be from the impact damper of renaming in wherein (being stored in " information " field of this impact damper of renaming Fig. 5) destination operand information being copied to (by defined in " register number " field of this impact damper of renaming among Fig. 5) FRPs36 thereby expression is renamed.
Therefore, response handler 10 is duplicated a specific instruction from this impact damper of renaming result's (destination operand information) to be storing in certain system register, and processor 10 makes write-back pointer 182 advance (impact damper of renaming distributing to this specific instruction before crossing).By this way, before processor 10 copied to an individual system register to the result of this specific instruction (destination operand information), processor 10 kept the impact damper of renaming of a distribution to store this result.
In addition, processor 10 keeps finishing pointer 184 and indicates this specific instruction whether to satisfy following condition with (for distributing to the impact damper of renaming of a certain specific instruction in the past):
Condition 1-(scheduling to this instruction) performance element finishes the execution of this instruction;
Condition 2-handles any unusual about not running in the stage of this instruction; And
The instruction of scheduling satisfied condition 1 and condition 2 before condition 3-was any.
As shown in Figure 5, finish pointer 184 and point to the impact damper 2 of renaming, impact damper 2 is can satisfy condition 1,2,3 the impact dampers of renaming of the next one thereby expression is renamed.In this exemplary embodiment, processor 10 and the result who whether duplicates this instruction from this impact damper of renaming irrespectively keep finishing pointer 184 to store an individual system register into.
Therefore, " more name entry " can be defined as by finishing the impact damper of respectively renaming rename impact damper and its back and before the impact damper of renaming that allocation pointer 180 is pointed to that pointer 184 points to." write-back inlet " can be defined as the impact damper of respectively renaming rename impact damper and its back and before the impact damper of renaming of finishing pointer 184 sensings that is pointed to by write-back pointer 182.The write-back item is being stored " completed " and their result do not copy to the instruction of system register as yet from the impact damper of renaming result, for example because the write port of these system registers can't obtain.
Conceptive, the write-back item is being renamed between item and the system register.If can obtain a write-back port in the stage of finishing, it is useful then making a result walk around the write-back inlet and directly write in the system register.The name entry is similar in addition and more, is stipulating under the situation of an individual system register relevant with information one of a performance element execution, and processor 10 operation write-backs inlets are to output to information this performance element.
FPU30 is to comply with the IEEE754 standard entirely on single precision (promptly 32) operand and double precision (promptly 64) operand.Therefore, in order to support double-precision arithmetic, rename " information " field of impact damper of each of Fig. 5 is 64 bit wides.Refer again to Fig. 1, if processor moves under the full power mode, then LSU28 (respond it and carry out " sending number " instruction towards FPRs36) is encased in 64 information in " information " field of (the impact damper 38 of renaming) single impact damper of renaming from data cache 16 in single cycle of processor 10 (promptly phase 64 information) weekly.By this way, as a kind of example, because being four-ways, data cache 16 becomes the set associative buffer memory, then 256 of activation data cache 16 sensor amplifiers (64/road * 4 tunnel * 1 sensor amplifier/position) at least in the cycle of this single processor 10.The excitation of these sensor amplifiers is influential to the average power consumption of processor 10.
By contrast, if processor 10 moves under low power mode, then LSU28 (respond it and carry out " sending number " instruction to FPRs36) is encased in 64 information in " information " field of (the impact damper 38 of renaming) single impact damper of renaming from data cache 16 in two cycles of processor 10 (being each cycle 32 information).By this way, as an example, 128 sensor amplifiers (32/road * 4 tunnel * 1 sensor amplifier/position) of (in other 128 sensor amplifiers of data cache 16 of stopping using) activation data cache 16 in each cycle in this two cycles of processor 10.Useful is, (under particular form) only encourages 128 sensor amplifiers rather than (under the full power mode) 256 sensor amplifiers, reduced the average power consumption of processor 10, because only encourage the sensor amplifier of half in each cycle of processor 10.
Can obtain more minimizing if enter special " economize on electricity " mode according to software event processor 10 because in this case as hereinafter further discussing in conjunction with Fig. 6 processor 10 reduce to instruct the quantity on " road " in cache 14 and the data cache 16.For example, if processor 10 reduces to two the tunnel to the quantity on " road " in the data cache 16 from four the tunnel according to software event, 64 sensor amplifiers (32/road * 2 tunnel * 1 sensor amplifier/position) of only excitation (other 192 sensor amplifiers of the data cache 16 of stopping using) in processor 10 runs on each cycle in above-mentioned two cycles of special power mode processor of following time 10 then.Be valuably, only encourage 64 sensor amplifiers to substitute 256 sensor amplifiers of (under the full power mode) excitation by (when processor 10 enters special " economize on electricity " mode according to software event), reduced the average power consumption of processor 10, because in each cycle of processor 10, only encourage the sensor amplifier of 1/4 quantity.
Fig. 6 is the calcspar of instruction cache 14.Fig. 6 instructs cache 14 same representative data cachies 16.Instruction cache 14 and data cache 16 respectively are a 16K byte, four tunnel associative cache in groups.Instruction cache 14 and data cache 16 address according to physics (i.e. " truly ") address.
Thereby Fig. 6 represents steering logic 100, and it comprises a memory management unit (" MMU ") that is used for effective address is transformed into the related physical address.For example, effective address is to receive from the fetch logic 71 (Fig. 2) of sequencer unit 18.In this exemplary embodiment, the position 2 of effective address 0To position 2 11Be transformed into its relevant physical address unchangeably, thus the position 2 of effective address 0To position 2 11 Position 2 with relevant physical address 0To position 2 11Has identical digital logic value.
As shown in Figure 6, instruction cache 14 respectively logically is arranged in 128 identical levels (that is group) with data cache 16.As an example, for instruction cache 14, each group has four the line related groups (that is, four " roads ", piece 0-3) that prepare that presort separately in instruction cache 14.Every circuit can be stored the ensemble of separately an address mark, state hyte separately (for example comprising one " effectively " position) and eight words separately.Each word has 4 bytes (promptly 32).
Like this, the piece 3 of group 0 can be stored an address mark Address Tag 03Mode bit State 03And word W 030To W 037Similarly, the piece y of each group x can store an Address Tag Xy, mode bit state XyAnd word W Xy0To W Xy7, wherein x is a variable integer group number of scope from 0 to 127, and y is the variable integer piece number of scope from 0 to 3.
Group is by the position 2 of physical address 5To position 2 11Regulation.Like this, each group comprises a plurality of addresses, and seven identical physical address bits 2 are shared in all addresses 5To 2 11Thereby in any single moment, instruction cache 14 is maximum four the physical address canned datas that belong to a particular group x, and by the Address Tag in relevant four line groups that are stored in group x in the instruction cache 14 X0To Address Tag X3Regulation.
For example, (a) in the piece 0 of group 0, instruction cache 14 can be stored the position 2 that is comprising first address 12To 2 31Address Tag 00, (b) in the piece 1 of group 0, instruction cache 14 can be stored the position 2 that is comprising second address 12To position 2 31Address Tag 01, (c) in the piece 2 of group 0, instruction cache 14 can be stored the position 2 that is comprising three-address 12To position 2 31Address Tag 02, (d) in the piece 3 of group 0, instruction cache 14 can be stored and comprise four address position 2 12To position 2 31Address Tag 03Each Address Tag like this XyHave 20.
Fig. 7 is the schematic circuit diagram of reading amplifying circuit of the instruction cache 14 of Fig. 6 of briefly representing at 121 places.Reading amplifying circuit 121 is the sensor amplifier 104,106 of Fig. 6 and a kind of representative sensor amplifier of 108a-h.Thereby, because each Address Tag XyHave 20, sensor amplifier 104 comprises 80 each sensor amplifiers (20 sensor amplifiers multiply by 4 piece 0-3s) identical with circuit 121 in fact.
Similarly, (wherein z is the font size of an integer of from 0 to 7) has 32 because each word Wxyz, among the sensor amplifier 108a-h each respectively comprise 128 be one group wherein each in fact with circuit 121 identical reading put device (32 sensor amplifiers multiply by 4 piece 0-3), thereby sensor amplifier 108a-h contains the sensor amplifier (128 sensor amplifier * 8 word 0-7) that adds up to 1024.Equally, the quantity of the sensor amplifier in the sensor amplifier 106 equals 4 and multiply by each State XyFigure place, and each sensor amplifier wherein is identical with circuit 121 in fact.
Each address mark Address Tag of 20 XyComprise that respectively 20 is dynamic random storage (" DRAM ") unit of one group, but the single position in each DRAM unit storing digital information.Similar, each word Wxyz of 32 comprises that respectively 32 is one group DRAM unit, but the single position in each DRAM unit storing digital information.Equally, each State XyIn the quantity of DRAM unit equal each State XyFigure place.
80 sensor amplifiers of sensor amplifier 104 are compiled into, and each group is representing Address Tag in (1) 128 group X0Each is connected 20 DRAM unit of (wherein x is a group number) with 20 sensor amplifiers of (sensor amplifier 104) first group, and that respectively organizes in (2) 128 groups is representing Address Tag X120 DRAM unit each is connected with 20 sensor amplifiers of (sensor amplifier 104) second group, that respectively organizes in (3) 128 groups is representing Address Tag X220 DRAM unit each is connected with 20 sensor amplifiers of (sensor amplifier 104) the 3rd group, and respectively organize in (4) 128 groups representing Address Tag X320 DRAM unit each be connected with 20 sensor amplifiers of (sensor amplifier 104) four group.
Therefore, each sensor amplifier of sensor amplifier 104 is connected with 128 relevant DRAM unit families, and this DRAM unit family of 128 is Address Tag XyThe position 2 qStoring digital information, wherein: (a) q is one from 0 to 9 a constant item (promptly all the DRAM unit to this family are all identical), (b) x is variable (promptly all DRAM unit of this family are different) group number of one from 0 to 127, and (c) y is one from 0 to 3 a invariant block number.
As shown in Figure 7, each sensor amplifier (circuit 121) has one and enables line.Referring to Fig. 6, enable line 102 and comprise always being that 4 address mark is enabled line (promptly 4 * 1 address mark is enabled line/piece).Every address mark is enabled line on steering logic 100 is connected to a relevant group the four group sensor amplifier 104, and group and gang that wherein should be relevant store Address Tag XyThe DRAM unit of numerical information connect, wherein (a) x is the group number of from 0 to 127 variation, and (b) y is from 0 to 3 a constant piece number.
1024 sensor amplifiers of sensor amplifier 108a-h are compiled into, 256 DRAM unit representing Wx0z of each group are connected with multiplexer 114a through bus 120a respectively by 256 sensor amplifiers of (sensor amplifier 108a-h's) first crowd in (1) 128 group, 256 DRAM unit representing Wx1z of each group are connected with multiplexer 114b through bus 120b respectively by 256 sensor amplifiers of (sensor amplifier 108a-h's) second crowd in (2) 128 groups, 256 DRAM unit representing Wx2z of each group are connected with multiplexer 114C through bus 120C respectively by 256 sensor amplifiers of (sensor amplifier 108a-h's) the 3rd crowd in (3) 128 groups, and 256 sensor amplifiers that 256 DRAM unit representing Wx3z of each group pass through (sensor amplifier 108a-h's) four group in (4) 128 groups are connected with multiplexer 114d through bus 120d respectively.
Thus, each sensor amplifier of (sensor amplifier 108a-h) is connected with 128 relevant DRAM unit families, and this DRAM unit family of 128 is 2 of word Wxyz qStoring digital information, wherein, (a) q is one from 0 to 31 a constant item, and (b) x is from 0 to 127 variable group number, and (c) y is from 0 to 3 a constant piece number, and (d) z is from 0 to 7 constant font size.
Enable line 102 and comprise that line (that is, line/word enabled in 4 * 8 words/piece * 1 word) enabled in 32 word altogether.Every word is enabled on line is connected to sensor amplifier 108a-h from steering logic 100 4 groups the relevant subgroup, this relevant subgroup is connected with a DRAM unit family that is storing the numerical information of word Wxyz, wherein (a) x is a variable group number of from 0 to 127, (b) y is a constant piece of from 0 to 3 number, and (c) z is a constant font size of from 0 to 7.
Equally, sensor amplifier 106 is compiled into, each representative State that organizes in (1) 128 group X0The first group of sensor amplifier in the DRAM unit of (wherein x is a group number) and (sensor amplifier 106) is connected, each representative State that organizes in (2) 128 groups X1The second group of sensor amplifier in DRAM unit and (sensor amplifier 106) be connected representative State of each group in (3) 128 groups X2The 3rd group of sensor amplifiers in DRAM unit and (sensor amplifier 106) be connected representative State of each group in (4) 128 groups X3The DRAM unit be connected with (sensor amplifier 106) four group sensor amplifier.
Thus, (sensor amplifier 106) each sensor amplifier and one are State XyThe position 2 qRelevant 128 DRAM unit families of storing digital information connect, and wherein: (a) q is a constant item, and (b) x is a variable group number of from 0 to 127, and (c) y is one from 0 to 3 a constant piece number.
Enable line 102 and comprise that 4 bar states are enabled line (promptly, 4 * 1 state is enabled line/piece) altogether.Every bar state is enabled line is connected to sensor amplifier 106 from steering logic 100 one of four group relevant group, and this relevant group and one is used to store state XyThe DRAM unit family of numerical information connect, wherein (a) x is one from 0 to 127 a variable group number, and (b) y is one from 0 to 3 a constant piece number.
See also Fig. 7, the unit warp D of each sensor amplifier (circuit 121) from its 128 the DRAM unit that link and the different voltage of line D input, wherein this DRAM unit be according to from steering logic 100 according to group number (promptly according to the position 2 of above-mentioned address 5To 2 11) logic state of control line 124 of output selects.If ENABLE has the logical one state, then exciting circuit 121; Otherwise exciting circuit 121.If exciting circuit 121, processor 10 consumes less power, and output node OUT has high impedance status.If the voltage that the voltage of D is higher than D simultaneously circuit 121 encourages, then the voltage that has of OUT is substantially equal to Vdd (that is logical one state).Compare with it, encourage if the voltage of D is lower than the voltage while circuit 121 of D, then OUT has the voltage (that is logical zero state) that is substantially equal to GND.
By bus 110, at steering logic 100 and address mark Address Tag XyBetween the transfer address label information, and at steering logic 100 and State XyBetween transferring status data.By bus 116,118 and 120a-d, in transport of instruction information between steering logic 100 and the Wxyz (perhaps data information under the situation of data cache 16).
Under the instruction fetch operation of example, steering logic 100 is effective address of 18 receptions from the sequencer unit.Position 2 according to the effective address that receives 5To position 2 11 Steering logic 100 is determined a specific group x (as discussed above), and steering logic 100 is from the piece 0-3 input information of group x.More specifically, by bus 110, steering logic 100 is read four address marks: Address Tag X0, Address Tag X1, Address Tag X2And Address Tag X3, and each one of four states relevant with them: state X0, state X1, state X2, and state X3
In addition, steering logic 100 is exported the position 2 of the effective address that receives to multiplexer 114a-d by control line 122 3With position 2 4According to the logic state of control line 122, the double word that multiplexer 114a selects from the piece 0 of group x to bus 118 outputs.For example, select this double word from following: (a) represent the DRAM unit of Wx00 and Wx01, (b) represent the DRAM unit of Wx02 and Wx03, (c) represent the DRAM unit of Wx04 and Wx05, perhaps (d) represents the DRAM unit of Wx06 and Wx07.
Equally, logic state according to control line 122, the double word that multiplexer 114b selects from the piece 1 of group x to bus 118 outputs, the double word that multiplexer 114c selects from the piece 2 of group x to bus 118 outputs, and multiplexer 114d exports the double word of selecting to bus 118 from the piece 3 of group x.By bus 118, all four double words that multiplexer 112 receives from multiplexer 114a-d.
The MMU of steering logic 100 converts the effective address that receives to concrete physical address.Steering logic 100 is the position 2 of this concrete physical address 12To position 2 31With any effective Address Tag from bus 110 XyCompare.Address Tag XyValidity be by with Address Tag XyRelated state XyIn the digital logic value indication of " effectively " position.According to this comparison, if the position 2 of this concrete physical address 12To position 2 31Any effective Address Tag XyCoupling, then steering logic 100 by control line 126 to multiplexer 112 output appropriate control signals, thereby multiplexer 112 (by bus 116 to steering logic 100) output one of the following: if (a) and Address Tag X0Coupling is from multiplexer 114d output double word, if (b) and Address Tag X1Coupling is from multiplexer 114b output double word, if (c) and Address Tag X2Coupling is from multiplexer 114c output double word, if (d) and Address Tag X3Coupling is from multiplexer 114d output double word.
Thereby steering logic 100 is from double word of multiplexed road 112 inputs.If processor 10 operates under the full power mode, then as the part of this example instruction fetch operation, steering logic 100 outputs to sequencer unit 18 to this double word from multiplexer 112.
By comparison, if processor 10 moves under special power mode, then steering logic 100 from multiplexer 112 to 18 of sequencer unit output individual character (be double word half).Steering logic 100 is according to the position 2 of effective address 2Select individual character.This be because, if processor 10 operates under the special power mode, instruction buffer 70 (Fig. 3) can only be by 64 buses 50 from single 32 bit instructions of instruction cache 14 inputs (rather than too many by two 32 bit instructions) in the single cycle of processor 10.
Under low power mode, steering logic 100 is exported appropriate signals enabling on the line 102, thereby only encourages the selected subgroup of sensor amplifier 108a-h, and the non-selected subgroup of the sensor amplifier 108a-h that stops using simultaneously.The subgroup of selecting is the sensor amplifier that the DRAM unit of those and the numerical information of storing word Wxyz is connected, wherein (a) x is one from 0 to 127 a variable group number, (b) y is one from 0 to 3 a variable piece number, and (c) z is one from 0 to 7 position 2 according to effective address 2To 2 4Selected constant font size.By this way, the sensor amplifier of less (with respect to the full power mode) instruction cache 14 of processor 10 excitation in each cycle of processor 10, thus reduce (and processor 10) thus average power consumption of instruction cache 14.
If entering special " economize on electricity " mode, also can more be reduced processor 10 response software events (being that SPS has the logical one state), because steering logic 100 reduces to two the tunnel to the quantity on " road " within the instruction cache 14 from four the tunnel in this case.Thereby when processor 10 operates in special " economize on electricity " mode following time, steering logic 100 operates under a kind of mode, and this mode is guaranteed Address Tag X2With Address Tag X3Be invalid (respectively by state X2And state X3In the Digital Logic value representation of " effectively " position) and only allow Address Tag X0And AddressTag X1Be effectively (respectively by state X0And state X1" effectively " position the Digital Logic value representation).
Therefore, for the instruction fetch operation under this situation, encourage 64 sensor amplifiers (1 word/road * 4 byte/word * 8/byte * 2 tunnel * 1 sensor amplifier/position), rather than as top 256 sensor amplifiers (2 words/road * 4 byte/word * 8/byte * 4 tunnel * 1 sensor amplifier/position) that encourage discussing in conjunction with Fig. 2 and Fig. 3.This advantageously obviously reduces instruction cache 14 (processor 10 thus) average power consumption.
Equally, if processor 10 response software events enter special " economize on electricity " mode, processor 10 reduces to two the tunnel to " road " in the data cache 16 number from four the tunnel.Therefore, for the operation of packing into of the LSU28 floating-point under this situation, encourage 64 sensor amplifiers (1 word/road * 4 byte/word * 8/byte * 2 tunnel * 1 sensor amplifier/position), rather than as top 256 sensor amplifiers (2 words/road * 4 byte/word * 8/byte * 4 tunnel * 1 sensor amplifier/position) that encourage discussing in conjunction with Fig. 5.This advantageously obviously reduces data cache 16 (thereby processor 10) average power consumption.
To the logical one state software event appears from logical zero state transition at SPS constantly.Response CFXU26 carries out the instruction towards a MTSPR of a pre-determined bit of " HID0 " register of SPSs40, and SPS is to logical one state transition.The logical one state of the one MTSPR instruction regulation SPS.
Before nestling up MTSPR instruction, (in order to reduce the circuit complexity in the processor 10) only instructed and is good at thereafter one " instruction synchronously " (" ISYNC ") for " synchronously " (" SYNC ") instruction of software dictates.Nestle up MTSPR instruction, for another ISYNC instruction of software dictates is good.
As top just discussed, if processor 10 response software events (being that SPS has the logical one state) enter special " economize on electricity " mode, processor 10 reduces to two the tunnel to " road " within instruction cache 14 and the data cache 16 number from four the tunnel.Thereby, nestle up before (in the front of MTSPR instruction) SYNC instruction, importantly to be software dictates " the data cache piece refreshes " (" DCBF ") instruction and " instruction cache store counterfoil is invalid " (" ICBI ") instruction.
Similarly, the logical zero state is moved in the MTSPR instruction of the pre-determined bit of " HID0 " register of second sensing of SPS response CFXU26 execution SPRs40.The logical zero state of this second MTSPR instruction regulation SPS.Nestle up before second MTSPR instruction, be good for SYNC instruction of software dictates and an ISYNC thereafter instruct.Nestle up second MTSPR instruction, for another ISYNC instruction of software dictates is good.
Effective address of DCBF instruction regulation.According to the DCBF instruction, if storing the information (for example data) at this effective address place on any line of data cache 16, processor 10 is by removing State XyIn " effectively " position of this line this line was lost efficacy.If should inefficacy line canned data be to go up the information that information had been revised in the same physical address of storer 39 (Fig. 1) (according to the effective address conversion) with respect to original storage, then and then respond this DCBF instruction, processor 10 is by updated stored device 39 on the same, physical that the information of this modification is copied to storer 39 from data cache 16.Before SPS is moved to the logical one state, importantly for the enough DCBF of software dictates instruction to guarantee that all lines in the piece 2 and 3 of all 128 groups of data cache 16 are invalid.
Effective address of ICBI instruction regulation.Response IC BI instruction, if storing the information (for example instruction) at this effective address place on any line within the instruction cache 14, processor 10 is by removing state XyIn this line " effectively " position make this line invalid, before SPS is moved to the logical one state, importantly to instruct to guarantee that the piece 2 of all 128 groups in the instruction cache 14 and all lines in the piece 3 were all lost efficacy for the enough ICBI of software dictates.
SYNC instruction provides a kind of ranking function, to influence all performed instructions of processor 10.Carrying out under the SYNC instruction, carry out any instruction (instruction in the programming instruction order that promptly will carry out after this SYNC instruction) in succession at processor 10, processor guarantees that the instructions (remove " contact is packed into " operation and instruction take out) before all finish entirely, and the instructions (i.e. the instruction before this SYNC instructs in the programming instruction order that will carry out) that perhaps reach at least before these now can not cause unusual stage.
When processor 10 was finished this SYNC instruction, processor 10 will carry out the external reference (with respect to all other mechanisms of reference-to storage 39) that all were started before this SYNC instruction by processor 10.In addition, processor 10 will finish that the instruction before its response starts all pack into and memory buffers/bus activity.Before " contact of data cache piece " (" DCBT ") instruction and " for the data cache piece contact of storage " (" DCBTST ") instruction of finishing before all, perhaps at least by address translation but whether do not consider that before finishing these DCBT and DCBTST instruction on the system bus 11, processor 10 is postponed finishing of SYNC instruction.In " the PowerPC 603e RISC microprocessor user manual " indicated hereinbefore SYNC, DCBT and DCBTST instruction there is more complete explanation.
The instruction (i.e. each instruction before this ISYNC instruction in the programming instruction order that will carry out) before it can finish all is waited in response ISYNC instruction, processor 10.Then, processor 10 is abolished any instruction of having taken out, thus by before the context set up of instruction under take out (or refetching) and carry out later instruction.The execution of 10 pairs of ISYNC instructions of processor does not impact other processor or their cache.
In carrying out the ISYNC instruction, processor 10 realizes refetching serialization.By this way, carry out any instruction instruction of ISYNC instruction back (in the programming instruction order that promptly will carry out) in succession before at processor 10, processor 10 is guaranteed: (a) finish the instruction of all fronts, the instruction that reaches these fronts at least can not cause unusual stage again; And (b) finish storage operations before all, finish address mapping at least.The instruction of these back is subjected to whole influences of the instruction of front.ISYNC instruction be obtain on the context synchronous.
Hardware event appears in the moment at HPS from logical zero state transition to the logical one state.The action enumerated is below carried out in the response migration of HPS from the logical zero state to the logical one state, processor 10.
1. sequencer unit 18 (Fig. 1) instruction any co-pending in the instruction buffer 70 (Fig. 3) of each performance element of not being dispatched to processor 10 as yet of dying young.
2. processor 10 instruction any co-pending in the performance element (transfering part 20, FXU22, CFXU26, LSU28 and FPU30) of dying young, thus these instructions co-pending do not carried out.On aspect this, LSU28 (Fig. 1) any storage instruction co-pending that does not cause information stores of dying young to data cache 16.For example, in this exemplary embodiment, LSU28 comprises a storage queue.Therefore, the LSU28 storage request any co-pending in this storage queue of dying young, thus do not carry out these storage requests co-pending.
3. processor 10 makes all of impact damper 34 and 38 (Fig. 1) lining of renaming invalid.For example, processor 10 moves back to write pointer 182 (Fig. 5) and finishes pointer 184, so that write-back pointer 182 and the identical buffer entries of renaming of finishing pointer 184 sensings and allocation pointer 180 sensings.
4 sequencer unit 18 (Fig. 2) keep the address of the instruction of finishing pointer 175 (Fig. 4) sensing of ordering impact damper 76.Then, processor 10 is finished pointer 175 and makes 76 li in ordering impact damper all are invalid by moving, and pointer 175 points to and the identical ordering buffer entries of allocation pointer 173 sensings thereby finish.
After processor 10 carried out above-named action, fetch logic 71 (Fig. 2) was restarted instruction fetch, and the place, address that is kept by sequencer unit 18 that (in the 4th action promptly) discussed in the preceding paragraph begins.
In a kind of embodiment of selecting fully, no matter processor 10 is to enter low mode according to software event or according to hardware event, in case processor 10 enters low mode, processor 10 just reduces to two the tunnel to the quantity on " road " of instruction cache 14 and data cache 16 from four the tunnel.In this embodiment of selecting fully, HPS is from logical zero state transition to the logical one state in response:
(1) steering logic 100 (Fig. 6) guarantee to instruct in the piece 2 of all 128 groups in the cache 14 and the piece 3 wired be invalid, thereby state X2And state X3" effectively " position be eliminated;
(2) same, the steering logic of data cache 16 guarantee in the piece 2 of all 128 groups in the data cache 16 and the piece 3 wired be invalid; And
(3) if the information that the line of this inefficacy in the data cache store 16 is being stored is the information of being revised with respect to the raw information on the same physical address (being converted to according to valid data) that is stored in storer 39 (Fig. 1) by processor 10, processor 10 is by copying to storer 39 interior same physical address and updated stored device 39 to the information of this modification from data cache 16.
100 pairs of steering logics are carried out least recently used (" LRU ") at instruction cache 14 stored fresh informations and are replaced policy.By this way, data cache 16 is identical with instruction cache 14 substantially.With respect to instruction cache 14, data cache 16 is also supported the write operation of 10 pairs of data cachies 16 of processor.Processor 10 can be carried out write operation on benchmark byte-by-byte, half-word, individual character or double word.In addition, processor 10 can be finished the whole read-modify-write operation to data cache 16 in the single cycle of processor 10.Data cache 16 optionally operates in write-back (write-back) or directly writes under (write-through) mode, and it is also fulfiled the cache ability, writes strategy and the control of storer continuity of benchmark and operation benchmark page by page.
Although describe a kind of exemplary embodiment and advantage thereof above in detail, only be that illustrate as an example rather than restrictive.Under the prerequisite of range of the present invention, scope and spirit, can carry out various modifications, substitute and alternate this exemplary embodiment.

Claims (20)

1. reduce the method for power consumption in electronic circuit in the processor system, comprising:
When the circuit of packing into moved under the full power mode, in each cycle of the described circuit of packing into, from storer maximum N data bit of packing into, wherein N was integer and N>1; And
When the circuit of packing into moved under low power mode, in each cycle of the described circuit of packing into, from described storer maximum M data bit of packing into, wherein M was integer and 0<M<N.
2. the method for claim 1 is characterized in that, the described circuit of packing into of response software event enters under described low power mode and moves.
3. the method for claim 1 is characterized in that, the described circuit of packing into of response hardware event enters under described low power mode and moves.
4. the method for claim 3 is characterized in that, occurs described hardware event when the temperature of the described circuit of packing into surpasses a threshold temperature.
5. the method for claim 4 is characterized in that, described threshold temperature is the maximum safe temperature that the described circuit of packing into moves under described full power mode.
6. the method for claim 1 is characterized in that, described storer is a cache.
7. the method for claim 6 is characterized in that, described cache is a data cache.
8. the method for claim 6 is characterized in that, this method comprises the sensor amplifier group of other data bit outside the described circuit of packing into operates in the described low power mode inactive described M of a storage data bit that links to each other with the unit of described cache of following time.
9. the method for claim 1 is characterized in that, a described N data bit is the double-precision floating point operand.
10. the method for claim 1 is characterized in that, a described M data bit is the single-precision floating point operand.
11. reduce the circuit of power consumption in electronic circuit in the processor system, comprising:
The circuit of packing into is used for:
When the described circuit of packing into operates in full power mode following time, in each cycle of the described circuit of packing into, from storer N the data bit of packing into, wherein N is integer and N>1; And
When the described circuit of packing into operates in low power mode following time, in each cycle of the described circuit of packing into, from described storer M the data bit of packing into, wherein M is integer and 0<M<N.
12. the circuit of claim 11 is characterized in that, the described circuit of packing into of response software event enters under described low power mode and moves.
13. the circuit of claim 11 is characterized in that, the described circuit of packing into of response hardware event enters under described low power mode and moves.
14. the circuit of claim 13 is characterized in that, occurs described hardware event when the temperature of this circuit of packing into of want surpasses a threshold temperature.
15. the circuit of claim 14 is characterized in that, described threshold temperature is the maximum safe temperature of the described circuit of packing into when moving under described full power mode.
16. the circuit of claim 11 is characterized in that, described storer is a cache.
17. the circuit of claim 16 is characterized in that, described cache is a data cache.
18. the circuit of claim 16, it is characterized in that this circuit comprises when the described circuit of packing into and operates in described low power mode following time be used for stopping using circuit of sensor amplifier group of other data bit outside the described M of the storage data bit that links to each other with the unit of described cache.
19. the circuit of claim 11 is characterized in that, a described N data bit is the double-precision floating point operand.
20. the circuit of claim 11 is characterized in that, a described M data bit is the single-precision floating point operand.
CN97117939A 1996-10-04 1997-09-03 System and method for reducing power consumption in electronic circuit Expired - Fee Related CN1091274C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US72639696A 1996-10-04 1996-10-04
US726396 1996-10-04
US726,396 1996-10-04

Publications (2)

Publication Number Publication Date
CN1180194A CN1180194A (en) 1998-04-29
CN1091274C true CN1091274C (en) 2002-09-18

Family

ID=24918440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN97117939A Expired - Fee Related CN1091274C (en) 1996-10-04 1997-09-03 System and method for reducing power consumption in electronic circuit

Country Status (5)

Country Link
JP (1) JP3048978B2 (en)
KR (1) KR100260865B1 (en)
CN (1) CN1091274C (en)
GB (1) GB2317975B (en)
SG (1) SG64433A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1050801B1 (en) * 1999-05-03 2006-12-13 STMicroelectronics S.A. An instruction supply mechanism
GB2539038B (en) 2015-06-05 2020-12-23 Advanced Risc Mach Ltd Processing pipeline with first and second processing modes having different performance or energy consumption characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241680A (en) * 1989-06-12 1993-08-31 Grid Systems Corporation Low-power, standby mode computer
US5420808A (en) * 1993-05-13 1995-05-30 International Business Machines Corporation Circuitry and method for reducing power consumption within an electronic circuit
US5452401A (en) * 1992-03-31 1995-09-19 Seiko Epson Corporation Selective power-down for high performance CPU/system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2297398B (en) * 1995-01-17 1999-11-24 Advanced Risc Mach Ltd Accessing cache memories

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241680A (en) * 1989-06-12 1993-08-31 Grid Systems Corporation Low-power, standby mode computer
US5452401A (en) * 1992-03-31 1995-09-19 Seiko Epson Corporation Selective power-down for high performance CPU/system
US5420808A (en) * 1993-05-13 1995-05-30 International Business Machines Corporation Circuitry and method for reducing power consumption within an electronic circuit

Also Published As

Publication number Publication date
KR100260865B1 (en) 2000-07-01
JPH10124203A (en) 1998-05-15
GB2317975B (en) 2001-09-12
CN1180194A (en) 1998-04-29
JP3048978B2 (en) 2000-06-05
KR19980032289A (en) 1998-07-25
GB2317975A (en) 1998-04-08
GB9716260D0 (en) 1997-10-08
SG64433A1 (en) 1999-04-27

Similar Documents

Publication Publication Date Title
CN1099076C (en) System and method for reducing power consumption in electronic circuit
CN1157658C (en) System and method for reducing power consumption in electronic circuit
JP2793488B2 (en) Method and system for dispatching multiple instructions in a single cycle in a superscalar processor system
EP1390835B1 (en) Microprocessor employing a performance throttling mechanism for power management
US8423750B2 (en) Hardware assist thread for increasing code parallelism
US8381004B2 (en) Optimizing energy consumption and application performance in a multi-core multi-threaded processor system
US7657708B2 (en) Methods for reducing data cache access power in a processor using way selection bits
US5694565A (en) Method and device for early deallocation of resources during load/store multiple operations to allow simultaneous dispatch/execution of subsequent instructions
US8145887B2 (en) Enhanced load lookahead prefetch in single threaded mode for a simultaneous multithreaded microprocessor
JP2777535B2 (en) Method and system for indexing allocation of intermediate storage buffers in a superscalar processor system
CN102362257A (en) Tracking deallocated load instructions using a dependence matrix
US7650465B2 (en) Micro tag array having way selection bits for reducing data cache access power
US20080229065A1 (en) Configurable Microprocessor
CN1224871A (en) Method and system for handling multiple store instruction completions in processing system
US20080229058A1 (en) Configurable Microprocessor
CN1091274C (en) System and method for reducing power consumption in electronic circuit
CN1095559C (en) System and method for reducing power consumption in electronic circuit
US6895497B2 (en) Multidispatch CPU integrated circuit having virtualized and modular resources and adjustable dispatch priority
Diefendorff History of the PowerPC architecture
US6988121B1 (en) Efficient implementation of multiprecision arithmetic
Fujiwara et al. A custom processor for the multiprocessor system ASCA
US7827389B2 (en) Enhanced single threaded execution in a simultaneous multithreaded microprocessor
EP4185953A1 (en) Register renaming for power conservation
Marsala et al. PowerPC processors

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20020918