CN101916181A - Microprocessor and manner of execution thereof - Google Patents

Microprocessor and manner of execution thereof Download PDF

Info

Publication number
CN101916181A
CN101916181A CN2010102603448A CN201010260344A CN101916181A CN 101916181 A CN101916181 A CN 101916181A CN 2010102603448 A CN2010102603448 A CN 2010102603448A CN 201010260344 A CN201010260344 A CN 201010260344A CN 101916181 A CN101916181 A CN 101916181A
Authority
CN
China
Prior art keywords
microprocessor
soon
row
storage
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102603448A
Other languages
Chinese (zh)
Other versions
CN101916181B (en
Inventor
G·葛兰·亨利
罗德尼·E·虎克
柯林·艾迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/781,210 external-priority patent/US8392693B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101916181A publication Critical patent/CN101916181A/en
Application granted granted Critical
Publication of CN101916181B publication Critical patent/CN101916181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions

Abstract

Microprocessor and manner of execution thereof.This microprocessor comprises that a memory cache and seizes line.Seize line and relate to the storage address that one of this storer is got row soon in order to appointment.Seizing line indication microprocessor begins to carry out a homodyne and reads invalid unusual fluctuation (zero-beatread-invalidate transaction) to obtain the entitlement that this gets row soon on bus.If microprocessor judges can cause exception handler (exception) to a storage operation of getting row soon, when line was seized in the microprocessor execution, the homodyne that microprocessor is abandoned beginning to carry out on bus read invalid unusual fluctuation.

Description

Microprocessor and manner of execution thereof
Technical field
The present invention relates to a kind of instruction set architecture of microprocessor, particularly a kind of instruction that serial data is stored in storer.
Background technology
Program is generally used the REP STOS of x86 instruction set to instruct and is removed (scrub) storer, for example fill up storer or a large amount of identical datas are write to video buffer with " 0 ".Specified at register ECX with relative big of the data volume that writes, make many pages of getting row or even many storeies soon be written into.For processor, this purpose is to write in order to carry out as much as possible fast.Generally speaking, the storer that is written into has write-back (write-back) memory characteristics, promptly is that it can write and can get soon.If storage area (memory area that promptly is written into) has hit in memory cache, and is miss in memory cache by comparison with storage area, REP STOS instruction will be carried out more quickly.This is because processor must distribute the miss row of getting soon, promptly obtains to get the entitlement (gain ownership) of row soon and it is advanced memory cache by memory read, and this causes many relatively time loss.
Summary of the invention
In a viewpoint, the invention provides a kind of microprocessor, it couples a storer by a bus.Microprocessor comprises that a memory cache and seizes line.Seize line and relate to the storage address that one of this storer is got row soon in order to appointment.Seizing line indication microprocessor begins to carry out a homodyne and reads invalid unusual fluctuation (zero-beat read-invalidate transaction) to obtain the entitlement that this gets row soon on bus.If microprocessor judges can cause exception handler to a storage operation of getting row soon, when line was seized in the microprocessor execution, the homodyne that microprocessor is abandoned beginning to carry out on bus read invalid unusual fluctuation.
In another viewpoint, the invention provides a kind of manner of execution, carry out by a microprocessor.Microprocessor couples a storer by a bus.This manner of execution comprises that receiving one seizes line to carry out.Seizing line is used for specifying and relates to the storage address that one of storer is got row soon.This manner of execution also comprises judging whether can cause exception handler to a storage operation of getting row soon according to the reception of seizing line.If this manner of execution also comprises the access to getting row soon and can not cause exception handler that a homodyne that begins to carry out on the bus reads the entitlement of invalid unusual fluctuation (zero-beat read-invalidatetransaction) to obtain to get row soon; And if an access meeting of getting row soon caused exception handler (exception), the homodyne of then abandoning beginning to carry out on bus reads invalid unusual fluctuation.
Description of drawings
Fig. 1 represents the microprocessor according to the embodiment of the invention;
The operational flowchart of the microprocessor of Fig. 2 presentation graphs 1; And
The operational flowchart of microprocessor in Fig. 3 A-Fig. 3 D presentation graphs 1.
[main element symbol description]
Fig. 1:
100~microprocessor; 102~instruction memory cache;
104~instruction transfer interpreter;
106~register name table (RAT);
108~reservation station;
112~performance element and storer subsystem
114~retirement unit; 116~reorder buffer (ROB);
118~microcode unit; 122~filling formation;
124~data memory cache;
126~Bus Interface Unit (BIU);
128~steering logic unit, 132~macro instruction
136~micro-order;
134~processor bus; 138~register (ECX/EDI);
142~fast REP STOS microcode routines;
Fig. 2:
202,204,206,208,212,214,216,218,222,224,226~step;
Fig. 3 A-Fig. 3 D:
302,304,306,308,312,314,316,318,322,324,326,328,332,334,336,338,342,344,346,348,352,354~step.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, a preferred embodiment cited below particularly, and conjunction with figs. are described in detail below.
The method of quickening REP STOS instruction (being also referred to as repeating data string storage instruction) at this be in the data actual storage before get row soon, the memory allocated zone get row soon.Yet the present application people thinks, gets soon for the row for storage involved each bar of serial data (entire) that lives forever, and is not required from the column data of getting soon of system storage, gets row soon because processor will be stored to whole piece.Therefore, not to carry out the special-purpose entitlement that the general bus cycles obtain to get soon row, but (Fig. 1's) microprocessor 100 is carried out homodyne (zero-beat) and is read invalid (read-invalidate) unusual fluctuation (transaction) on (Fig. 1's) processor bus 134, by not having cycle data in this unusual fluctuation and not needing actual in storage access, therefore comparatively quick.Moreover, because microprocessor 100 knows that it will write whole piece with the data from REP STOS instruction and get row soon, microprocessor 100 can be carried out this and read invalid unusual fluctuation before the actual storage operation, make storage instruction arrival (Fig. 1's) memory cache 124 have this at that time and get row soon.
Yet, obtain one and get the entitlement of row soon and do not have data from storer, have data error and/or the processor possibility of (hang) of stopping, so read invalid unusual fluctuation, must address these problems in order to utilize, as mentioned below.For instance, because microprocessor 100 will only have the entitlement of getting row soon, but will not have the real data of getting row soon, therefore, it can not carried out and read invalid unusual fluctuation, unless microprocessor 100 learns that it will be stored to whole piece and get row soon.
Illustrate that for another example REP STOS instruction has it provides the structural requirement of round-robin that is performed as single (individual) STOS instruction.Therefore, if STOS instruction causes when unusual, configuration state must reflect that (reflect) goes out the there and take place unusual.Especially, register ECX must reflect has remaining what loop iterations (iterations) to carry out, and register EDI must reflect the address that causes unusual storer.These complicated widely a large amount of storage operations and the use of reading invalid unusual fluctuation.
Consult Fig. 1 now, its expression is according to the microprocessor 100 of the embodiment of the invention.Microprocessor 100 comprises the instruction memory cache, and in order to fast program fetch instruction, for example x86REP STOS instruction is also referred to as macro instruction 132.Microprocessor 100 also comprises instruction transfer interpreter 104, and it is translated into macro instruction 132 into by the performed micro-order of a plurality of performance elements 112 of microprocessor 100.When instruction transfer interpreter 104 ran into some complicated macro instruction 132 (for example REP STOS instruction), instruction transfer interpreter 104 shifted control to microcode unit 118.
Microcode unit 118 comprises microcode ROM (read-only memory) (ROM) (not shown), comprises the microcode routine (routine) of a plurality of micro-orders 136 in order to storage, and it implements macro instruction 132.Particularly, microcode unit comprises that a quick microcode routine 142 is in order to implement REP STOS macro instruction 132.Microcode routine 142 comprises the conventional store instruction, to store by the specified data of REP STOS instruction.Microcode ROM also comprises the routine (not shown) of implementing REP STOS instruction in a conventional manner, for example, need not use and seize line operation (grabline operation) (in hereinafter explanation) and need not use and the specified relatively large by comparison storage operation of REP STOS instruction.Microcode routine 142 also comprises specific micro-order 136 fast, be called and seize the line operation, the storer subsystem 112 of its command processor 100 goes to indicate the Bus Interface Unit 126 of microprocessor 100 to read invalid unusual fluctuation by carry out homodyne on bus 134, goes to obtain to seize the specified related entitlement of getting row soon of storage address of line operation.
Microprocessor 100 with comprise register name table (register alias table, RAT) 106, its follow procedure receives in proper order from the micro-order of instruction transfer interpreter 104 with microcode unit 118, produces the interdependent information of instruction and instruction is sent to a plurality of reservation stations 108.When instruction had prepared to carry out, reservation station 108 issuing commands were to the performance element 112 of microprocessor 100.A plurality of registers 138 comprise the structure register and the temporary register of microprocessor 100, in order to operand to be provided and to instruct to performance element 112.Especially, register 138 comprises that REP STOS instructs employed ECT register and EDI register.
Performance element and storer subsystem 112 comprise a plurality of unit that are based upon jointly in the microprocessor, for example, integer unit, floating point unit, SIMD unit, are written into/storage unit and branch units (not shown).Storer subsystem 112 comprises data memory cache 124, fills formation 122, Bus Interface Unit (bus interface unit, BIU) 126 and steering logic unit 128.Fill formation 122 comprise a plurality of projects (entry) with keep be received from system storage get row soon, enter to data memory cache 124 with distribution.Operating in hereinafter of storer subsystem 112 describes in detail.
Microprocessor 100 also comprises retirement unit (retire unit) 124, and its follow procedure comes instruction retired in proper order, and (reorder buffer, ROB) position in 116 is indicated in the reorder buffer of microprocessor 100 according to instruction for this procedure order.
Consult Fig. 2 now, the operational flowchart of the microprocessor of its presentation graphs 1.Flow process begins in step 202.
In step 202, instruction transfer interpreter 104 runs into a large amount of REP STOS instructions and shifts control to quick REP STOS microcode routine 142.In one embodiment, " in a large number " REP STOS instruction is meant that the value of its ECX is more than or equal to 128.In one embodiment, shift control to microcode routine 142 forbidden energy microprocessor 100 is interrupted avoiding, allows interruption clearly up to microcode routine 142.In one embodiment, microcode routine 142 comprises a specific microoperation, its term of execution of microoperation in activation interrupt being called, microcode routine 142 also comprises another specific microoperation, its microoperation the term of execution in forbidden energy interrupt being called avoiding.Flow process continues to step 204.
In step 204, microcode routine 142 is carried out storage operation in a small amount on the specified initial memory position of REP STOS, arrives up to it and gets row border (boundary) soon.Be called " header (head) " with these zones that storage operation is relevant in a small amount that cause getting the row border soon at this.These in a small amount storage operations be byte, character, double word unit and go up to the storage operation of getting the row size for half soon.In one embodiment, one get soon and classify 64 bytes as.Flow process continues to step 206.
In step 206, microcode routine 142 was carried out N and is seized the line operation before the storage operation of correspondence, and these corresponding stored operations are to get row soon with what filling was seized.In one embodiment, N is 6, and it is a little less than the sum of filling formation 122 projects.These are seized line operational order storer subsystem 112 and remove to require Bus Interface Unit 126, go to get row soon and carry out one and read invalid unusual fluctuation on processor bus 134 in order to seize the specified related whole piece of storage address of line operation.Advantageously, processor bus 134 read invalid unusual fluctuation (being also referred to as homodyne reads) will be not access system storer and on processor bus 134, do not transmit data.This is more efficient on processor bus 134, its efficient is higher than in miss (miss) data memory cache 124 and causes getting soon the storage operation that row fill up, because this storage operation need be carried out a unusual fluctuation on processor bus 134, and need the access system storer to get row soon to read.Yet this needs microprocessor 100 to go to guarantee that it fills up with valid data (valid data) gets row soon, and this is because can't reading into the microprocessor 100 from the valid data of system storage.Specifically, microprocessor 100 must guarantee exception handler (exception) and the filling of interrupting not stoping each to seize of getting row soon.In addition, before soon the storage operation of row is got in filling soon, each is seized the line operation and is successfully carried out, making that processor bus 134 is required reads time of unusual fluctuation to carry out with seizing the relevant homodyne on bus 134 of line operation, may partially overlap between seizing between line operation and the corresponding stored operation other and instruct execution.Promptly be, microcode routine 142 is designed to make that seizing the line operation sufficiently is in before its associated storage operation, when waiting until that with foundation associative operation is stored 112 execution of device subsystem, reading invalid unusual fluctuation on bus 134 will be with the height possibility that is performed, this makes related the getting row entitlement soon and will obtain of associated storage, so that the associated storage operation will be hit (hit) data memory cache 124.In one embodiment, Bus Interface Unit 126 is carried out immediately and is read invalid unusual fluctuation, promptly is, reading invalid unusual fluctuation may be performed before being retired from office by retirement unit 124 seizing line operation.In addition, be the judgement of misprediction (mispredicted) or for example, can not stop the line of seizing that predictability (speculatively) carries out and operate according to the removing that microprocessor 100 pipelines of carrying out reacts in operation that is written into of miss data memory cache 124 according to branch instruction.Therefore, for fear of can not fill seize get row soon, microcode routine 142 must design seizes the execution that the line operation will not predictability.Flow process continues to step 208.
In step 208, microcode routine 142 is carried out enough a large amount of a plurality of storage operations (for example four 16 byte store operations) and is got row soon to fill one, and carries out one and seize the line operation." in a large number " storage operation is meant the storage operation of the length that individual byte, character or the double word unit of being longer than REP STOS instruction is specified.In one embodiment, a large amount of storage operation sizes are 16 bytes.Notice, one of step 208 propose for example in these performed storage operations will can not fill by step 208 identical give an example performed seize that the line operation seized get row soon, but with one of filling step 208 before given an example performed seize that the line operation seized one get row soon.Flow process proceeds in the determining step 212.
In determining step 212, microcode routine 142 judges whether to be left the individual row of getting soon of N and stores to satisfy REP STOS instruction, and wherein, N is the quantity of getting row soon of seizing in advance in step 206.If be, flow process continues to step 218, otherwise flow process continues to step 214.
In determining step 214, microcode routine 142 judges whether it needs to allow to interrupt.Structurally, REP STOS must its term of execution allow to interrupt.In one embodiment, in order to satisfy structural requirement, each 46 data are got soon and are listed as when being written into, and microcode routine 142 allows to interrupt.In one embodiment, represent (unrolled) in the microcode routine 142 by being circulated in of flowchart text, to improve usefulness by step 208/212/214.If allow in time to interrupt, flow process continues to step 216, otherwise flow process is back to step 208.
In step 216, microcode routine 142 is upgraded the configuration state of microprocessor 100 to reflect the STOS iteration of what REP STOS of its executed instruction.Especially, microcode routine 142 renewal ECX values and EDI value are to satisfy the requirement of said structure.When following exception handler, according to calling of an interruption, when broken hair was given birth in the middle of configuration state must reflect, where microprocessor 100 was carried out in REP STOS instruction.Unless interrupt being handled modestly, promptly only on the controlled time, be allowed to, the configuration state that interruption can lead to errors and/or stop, wherein, this stops and can not fill its entitlement and one get soon to be listed as and be associated by what a homodyne read that invalid unusual fluctuation obtains.When the interruption that provides an of short duration window (window) to give will to take place, interruption is circulated back to step 206 by forbidden energy and flow process once more and seizes operation in advance with what carry out other.
At determining step 218, microcode routine 142 is carried out a large amount of storage operations, to be filled in step 206 and/or 208 by seizing the individual row of getting soon of last N that the line operation is seized.Flow process continues to determining step 222.
In determining step 222, microcode routine 142 judges whether to remain the byte of REP STOS instruction with storage.If be, flow process continues to step 224, otherwise flow process continues to step 226.
In step 224, microcode routine 142 is carried out a plurality of a small amount of storage operations, to finish the last byte (last bytes) of storage REPSTOS instruction.Getting soon after the row and with these zones that storage operation is relevant in a small amount of in the end filling up is called tail end (tail) at this.Flow process continues to step 226.
In step 226, microcode routine 142 is upgraded configuration state (ECX/EDI) finishing with reflection REPSTOS instruction.Flow process finishes in step 226.
Consult Fig. 3 A-Fig. 3 D now, the operational flowchart of microprocessor 100 in its presentation graphs 1.Flow process begins in step 302.
In step 302, one seizes line operation (for example step 206 or 208 performed is seized the lines operation among Fig. 2) arrives storer subsystem 112, and it checks that this seizes the line operation.In one embodiment, by having specified a memory location in step 206 or the 208 performed specified storage addresss of line operation of seizing, it is in or the approaching end (end) of getting row soon, check to carry out various abnormal conditions for activation storer subsystem 112, and these abnormal conditions are checked to occur in and are got the terminal of row soon or near the end of getting row soon, but can not approach to get soon initial (beginning) of row, for example segmentation restriction conflict (segment limit violations) or the point of interruption.Flow process continues to determining step 304.
In determining step 304, storer subsystem 112 judges whether to exist and seizes the relevant abnormal conditions of line operation.One abnormal conditions may comprise but not limit following several: seize line operation specified get in the row Anywhere segmentation restriction conflict soon; At the page fault (page fault) that comprises on the locked memory pages of getting row soon; The debug point of interruption; The memory characteristics (trait) of getting row soon is unknown (for example translating the miss situation of lookaside buffer (TLB)); The page can not be stored operation and use (be page table lookup (page table walk) be not performed to indicate this page as yet be dirty (dirty)); The memory characteristics of getting row soon be except write-back (write-back, WB) or write combination (write-combine, WC) in addition.Owing to storer subsystem 112 hardware are checked for these situations, if there are not abnormal conditions to have (and memory characteristics is WB), microcode routine 142 advantageously can be at full speed and be followed before a large amount of storage operations of correspondence or even cross over seizing the line operation and continuing of page boundary.If abnormal conditions exist, flow process continues to step 328, otherwise flow process continues to determining step 306.
In determining step 306,112 judgements of storer subsystem are got the memory characteristics of row soon whether for writing combination.If be, flow process continues to step 308, otherwise flow process continues to step 312.
In step 308, storer subsystem 112 is not carried out (no-op) and is seized the line operation.In other words, storer subsystem 112 can not carried out and read invalid unusual fluctuation (for example in step 316) and can not be distributed in data memory cache 124 or fill a project (for example in step 314) in the formation 122.In addition, storer subsystem 112 can be not an exception handler with seizing the line operational label.Therefore, what a large amount of subsequently storage operations of carrying out in step 208 and 218 were published and entered to microprocessor 100 writes the combined buffers (not shown), and any operation and non-execute exception handling procedure are not carried out in the line operation of seizing subsequently yet.Therefore, the REP STOS instruction that enters to one of storer WC zone (region) may be enjoyed a large amount of storage operation usefulness benefits, even it does not enjoy the benefit of seizing line.Flow process finishes in step 308.
In determining step 312, storer subsystem 112 judges whether the memory characteristics of getting row soon is write-back.If be, flow process continues to step 314, otherwise flow process continues to step 328.
In step 314, storer subsystem 112 is distributed in a project that data get a project of row 124 soon and fill formation 122 and is got row soon to what seize.Flow process continues to step 316.
In step 316, Bus Interface Unit 126 is carried out homodyne and is read invalid unusual fluctuation on bus 134, not need to read any data from system storage for special use (exclusive) entitlement that obtains to get soon row.Flow process continues to step 318.
In step 318, arrive storer subsystem 112 in the step 208 of Fig. 2 or 218 storage operations of carrying out, and storer subsystem 112 will store data write to distributed in the data memory cache 124 get row soon.Flow process continues to step 322.
In step 322, when the storage data of step 318 were just writing to data memory cache 124, the project of filling formation 122 was maintained the byte shade (byte mask) of the effective byte that the storage data have been written into.The byte shade identify (identify) for hit get soon row be written into subsequently the operation for which byte be effective.Flow process continues to step 324.
In step 324, if one surveys (snoop) operation hits and seizes line when operating the project of relevant filling formation 122, this exploration operation of Bus Interface Unit 126 retries (retry) is fully stored (fully polulate) valid data from storage operation up to getting row soon.Flow process continues to step 326.
In step 326, in case get all bytes of row soon when having been filled up by valid data, to fill the project oneself of formation 122 and remove distribution (deallocate itself), it causes getting soon row and is retired from office to data memory cache 124 at last.Flow process finishes in step 326.
In step 328, storer subsystem 112 is not carried out (no-op) and is seized the line operation.In other words, storer subsystem 112 can not carried out and read invalid unusual fluctuation (for example in step 316) and can not be distributed in data memory cache 124 or fill a project (for example in step 314) in the formation 122.Yet storer subsystem 112 marks (mark) are seized line and are operating as exception handler.Therefore, newly be eliminated in a plurality of instructions of seizing line unusually in procedure order, for example new a large amount of storage operations of execution and the line of carrying out in step 206 and 208 of newly seizing are operated in step 208 and 218.Especially, storer subsystem 112 marks are seized line unusually and are operating as inner (internally) exception handler, rather than at structural exception handler, describe refer step 332 to 348 in detail.Flow process continues to step 332.
In step 332, seize the line operation and be ready to resignation, and retirement unit 114 detects seizes line operation and is marked as exception handler, therefore retirement unit 114 will newly be removed in all instructions of seizing the line operation that are expressed as exception handler, and control will be transferred to the exception handler processing routine (handler) of quick REP STOS microcode routine 142.In one embodiment, storer subsystem 112 is set one and is indicated (indicate) to cause that one of internal abnormality handling procedure seizes the line operation, and microcode routine 142 reads this position to detect this situation.Flow process continues to step 334.
In step 334, the microcode exception handler is handled routine and is detected any remaining part that exception handler has been seized by that the line operation is caused and filled " header " with a small amount of storage operation (as if the words that have).In one embodiment, the line of carrying out in the step 206 of Fig. 2 of seizing operates in before a small amount of storage operation that in fact is in execution in the step 204 in the procedure order.Therefore, before header was fully filled, seize in the line operation one of step 206 may be exception handler.Step continues to step 336.
In step 336, the microcode exception handler is handled routine and is judged that what have getting has soon prepared to be seized the line operation before the operation of carrying out of getting the last place and seize, and has promptly what were not filled is got row soon.The microcode exception handler handle routine then by carry out a large amount of storage operations fill be not filled get row, similar a large amount of storage operations soon in step 208.As discussed above, storer subsystem 112 guarantees not have the textural anomaly handling procedure with microcode routine 142 always or interrupts taking place (being that control can not be transferred to operating system), up to configuration state for correct and get soon to be listed as by each of seizing that the line operation seized and be filled.Flow process continues to step 338.
In step 338, the microcode exception handler handle routine trigger (tickle) as the target of seizing the line operation get row soon, and this seizes line operation having caused internal abnormality handling procedure.Promptly be, the microcode exception handler is handled routine and is carried out an instruction, and its instruction memory subsystem 112 goes to carry out the function relevant with a storage operation in fact, does not write to storer and do not need in fact to store data.Especially, storer subsystem 112 must be checked all exception handlers of carrying out on a storage operation.For instance, exception handler is checked and may be included, but are not limited to wherein: segmentation restriction conflict, page fault, debug (debug) point of interruption, get and classify non-writeable (non-writeable) as soon.If the page table information (comprising memory characteristics information) that necessary acquisition is relevant with getting row soon, this triggering command will be carried out a page table lookup.Flow process continues to determining step 342.
In determining step 342, storer subsystem 112 judges whether the triggering of carrying out causes textural anomaly handling procedure situation in step 338.If be, flow process continues to step 354, otherwise continues to step 344.
In step 344, the microcode exception handler is handled routine and is written into the memory characteristics of getting row soon.Flow process continues to determining step 346.
In determining step 346, the microcode exception handler is handled the memory characteristics that the routine inspection is written into.If this memory characteristics is write-back (WB) or writes combination (WC), flow process continues to step 352, otherwise flow process continues to step 348.
In step 348, the microcode exception handler handle routine upgrade configuration state (for example ECX/EDI) with the storage operation that reflects last resignation, remove exception handler, and reply (revert) to slow data string memory module (slow string store mode).Promptly be, the microcode exception handler is handled the microcode routine (not shown) in the microcode unit 118 that routine is transferred to control general execution REP STOS instruction, promptly in STOS operation cycle and need not use and seize the line operation, and allow interruption after each STOS operates with the specified size (with respect to large-sized byte, character or double word unit) of REP STOS instruction.In determining step 304 and 346 these are checked and are allowed microprocessor 100 devices to carry out rapid data string storage operation valuably, have by the writing of the specified size of original REP STOS instruction on bus 134, to carry out, and not need to carry out fast extract operation having the non-memory area that can get characteristic soon.For instance, the memory mapped formula is exported (memory-mapped I/O) device may be mapped to a position that has in the non-memory area that can get characteristic soon, and this is because (1) program developer (programmer) in fact will be stored and walk out to bus rather than in fact want this I/O device to store with the specified size of program to memory cache and (2) program developer to be written into (for example the byte size to the control register of byte size on the I/O device writes) rather than a relatively large storage operation.Therefore, REPSTOS is converted to one non-can get the zone soon the time (in wherein not being allowed to be engaged in a large amount of storage operations) if can get zone (in wherein allowing to be engaged in a large amount of storage operations) soon by one, stop valuably being engaged in soon at the microprocessor 100 of this expectation and get/write in a large number, and be engaged in non-can getting soon/a small amount of and write.Flow process finishes in step 348.
In step 352, the microcode exception handler is handled routine and is continued in rapid data string memory module (fast string store mode).Promptly be that flow process is back to the step 206 of Fig. 2.Flow process finishes in step 352.
In step 354, another exception handlers that microcode exception handler processing routine textural anomaly handling procedure causes flow process to be transferred in the microcode routine 142 are handled routine, it upgrades configuration state, remove the exception handler situation and carry out with in step 348 by the similar a plurality of a small amount of storage operations of the performed storage of slow data string memory encoding, the textural anomaly handling procedure that in these a small amount of storage operations one causes above-mentioned triggering again and caused.Especially, for guarantee to allow with data storage between cause the starting point of getting row soon that exception handler triggers with actual cause exception handler get memory area between the position in the row soon, this exception handler processing routine can not allow interruption.When textural anomaly routine processes routine is when causing (invoke) according to a small amount of storage operation, it will normally operate (handle) textural anomaly handling procedure, and this is can be received, is before to have upgraded configuration state and any important quilt and seize and get row soon and be filled because the microcode exception handler is handled routine.Flow process finishes in step 354.
Can learn and as discussed above by Fig. 3, in order to receive usefulness benefit from Fig. 3, suitably storage operation be structural before (promptly suitably before the storage operation of reality resignation), seize line and operate on the bus 134 and carry out.If storer subsystem 112 detects one and seizes line and can cause abnormal conditions (abnormal condition) (for example exception handler or other above-mentioned situations about indicating), storer subsystem 112 makes this seize line and produces an internal abnormality handling procedure, its activation microcode exception handler is handled routine and is gone to judge that it is the line of seizing that causes exception handler, makes exception handler handle routine and can carry out the special line processing coding of seizing.The seize line important for activation is stored filling (for example in step 336), and storer subsystem 112 produces this internal abnormality handling procedure, rather than a textural anomaly handling procedure.Otherwise, machine stop (machinehang) may take place.Therefore, if whole REP STOS instruction enters to the WB memory area and does not have abnormal conditions, the embodiment of microprocessor 100 herein may store the specified overall data string (except header and end section) of REP STOS instruction valuably with maximum rate effectively, and under this this maximum rate, processor bus 134 can be seized line by utilization with the storer subsystem and be in harmonious proportion (accommodate) with a large amount of storage operations, and does not need slow-down in the overall data string length.
Seize the embodiment that the line operation is used for carrying out rapid data string storage (REP STOS) though narrated, but can consider to use and seize line and operate in and carry out the embodiment that rapid data moves (REP MOVS), promptly before the storage relevant, carry out fully and seize the line operation quickening the usefulness of REP MOVS, operate and to hit memory cache when being performed when seizing line causing with MOVS operation.
Different embodiments of the invention are in this paper narration, but those skilled in the art should be able to understand these embodiment only as example, but not are defined in this.Those skilled in the art can be under the situation that does not break away from spirit of the present invention, does different variations on form and the details.For example, but the function of the described apparatus and method of the software activation embodiment of the invention, establishment (fabrication), modeling (modeling), emulation, description (description) and/or test.Can pass through general procedure language (C, C++), (Hardware Description Languages HDL) (comprises Verilog HDL, VHDL or the like) to hardware description language or other available program languages are finished.But this software is configurable at any known computing machine working medium, for example semiconductor, disk or CD (for example CD-ROM, DVD-ROM or the like).Apparatus and method embodiment of the present invention can be included in semiconductor intellecture property kernel, micro-processor kernel (realizing) for example with HDL, and convert the hardware of integrated circuit (IC) products to.In addition, apparatus and method of the present invention can be embodied as combining of hardware and software.Therefore, the present invention should not be limited to the disclosed embodiments, defines but comply with accompanying claim and be equal to enforcement institute.Particularly, present invention can be implemented in the micro processor, apparatus that is used in the general service computing machine.At last; though the present invention with preferred embodiment openly as above; so it is not in order to limit scope of the present invention; those skilled in the art; without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking the appended claims person of defining.

Claims (29)

1. a microprocessor couples a storer by a bus, and this microprocessor comprises:
One memory cache; And
One seizes line, relates to the storage address that one of this storer is got row soon in order to appointment, and wherein, this is seized line and indicates this microprocessor to begin to carry out a homodyne on this bus to read invalid unusual fluctuation to obtain the entitlement that this gets row soon;
Wherein, if this microprocessor judges can cause exception handler to this storage operation of getting row soon, when this microprocessor was carried out this and seized line, this homodyne that this microprocessor is abandoned beginning to carry out on this bus read invalid unusual fluctuation.
2. microprocessor as claimed in claim 1 also comprises:
One microcode unit comprises an organization instruction, and this organization instruction indicates this microprocessor repeatedly a serial data to be stored to a plurality of adjacent positions in specified this storer of this organization instruction;
Wherein, these adjacent positions in this storer comprise a plurality of row of getting soon jointly, and this microcode unit comprises a plurality of lines of seizing that are used for specifying a plurality of storage addresss, and these storage addresss and these are got soon and shown the pass; And
Wherein, this microcode unit also comprises a plurality of storage instructions, and these storage instructions are filled these with this serial data and got row soon.
3. microprocessor as claimed in claim 2, wherein, this microcode unit more is used for detecting the situation that this microprocessor is removed before one or more these storage instructions of resignation, make its entitlement read invalid unusual fluctuation by this homodyne and by this little processing obtained one or more these get row soon, do not fill with this serial data according to these storage instructions that are eliminated.
4. microprocessor as claimed in claim 3, wherein, this microcode unit is more used according to the detection of this situation and is filled with this serial data, not according to these storage instructions that are eliminated come with this serial data fill one or more these get row soon.
5. microprocessor as claimed in claim 4, wherein, this microcode unit also is stored to a header part of these adjacent positions in this specified storer of this organization instruction according to the detection of this situation with this serial data, and this header partly be included in this storer by the specified primary importance of this organization instruction begin to go up to but do not comprise that these get first a plurality of positions of getting row soon in the row soon.
6. microprocessor as claimed in claim 3,
Wherein, this microprocessor is got one of row soon to these and is got a storage of row soon and can cause that one of exception handler seizes line according to seizing at these to be used to refer in line, removes this one or more these storage instructions; And
Wherein, these storage instructions that are eliminated newly are being used to refer to this that these are got soon that one in the row get soon that a storage of row can cause exception handler and are seizing line in seizing at these on the procedure order in the line.
7. microprocessor as claimed in claim 6, wherein, if will cause a textural anomaly handling procedure to these these these storages of getting row soon of getting row soon, this microcode unit is stored to these a plurality of adjacent positions of getting row soon that these get row soon with this serial data, produces up to this textural anomaly handling procedure.
8. microprocessor as claimed in claim 7, wherein, this organization instruction is specified a serial data size of repeated storage this serial data of these adjacent positions to this storer, and this microcode unit is stored to these a plurality of adjacent positions of getting row soon that these get row soon by a plurality of serial data size storage instructions with this serial data, produces up to this textural anomaly handling procedure.
9. microprocessor as claimed in claim 2, wherein, this microcode unit disposes and makes these seize line to be in momently before these storage instructions, attempts to fill this with this serial data separately and gets the height possibility that this microprocessor before the row has obtained this right of priority of getting row soon soon to be based upon these storage instructions.
10. microprocessor as claimed in claim 2, wherein, this organization instruction is specified a size of this serial data of these adjacent positions of repeated storage to this storer, and each this storage instruction writes a plurality of bytes more than this size of this specified serial data of this organization instruction.
11. microprocessor as claimed in claim 2, wherein, this microprocessor also guarantee this organization instruction the term of execution in, control can not be transferred to system software, up to its entitlement read invalid unusual fluctuation by this homodyne and by this microprocessor obtained these get row soon each get row soon and fill with this serial data.
12. microprocessor as claimed in claim 1, wherein, if this gets this microprocessor judges row soon and have one and write the compound storage characteristic, when this microprocessor was carried out this and seized line, this homodyne that this microprocessor is more abandoned beginning to carry out on this bus read invalid unusual fluctuation.
13. microprocessor as claimed in claim 1, wherein, if this memory characteristics of getting row soon be not write combination neither write-back, when this microprocessor was carried out this and seized line, this homodyne that this microprocessor is more abandoned beginning to carry out on this bus read invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
14. microprocessor as claimed in claim 1, wherein, this storage address of line is miss translates lookaside buffer at one of this microprocessor if this is seized, when this microprocessor was carried out this and seized line, this homodyne that this microprocessor is more abandoned beginning to carry out on this bus read invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
15. microprocessor as claimed in claim 1, wherein, if a page table lookup is not performed as yet to comprising that related this of this this storage address of seizing line get a locked memory pages of row soon, when this microprocessor was carried out this and seized line, this homodyne that this microprocessor is more abandoned beginning to carry out on this bus read invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
16. microprocessor as claimed in claim 1 also comprises:
One microcode unit is used for implementing an organization instruction, and this organization instruction indicates this microprocessor that one serial data is moved to a second area of this storer by a first area of this storer, and this second area of this storer jointly comprises a plurality of row of getting soon;
Wherein, this microcode unit comprises a plurality of lines of seizing that are used for specifying a plurality of storage addresss, and these storage addresss and these are got soon and shown the pass; And
Wherein, this microcode unit also comprises a plurality of storage instructions, and these storage instructions are filled these with this serial data of a part and got row soon.
17. a manner of execution is carried out by a microprocessor, this microprocessor couples a storer by a bus, and manner of execution comprises:
Receive one and seize line to carry out, this is seized the line appointment and relates to the storage address that one of this storer is got row soon;
Seize the reception of line according to this, judge whether can cause exception handler this storage operation of getting row soon;
If can not cause exception handler to this access of getting row soon, a homodyne that begins to carry out on this bus reads invalid unusual fluctuation to obtain the entitlement that this gets row soon; And
If this access meeting of getting row is soon caused exception handler, and this homodyne of then abandoning beginning to carry out on this bus reads invalid unusual fluctuation.
18. manner of execution as claimed in claim 17 also comprises:
One organization instruction is decoded, and this organization instruction indicates this microprocessor repeatedly a serial data to be stored to a plurality of adjacent positions in specified this storer of this organization instruction; And
According to the decoding of this organization instruction, carry out a microcode unit of this microprocessor, wherein, this microcode unit comprises a plurality of lines of seizing that are used for specifying a plurality of storage addresss, and these storage addresss and these are got soon and are shown the pass;
Wherein, this microcode unit also comprises a plurality of storage instructions, and these storage instructions are filled these with this serial data and got row soon.
19. manner of execution as claimed in claim 17 also comprises:
The detection situation that this microprocessor is removed before one or more these storage instructions of resignation, make its entitlement read invalid unusual fluctuation by this homodyne and by this little processing obtained one or more these get row soon, do not fill with this serial data according to these storage instructions that are eliminated.
20. manner of execution as claimed in claim 19 also comprises
According to the detection of this situation, and with this serial data do not fill not according to these storage instructions that are eliminated come with this serial data fill one or more these get row soon.
21. manner of execution as claimed in claim 19, wherein, this microprocessor is got one of row soon to these and is got a storage of row soon and can cause that one of exception handler seizes line according to seizing at these to be used to refer in line, removes this one or more these storage instructions; And
Wherein, these storage instructions that are eliminated newly are being used to refer to this that these are got soon that one of row get soon that a storage of row can cause exception handler and are seizing line in seizing at these on the procedure order in the line.
22. manner of execution as claimed in claim 21 also comprises:
Judgement will cause a textural anomaly handling procedure to these these these storages of getting row soon of getting row soon; And
According to judging the step that will cause this textural anomaly handling procedure to these these these storages of getting row soon of getting row soon, this serial data is stored to these these a plurality of adjacent positions of getting row soon of getting row soon, produces up to this textural anomaly handling procedure.
23. manner of execution as claimed in claim 18, wherein, this microcode unit disposes and makes these seize line to be in momently before these storage instructions, attempts to fill this with this serial data separately and gets the height possibility that this microprocessor before the row has obtained this right of priority of getting row soon soon to be based upon these storage instructions.
24. manner of execution as claimed in claim 18 also comprises:
Guarantee this organization instruction the term of execution in, control can not be transferred to system software, up to its entitlement read invalid unusual fluctuation by this homodyne and by this little processing obtained these get row soon each get row soon and fill with this serial data.
25. manner of execution as claimed in claim 17 also comprises:
False this got row soon and had one and write the compound storage characteristic, and when carrying out this and seize line, this homodyne of abandoning beginning to carry out on this bus reads invalid unusual fluctuation.
26. manner of execution as claimed in claim 17 also comprises:
If this memory characteristics of getting row soon be not write combination neither write-back, when carrying out this and seize line, this homodyne of abandoning beginning to carry out on this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
27. manner of execution as claimed in claim 17 also comprises:
This storage address of line is miss translates lookaside buffer at one of this microprocessor if this is seized, and when carrying out this and seize line, this homodyne of abandoning beginning to carry out on this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
28. manner of execution as claimed in claim 17 also comprises:
If a page table lookup is not performed as yet to comprising that related this of this this storage address of seizing line get a locked memory pages of row soon, when this seized line when execution, this homodyne of abandoning beginning to carry out on this bus read invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
29. manner of execution as claimed in claim 17 also comprises:
One organization instruction is decoded, and this organization instruction indicates this microprocessor that one serial data is moved to a second area of this storer by a first area of this storer, and this second area of this storer jointly comprises a plurality of row of getting soon; And
According to the step that this organization instruction is decoded, carry out a microcode of this microprocessor, wherein, this microcode comprises a plurality of lines of seizing that are used for specifying a plurality of storage addresss, and these storage addresss and these are got soon and are shown the pass;
Wherein, this microcode also comprises a plurality of storage instructions, and these storage instructions are filled these with this serial data of a part and got row soon.
CN201010260344.8A 2009-08-28 2010-08-20 Microprocessor and execution method thereof Active CN101916181B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US23791709P 2009-08-28 2009-08-28
US61/237,917 2009-08-28
US12/781,210 US8392693B2 (en) 2009-08-28 2010-05-17 Fast REP STOS using grabline operations
US12/781,210 2010-05-17

Publications (2)

Publication Number Publication Date
CN101916181A true CN101916181A (en) 2010-12-15
CN101916181B CN101916181B (en) 2014-04-23

Family

ID=43323700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010260344.8A Active CN101916181B (en) 2009-08-28 2010-08-20 Microprocessor and execution method thereof

Country Status (1)

Country Link
CN (1) CN101916181B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104049903A (en) * 2013-03-14 2014-09-17 辉达公司 Migration scheme for unified virtual memory system
CN104049904A (en) * 2013-03-14 2014-09-17 辉达公司 Page state directory for managing unified virtual memory
US11487673B2 (en) 2013-03-14 2022-11-01 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
US11741015B2 (en) 2013-03-14 2023-08-29 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5404473A (en) * 1994-03-01 1995-04-04 Intel Corporation Apparatus and method for handling string operations in a pipelined processor
US6212629B1 (en) * 1989-02-24 2001-04-03 Advanced Micro Devices, Inc. Method and apparatus for executing string instructions
US6212601B1 (en) * 1996-08-30 2001-04-03 Texas Instruments Incorporated Microprocessor system with block move circuit disposed between cache circuits
CN1225690C (en) * 2003-02-11 2005-11-02 智慧第一公司 Fast fetch allocation and initial apparatus and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6212629B1 (en) * 1989-02-24 2001-04-03 Advanced Micro Devices, Inc. Method and apparatus for executing string instructions
US5404473A (en) * 1994-03-01 1995-04-04 Intel Corporation Apparatus and method for handling string operations in a pipelined processor
US6212601B1 (en) * 1996-08-30 2001-04-03 Texas Instruments Incorporated Microprocessor system with block move circuit disposed between cache circuits
CN1225690C (en) * 2003-02-11 2005-11-02 智慧第一公司 Fast fetch allocation and initial apparatus and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104049903A (en) * 2013-03-14 2014-09-17 辉达公司 Migration scheme for unified virtual memory system
CN104049904A (en) * 2013-03-14 2014-09-17 辉达公司 Page state directory for managing unified virtual memory
CN104049904B (en) * 2013-03-14 2017-07-14 辉达公司 For the system and method for the page status catalogue for managing unified virtual memory
US11487673B2 (en) 2013-03-14 2022-11-01 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
US11741015B2 (en) 2013-03-14 2023-08-29 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system

Also Published As

Publication number Publication date
CN101916181B (en) 2014-04-23

Similar Documents

Publication Publication Date Title
US8392693B2 (en) Fast REP STOS using grabline operations
US5465336A (en) Fetch and store buffer that enables out-of-order execution of memory instructions in a data processing system
US7133969B2 (en) System and method for handling exceptional instructions in a trace cache based processor
US8069336B2 (en) Transitioning from instruction cache to trace cache on label boundaries
JP3753743B2 (en) Method and apparatus for memory data aliasing in advanced processors
EP2336878B1 (en) Method and medium storing instructions for efficient load processing using buffers
US6266768B1 (en) System and method for permitting out-of-order execution of load instructions
US20120117335A1 (en) Load ordering queue
JP3659877B2 (en) Superscaler processing system and method for efficiently preventing errors caused by write-after-write data hazards
CN101984403A (en) Microprocessor and its executing method
US20140375658A1 (en) Processor Core to Graphics Processor Task Scheduling and Execution
US8645588B2 (en) Pipelined serial ring bus
JP4624988B2 (en) System and method for preventing instances of operations executing in a data speculation microprocessor from interrupting operation replay
CN101916181B (en) Microprocessor and execution method thereof
US8799628B2 (en) Early branch determination
US8255602B2 (en) Effective mixing real-time software with a non-real-time operating system
US9400655B2 (en) Technique for freeing renamed registers
US9405545B2 (en) Method and apparatus for cutting senior store latency using store prefetching
US6134645A (en) Instruction completion logic distributed among execution units for improving completion efficiency
US7043626B1 (en) Retaining flag value associated with dead result data in freed rename physical register with an indicator to select set-aside register instead for renaming
JP3621116B2 (en) Conversion memory protector for advanced processors
US7321964B2 (en) Store-to-load forwarding buffer using indexed lookup
US7197630B1 (en) Method and system for changing the executable status of an operation following a branch misprediction without refetching the operation
US6711670B1 (en) System and method for detecting data hazards within an instruction group of a compiled computer program
US7900023B2 (en) Technique to enable store forwarding during long latency instruction execution

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant