CN101916181B - Microprocessor and execution method thereof - Google Patents

Microprocessor and execution method thereof Download PDF

Info

Publication number
CN101916181B
CN101916181B CN201010260344.8A CN201010260344A CN101916181B CN 101916181 B CN101916181 B CN 101916181B CN 201010260344 A CN201010260344 A CN 201010260344A CN 101916181 B CN101916181 B CN 101916181B
Authority
CN
China
Prior art keywords
microprocessor
storage
cache row
cache
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010260344.8A
Other languages
Chinese (zh)
Other versions
CN101916181A (en
Inventor
G·葛兰·亨利
罗德尼·E·虎克
柯林·艾迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/781,210 external-priority patent/US8392693B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101916181A publication Critical patent/CN101916181A/en
Application granted granted Critical
Publication of CN101916181B publication Critical patent/CN101916181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions

Abstract

Microprocessor and the execution method thereof are disclosed. The microprocessor comprises a cache memory and a snatching line instruction; the snatching line instruction is used for assigning a memory address of a cache column of the memory, and indicates the microprocessor to start to execute a zero-beatread-invalidate transaction to obtain the ownership of the cache column. if the microprocessor judges that an abnormal exception is caused to a storing operation of the cache column, when the microprocessor executes the snatching line instruction, the microprocessor abandons to execute the zero-beatread-invalidate transaction on the bus.

Description

Microprocessor and manner of execution thereof
Technical field
The present invention relates to a kind of instruction set architecture of microprocessor, particularly a kind of instruction that serial data is stored in to storer.
Background technology
Program is generally removed (scrub) storer with the REP STOS instruction of x86 instruction set, for example, with " 0 ", fill up storer or a large amount of identical datas are write to video buffer.Specified by relative the data volume writing large at register ECX, the page of many caches row or even many storeies is written into.For processor, this object is in order to carry out and to write as much as possible fast.Generally speaking, the storer being written into has write-back (write-back) memory characteristics, is that it can write and can cache.If storage area (memory area being written into) has hit in memory cache, miss in memory cache by comparison with storage area, REP STOS instruction will be carried out more quickly.This is because processor must distribute miss cache row, obtains the entitlement (gain ownership) of cache row and it is read into memory cache by storer, and this causes relatively many time loss.
Summary of the invention
In a viewpoint, the invention provides a kind of microprocessor, it couples a storer by a bus.Microprocessor comprises that a memory cache and seizes line.Seize line in order to specify a storage address of cache row that relate to this storer.Seize line indication microprocessor starts to carry out a homodyne and reads invalid unusual fluctuation (zero-beat read-invalidate transaction) to obtain the entitlement of these cache row in bus.If microprocessor judges can cause exception handler to a storage operation of cache row, when line is seized in microprocessor execution, the homodyne that microprocessor is abandoned starting to carry out in bus reads invalid unusual fluctuation.
In another viewpoint, the invention provides a kind of manner of execution, by a microprocessor, carried out.Microprocessor couples a storer by a bus.This manner of execution comprises that receiving one seizes line to carry out.Seize the storage address that line is used for specifying a cache that relates to storer to be listed as.This manner of execution also comprises according to the reception of seizing line, judges and whether can cause exception handler to a storage operation of cache row.If this manner of execution also comprises, to an access of cache row, can not cause exception handler, a homodyne that starts to carry out in bus reads invalid unusual fluctuation (zero-beat read-invalidatetransaction) to obtain the entitlement of cache row; And if an access meeting of cache row is caused to exception handler (exception), the homodyne of abandoning starting to carry out in bus reads invalid unusual fluctuation.
Accompanying drawing explanation
Fig. 1 represents the microprocessor according to the embodiment of the present invention;
The operational flowchart of the microprocessor of Fig. 2 presentation graphs 1; And
The operational flowchart of microprocessor in Fig. 3 A-Fig. 3 D presentation graphs 1.
[main element symbol description]
Fig. 1:
100~microprocessor; 102~instruction memory cache;
104~instruction transfer interpreter;
106~register name table (RAT);
108~reservation station;
112~performance element and storer subsystem
114~retirement unit; 116~reorder buffer (ROB);
118~microcode unit; 122~filling queue;
124~data quick storer;
126~Bus Interface Unit (BIU);
128~steering logic unit, 132~macro instruction
136~micro-order;
134~processor bus; 138~register (ECX/EDI);
142~REP STOS microcode routine fast;
Fig. 2:
202,204,206,208,212,214,216,218,222,224,226~step;
Fig. 3 A-Fig. 3 D:
302,304,306,308,312,314,316,318,322,324,326,328,332,334,336,338,342,344,346,348,352,354~step.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, a preferred embodiment cited below particularly, and coordinate accompanying drawing, be described in detail below.
The method of accelerating REP STOS instruction (at this also referred to as repeating data string storage instruction) be in data actual storage to before cache row, the cache in memory allocated region is listed as.But present inventor thinks, live forever for involved each (entire) cache row of storage serial data for one, from the cache column data of system storage, be not required, because processor will be stored to whole piece cache row.Therefore, not to carry out the special entitlement that the general bus cycle obtains cache row, but (Fig. 1's) microprocessor 100 is carried out homodyne (zero-beat) and is read invalid (read-invalidate) unusual fluctuation (transaction) on (Fig. 1's) processor bus 134, by not thering is the data cycle in this unusual fluctuation and not needing actual in memory access, therefore comparatively quick.Moreover, due to microprocessor 100 know it will be to write whole piece cache row from the data of REP STOS instruction, microprocessor 100 can be carried out this and read invalid unusual fluctuation before actual storage operation, made to store instruction arrival (Fig. 1's) memory cache 124 and had at that time this cache row.
But, obtain the entitlement of cache row and do not there are the data from storer, there is data error and/or the processor possibility of (hang) of stopping, so read invalid unusual fluctuation in order to utilize, must address these problems, as mentioned below.For instance, because microprocessor 100 will only have the entitlement of cache row, but will not have the real data of cache row, therefore, it can not be carried out and read invalid unusual fluctuation, unless microprocessor 100 learns that it will be stored to whole piece cache row.
For another example, illustrate, REP STOS instruction has it provides to be performed the structural requirement as the circulation of single (individual) STOS instruction.Therefore,, if a STOS instruction causes when abnormal, configuration state must reflect that (reflect) goes out there and occur abnormal.Especially, register ECX must reflect has remaining how many loop iterations (iterations) to carry out, and register EDI must reflect the address that causes abnormal storer.This is complicated a large amount of storage operation and the use of reading invalid unusual fluctuation widely.
Consult now Fig. 1, it represents according to the microprocessor 100 of the embodiment of the present invention.Microprocessor 100 comprises instruction memory cache, and in order to cache programmed instruction, for example x86REP STOS instruction, also referred to as macro instruction 132.Microprocessor 100 also comprises instruction transfer interpreter 104, and it is translated into macro instruction 132 into by the performed micro-order of multiple performance elements 112 of microprocessor 100.For example, when instruction transfer interpreter 104 runs into the macro instruction 132 that some is complicated (REP STOS instruction), instruction transfer interpreter 104 shifts control to microcode unit 118.
Microcode unit 118 comprises microcode ROM (read-only memory) (ROM) (not shown), comprises the microcode routine (routine) of multiple micro-orders 136 in order to storage, and it implements macro instruction 132.Particularly, microcode unit comprises that a quick microcode routine 142 is in order to implement REP STOS macro instruction 132.Microcode routine 142 comprises conventional store instruction, to store by the specified data of REP STOS instruction.Microcode ROM also comprises the routine (not shown) of implementing in a conventional manner REP STOS instruction, for example, do not need to use and seize line operation (grabline operation) (in below explanation) and do not need to use and the specified relatively large storage operation by comparison of REP STOS instruction.Microcode routine 142 also comprises specific micro-order 136 fast, be called and seize line operation, the storer subsystem 112 of its command processor 100 goes to indicate the Bus Interface Unit 126 of microprocessor 100 to read invalid unusual fluctuation by carry out homodyne in bus 134, goes to obtain the entitlement of seizing line and operate the related cache of specified storage address row.
Microprocessor 100 with comprise register name table (register alias table, RAT) 106, its follow procedure order receives the micro-order with microcode unit 118 from instruction transfer interpreter 104, produces the interdependent information of instruction and instruction is sent to multiple reservation stations 108.When instruction prepares to carry out, reservation station 108 issuing commands are to the performance element 112 of microprocessor 100.Multiple registers 138 comprise structure register and the temporary register of microprocessor 100, in order to provide operand and instruction to performance element 112.Especially, register 138 comprises ECT register and the EDI register that REP STOS instruction is used.
Performance element and storer subsystem 112 comprise the multiple unit that are jointly based upon in a microprocessor, for example, and integer unit, floating point unit, SIMD unit, be written into/storage unit and branch units (not shown).Storer subsystem 112 comprises data quick storer 124, fills queue 122, Bus Interface Unit (bus interface unit, BIU) 126 and steering logic unit 128.Fill queue 122 and comprise that multiple projects (entry), to maintain the cache row that are received from system storage, enter to data quick storer 124 to distribute.Operating in below of storer subsystem 112 describes in detail.
Microprocessor 100 also comprises retirement unit (retire unit) 124, its follow procedure order is carried out instruction retired, this procedure order according to instruction the position in the reorder buffer (reorder buffer, ROB) 116 of microprocessor 100 indicated.
Consult now Fig. 2, the operational flowchart of the microprocessor of its presentation graphs 1.Flow process starts in step 202.
In step 202, instruction transfer interpreter 104 runs into a large amount of REP STOS instructions and shifts control to quick REP STOS microcode routine 142.In one embodiment, " in a large number " REP STOS instruction refers to that the value of its ECX is more than or equal to 128.In one embodiment, shift control to microcode routine 142 by forbidden energy microprocessor 100 with avoidance breakout, until microcode routine 142 allows clearly interruption.In one embodiment, microcode routine 142 comprises a specific microoperation, its term of execution of microoperation in activation interrupt calledly, microcode routine 142 also comprises another specific microoperation, its microoperation the term of execution in forbidden energy interrupt avoiding called.Flow process continues to step 204.
In step 204, microcode routine 142 is carried out storage operation in a small amount on the specified initial memory position of REP STOS, until it arrives cache row border (boundary).Regions relevant to these a small amount of storage operations that cause cache row border are referred to here as " header (head) ".These a small amount of storage operations are byte, character, double word unit and the upper storage operation to half cache row size.In one embodiment, a cache is classified 64 bytes as.Flow process continues to step 206.
In step 206, microcode routine 142 was carried out N and is seized line operation before corresponding storage operation, and these corresponding stored operations are cache row that filling is seized.In one embodiment, N is 6, and it is less than the sum of filling queue 122 projects slightly.These are seized line operational order storer subsystem 112 and remove to require Bus Interface Unit 126, go operate the related whole piece cache row of specified storage address and on processor bus 134, carry out one and read invalid unusual fluctuation in order to seize line.Advantageously, processor bus 134 read invalid unusual fluctuation (reading also referred to as homodyne) will be not access system storer and do not transmit data on processor bus 134.This is more efficient on processor bus 134, its efficiency is higher than in miss (miss) data quick storer 124 and cause cache to be listed as a storage operation of filling up, cause for this reason storage operation need to be carried out a unusual fluctuation on processor bus 134, and needs access system storer to read cache row.But this needs microprocessor 100 to go to guarantee that it fills up cache row with valid data (valid data), this is because cannot reading into the microprocessor 100 from the valid data of system storage.Specifically, microprocessor 100 must guarantee exception handler (exception) and the filling of interrupting can not stoping each cache row of seizing.In addition, before being about to fill the storage operation of cache row, each is seized line operation and is successfully carried out, make processor bus 134 required to carry out and the time of seizing the relevant homodyne in bus 134 of line operation and read unusual fluctuation, may partially overlap between seizing line operation other instructions execution between operating with corresponding stored.Be, microcode routine 142 is designed to make to seize line operation sufficiently before its associated storage operation, while waiting until that to set up associative operation is stored device subsystem 112 and carries out, in bus 134, read invalid unusual fluctuation by the height possibility to be performed, this will obtain the related cache row entitlement of associated storage, so that associated storage operation will be hit (hit) data quick storer 124.In one embodiment, Bus Interface Unit 126 is carried out and is read invalid unusual fluctuation immediately, that is, reading invalid unusual fluctuation may be performed before being retired from office by retirement unit 124 seizing line operation.In addition, according to branch instruction, be for example the judgement of misprediction (mispredicted) or the removing that operates microprocessor 100 pipelines that react execution that is written into according to miss data memory cache 124, the line of seizing that can not stop predictability (speculatively) execution operates.Therefore, for fear of not filling the cache row of seizing, microcode routine 142 must design seize line operate will not predictability execution.Flow process continues to step 208.
In step 208, microcode routine 142 is carried out enough a large amount of multiple storage operations (for example four 16 byte store operations) and is listed as to fill a cache, and carries out one and seize line operation." in a large number " storage operation refers to the storage operation of the first specified length of indivedual bytes, character or double word of being longer than REP STOS instruction.In one embodiment, a large amount of storage operation sizes are 16 bytes.Notice, in one of step 208, propose for example in performed these storage operations by not filling by step 208 the identical performed line of seizing of giving an example, operate seized cache and is listed as, but previously given an example and performed seized the cache that line operation seizes and be listed as one of filling step 208.Flow process proceeds in determining step 212.
In determining step 212, microcode routine 142 judges whether that remaining N cache is listed as to store to meet REP STOS instruction, and wherein, N is the quantity that the cache of seizing in advance in step 206 is listed as.If be, flow process continues to step 218, otherwise flow process continues to step 214.
In determining step 214, microcode routine 142 judges whether it needs to allow to interrupt.Structurally, REP STOS must its term of execution allow interrupt.In one embodiment, in order to meet structural requirement, when each 46 data quicks row are written into, microcode routine 142 allows to interrupt.In one embodiment, by step 208/212/214, by being circulated in microcode routine 142 of flowchart text, represent (unrolled), to improve usefulness.If allow in time to interrupt, flow process continues to step 216, otherwise flow process is back to step 208.
In step 216, microcode routine 142 is upgraded the configuration state of microprocessor 100 to reflect the STOS iteration of how many REP STOS of its executed instruction.Especially, microcode routine 142 is upgraded ECX value and EDI value to meet the requirement of said structure.When following exception handler, according to calling of an interruption, configuration state must reflect when an interrupt occurs, and where microprocessor 100 is carried out in REP STOS instruction.Unless interrupted being processed modestly, only on the controlled time, be allowed to, the configuration state that interruption can lead to errors and/or stop, wherein, this stops and can not fill its entitlement and by a homodyne, read the cache row that invalid unusual fluctuation obtains and be associated.Provide an of short duration window (window) to by occur interruption time, interrupt being again disabled and flow process is circulated back to step 206 and seizes in advance operation with what carry out other.
At determining step 218, microcode routine 142 is carried out a large amount of storage operations, to be filled in step 206 and/or 208 by seizing line, to operate last N the cache of being seized and is listed as.Flow process continues to determining step 222.
In determining step 222, microcode routine 142 judges whether that the byte that remains REP STOS instruction is with storage.If be, flow process continues to step 224, otherwise flow process continues to step 226.
In step 224, microcode routine 142 is carried out multiple a small amount of storage operations, to complete the last byte (last bytes) of storage REPSTOS instruction.After the cache row that in the end fill up and regions relevant to these a small amount of storage operations are referred to here as tail end (tail).Flow process continues to step 226.
In step 226, microcode routine 142 is upgraded configuration state (ECX/EDI) to reflect completing of REPSTOS instruction.Flow process finishes in step 226.
Consult now Fig. 3 A-Fig. 3 D, the operational flowchart of microprocessor 100 in its presentation graphs 1.Flow process starts in step 302.
In step 302, one seizes line operation (for example in Fig. 2, step 206 or 208 performed is seized lines operation) arrives storer subsystem 112, and it checks that this seizes line operation.In one embodiment, by operating specified storage address and specified a memory location in step 206 or the 208 performed lines of seizing, its in or approach the end (end) of cache row, to carry out various abnormal conditions for activation storer subsystem 112, check, and these abnormal conditions are checked the end that occurs in cache row or the end that approaches cache row, but can be close to initial (beginning) of cache row, for example segmentation limits conflict (segment limit violations) or the point of interruption.Flow process continues to determining step 304.
In determining step 304, storer subsystem 112 judges whether to exist and seize line and operate relevant abnormal conditions.One abnormal conditions may comprise but not limit following several: operate in specified cache row a segmentation restriction conflict Anywhere seizing line; Page fault (page fault) in the locked memory pages that comprises cache row; The debug point of interruption; The memory characteristics (trait) of cache row is unknown (for example translating the miss situation of lookaside buffer (TLB)); The page can not be stored operation and use (be page table lookup (page table walk) be not yet performed to indicate this page be dirty (dirty)); The memory characteristics of cache row is except write-back (write-back, WB) or writes combination (write-combine, WC) in addition.Because storer subsystem 112 hardware are checked for these situations, if there is no abnormal conditions (and memory characteristics is WB), microcode routine 142 advantageously can at full speed and follow before a large amount of storage operations of correspondence or even the line seized of leap page boundary operates and continues.If abnormal conditions exist, flow process continues to step 328, otherwise flow process continues to determining step 306.
In determining step 306, storer subsystem 112 judges that whether the memory characteristics of cache row is for writing combination.If be, flow process continues to step 308, otherwise flow process continues to step 312.
In step 308, storer subsystem 112 is not carried out (no-op) and is seized line operation.In other words, storer subsystem 112 can not carried out and reads invalid unusual fluctuation (for example, in step 316) and can not be distributed in data quick storer 124 or fill the project (for example, in step 314) in queue 122.In addition, storer subsystem 112 can be not exception handler by seizing line operational label.Therefore, what a large amount of storage operations subsequently of carrying out in step 208 and 218 were published and entered to microprocessor 100 writes combined buffers (not shown), and any operation and non-execute exception handling procedure are not carried out in the line operation of seizing subsequently yet.Therefore, the REP STOS instruction that enters to one of storer WC region (region) may be enjoyed a large amount of storage operation usefulness benefits, even if it does not enjoy the benefit of seizing line.Flow process finishes in step 308.
In determining step 312, storer subsystem 112 judges whether the memory characteristics of cache row is write-back.If be, flow process continues to step 314, otherwise flow process continues to step 328.
In step 314, storer subsystem 112 is distributed in a project of data quick row 124 and a project of filling queue 122 is listed as to the cache of being seized.Flow process continues to step 316.
In step 316, Bus Interface Unit 126 is carried out homodyne and is read invalid unusual fluctuation in bus 134, with special (exclusive) entitlement in order to obtain cache row, does not need to read any data from system storage.Flow process continues to step 318.
In step 318, in step 208 or 218 storage operations of carrying out of Fig. 2, arrive storer subsystem 112, and storer subsystem 112 writes to storage data the cache row that distribute in data quick storer 124.Flow process continues to step 322.
In step 322, when the storage data of step 318 are just writing to data quick storer 124, fill the project of queue 122 and be maintained the byte shade (byte mask) of the effective byte that storage data have been written into.Byte shade identify (identify) for hit cache row to be written into subsequently which byte for operation be effective.Flow process continues to step 324.
In step 324, if one surveys (snoop) operation hits and seizes line while operating the project of relevant filling queue 122, this exploration operation of Bus Interface Unit 126 retries (retry) is until cache row are fully stored (fully polulate) valid data from storage operation.Flow process continues to step 326.
In step 326, once all bytes of cache row, by valid data, to be filled up, the project oneself of filling queue 122 deallocates (deallocate itself), and it causes cache row finally to be retired from office to data quick storer 124.Flow process finishes in step 326.
In step 328, storer subsystem 112 is not carried out (no-op) and is seized line operation.In other words, storer subsystem 112 can not carried out and reads invalid unusual fluctuation (for example, in step 316) and can not be distributed in data quick storer 124 or fill the project (for example, in step 314) in queue 122.But storer subsystem 112 marks (mark) are seized line and are operating as exception handler.Therefore, in procedure order, newly in multiple instructions of extremely seizing line, be eliminated, new a large amount of storage operations of for example carrying out in step 208 and 218 and the line of newly seizing of carrying out in step 206 and 208 operate.Especially, storer subsystem 112 marks are extremely seized line and are operating as inner (internally) exception handler, rather than at structural exception handler, describe refer step 332 to 348 in detail.Flow process continues to step 332.
In step 332, seize line operation and be ready to resignation, and retirement unit 114 detects that seizing line operation is marked as exception handler, therefore retirement unit 114 will newly be removed in all instructions of seizing line operation that are expressed as exception handler, and control will be transferred to an exception handler processing routine (handler) of quick REP STOS microcode routine 142.In one embodiment, storer subsystem 112 is set one and is indicated (indicate) to cause that one of internal abnormality handling procedure seizes line operation, and microcode routine 142 reads this position to detect this situation.Flow process continues to step 334.
In step 334, microcode exception handler is processed routine and is detected that exception handler seizes line by one and operate cause and fill with a small amount of storage operation (if yes) any remaining part of " header ".In one embodiment, seizing of carrying out in the step 206 of Fig. 2 is before line operates in and be in fact in a small amount of storage operation of carrying out in step 204 in procedure order.Therefore,, before header is fully filled, seize in line operation one of step 206 may be exception handler.Step continues to step 336.
In step 336, microcode exception handler is processed routine judgement to be had how many caches to get the last place to have prepared to be seized line operation before the operation of carrying out and seizes, and has the caches row that how much are not filled.Microcode exception handler is processed routine and then by carrying out a large amount of storage operations, is filled the cache row that are not filled, similar a large amount of storage operations in step 208.As discussed above, storer subsystem 112 guarantees not have textural anomaly handling procedure with microcode routine 142 always or interrupts (being that control can not be transferred to operating system) occurs, until configuration state is for correct and seize line by one and operate each seized cache and be listed as and be filled.Flow process continues to step 338.
In step 338, microcode exception handler is processed routine and is triggered (tickle) as the cache row of target of seizing line operation, and this seizes line operation and has caused internal abnormality handling procedure.That is, microcode exception handler is processed routine and is carried out an instruction, and its instruction memory subsystem 112 goes to carry out in fact the function relevant to a storage operation, and does not need in fact storage data to be write to storer.Especially, storer subsystem 112 must be checked all exception handlers of carrying out in a storage operation.For instance, exception handler is checked and may be included, but are not limited to wherein: a segmentation restriction conflict, page fault, debug (debug) point of interruption, cache are classified non-writeable (non-writeable) as.If necessary, obtain and be listed as relevant page table information (comprising memory characteristics information) to cache, this triggering command will be carried out a page table lookup.Flow process continues to determining step 342.
In determining step 342, storer subsystem 112 judges whether the triggering of carrying out in step 338 causes textural anomaly handling procedure situation.If be, flow process continues to step 354, otherwise continues to step 344.
In step 344, microcode exception handler is processed routine and is written into the memory characteristics that cache is listed as.Flow process continues to determining step 346.
In determining step 346, microcode exception handler is processed the memory characteristics that routine inspection is written into.If this memory characteristics is write-back (WB) or writes combination (WC), flow process continues to step 352, otherwise flow process continues to step 348.
In step 348, microcode exception handler process routine upgrade configuration state (for example ECX/EDI) with reflect last resignation storage operation, remove exception handler, and reply (revert) to slow data string memory module (slow string store mode).Be, microcode exception handler is processed routine and control is transferred to the microcode routine (not shown) in the microcode unit 118 of general execution REP STOS instruction, in the STOS operation cycle with the specified size of REP STOS instruction (with respect to large-sized byte, character or double word unit) and do not need to use and seize line operation, and allow the interruption after each STOS operation.These in determining step 304 and 346 are checked and are allowed valuably microprocessor 100 devices to carry out rapid data string storage operation, to carry out and to there is writing of the size specified by original REP STOS instruction in bus 134, and do not need to carry out fast extract operation to thering is non-memory area that can cache characteristic.For instance, memory mapped formula import and export (memory-mapped I/O) device may be mapped to one has a non-position in can the memory area of cache characteristic, and this is in fact to want this I/O device store to be written into (for example the byte size of the control register to byte size on I/O device writes) rather than a relatively large storage operation with the specified size of program because (1) program developer (programmer) in fact will be stored to walk out to bus rather than to memory cache and (2) program developer.Therefore, if it is one non-can cache region time (in not wherein being allowed to be engaged in a large amount of storage operations) that REPSTOS can cache region (in wherein allowing to be engaged in a large amount of storage operations) be converted to by one, at the microprocessor 100 of this expectation, stop being valuably engaged in cache/write in a large number, and be engaged in non-can cache/a small amount of writing.Flow process finishes in step 348.
In step 352, microcode exception handler is processed routine and is continued in rapid data string memory module (fast string store mode).That is, flow process is back to the step 206 of Fig. 2.Flow process finishes in step 352.
In step 354, microcode exception handler is processed routine textural anomaly handling procedure and is caused flow process to be transferred to another exception handler processing routine in microcode routine 142, it upgrades configuration state, remove exception handler situation and carry out with in step 348 by the similar multiple a small amount of storage operations of the performed storage of slow data string memory encoding, until the textural anomaly handling procedure that again cause above-mentioned triggering cause of these a small amount of in storage operations.Especially, in order to guarantee to allow data to be stored to the memory area between the position in the starting point and the actual cache row that cause exception handler that cause the cache row that exception handler triggers, this exception handler is processed routine and can not allowed to interrupt.When textural anomaly routine processes routine is while causing (invoke) according to a small amount of storage operation, it will normally operate (handle) textural anomaly handling procedure, and this is can be received, be that routine had previously been upgraded configuration state and any important cache row of being seized are filled because microcode exception handler is processed.Flow process finishes in step 354.
By Fig. 3, can be learnt and as discussed above, in order to receive the usefulness benefit from Fig. 3, suitably storage operation be structural before (suitably before the storage operation of reality resignation), seize line and operate in bus 134 and carry out.For example, if storer subsystem 112 detects that one seizes line and can cause abnormal conditions (abnormal condition) (exception handler or other above-mentioned situations about indicating), storer subsystem 112 makes this seize line and produces an internal abnormality handling procedure, its activation microcode exception handler is processed routine and is gone to judge that it is the line of seizing that causes exception handler, makes exception handler process routine and can carry out the special line processing coding of seizing.The seize line important for activation is stored filling (for example, in step 336), and storer subsystem 112 produces this internal abnormality handling procedure, rather than a textural anomaly handling procedure.Otherwise, may there is machine stop (machinehang).Therefore, if overall REP STOS instruction enters to WB memory area and does not have abnormal conditions, the embodiment of microprocessor 100 herein may store the specified overall data string of REP STOS instruction (except header and end section) valuably with maximum rate effectively, and under this this maximum rate, processor bus 134 can be seized line by utilization with storer subsystem and be in harmonious proportion (accommodate) with a large amount of storage operations, and in overall data string length, does not need slow-down.
Although narrated the embodiment that seizes line operation and be used for carrying out rapid data string storage (REP STOS), but can consider with the embodiment that seizes line and operate in to carry out rapid data and move (REP MOVS), before operating relevant storage to MOVS, carry out fully and seize line and operate to accelerate the usefulness of REP MOVS, to cause hitting memory cache when seizing line and operate to be performed.
Different embodiments of the invention are in narration herein, but those skilled in the art should be able to understand these embodiment only as example, but not are defined in this.Those skilled in the art can be in the situation that not departing from spirit of the present invention, to form from details, do different variations.For example, function, establishment (fabrication), modeling (modeling), emulation, description (description) and/or the test of the apparatus and method of software described in can the activation embodiment of the present invention.Can pass through general procedure language (C, C++), hardware description language (Hardware Description Languages, HDL) (comprising Verilog HDL, VHDL etc.) or other available program languages completes.This software is configurable can working medium at any known computing machine, for example semiconductor, disk or CD (for example CD-ROM, DVD-ROM etc.).Apparatus and method embodiment of the present invention can be included in semiconductor intellecture property kernel, for example micro-processor kernel (realizing with HDL), and convert the hardware of integrated circuit (IC) products to.In addition, apparatus and method of the present invention can be embodied as the combination of hardware and software.Therefore, the present invention should not be limited to the disclosed embodiments, but comply with accompanying claim and be equal to enforcement institute, defines.Particularly, present invention can be implemented in the micro processor, apparatus being used in general service computing machine.Finally; though the present invention with preferred embodiment openly as above; so it is not in order to limit scope of the present invention; those skilled in the art; without departing from the spirit and scope of the present invention; when doing a little change and retouching, therefore protection scope of the present invention is when being as the criterion depending on the appended claims person of defining.

Claims (29)

1. a microprocessor, couples a storer by a bus, and this microprocessor comprises:
One memory cache; And
One seizes line, and in order to specify a storage address of the cache row that relate to this storer, wherein, this is seized line and indicates this microprocessor in this bus, to start to carry out a homodyne to read invalid unusual fluctuation to obtain the entitlement of these cache row;
Wherein, if this microprocessor judges can cause exception handler to a storage operation of this cache row, when this microprocessor is carried out this and seized line, this homodyne that this microprocessor is abandoned starting to carry out in this bus reads invalid unusual fluctuation.
2. microprocessor as claimed in claim 1, also comprises:
One microcode unit, comprises an organization instruction, and this organization instruction indicates this microprocessor repeatedly a serial data to be stored to the multiple adjacent positions in specified this storer of this organization instruction;
Wherein, these adjacent positions in this storer comprise multiple cache row jointly, and this microcode unit comprises the multiple lines of seizing for specifying multiple storage addresss, and these storage addresss and these caches are shown pass; And
Wherein, this microcode unit also comprises multiple storage instructions, and these storage instructions are filled these cache row with this serial data.
3. microprocessor as claimed in claim 2, wherein, this microcode unit is more used for detecting the situation that this microprocessor is removed before resignation one or more these storage instruction, one or more these caches that make its entitlement read invalid unusual fluctuation by this homodyne and to be obtained by this micro-processing are listed as, and according to these storage instructions that are eliminated, with this serial data, do not fill.
4. microprocessor as claimed in claim 3, wherein, this microcode unit is more used according to the detection of this situation and is filled with this serial data, one or more these cache row of not filling with this serial data according to these storage instructions that are eliminated.
5. microprocessor as claimed in claim 4, wherein, this microcode unit is also stored to this serial data according to the detection of this situation one header part of these adjacent positions in this specified storer of this organization instruction, and this header part be included in this storer by the specified primary importance of this organization instruction start upper to but do not comprise multiple positions of the first cache row in these caches row.
6. microprocessor as claimed in claim 3,
Wherein, this microprocessor can cause that according to seize a storage that is used to refer to the cache row to these cache row in line at these one of exception handler seizes line, removes this one or more these storage instruction; And
Wherein, these storage instructions that are eliminated are newly used to refer to this that can cause exception handler to a storage of the cache row in these cache row and seize line in seizing at these in procedure order in line.
7. microprocessor as claimed in claim 6, wherein, if this storage of these cache row to these cache row will cause a textural anomaly handling procedure, this microcode unit is stored to this serial data multiple adjacent positions of these cache row of these cache row, until this textural anomaly handling procedure produces.
8. microprocessor as claimed in claim 7, wherein, this organization instruction is specified the serial data size of repeated storage to this serial data of these adjacent positions in this storer, and this microcode unit is stored to this serial data by multiple serial data size storage instructions multiple adjacent positions of these cache row of these cache row, until this textural anomaly handling procedure produces.
9. microprocessor as claimed in claim 2, wherein, this microcode unit configures to make these to seize line momently before these storage instructions, attempts separately to fill this microprocessor before this cache row and obtained with this serial data the height possibility of the right of priority of these cache row to be based upon these storage instructions.
10. microprocessor as claimed in claim 2, wherein, this organization instruction is specified the size of repeated storage to this serial data of these adjacent positions in this storer, and each this storage instruction writes multiple bytes of this size of this serial data specified more than this organization instruction.
11. microprocessors as claimed in claim 2, wherein, this microprocessor also guarantee this organization instruction the term of execution in, control and can not be transferred to system software, each cache of these cache row that obtained by this microprocessor until its entitlement reads invalid unusual fluctuation by this homodyne is listed as with this serial data fills.
12. microprocessors as claimed in claim 1, wherein, if this cache of this microprocessor judges row have one and write compound storage characteristic, when this microprocessor is carried out this and seized line, this homodyne that this microprocessor is more abandoned starting to carry out in this bus reads invalid unusual fluctuation.
13. microprocessors as claimed in claim 1, wherein, if the memory characteristics of this cache row be not write combination neither write-back, when this microprocessor is carried out this and seized line, this homodyne that this microprocessor is more abandoned starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
14. microprocessors as claimed in claim 1, wherein, if this is seized, this storage address of line is miss translates lookaside buffer at one of this microprocessor, when this microprocessor is carried out this and seized line, this homodyne that this microprocessor is more abandoned starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
15. microprocessors as claimed in claim 1, wherein, if a page table lookup is not yet performed to a locked memory pages that comprises these related cache row of this this storage address of seizing line, when this microprocessor is carried out this and seized line, this homodyne that this microprocessor is more abandoned starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
16. microprocessors as claimed in claim 1, also comprise:
One microcode unit, is used for implementing an organization instruction, and this organization instruction indicates this microprocessor one serial data to be moved to a second area of this storer by a first area of this storer, and this second area of this storer jointly comprises multiple cache row;
Wherein, this microcode unit comprises the multiple lines of seizing for specifying multiple storage addresss, and these storage addresss and these caches are shown pass; And
Wherein, this microcode unit also comprises multiple storage instructions, and these storage instructions are filled these cache row with this serial data of a part.
17. 1 kinds of manners of execution, are carried out by a microprocessor, and this microprocessor couples a storer by a bus, and manner of execution comprises:
Receive one and seize line to carry out, this seizes the storage address that line specifies a cache that relates to this storer to be listed as;
According to this reception of seizing line, judge and whether can cause exception handler to a storage operation of these cache row;
If the access to these cache row can not cause exception handler, a homodyne that starts to carry out in this bus reads invalid unusual fluctuation to obtain the entitlement of these cache row; And
If the access meeting to these cache row causes exception handler, this homodyne of abandoning starting to carry out in this bus reads invalid unusual fluctuation.
18. manners of execution as claimed in claim 17, also comprise:
One organization instruction is decoded, and this organization instruction indicates this microprocessor repeatedly a serial data to be stored to the multiple adjacent positions in specified this storer of this organization instruction; And
According to the decoding of this organization instruction, carry out a microcode unit of this microprocessor, wherein, this microcode unit comprises the multiple lines of seizing for specifying multiple storage addresss, and these storage addresss and these caches are shown pass;
Wherein, this microcode unit also comprises multiple storage instructions, and these storage instructions are filled these cache row with this serial data.
19. manners of execution as claimed in claim 18, also comprise:
The detection situation that this microprocessor is removed before resignation one or more these storage instruction, one or more these caches that make its entitlement read invalid unusual fluctuation by this homodyne and to be obtained by this micro-processing are listed as, and according to these storage instructions that are eliminated, with this serial data, do not fill.
20. manners of execution as claimed in claim 19, also comprise
According to the detection of this situation, and with this serial data, fill one or more these cache row of not filling with this serial data according to these storage instructions that are eliminated.
21. manners of execution as claimed in claim 19, wherein, this microprocessor can cause that according to seize a storage that is used to refer to the cache row to these cache row in line at these one of exception handler seizes line, removes this one or more these storage instruction; And
Wherein, these storage instructions that are eliminated are newly seized line in this that seize at these that storage that is used to refer to the cache row to these cache row in line can cause exception handler in procedure order.
22. manners of execution as claimed in claim 21, also comprise:
This storage of this cache row of judgement to these cache row will cause a textural anomaly handling procedure; And
According to this storage that judges these cache row to these cache row, will cause the step of this textural anomaly handling procedure, this serial data is stored to multiple adjacent positions of these cache row of these cache row, until this textural anomaly handling procedure produces.
23. manners of execution as claimed in claim 18, wherein, this microcode unit configures to make these to seize line momently before these storage instructions, attempts separately to fill this microprocessor before this cache row and obtained with this serial data the height possibility of the right of priority of these cache row to be based upon these storage instructions.
24. manners of execution as claimed in claim 18, also comprise:
Guarantee this organization instruction the term of execution in, control and can not be transferred to system software, each cache of these caches row that obtained by this micro-processing until its entitlement reads invalid unusual fluctuation by this homodyne is listed as with this serial data fills.
25. manners of execution as claimed in claim 17, also comprise:
False these cache row have one and write compound storage characteristic, and when when execution, this seizes line, this homodyne of abandoning starting to carry out in this bus reads invalid unusual fluctuation.
26. manners of execution as claimed in claim 17, also comprise:
If the memory characteristics of this cache row be not write combination neither write-back, when carrying out this and seize line, this homodyne of abandoning starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
27. manners of execution as claimed in claim 17, also comprise:
If this is seized, this storage address of line is miss translates lookaside buffer at one of this microprocessor, and when carrying out this and seize line, this homodyne of abandoning starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
28. manners of execution as claimed in claim 17, also comprise:
If a page table lookup is not yet performed to a locked memory pages that comprises these related cache row of this this storage address of seizing line, when when execution, this seizes line, this homodyne of abandoning starting to carry out in this bus reads invalid unusual fluctuation and produces a non-textural anomaly handling procedure.
29. manners of execution as claimed in claim 17, also comprise:
One organization instruction is decoded, and this organization instruction indicates this microprocessor one serial data to be moved to a second area of this storer by a first area of this storer, and this second area of this storer jointly comprises multiple caches row; And
According to the step that this organization instruction is decoded, carry out a microcode of this microprocessor, wherein, this microcode comprises the multiple lines of seizing for specifying multiple storage addresss, and these storage addresss and these caches are shown pass;
Wherein, this microcode also comprises multiple storage instructions, and these storage instructions are filled these cache row with this serial data of a part.
CN201010260344.8A 2009-08-28 2010-08-20 Microprocessor and execution method thereof Active CN101916181B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US23791709P 2009-08-28 2009-08-28
US61/237,917 2009-08-28
US12/781,210 US8392693B2 (en) 2009-08-28 2010-05-17 Fast REP STOS using grabline operations
US12/781,210 2010-05-17

Publications (2)

Publication Number Publication Date
CN101916181A CN101916181A (en) 2010-12-15
CN101916181B true CN101916181B (en) 2014-04-23

Family

ID=43323700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010260344.8A Active CN101916181B (en) 2009-08-28 2010-08-20 Microprocessor and execution method thereof

Country Status (1)

Country Link
CN (1) CN101916181B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9767036B2 (en) 2013-03-14 2017-09-19 Nvidia Corporation Page state directory for managing unified virtual memory
DE102013022166B4 (en) * 2013-03-14 2024-04-25 Nvidia Corporation PAGE STATE DIRECTORY FOR MANAGING A UNIFIED VIRTUAL STORAGE
DE102013022169A1 (en) 2013-03-14 2014-09-18 Nvidia Corporation ERROR BUFFER TO TRACK SIDE ERRORS IN A UNIFORM VIRTUAL STORAGE SYSTEM
DE102013022168B4 (en) * 2013-03-14 2023-03-09 Nvidia Corporation Unified Virtual Storage System Migration Scheme

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5404473A (en) * 1994-03-01 1995-04-04 Intel Corporation Apparatus and method for handling string operations in a pipelined processor
US6212629B1 (en) * 1989-02-24 2001-04-03 Advanced Micro Devices, Inc. Method and apparatus for executing string instructions
US6212601B1 (en) * 1996-08-30 2001-04-03 Texas Instruments Incorporated Microprocessor system with block move circuit disposed between cache circuits
CN1225690C (en) * 2003-02-11 2005-11-02 智慧第一公司 Fast fetch allocation and initial apparatus and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6212629B1 (en) * 1989-02-24 2001-04-03 Advanced Micro Devices, Inc. Method and apparatus for executing string instructions
US5404473A (en) * 1994-03-01 1995-04-04 Intel Corporation Apparatus and method for handling string operations in a pipelined processor
US6212601B1 (en) * 1996-08-30 2001-04-03 Texas Instruments Incorporated Microprocessor system with block move circuit disposed between cache circuits
CN1225690C (en) * 2003-02-11 2005-11-02 智慧第一公司 Fast fetch allocation and initial apparatus and method

Also Published As

Publication number Publication date
CN101916181A (en) 2010-12-15

Similar Documents

Publication Publication Date Title
US8392693B2 (en) Fast REP STOS using grabline operations
JP3753743B2 (en) Method and apparatus for memory data aliasing in advanced processors
US5465336A (en) Fetch and store buffer that enables out-of-order execution of memory instructions in a data processing system
US11379234B2 (en) Store-to-load forwarding
JP3615770B2 (en) Memory controller that detects failure to think of addressed components
EP2336878B1 (en) Method and medium storing instructions for efficient load processing using buffers
US20120117335A1 (en) Load ordering queue
US20140375658A1 (en) Processor Core to Graphics Processor Task Scheduling and Execution
JP2001507151A (en) Gate storage buffers for advanced microprocessors.
KR20020022068A (en) Method and apparatus for enhancing scheduling in an advanced microprocessor
US20150121010A1 (en) Unified store queue
US8645588B2 (en) Pipelined serial ring bus
KR20010014094A (en) Improved microprocessor
CN101916181B (en) Microprocessor and execution method thereof
US6449713B1 (en) Implementation of a conditional move instruction in an out-of-order processor
US9400655B2 (en) Technique for freeing renamed registers
JP3621116B2 (en) Conversion memory protector for advanced processors
US10114794B2 (en) Programmable load replay precluding mechanism
US6711670B1 (en) System and method for detecting data hazards within an instruction group of a compiled computer program
US10108428B2 (en) Mechanism to preclude load replays dependent on long load cycles in an out-of-order processor
JP3654913B2 (en) Host microprocessor with a device that temporarily holds the state of the target processor
US6651164B1 (en) System and method for detecting an erroneous data hazard between instructions of an instruction group and resulting from a compiler grouping error
US10108429B2 (en) Mechanism to preclude shared RAM-dependent load replays in an out-of-order processor
US10108420B2 (en) Mechanism to preclude load replays dependent on long load cycles in an out-of-order processor
US9645827B2 (en) Mechanism to preclude load replays dependent on page walks in an out-of-order processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant