US20050278505A1 - Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory - Google Patents
Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory Download PDFInfo
- Publication number
- US20050278505A1 US20050278505A1 US11/132,447 US13244705A US2005278505A1 US 20050278505 A1 US20050278505 A1 US 20050278505A1 US 13244705 A US13244705 A US 13244705A US 2005278505 A1 US2005278505 A1 US 2005278505A1
- Authority
- US
- United States
- Prior art keywords
- fetch
- pipeline
- instruction
- memory
- microprocessor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 title claims abstract description 126
- 230000007246 mechanism Effects 0.000 title abstract description 3
- 238000000034 method Methods 0.000 claims abstract description 37
- 239000000872 buffer Substances 0.000 claims description 28
- 238000002955 isolation Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 abstract description 12
- 238000012545 processing Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 10
- 102100040577 Dermatan-sulfate epimerase-like protein Human genes 0.000 description 8
- 101000816741 Homo sapiens Dermatan-sulfate epimerase-like protein Proteins 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/01—Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3648—Software debugging using additional hardware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30149—Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3846—Speculative instruction execution using static prediction, e.g. branch taken strategy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This invention relates generally to microprocessor architecture and more specifically to systems and methods for achieving improved performance through a predictive data pre-fetch mechanism for a pipeline data memory, including specifically XY-type data memory.
- Multistage pipeline microprocessor architecture is known in the art.
- a typical microprocessor pipeline consists of several stages of instruction handling hardware, wherein each rising pulse of a clock signal propagates instructions one stage further in the pipeline.
- the clock speed dictates the number of pipeline propagations per second, the effective operational speed of the processor is dependent partially upon the rate that instructions and operands are transferred between memory and the processor.
- processors typically employ one or more relatively small cache memories built directly into the processor.
- Cache memory typically is an on-chip random access memory (RAM) used to store a copy of memory data in anticipation of future use by the processor.
- the cache is positioned between the processor and the main memory to intercept calls from the processor to the main memory. Access to cache memory is generally much faster than off-chip RAM. When data is needed that has previously been accessed, it can be retrieved directly from the cache rather than from the relatively slower off-chip RAM.
- microprocessor pipeline advances instructions on each clock signal pulse to subsequent pipeline stages.
- effective pipeline performance can be slower than that implied by the processor speed. Therefore, simply increasing microprocessor clock speed does not usually provide a corresponding increase in system performance. Accordingly, there is a need for a microprocessor architecture that enhances effective system performance through methods in addition to increased clock speed.
- X and Y memory structures in parallel to the microprocessor pipeline.
- the ARCtangent-A4TM and ARCtangent-A5TM line of embedded microprocessors designed and licensed by ARC International, Inc. of Hertfordshire, UK, (ARC) employ such an XY memory structure.
- XY memory was designed to facilitate executing compound instructions on a RISC architecture processor without interrupting the pipeline.
- XY memory is typically located in parallel to the main processor pipeline, after the instruction decode stage, but prior to the execute stage. After decoding an instruction, source data is fetched from XY memory using address pointers. This source data is then fed to the execution stage.
- the two X and Y memory structures source two operands and receive results in the same cycle.
- Data in the XY memory is indexed via pointers from address generators and supplied to the ARC CPU pipeline for processing by any ARC instruction.
- the memories are software-programmable to provide 32-bit, 16-bit, or dual 16-bit data to the pipeline.
- Various embodiments of the invention may ameliorate or overcome one or more of the shortcomings of conventional microprocessor architecture through a predictively fetched XY memory scheme.
- an XY memory structure is located in parallel to the instruction pipeline.
- a speculative pre-fetching scheme is spread over several sections of the pipeline in order to maintain high processor clock speed.
- operands are speculatively pre-fetched from X and Y memory before the current instruction has even been decoded.
- the speculative pre-fetching occurs in an alignment stage of the instruction pipeline.
- speculative address calculation of operands also occurs in the alignment stage of the instruction pipeline.
- the XY memory is accessed in the instruction decode stage based on the speculative address calculation of the pipeline, and the resolution of the predictive pre-fetching occurs in the register file stage of the pipeline. Because the actual decoded instruction is not available in the pipeline until after the decode stage, all pre-fetching is done without explicit knowledge of what the current instruction is while this instruction is being pushed out of the decode stage into the register file stage. Thus, in various embodiments, a comparison is made in the register file stage between the operands specified by the actual instruction and those predictively pre-fetched. The pre-fetched values that match are selected to be passed to the execute stage of the instruction pipeline. Therefore, in a microprocessor architecture employing such a scheme, data memory fetches, arithmetic operation and result write back can be performed using a single instruction without slowing down the instruction pipeline clock speed or stalling the pipeline, even at high processor clock frequencies.
- At least one exemplary embodiment of the invention may provide a predictive pre-fetch XY memory pipeline for a microprocessor pipeline.
- the predictive pre-fetch XY memory pipeline may comprise a first pre-fetch stage comprising a pre-fetch pointer address register file and X and Y address generators, a second pre-fetch stage comprising X and Y memory structures accessed using address pointers generated in the first pre-fetch stage, and third data select stage comprising at least one pre-fetch buffer in which speculative operand data and address information are stored.
- At least one additional exemplary embodiment may provide a method of predictively pre-fetching operand address and data information for a instruction pipeline of a microprocessor.
- the method of predictively pre-fetching operand address and data information for a instruction pipeline of a microprocessor according to this embodiment may comprise, prior to decoding a current instruction in the pipeline, accessing a set of registers containing pointers to specific locations in pre-fetch memory structures, fetching operand data information from the specific locations in the pre-fetch memory structures, and storing the pointer and operand data information in at least one pre-fetch buffer.
- the microprocessor architecture may comprise a multi-stage microprocessor pipeline, and a multi-stage pre-fetch memory pipeline in parallel to at least a portion of the instruction pipeline, wherein the pre-fetch pipeline comprises a first stage having a set of registers serving as pointers to specific pre-fetch memory locations, a second stage, having pre-fetch memory structures for storing predicted operand address information corresponding to operands in an un-decoded instruction in the microprocessor pipeline, and a third stage comprising at least one pre-fetch buffers, wherein said first, second and third stage respectively are parallel to, simultaneous to and in isolation of corresponding stages of the microprocessor pipeline.
- FIG. 1 is a block diagram illustrating a processor core in accordance with at least one exemplary embodiment of this invention
- FIG. 2 is a block diagram illustration a portion of an instruction pipeline of a microprocessor core architecture employing an XY memory structure and a typical multi-operand instruction processed by such an instruction pipeline in accordance with a conventional non-speculative XY memory;
- FIG. 3 is an exemplary instruction format for performing a multiply instruction on 2 operands and a memory write back with a single instruction in accordance with at least one embodiment of this invention
- FIG. 4 is a block diagram illustrating a microprocessor instruction pipeline architecture including a parallel predictive pre-fetch XY memory pipeline in accordance with at least one embodiment of this invention
- FIG. 5 is a block diagram, illustrating in greater detail the structure and operation of a predictively pre-fetching XY memory pipeline in accordance with at least one embodiment of this invention
- FIG. 6 is a block diagram illustrating the specific pre-fetch operations in an XY memory structure in accordance with at least one embodiment of this invention.
- FIG. 7 is a flow chart detailing the steps of a method for predictively pre-fetching instruction operand addresses in accordance with at least one embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 illustrates in block diagram form, an architecture for
- the align stage 120 formats the words coming from the fetch stage 110 into the appropriate instructions.
- instructions are fetched from memory in 32-bit words.
- the entry at that fetch address may contain an aligned 16-bit or 32-bit instruction, an unaligned 16 bit instruction preceded by a portion of a previous instruction, or an unaligned portion of a larger instruction preceded by a portion of a previous instruction based on the actual instruction address.
- a fetched word may have an instruction fetch address of Ox 4 , but an actual instruction address of Ox 6 .
- the 32-bit word fetched from memory is passed to the align stage 120 where it is aligned into an complete instruction.
- this alignment may include discarding superfluous 16-bit instructions or assembling unaligned 32-bit or larger instructions into a single instructions. After completely assembling the instruction, the N-bit instruction is forwarded to the decoder 130 .
- an instruction extension interface 180 is also shown which permits interface of customized processor instructions that are used to complement the standard instruction set architecture of the microprocessor. Interfacing of these customized instructions occurs through a timing registered interface to the various stages of the microprocessor pipeline 100 in order to minimize the effect of critical path loading when attaching customized logic to a pre-existing processor pipeline.
- a custom opcode slot is defined in the extensions instruction interface for the specific custom instruction in order for the microprocessor to correctly acknowledge the presence of a custom instruction 182 as well as the extraction of the source operand addresses that are used to index the register file 142 .
- the custom instruction flag interface 184 is used to allow the addition of custom instruction flags that are used by the microprocessor for conditional evaluation using either the standard condition code evaluators or custom extension condition code evaluators 184 in order to determine whether the instruction is executed or not based upon the condition evaluation result of the execute stage (EXEC) 150 .
- a custom ALU interface 186 permits user defined arithmetic and logical extension instructions the result of which are selected in the result select stage (SEL) 160 .
- FIG. 2 a block diagram illustrating a portion of an instruction pipeline of a microprocessor core architecture employing an XY memory structure and a typical multi-operand instruction processed by such an instruction pipeline in accordance with a conventional non-speculative XY memory is illustrated.
- XY-type data memory is known in the art.
- a RISC processor typically only one memory load or store can be effected per pipelined instruction.
- DSP Digital Signal Processor
- FIG. 2 illustrates such an XY memory implementation.
- an instruction is fetched from memory in the fetch stage 210 and, in the next clock cycle is passed to the align stage 220 .
- the instruction is formatted into proper form. For example, if in the fetch stage 210 a 32-bit word is fetched from memory with the fetch address 0 x 4 , but the actual instruction address is for the 16-bit word having instruction address 0 x 6 , the first 16 bits of 32-bit word are discarded.
- This properly formatted instruction is then passed to the decode stage 230 , where it is decoded into an actual instruction, for example, the decoded instruction 241 shown in FIG. 2 .
- This decoded instruction is then passed to the register file stage 240 .
- FIG. 2 illustrates the format of such a decoded instruction 241 .
- the instruction is comprised of a name (any arbitrary name used to reference the instruction), the destination address pointer and update mode, the first source address pointer and update mode, and the second source address pointer and update mode.
- the register file stage 240 from the decoded instruction 241 , the address of the source and destination operands are selected using the register numbers (windowing registers) as pointers to a set of address registers 242 .
- the source addresses are then used to access X memory 243 and Y memory 244 .
- the address to use for access needs to be selected, the memory access performed, and the data selected fed to the execution stage 250 .
- An alternative approach is to move the XY memory to an earlier stage of the instruction pipeline, ahead of the register file stage, to allow for more cycle time for the data selection. However, doing so may result in the complication that, when XY memory is moved into the decode stage, the windowing register number is not yet decoded before accessing memory.
- the source data is predictively pre-fetched and stored for use in data buffers.
- a comparison may be made to check if the desired data was already pre-fetched, and if so, the data is simply taken from the pre-fetched data buffer and used. If it has not been pre-fetched, then the instruction is stalled and the required data is fetched. In order to reduce the number of instructions that are stalled, it is essential to ensure that data is pre-fetched correctly most of the time. Two schemes may be used to assist in this function. Firstly, a predictable way of using windowing registers may be employed.
- FIG. 3 illustrates the format of a compound instruction, such as an instruction that might be used in a DSP application that would require extendible processing functions including XY memory in accordance with various embodiments of this invention.
- the compound instruction 300 consists of four sub-components, the name of the instruction 301 , the destination pointer 302 , the first operand pointer 303 and the second operand pointer 304 .
- the instruction, Muldw is a dual 16-bit multiply instruction.
- the destination pointer 302 specifies that the result of the calculation instruction is to be written to X memory using the pointer address AX 1 .
- the label u 0 specifies the update mode.
- the source operand pointers 303 and 304 specify that the first operand is to be read from X memory using the pointer address AX 0 and updated using update mode u 1 and the second operand is to be read from Y memory using the pointer address AY 0 and the update mode u 0 .
- FIG. 4 is a block diagram illustrating a microprocessor instruction pipeline architecture including a parallel predictive pre-fetch XY memory pipeline in accordance with at least one embodiment of this invention.
- the instruction pipeline is comprised of seven stages, FCH 401 , ALN 402 , DEC 403 , RF 04 , EX 405 , SEL 406 and WB 407 .
- each rising pulse of the clock cycle propagates an instruction to the next stage of the instruction pipeline.
- the predictive pre-fetch XY memory pipeline comprised of 6 stages including PF 1 412 , PF 2 413 , DSEL 414 , P 0 415 , P 1 416 and C 417 .
- speculative pre-fetching may begin in stage PF 1 412 .
- pre-fetching does not have to begin at the same time as the fetch instruction 401 .
- Pre-fetching can happen much earlier, for example, when a pointer is first set-up, or was already fetched because it was recently used. Pre-fetching can also happen later if the pre-fetched instruction was predicted incorrectly.
- the two previous stages PF 1 412 and PF 2 413 prior to the register file stage 404 , allow sufficient time for the access address to be selected, the memory access performed, and the data selected to be fed to the execution stage 405 .
- FIG. 5 is a block diagram, illustrating in greater detail the structure and operation of a predictively pre-fetching XY memory pipeline in accordance with at least one embodiment of this invention.
- 6 pipeline stages of the predictive pre-fetch XY memory pipeline are illustrated.
- these stages may include the PF 1 500 , PF 2 510 , DSEL (data select) 520 , P 0 530 , P 1 540 and C 550 .
- Stage PF 1 500 which occurs simultaneous to the align stage of the instruction pipeline, includes the pre-fetch shadow pointer address register file 502 and the X and Y address generators (used to update the pointer address) 504 and 506 .
- stage PF 2 includes access to X memory unit 512 and Y memory unit 514 , using the pointers 504 and 506 in stage PF 1 500 .
- stage DSEL 520 the data accessed from X memory 512 and Y memory 514 in stage PF 2 510 are written to one of multiple pre-fetch buffers 522 .
- four pre-fetch buffers 522 are illustrated in FIG. 5 . In various embodiments, multiple queue-like pre-fetch buffers will be used.
- each queue is associated to any pointer, but each pointer associated with at most one queue.
- the pre-fetched data is reconciled with the pointer of the operands contained in the actual instruction forwarded from the decode stage. If the actual data have been pre-fetched, they are passed to the appropriate execute unit in the execute stage.
- P 0 530 , P 1 540 and C 550 stages are used to continue to pass down the source address and destination address (destination address is selected in DSEL stage) so that when they reach the C 550 stage, they update the actual pointer address registers, and the destination address is also used for writing the results of execution (if required, as specified by the instruction) back to XY memory.
- the address registers in PF 1 500 stage are only shadowing address registers which are predictively updated when required. These values only become committed at the C stage 550 .
- Pre-fetch hazard detection performs the task of matching the addresses used in PF 1 500 and PF 2 510 stages to the destination addresses in DSEL 520 , P 0 530 , P 1 540 , and C 550 stage, so that if there is a write to a location in memory that is to be pre-fetched, the pre-fetch is stalled until, or restarted when, this Read after Write hazard has disappeared.
- a pre-fetch hazard can also occur when there is a write to a location in memory that has already been prefetched and stored in the buffer in DSEL stage. In this case, the item in the buffer is flushed and refetched when the write operation is complete
- FIG. 6 is a block diagram illustrating the specific structure of the pre-fetch logic in an XY memory structure in accordance with at least one embodiment of this invention.
- speculative pre-fetch is performed by accessing a set of registers 610 that serve as pointers pointing to specific locations in the X and Y memories 614 and 612 .
- the data is fetched from the XY memory and then on the next clock pulse, the speculative operand data and address information is stored in pre-fetch buffers 620 .
- matching and select block 622 checks for the pre-fetched addresses. If the required operand addresses from the decoded instruction are in the pre-fetch buffers, they are selected and registered for use in the execution stage.
- the pre-fetch buffers may be one, two, three or more deep such that a first in, first out storing scheme is used. When a data item is read out of one of the pre-fetch buffers 620 , it no longer resides in the buffer. The next data in the FIFO buffer automatically moves to the front of the queue.
- FIG. 7 a flow chart detailing the steps of a method for predictively pre-fetching instruction operand addresses in accordance with at least one embodiment of this invention is depicted.
- the steps of a pre-fetch method as well as the steps of a typical instruction pipeline are illustrated in parallel.
- the individual steps of the pre-fetch method may occur at the same time as the various steps or even before.
- steps of the pre-fetch process occur in isolation of the steps of the instruction pipeline method until matching and selection.
- step 700 operation of the pre-fetch method begins in step 700 and proceeds to step 705 where a set of registers are accessed that serve as pointers pointing to specific locations in the X and Y memory structures.
- step 705 may occur simultaneous to a compound instruction entering the fetch stage of the microprocessor's instruction pipeline.
- the pre-fetch process is not based on any information in the instruction this may occur before, an instruction is fetched in step 707 .
- step 705 may occur after a compound instruction is pre-fetched but prior to decoding.
- a compound instruction is one that performs multiple steps, such as, for example, a memory read, an arithmetic operation and a memory write.
- step 710 the X and Y memory structures are accessed at locations specified by the pointers in the pre-fetch registers.
- step 715 the data read from the X and Y memory locations are written to pre-fetch buffers.
- step 720 the results of the pre-fetch method are matched with the actual decoded instruction in the matching and selection step. Matching and selection is performed to reconcile the addresses of the operands contained in the actual instruction forwarded from the decode stage of the instruction pipeline with the pre-fetched data in the pre-fetch buffers. If the pre-fetched data is correct, operation continues to the appropriate execute unit of the execute pipeline in step 725 depending upon the nature of the instruction, i.e., shift, add, etc. It should be appreciated that if the pre-fetched operand addresses are not correct, a pipeline flush will occur while actual operands are fetched and injected into pipeline. Operation of the pre-fetch method terminates after matching and selection.
- steps 700 - 715 are performed in parallel and isolation to the processor pipeline operations 703 - 720 that they do not effect or otherwise delay the processor pipeline operations of fetching, aligning, decoding, register file or execution.
- predictive pre-fetching is an effective means of taking advantage of the benefits of XY memory without impacting the instruction pipeline.
- Processor clock frequency may be maintained at high speeds despite the use of XY memory.
- the XY memory functionality is completely transparent to the applications. Normal instruction pipeline flow and branch prediction are completely unaffected by this XY memory functionality both when it is invoked and when it is not used.
- the auxiliary unit of the execute branch provides an interface for applications to select this extendible functionality.
- operands can be predictively pre-fetched with sufficient accuracy to outweigh the overhead associated with mispredictions and without any impact on the processor pipeline.
Abstract
A microprocessor architecture including a predictive pre-fetch XY memory pipeline in parallel to the processor's pipeline for processing compound instructions with enhanced processor performance through predictive prefetch techniques. Instruction operands are predictively prefetched from X and Y based on the historical use of operands in instructions that target X and Y memory. After the compound instruction is decoded in the pipeline, the pre-fetched operand pointer, address and data is reconciled with the operands contained in the actual instruction. If the actual data has been pre-fetched, it is passed to the appropriate execute unit in the execute stage of the processor pipeline. As a result, if the prediction is correct, the data to use for access can be selected and the data selected fed to the execution stage without any addition processor overhead. This pre-fetch mechanism avoids the need to slow down the clock speed of the processor or insert stalls for each compound instruction when using XY memory.
Description
- This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
- This invention relates generally to microprocessor architecture and more specifically to systems and methods for achieving improved performance through a predictive data pre-fetch mechanism for a pipeline data memory, including specifically XY-type data memory.
- Multistage pipeline microprocessor architecture is known in the art. A typical microprocessor pipeline consists of several stages of instruction handling hardware, wherein each rising pulse of a clock signal propagates instructions one stage further in the pipeline. Although the clock speed dictates the number of pipeline propagations per second, the effective operational speed of the processor is dependent partially upon the rate that instructions and operands are transferred between memory and the processor. For this reason, processors typically employ one or more relatively small cache memories built directly into the processor. Cache memory typically is an on-chip random access memory (RAM) used to store a copy of memory data in anticipation of future use by the processor. Typically, the cache is positioned between the processor and the main memory to intercept calls from the processor to the main memory. Access to cache memory is generally much faster than off-chip RAM. When data is needed that has previously been accessed, it can be retrieved directly from the cache rather than from the relatively slower off-chip RAM.
- Generally, the microprocessor pipeline advances instructions on each clock signal pulse to subsequent pipeline stages. However, effective pipeline performance can be slower than that implied by the processor speed. Therefore, simply increasing microprocessor clock speed does not usually provide a corresponding increase in system performance. Accordingly, there is a need for a microprocessor architecture that enhances effective system performance through methods in addition to increased clock speed.
- One method of doing this has been to employ X and Y memory structures in parallel to the microprocessor pipeline. The ARCtangent-A4™ and ARCtangent-A5™ line of embedded microprocessors designed and licensed by ARC International, Inc. of Hertfordshire, UK, (ARC) employ such an XY memory structure. XY memory was designed to facilitate executing compound instructions on a RISC architecture processor without interrupting the pipeline. XY memory is typically located in parallel to the main processor pipeline, after the instruction decode stage, but prior to the execute stage. After decoding an instruction, source data is fetched from XY memory using address pointers. This source data is then fed to the execution stage. In the exemplary ARC XY architecture the two X and Y memory structures source two operands and receive results in the same cycle. Data in the XY memory is indexed via pointers from address generators and supplied to the ARC CPU pipeline for processing by any ARC instruction. The memories are software-programmable to provide 32-bit, 16-bit, or dual 16-bit data to the pipeline.
- It should be appreciated that the description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
- As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
- Various embodiments of the invention may ameliorate or overcome one or more of the shortcomings of conventional microprocessor architecture through a predictively fetched XY memory scheme. In various embodiments, an XY memory structure is located in parallel to the instruction pipeline. In various embodiments, a speculative pre-fetching scheme is spread over several sections of the pipeline in order to maintain high processor clock speed. In order to prevent impact on clock speed, operands are speculatively pre-fetched from X and Y memory before the current instruction has even been decoded. In various exemplary embodiments, the speculative pre-fetching occurs in an alignment stage of the instruction pipeline. In various embodiments, speculative address calculation of operands also occurs in the alignment stage of the instruction pipeline. In various embodiments, the XY memory is accessed in the instruction decode stage based on the speculative address calculation of the pipeline, and the resolution of the predictive pre-fetching occurs in the register file stage of the pipeline. Because the actual decoded instruction is not available in the pipeline until after the decode stage, all pre-fetching is done without explicit knowledge of what the current instruction is while this instruction is being pushed out of the decode stage into the register file stage. Thus, in various embodiments, a comparison is made in the register file stage between the operands specified by the actual instruction and those predictively pre-fetched. The pre-fetched values that match are selected to be passed to the execute stage of the instruction pipeline. Therefore, in a microprocessor architecture employing such a scheme, data memory fetches, arithmetic operation and result write back can be performed using a single instruction without slowing down the instruction pipeline clock speed or stalling the pipeline, even at high processor clock frequencies.
- At least one exemplary embodiment of the invention may provide a predictive pre-fetch XY memory pipeline for a microprocessor pipeline. The predictive pre-fetch XY memory pipeline according to this embodiment may comprise a first pre-fetch stage comprising a pre-fetch pointer address register file and X and Y address generators, a second pre-fetch stage comprising X and Y memory structures accessed using address pointers generated in the first pre-fetch stage, and third data select stage comprising at least one pre-fetch buffer in which speculative operand data and address information are stored.
- At least one additional exemplary embodiment may provide a method of predictively pre-fetching operand address and data information for a instruction pipeline of a microprocessor. The method of predictively pre-fetching operand address and data information for a instruction pipeline of a microprocessor according to this embodiment may comprise, prior to decoding a current instruction in the pipeline, accessing a set of registers containing pointers to specific locations in pre-fetch memory structures, fetching operand data information from the specific locations in the pre-fetch memory structures, and storing the pointer and operand data information in at least one pre-fetch buffer.
- Yet another exemplary embodiment of this invention may provide a microprocessor architecture. The microprocessor architecture according to this embodiment may comprise a multi-stage microprocessor pipeline, and a multi-stage pre-fetch memory pipeline in parallel to at least a portion of the instruction pipeline, wherein the pre-fetch pipeline comprises a first stage having a set of registers serving as pointers to specific pre-fetch memory locations, a second stage, having pre-fetch memory structures for storing predicted operand address information corresponding to operands in an un-decoded instruction in the microprocessor pipeline, and a third stage comprising at least one pre-fetch buffers, wherein said first, second and third stage respectively are parallel to, simultaneous to and in isolation of corresponding stages of the microprocessor pipeline.
- Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
-
FIG. 1 is a block diagram illustrating a processor core in accordance with at least one exemplary embodiment of this invention; -
FIG. 2 is a block diagram illustration a portion of an instruction pipeline of a microprocessor core architecture employing an XY memory structure and a typical multi-operand instruction processed by such an instruction pipeline in accordance with a conventional non-speculative XY memory; -
FIG. 3 is an exemplary instruction format for performing a multiply instruction on 2 operands and a memory write back with a single instruction in accordance with at least one embodiment of this invention; -
FIG. 4 is a block diagram illustrating a microprocessor instruction pipeline architecture including a parallel predictive pre-fetch XY memory pipeline in accordance with at least one embodiment of this invention; -
FIG. 5 is a block diagram, illustrating in greater detail the structure and operation of a predictively pre-fetching XY memory pipeline in accordance with at least one embodiment of this invention; -
FIG. 6 is a block diagram illustrating the specific pre-fetch operations in an XY memory structure in accordance with at least one embodiment of this invention; and -
FIG. 7 is a flow chart detailing the steps of a method for predictively pre-fetching instruction operand addresses in accordance with at least one embodiment of this invention. - The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
- Discussion of the invention will now made by way of example in reference to the various drawing figures.
FIG. 1 illustrates in block diagram form, an architecture for amicroprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention. Several novel features will be apparent fromFIG. 1 which distinguish the illustrated microprocessor architecture from that of a conventional microprocessor architecture. Firstly, the exemplary microprocessor architecture ofFIG. 1 features aprocessor core 100 having a seven stage instruction pipeline. However, it should be appreciated that additional pipeline stages may also be present. Analign stage 120 is shown inFIG. 1 following the fetchstage 110. Because themicroprocessor core 100 shown inFIG. 1 is operable to work with a variable bit-length instruction set, namely, 16-bits, 32-bits, 48-bits or 64-bits, thealign stage 120 formats the words coming from the fetchstage 110 into the appropriate instructions. In various exemplary embodiments, instructions are fetched from memory in 32-bit words. Thus, when the fetchstage 110 fetches a 32-bit word at a specified fetch address, the entry at that fetch address may contain an aligned 16-bit or 32-bit instruction, an unaligned 16 bit instruction preceded by a portion of a previous instruction, or an unaligned portion of a larger instruction preceded by a portion of a previous instruction based on the actual instruction address. For example, a fetched word may have an instruction fetch address of Ox4, but an actual instruction address of Ox6. In various exemplary embodiments, the 32-bit word fetched from memory is passed to thealign stage 120 where it is aligned into an complete instruction. In various exemplary embodiments, this alignment may include discarding superfluous 16-bit instructions or assembling unaligned 32-bit or larger instructions into a single instructions. After completely assembling the instruction, the N-bit instruction is forwarded to thedecoder 130. - Still referring to
FIG. 1 , aninstruction extension interface 180 is also shown which permits interface of customized processor instructions that are used to complement the standard instruction set architecture of the microprocessor. Interfacing of these customized instructions occurs through a timing registered interface to the various stages of themicroprocessor pipeline 100 in order to minimize the effect of critical path loading when attaching customized logic to a pre-existing processor pipeline. Specifically, a custom opcode slot is defined in the extensions instruction interface for the specific custom instruction in order for the microprocessor to correctly acknowledge the presence of a custom instruction 182 as well as the extraction of the source operand addresses that are used to index theregister file 142. The custominstruction flag interface 184 is used to allow the addition of custom instruction flags that are used by the microprocessor for conditional evaluation using either the standard condition code evaluators or custom extensioncondition code evaluators 184 in order to determine whether the instruction is executed or not based upon the condition evaluation result of the execute stage (EXEC) 150. Acustom ALU interface 186 permits user defined arithmetic and logical extension instructions the result of which are selected in the result select stage (SEL) 160. - Referring now to
FIG. 2 , a block diagram illustrating a portion of an instruction pipeline of a microprocessor core architecture employing an XY memory structure and a typical multi-operand instruction processed by such an instruction pipeline in accordance with a conventional non-speculative XY memory is illustrated. XY-type data memory is known in the art. Typically, in a RISC processor, only one memory load or store can be effected per pipelined instruction. However, in some cases, in order to accelerate pipeline efficiency, i.e., the number of operations executed per clock, it is desirable to have a single instruction perform multiple operations. For example, a single instruction could perform a memory read, an arithmetic operation and a memory write operation. The ability to decode and execute these kind of compound instructions is particularly important for achieving high performance in Digital Signal Processor (DSP) operations. DSP operations typically involve repetitive calculations on large data sets, thus, high memory bandwidth is required. By using an XY-memory structure, up to 2×32-bits of source data memory read access , and 1×32-bits of destination data memory write access per clock cycle are possible, resulting in a very high data memory bandwidth. (For example, a 4.8 Gbytes/s memory bandwidth can be achieved based on 3 32-bit accesses, 2 read and 1 write, per instruction in a 400 MHz processor or 3*32 bits*400 MHz/sec=38.4 Gbit/s or 4.8 Gbyte/s.) - In typical XY memory implementation, data used for XY memory is fetched from memory using addresses that are selected using register numbers decoded from the instruction in the decode stage. This data is then fed back to the execution units in the processor pipeline.
FIG. 2 illustrates such an XY memory implementation. InFIG. 2 , an instruction is fetched from memory in the fetchstage 210 and, in the next clock cycle is passed to thealign stage 220. In thealign stage 220, the instruction is formatted into proper form. For example, if in the fetch stage 210 a 32-bit word is fetched from memory with the fetch address 0x4, but the actual instruction address is for the 16-bit word having instruction address 0x6, the first 16 bits of 32-bit word are discarded. This properly formatted instruction is then passed to thedecode stage 230, where it is decoded into an actual instruction, for example, the decodedinstruction 241 shown inFIG. 2 . This decoded instruction is then passed to theregister file stage 240. -
FIG. 2 illustrates the format of such a decodedinstruction 241. The instruction is comprised of a name (any arbitrary name used to reference the instruction), the destination address pointer and update mode, the first source address pointer and update mode, and the second source address pointer and update mode. In theregister file stage 240, from the decodedinstruction 241, the address of the source and destination operands are selected using the register numbers (windowing registers) as pointers to a set of address registers 242. The source addresses are then used to accessX memory 243 andY memory 244. Thus, between thedecode stage 230 and the executestage 250, the address to use for access needs to be selected, the memory access performed, and the data selected fed to theexecution stage 250. As microprocessor clock speeds increases, it becomes difficult, if not impossible, to perform all these steps in a single clock cycle. As a result, either a decrease in the processor clock frequency must occur to accommodate these extra steps, or multiple clock cycles for each instruction using XY memory must be used, both of which negate or at least reduce the benefits of using XY memory in the first place. - One method of solving this problem is extending the processor pipeline to add more pipeline stages between the decode and the execution stage. However, extra processor stages are undesirable for several reasons. Firstly, they complicate the architecture of the processor. Secondly, any penalties from incorrect predictions in the branch prediction stage will be increased. Finally, because XY memory functions may only be needed when certain applications are being run on the processor, extra pipeline stages will necessarily be present even when these applications are not being used.
- An alternative approach is to move the XY memory to an earlier stage of the instruction pipeline, ahead of the register file stage, to allow for more cycle time for the data selection. However, doing so may result in the complication that, when XY memory is moved into the decode stage, the windowing register number is not yet decoded before accessing memory.
- Therefore, in accordance with at least one embodiment of this invention, to overcome these problems, the source data is predictively pre-fetched and stored for use in data buffers. When the source data from X or Y memory is required, just before the execution stage, a comparison may be made to check if the desired data was already pre-fetched, and if so, the data is simply taken from the pre-fetched data buffer and used. If it has not been pre-fetched, then the instruction is stalled and the required data is fetched. In order to reduce the number of instructions that are stalled, it is essential to ensure that data is pre-fetched correctly most of the time. Two schemes may be used to assist in this function. Firstly, a predictable way of using windowing registers may be employed. For example, if the same set of N windowing registers are used most of the time, and each pointer address is incremented in a regular way (sequentially as selected by the windowing registers), then the next few data for each of these N windowing registers can be pre-fetched fairly accurately. This reduces the number of prediction failures.
- Secondly, by having more prediction data buffers, more predictive fetches can be made in advance, reducing the chance of a prediction miss. Because compound instructions also include updating addresses, these addresses must also be predictively updated. In general, address updates are predictable as long as the user uses the same modifiers along with its associated non-modify mode in a sequence of code and the user sticks to a set of N pointers for an implementation with N pre-fetch data buffers. Since the data is pre-fetched, the pre-fetched data can become outdated due to write-backs to XY memory. In cases such as this, the specific pre-fetch buffer can be flushed and the out-of-date data re-fetched, or, alternatively, data forwarding can be performed to update these buffers.
-
FIG. 3 illustrates the format of a compound instruction, such as an instruction that might be used in a DSP application that would require extendible processing functions including XY memory in accordance with various embodiments of this invention. The compound instruction 300 consists of four sub-components, the name of theinstruction 301, thedestination pointer 302, thefirst operand pointer 303 and thesecond operand pointer 304. In the instruction 300 shown inFIG. 3 , the instruction, Muldw, is a dual 16-bit multiply instruction. Thedestination pointer 302 specifies that the result of the calculation instruction is to be written to X memory using the pointer address AX1. The label u0 specifies the update mode. This is a user defined address update mode and must be specified before calling the extendible function. Thesource operand pointers -
FIG. 4 is a block diagram illustrating a microprocessor instruction pipeline architecture including a parallel predictive pre-fetch XY memory pipeline in accordance with at least one embodiment of this invention. In the example illustrated inFIG. 4 , the instruction pipeline is comprised of seven stages,FCH 401,ALN 402,DEC 403, RF 04,EX 405,SEL 406 andWB 407. As stated above, each rising pulse of the clock cycle propagates an instruction to the next stage of the instruction pipeline. In parallel to the instruction pipeline is the predictive pre-fetch XY memory pipeline comprised of 6stages including PF1 412,PF2 413,DSEL 414,P0 415,P1 416 andC 417. It should be appreciated that various embodiments may utilize more or less pipeline stages. In various exemplary embodiments, speculative pre-fetching may begin instage PF1 412. However, in various exemplary embodiments, pre-fetching does not have to begin at the same time as the fetchinstruction 401. Pre-fetching can happen much earlier, for example, when a pointer is first set-up, or was already fetched because it was recently used. Pre-fetching can also happen later if the pre-fetched instruction was predicted incorrectly. The twoprevious stages PF1 412 andPF2 413, prior to theregister file stage 404, allow sufficient time for the access address to be selected, the memory access performed, and the data selected to be fed to theexecution stage 405. -
FIG. 5 , is a block diagram, illustrating in greater detail the structure and operation of a predictively pre-fetching XY memory pipeline in accordance with at least one embodiment of this invention. InFIG. 5, 6 pipeline stages of the predictive pre-fetch XY memory pipeline are illustrated. As noted here, it should be appreciated that in various embodiments, more or less stages may be employed. As stated above in the context ofFIG. 4 , these stages may include thePF1 500,PF2 510, DSEL (data select) 520,P0 530,P1 540 andC 550.Stage PF1 500, which occurs simultaneous to the align stage of the instruction pipeline, includes the pre-fetch shadow pointeraddress register file 502 and the X and Y address generators (used to update the pointer address) 504 and 506. Next, stage PF2, includes access toX memory unit 512 andY memory unit 514, using thepointers stage PF1 500. Instage DSEL 520, the data accessed fromX memory 512 andY memory 514 instage PF2 510 are written to one of multiple pre-fetch buffers 522. For purposes of example only, fourpre-fetch buffers 522 are illustrated inFIG. 5 . In various embodiments, multiple queue-like pre-fetch buffers will be used. It should be noted that typically each queue is associated to any pointer, but each pointer associated with at most one queue. In theDSEL stage 520, the pre-fetched data is reconciled with the pointer of the operands contained in the actual instruction forwarded from the decode stage. If the actual data have been pre-fetched, they are passed to the appropriate execute unit in the execute stage. -
P0 530,P1 540 andC 550 stages are used to continue to pass down the source address and destination address (destination address is selected in DSEL stage) so that when they reach theC 550 stage, they update the actual pointer address registers, and the destination address is also used for writing the results of execution (if required, as specified by the instruction) back to XY memory. The address registers inPF1 500 stage are only shadowing address registers which are predictively updated when required. These values only become committed at theC stage 550. Pre-fetch hazard detection performs the task of matching the addresses used inPF1 500 andPF2 510 stages to the destination addresses inDSEL 520,P0 530,P1 540, andC 550 stage, so that if there is a write to a location in memory that is to be pre-fetched, the pre-fetch is stalled until, or restarted when, this Read after Write hazard has disappeared. A pre-fetch hazard can also occur when there is a write to a location in memory that has already been prefetched and stored in the buffer in DSEL stage. In this case, the item in the buffer is flushed and refetched when the write operation is complete -
FIG. 6 is a block diagram illustrating the specific structure of the pre-fetch logic in an XY memory structure in accordance with at least one embodiment of this invention. In various exemplary embodiments, in thePF1 stage 605, speculative pre-fetch is performed by accessing a set ofregisters 610 that serve as pointers pointing to specific locations in the X andY memories PF2 stage 602, the data is fetched from the XY memory and then on the next clock pulse, the speculative operand data and address information is stored in pre-fetch buffers 620. While still in the DSEL stage which also corresponds with the processor'sRegister File stage 603, matching andselect block 622 checks for the pre-fetched addresses. If the required operand addresses from the decoded instruction are in the pre-fetch buffers, they are selected and registered for use in the execution stage. In various exemplary embodiments, the pre-fetch buffers may be one, two, three or more deep such that a first in, first out storing scheme is used. When a data item is read out of one of thepre-fetch buffers 620, it no longer resides in the buffer. The next data in the FIFO buffer automatically moves to the front of the queue. - Referring now to
FIG. 7 , a flow chart detailing the steps of a method for predictively pre-fetching instruction operand addresses in accordance with at least one embodiment of this invention is depicted. InFIG. 7 , the steps of a pre-fetch method as well as the steps of a typical instruction pipeline are illustrated in parallel. The individual steps of the pre-fetch method may occur at the same time as the various steps or even before. - Any correspondence between steps of the pre-fetch process and the instruction pipeline process implied by the figure are merely for ease of illustration. It should be appreciated that the steps of the pre-fetch method occur in isolation of the steps of the instruction pipeline method until matching and selection.
- With continued reference to
FIG. 7 , operation of the pre-fetch method begins instep 700 and proceeds to step 705 where a set of registers are accessed that serve as pointers pointing to specific locations in the X and Y memory structures. In various embodiments,step 705 may occur simultaneous to a compound instruction entering the fetch stage of the microprocessor's instruction pipeline. However, as noted herein, in various other embodiments, because the actual compound instruction has not yet been decoded, and therefore, the pre-fetch process is not based on any information in the instruction this may occur before, an instruction is fetched instep 707. Alternatively, step 705 may occur after a compound instruction is pre-fetched but prior to decoding. - As used herein, a compound instruction is one that performs multiple steps, such as, for example, a memory read, an arithmetic operation and a memory write.
- With continued reference to the method of
FIG. 7 , instep 710, the X and Y memory structures are accessed at locations specified by the pointers in the pre-fetch registers. - Operation of the method then goes to step 715 where the data read from the X and Y memory locations are written to pre-fetch buffers.
- Next, in
step 720, the results of the pre-fetch method are matched with the actual decoded instruction in the matching and selection step. Matching and selection is performed to reconcile the addresses of the operands contained in the actual instruction forwarded from the decode stage of the instruction pipeline with the pre-fetched data in the pre-fetch buffers. If the pre-fetched data is correct, operation continues to the appropriate execute unit of the execute pipeline instep 725 depending upon the nature of the instruction, i.e., shift, add, etc. It should be appreciated that if the pre-fetched operand addresses are not correct, a pipeline flush will occur while actual operands are fetched and injected into pipeline. Operation of the pre-fetch method terminates after matching and selection. It should be appreciated that if necessary, that is, if the instruction requires a write operation to X Y memory, the results of execution are written back to XY memory. Furthermore, it should be appreciated that because steps 700-715 are performed in parallel and isolation to the processor pipeline operations 703-720 that they do not effect or otherwise delay the processor pipeline operations of fetching, aligning, decoding, register file or execution. - As stated above, when performing repetitive functions, such as DSP extension functions where data is repeatedly read from and written to XY memory, predictive pre-fetching is an effective means of taking advantage of the benefits of XY memory without impacting the instruction pipeline. Processor clock frequency may be maintained at high speeds despite the use of XY memory. Also, when applications being run on the microprocessor do not require XY memory, the XY memory functionality is completely transparent to the applications. Normal instruction pipeline flow and branch prediction are completely unaffected by this XY memory functionality both when it is invoked and when it is not used. The auxiliary unit of the execute branch provides an interface for applications to select this extendible functionality. Therefore, as a result of the above-described microprocessor architecture, with careful use of pointers and their associated update modes, operands can be predictively pre-fetched with sufficient accuracy to outweigh the overhead associated with mispredictions and without any impact on the processor pipeline.
- It should be appreciated that, while the descriptors “X” and “Y” have been used throughout the specification that theses terms are purely descriptive to the extent that they do not imply any specific structural. That is to say that any two dimensional pre-fetch memory structure can be considered “X Y memory.”
- While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only. The embodiments of the present invention are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to particular embodiments, the principles herein are equally applicable to microprocessors in general. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although the embodiments of the present inventions have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the embodiments of the present inventions as disclosed herein.
Claims (20)
1. A microprocessor comprising:
a multistage instruction pipeline; and
a predictive pre-fetch memory pipeline comprising:
a first pre-fetch stage comprising a pre-fetch pointer address register file and memory address generators;
a second pre-fetch stage comprising pre-fetch memory structures accessed using address pointers generated in the first pre-fetch stage; and
a data select stage comprising at least one pre-fetch buffer in which predictive operand address and data information from the pre-fetch memory structures are stored.
2. The microprocessor of claim 1 , wherein the pre-fetch memory structures comprise X and Y memory structures storing operand address data.
3. The microprocessor of claim 1 , wherein the first and second pre-fetch stages and the data select stage occur in parallel to stages preceding an execute stage of the instruction pipeline.
4. The microprocessor of claim 1 , wherein the instruction pipeline comprises align, decode and register file stages, and the first and second and pre-fetch stages and the data select stage occur in parallel to the align, decode and register file stages, respectively.
5. The microprocessor of claim 1 , wherein the predictive pre-fetch memory pipeline further comprises hardware logic in the data select stage adapted to reconcile actual operand address information contained in an actual decoded instruction with the predictive operand address information the predictive operand address information.
6. The microprocessor of claim 5 , wherein the predictive pre-fetch memory pipeline further comprises hardware logic adapted to pass the predictive operand address information from the pre-fetch buffer to an execute stage of the instruction pipeline if the actual operand address information matches the predictive operand address information.
7. The microprocessor of claim 1 , wherein the predictive pre-fetch memory pipeline further comprises a write back structure invoked after the execute stage and being adapted to write the results of execution back to XY memory if the instruction requires a write to at least one of the pre-fetch memory structures.
8. A method of predictively pre-fetching operand address and data information for an instruction pipeline of a microprocessor, the method comprising:
prior to decoding a current instruction in the instruction pipeline, accessing at least one register containing pointers to specific locations in pre-fetch memory structures;
fetching predictive operand data from the specific locations in the pre-fetch memory structures; and
storing the pointer and predictive operand data in at least one pre-fetch buffer.
9. The method according to claim 8 , wherein accessing, fetching and storing occur in parallel to, simultaneous to and in isolation of the instruction pipeline.
10. The method according to claim 9 , wherein accessing, fetching and storing occur, respectively, in parallel to align, decode and register file stages of the instruction pipeline.
11. The method according to claim 8 , further comprising, after decoding the current instruction, reconciling actual operand data contained in the decoded current instruction with the predictive operand data.
12. The method according to claim 8 , further comprising decoding the current instruction and passing the pre-fetched predictive operand data to an execute unit of the microprocessor pipeline if the pre-fetched predictive operand data matches actual operand data contained in the current instruction.
13. The method according to claim 8 , wherein accessing, fetching and storing are performed on successive clock pulses of the microprocessor.
14. The method according to claim 8 further comprising, performing pre-fetch hazard detection.
15. The method according to claim 14 , wherein performing pre-fetch hazard detection comprises at least one operation selected from the group consisting of: stalling pre-fetch operation or restarting pre-fetch operation when the read after write hazard has disappeared, if it is determined that there is a read after write hazard characterized by a memory write to a location in memory that is to be pre-fetched; and clearing the pre-fetch buffers if there is a read from a memory location previously pre-fetched.
16. A microprocessor comprising:
a multistage microprocessor pipeline; and
a multistage pre-fetch memory pipeline in parallel to at least a portion of the microprocessor pipeline, wherein the pre-fetch memory pipeline comprises:
a first stage having at least one register serving as pointers to specific pre-fetch memory locations;
a second stage, having pre-fetch memory structures for storing predicted operand address information corresponding to operands in a pre-decoded instruction in the microprocessor pipeline; and
a third stage comprising at least one pre-fetch buffer;
wherein said first, second and third stages, respectively, are parallel to, simultaneous to and in isolation of corresponding stages of the microprocessor pipeline.
17. The microprocessor according to claim 16 , wherein the microprocessor pipeline comprises align, decode, and register file stages, and the first, second and third stages of the pre-fetch memory pipeline, respectively, are parallel to the align, decode and register file stages.
18. The microprocessor according to claim 16 , further comprising hardware logic in the third stage adapted to reconcile operand address information contained in an actual instruction forwarded from a decode stage of the microprocessor pipeline with the predicted operand address information.
19. The microprocessor according to claim 16 , further comprising circuitry adapted to passing the predicted operand address information from the pre-fetch buffer to an execute stage of the microprocessor pipeline if the operand pointer in the actual instruction matches the predicted operand address information.
20. The microprocessor according to claim 16 , further comprising post-execute stage hardware logic adapted to write the results of execution back to pre-fetch memory if a decoded instruction specifies a write back to at least one pre-fetch memory structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/132,447 US20050278505A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US57223804P | 2004-05-19 | 2004-05-19 | |
US11/132,447 US20050278505A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050278505A1 true US20050278505A1 (en) | 2005-12-15 |
Family
ID=35429033
Family Applications (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/132,448 Abandoned US20050289323A1 (en) | 2004-05-19 | 2005-05-19 | Barrel shifter for a microprocessor |
US11/132,424 Active 2031-02-12 US8719837B2 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture having extendible logic |
US11/132,447 Abandoned US20050278505A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
US11/132,423 Abandoned US20050278513A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods of dynamic branch prediction in a microprocessor |
US11/132,432 Abandoned US20050273559A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including unified cache debug unit |
US11/132,428 Abandoned US20050278517A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
US14/222,194 Active US9003422B2 (en) | 2004-05-19 | 2014-03-21 | Microprocessor architecture having extendible logic |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/132,448 Abandoned US20050289323A1 (en) | 2004-05-19 | 2005-05-19 | Barrel shifter for a microprocessor |
US11/132,424 Active 2031-02-12 US8719837B2 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture having extendible logic |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/132,423 Abandoned US20050278513A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods of dynamic branch prediction in a microprocessor |
US11/132,432 Abandoned US20050273559A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including unified cache debug unit |
US11/132,428 Abandoned US20050278517A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
US14/222,194 Active US9003422B2 (en) | 2004-05-19 | 2014-03-21 | Microprocessor architecture having extendible logic |
Country Status (5)
Country | Link |
---|---|
US (7) | US20050289323A1 (en) |
CN (1) | CN101002169A (en) |
GB (1) | GB2428842A (en) |
TW (1) | TW200602974A (en) |
WO (1) | WO2005114441A2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050289323A1 (en) * | 2004-05-19 | 2005-12-29 | Kar-Lik Wong | Barrel shifter for a microprocessor |
US20090198905A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Techniques for Prediction-Based Indirect Data Prefetching |
US20090198948A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Techniques for Data Prefetching Using Indirect Addressing |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US20190012177A1 (en) * | 2017-07-04 | 2019-01-10 | Arm Limited | Apparatus and method for controlling use of a register cache |
US11243880B1 (en) * | 2017-09-15 | 2022-02-08 | Groq, Inc. | Processor architecture |
US11360934B1 (en) | 2017-09-15 | 2022-06-14 | Groq, Inc. | Tensor streaming processor architecture |
US11392535B2 (en) | 2019-11-26 | 2022-07-19 | Groq, Inc. | Loading operands and outputting results from a multi-dimensional array using only a single side |
US11809514B2 (en) | 2018-11-19 | 2023-11-07 | Groq, Inc. | Expanded kernel generation |
US11868908B2 (en) | 2017-09-21 | 2024-01-09 | Groq, Inc. | Processor compiler for scheduling instructions to reduce execution delay due to dependencies |
US11868804B1 (en) | 2019-11-18 | 2024-01-09 | Groq, Inc. | Processor instruction dispatch configuration |
US11875874B2 (en) | 2017-09-15 | 2024-01-16 | Groq, Inc. | Data structures with multiple read ports |
Families Citing this family (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7577795B2 (en) * | 2006-01-25 | 2009-08-18 | International Business Machines Corporation | Disowning cache entries on aging out of the entry |
US20070260862A1 (en) * | 2006-05-03 | 2007-11-08 | Mcfarling Scott | Providing storage in a memory hierarchy for prediction information |
US7752468B2 (en) | 2006-06-06 | 2010-07-06 | Intel Corporation | Predict computing platform memory power utilization |
US7555605B2 (en) * | 2006-09-28 | 2009-06-30 | Freescale Semiconductor, Inc. | Data processing system having cache memory debugging support and method therefor |
US7716460B2 (en) * | 2006-09-29 | 2010-05-11 | Qualcomm Incorporated | Effective use of a BHT in processor having variable length instruction set execution modes |
US7529909B2 (en) * | 2006-12-28 | 2009-05-05 | Microsoft Corporation | Security verified reconfiguration of execution datapath in extensible microcomputer |
US7779241B1 (en) | 2007-04-10 | 2010-08-17 | Dunn David A | History based pipelined branch prediction |
US9519480B2 (en) * | 2008-02-11 | 2016-12-13 | International Business Machines Corporation | Branch target preloading using a multiplexer and hash circuit to reduce incorrect branch predictions |
US9201655B2 (en) * | 2008-03-19 | 2015-12-01 | International Business Machines Corporation | Method, computer program product, and hardware product for eliminating or reducing operand line crossing penalty |
US8181003B2 (en) * | 2008-05-29 | 2012-05-15 | Axis Semiconductor, Inc. | Instruction set design, control and communication in programmable microprocessor cores and the like |
US8131982B2 (en) * | 2008-06-13 | 2012-03-06 | International Business Machines Corporation | Branch prediction instructions having mask values involving unloading and loading branch history data |
US8225069B2 (en) * | 2009-03-31 | 2012-07-17 | Intel Corporation | Control of on-die system fabric blocks |
US10338923B2 (en) * | 2009-05-05 | 2019-07-02 | International Business Machines Corporation | Branch prediction path wrong guess instruction |
JP5423156B2 (en) * | 2009-06-01 | 2014-02-19 | 富士通株式会社 | Information processing apparatus and branch prediction method |
US8954714B2 (en) * | 2010-02-01 | 2015-02-10 | Altera Corporation | Processor with cycle offsets and delay lines to allow scheduling of instructions through time |
US8521999B2 (en) * | 2010-03-11 | 2013-08-27 | International Business Machines Corporation | Executing touchBHT instruction to pre-fetch information to prediction mechanism for branch with taken history |
US8495287B2 (en) * | 2010-06-24 | 2013-07-23 | International Business Machines Corporation | Clock-based debugging for embedded dynamic random access memory element in a processor core |
US9639354B2 (en) | 2011-12-22 | 2017-05-02 | Intel Corporation | Packed data rearrangement control indexes precursors generation processors, methods, systems, and instructions |
WO2013095554A1 (en) | 2011-12-22 | 2013-06-27 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of consecutive integers in numerical order |
US10223112B2 (en) | 2011-12-22 | 2019-03-05 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset |
CN104011644B (en) | 2011-12-22 | 2017-12-08 | 英特尔公司 | Processor, method, system and instruction for generation according to the sequence of the integer of the phase difference constant span of numerical order |
US9395994B2 (en) | 2011-12-30 | 2016-07-19 | Intel Corporation | Embedded branch prediction unit |
WO2013147879A1 (en) * | 2012-03-30 | 2013-10-03 | Intel Corporation | Dynamic branch hints using branches-to-nowhere conditional branch |
US9135012B2 (en) | 2012-06-14 | 2015-09-15 | International Business Machines Corporation | Instruction filtering |
US9152424B2 (en) | 2012-06-14 | 2015-10-06 | International Business Machines Corporation | Mitigating instruction prediction latency with independently filtered presence predictors |
WO2013188705A2 (en) * | 2012-06-15 | 2013-12-19 | Soft Machines, Inc. | A virtual load store queue having a dynamic dispatch window with a unified structure |
US9378017B2 (en) * | 2012-12-29 | 2016-06-28 | Intel Corporation | Apparatus and method of efficient vector roll operation |
CN103425498B (en) * | 2013-08-20 | 2018-07-24 | 复旦大学 | A kind of long instruction words command memory of low-power consumption and its method for optimizing power consumption |
US10372590B2 (en) | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US9870226B2 (en) * | 2014-07-03 | 2018-01-16 | The Regents Of The University Of Michigan | Control of switching between executed mechanisms |
US9910670B2 (en) | 2014-07-09 | 2018-03-06 | Intel Corporation | Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows |
US9740607B2 (en) | 2014-09-03 | 2017-08-22 | Micron Technology, Inc. | Swap operations in memory |
TWI569207B (en) * | 2014-10-28 | 2017-02-01 | 上海兆芯集成電路有限公司 | Fractional use of prediction history storage for operating system routines |
US9665374B2 (en) * | 2014-12-18 | 2017-05-30 | Intel Corporation | Binary translation mechanism |
EP3286640A4 (en) * | 2015-04-24 | 2019-07-10 | Optimum Semiconductor Technologies, Inc. | Computer processor with separate registers for addressing memory |
US10346168B2 (en) * | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US10776115B2 (en) * | 2015-09-19 | 2020-09-15 | Microsoft Technology Licensing, Llc | Debug support for block-based processor |
US10664280B2 (en) | 2015-11-09 | 2020-05-26 | MIPS Tech, LLC | Fetch ahead branch target buffer |
US10599428B2 (en) | 2016-03-23 | 2020-03-24 | Arm Limited | Relaxed execution of overlapping mixed-scalar-vector instructions |
GB2548601B (en) * | 2016-03-23 | 2019-02-13 | Advanced Risc Mach Ltd | Processing vector instructions |
US10192281B2 (en) | 2016-07-07 | 2019-01-29 | Intel Corporation | Graphics command parsing mechanism |
WO2018149495A1 (en) * | 2017-02-16 | 2018-08-23 | Huawei Technologies Co., Ltd. | A method and system to fetch multicore instruction traces from a virtual platform emulator to a performance simulation model |
US9959247B1 (en) | 2017-02-17 | 2018-05-01 | Google Llc | Permuting in a matrix-vector processor |
CN107179895B (en) * | 2017-05-17 | 2020-08-28 | 北京中科睿芯科技有限公司 | Method for accelerating instruction execution speed in data stream structure by applying composite instruction |
US10902348B2 (en) | 2017-05-19 | 2021-01-26 | International Business Machines Corporation | Computerized branch predictions and decisions |
US10372459B2 (en) | 2017-09-21 | 2019-08-06 | Qualcomm Incorporated | Training and utilization of neural branch predictor |
US20200065112A1 (en) * | 2018-08-22 | 2020-02-27 | Qualcomm Incorporated | Asymmetric speculative/nonspeculative conditional branching |
US11163577B2 (en) | 2018-11-26 | 2021-11-02 | International Business Machines Corporation | Selectively supporting static branch prediction settings only in association with processor-designated types of instructions |
US11086631B2 (en) | 2018-11-30 | 2021-08-10 | Western Digital Technologies, Inc. | Illegal instruction exception handling |
CN109783384A (en) * | 2019-01-10 | 2019-05-21 | 未来电视有限公司 | Log use-case test method, log use-case test device and electronic equipment |
US11182166B2 (en) | 2019-05-23 | 2021-11-23 | Samsung Electronics Co., Ltd. | Branch prediction throughput by skipping over cachelines without branches |
CN110442382B (en) * | 2019-07-31 | 2021-06-15 | 西安芯海微电子科技有限公司 | Prefetch cache control method, device, chip and computer readable storage medium |
CN110727463B (en) * | 2019-09-12 | 2021-08-10 | 无锡江南计算技术研究所 | Zero-level instruction circular buffer prefetching method and device based on dynamic credit |
CN112015490A (en) * | 2020-11-02 | 2020-12-01 | 鹏城实验室 | Method, apparatus and medium for programmable device implementing and testing reduced instruction set |
CN113076277A (en) * | 2021-03-26 | 2021-07-06 | 大唐微电子技术有限公司 | Method and device for realizing pipeline scheduling, computer storage medium and terminal |
US11599358B1 (en) | 2021-08-12 | 2023-03-07 | Tenstorrent Inc. | Pre-staged instruction registers for variable length instruction set machine |
US11663007B2 (en) * | 2021-10-01 | 2023-05-30 | Arm Limited | Control of branch prediction for zero-overhead loop |
CN115495155B (en) * | 2022-11-18 | 2023-03-24 | 北京数渡信息科技有限公司 | Hardware circulation processing device suitable for general processor |
CN117193861B (en) * | 2023-11-07 | 2024-03-15 | 芯来智融半导体科技(上海)有限公司 | Instruction processing method, apparatus, computer device and storage medium |
Citations (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4594659A (en) * | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
US5148532A (en) * | 1987-12-25 | 1992-09-15 | Hitachi, Ltd. | Pipeline processor with prefetch circuit |
US5317701A (en) * | 1990-01-02 | 1994-05-31 | Motorola, Inc. | Method for refilling instruction queue by reading predetermined number of instruction words comprising one or more instructions and determining the actual number of instruction words used |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5642500A (en) * | 1993-11-26 | 1997-06-24 | Fujitsu Limited | Method and apparatus for controlling instruction in pipeline processor |
US5692168A (en) * | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
US5696958A (en) * | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
US5808876A (en) * | 1997-06-20 | 1998-09-15 | International Business Machines Corporation | Multi-function power distribution system |
US5909566A (en) * | 1996-12-31 | 1999-06-01 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods for speculatively executing an instruction using its most recently used data while concurrently prefetching data for the instruction |
US5920711A (en) * | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US5978909A (en) * | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US5996071A (en) * | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
US6038649A (en) * | 1994-03-14 | 2000-03-14 | Texas Instruments Incorporated | Address generating circuit for block repeat addressing for a pipelined processor |
US6044458A (en) * | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6157988A (en) * | 1997-08-01 | 2000-12-05 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US6292879B1 (en) * | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US6550056B1 (en) * | 1999-07-19 | 2003-04-15 | Mitsubishi Denki Kabushiki Kaisha | Source level debugger for debugging source programs |
US6560754B1 (en) * | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6609194B1 (en) * | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6681295B1 (en) * | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718504B1 (en) * | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US6718460B1 (en) * | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US6774832B1 (en) * | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US6823444B1 (en) * | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US20050138607A1 (en) * | 2003-12-18 | 2005-06-23 | John Lu | Software-implemented grouping techniques for use in a superscalar data processing system |
US20050204121A1 (en) * | 2004-03-12 | 2005-09-15 | Arm Limited | Prefetching exception vectors |
US6948052B2 (en) * | 1991-07-08 | 2005-09-20 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US6963554B1 (en) * | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US20050273559A1 (en) * | 2004-05-19 | 2005-12-08 | Aris Aristodemou | Microprocessor architecture including unified cache debug unit |
Family Cites Families (189)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4342082A (en) | 1977-01-13 | 1982-07-27 | International Business Machines Corp. | Program instruction mechanism for shortened recursive handling of interruptions |
US4216539A (en) | 1978-05-05 | 1980-08-05 | Zehntel, Inc. | In-circuit digital tester |
US4400773A (en) | 1980-12-31 | 1983-08-23 | International Business Machines Corp. | Independent handling of I/O interrupt requests and associated status information transfers |
JPS63225822A (en) * | 1986-08-11 | 1988-09-20 | Toshiba Corp | Barrel shifter |
US4905178A (en) | 1986-09-19 | 1990-02-27 | Performance Semiconductor Corporation | Fast shifter method and structure |
JPS6398729A (en) | 1986-10-15 | 1988-04-30 | Fujitsu Ltd | Barrel shifter |
US4914622A (en) | 1987-04-17 | 1990-04-03 | Advanced Micro Devices, Inc. | Array-organized bit map with a barrel shifter |
DE3889812T2 (en) | 1987-08-28 | 1994-12-15 | Nec Corp | Data processor with a test structure for multi-position shifters. |
JPH01263820A (en) | 1988-04-15 | 1989-10-20 | Hitachi Ltd | Microprocessor |
EP0344347B1 (en) | 1988-06-02 | 1993-12-29 | Deutsche ITT Industries GmbH | Digital signal processing unit |
GB2229832B (en) | 1989-03-30 | 1993-04-07 | Intel Corp | Byte swap instruction for memory format conversion within a microprocessor |
EP0415648B1 (en) * | 1989-08-31 | 1998-05-20 | Canon Kabushiki Kaisha | Image processing apparatus |
JPH03185530A (en) * | 1989-12-14 | 1991-08-13 | Mitsubishi Electric Corp | Data processor |
JPH03248226A (en) | 1990-02-26 | 1991-11-06 | Nec Corp | Microprocessor |
JP2560889B2 (en) * | 1990-05-22 | 1996-12-04 | 日本電気株式会社 | Microprocessor |
CA2045790A1 (en) * | 1990-06-29 | 1991-12-30 | Richard Lee Sites | Branch prediction in high-performance processor |
US5155843A (en) | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
US5778423A (en) * | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
JP2556612B2 (en) * | 1990-08-29 | 1996-11-20 | 日本電気アイシーマイコンシステム株式会社 | Barrel shifter circuit |
US5636363A (en) * | 1991-06-14 | 1997-06-03 | Integrated Device Technology, Inc. | Hardware control structure and method for off-chip monitoring entries of an on-chip cache |
DE69229084T2 (en) * | 1991-07-08 | 1999-10-21 | Canon Kk | Color imaging process, color image reader and color image processing apparatus |
US5450586A (en) * | 1991-08-14 | 1995-09-12 | Hewlett-Packard Company | System for analyzing and debugging embedded software through dynamic and interactive use of code markers |
CA2073516A1 (en) | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
US5485625A (en) | 1992-06-29 | 1996-01-16 | Ford Motor Company | Method and apparatus for monitoring external events during a microprocessor's sleep mode |
US5274770A (en) | 1992-07-29 | 1993-12-28 | Tritech Microelectronics International Pte Ltd. | Flexible register-based I/O microcontroller with single cycle instruction execution |
US5294928A (en) | 1992-08-31 | 1994-03-15 | Microchip Technology Incorporated | A/D converter with zero power mode |
US5333119A (en) | 1992-09-30 | 1994-07-26 | Regents Of The University Of Minnesota | Digital signal processor with delayed-evaluation array multipliers and low-power memory addressing |
US5542074A (en) | 1992-10-22 | 1996-07-30 | Maspar Computer Corporation | Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount |
GB2275119B (en) | 1993-02-03 | 1997-05-14 | Motorola Inc | A cached processor |
US5577217A (en) * | 1993-05-14 | 1996-11-19 | Intel Corporation | Method and apparatus for a branch target buffer with shared branch pattern tables for associated branch predictions |
JPH06332693A (en) | 1993-05-27 | 1994-12-02 | Hitachi Ltd | Issuing system of suspending instruction with time-out function |
US5454117A (en) | 1993-08-25 | 1995-09-26 | Nexgen, Inc. | Configurable branch prediction for a processor performing speculative execution |
US5584031A (en) | 1993-11-09 | 1996-12-10 | Motorola Inc. | System and method for executing a low power delay instruction |
US5590350A (en) | 1993-11-30 | 1996-12-31 | Texas Instruments Incorporated | Three input arithmetic logic unit with mask generator |
US6116768A (en) | 1993-11-30 | 2000-09-12 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5509129A (en) | 1993-11-30 | 1996-04-16 | Guttag; Karl M. | Long instruction word controlling plural independent processor operations |
US5590351A (en) | 1994-01-21 | 1996-12-31 | Advanced Micro Devices, Inc. | Superscalar execution unit for sequential instruction pointer updates and segment limit checks |
TW253946B (en) * | 1994-02-04 | 1995-08-11 | Ibm | Data processor with branch prediction and method of operation |
US5517436A (en) | 1994-06-07 | 1996-05-14 | Andreas; David C. | Digital signal processor for audio applications |
US5809293A (en) * | 1994-07-29 | 1998-09-15 | International Business Machines Corporation | System and method for program execution tracing within an integrated processor |
US5566357A (en) | 1994-10-06 | 1996-10-15 | Qualcomm Incorporated | Power reduction in a cellular radiotelephone |
JPH08202469A (en) | 1995-01-30 | 1996-08-09 | Fujitsu Ltd | Microcontroller unit equipped with universal asychronous transmitting and receiving circuit |
US5600674A (en) | 1995-03-02 | 1997-02-04 | Motorola Inc. | Method and apparatus of an enhanced digital signal processor |
US5655122A (en) | 1995-04-05 | 1997-08-05 | Sequent Computer Systems, Inc. | Optimizing compiler with static prediction of branch probability, branch frequency and function frequency |
US5835753A (en) | 1995-04-12 | 1998-11-10 | Advanced Micro Devices, Inc. | Microprocessor with dynamically extendable pipeline stages and a classifying circuit |
US5659752A (en) * | 1995-06-30 | 1997-08-19 | International Business Machines Corporation | System and method for improving branch prediction in compiled program code |
US5768602A (en) | 1995-08-04 | 1998-06-16 | Apple Computer, Inc. | Sleep mode controller for power management |
US5842004A (en) | 1995-08-04 | 1998-11-24 | Sun Microsystems, Inc. | Method and apparatus for decompression of compressed geometric three-dimensional graphics data |
US5727211A (en) * | 1995-11-09 | 1998-03-10 | Chromatic Research, Inc. | System and method for fast context switching between tasks |
US5774709A (en) | 1995-12-06 | 1998-06-30 | Lsi Logic Corporation | Enhanced branch delay slot handling with single exception program counter |
US5778438A (en) | 1995-12-06 | 1998-07-07 | Intel Corporation | Method and apparatus for maintaining cache coherency in a computer system with a highly pipelined bus and multiple conflicting snoop requests |
JP3663710B2 (en) * | 1996-01-17 | 2005-06-22 | ヤマハ株式会社 | Program generation method and processor interrupt control method |
US5896305A (en) | 1996-02-08 | 1999-04-20 | Texas Instruments Incorporated | Shifter circuit for an arithmetic logic unit in a microprocessor |
JPH09261490A (en) * | 1996-03-22 | 1997-10-03 | Minolta Co Ltd | Image forming device |
US5752014A (en) | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5784636A (en) | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US20010025337A1 (en) | 1996-06-10 | 2001-09-27 | Frank Worrell | Microprocessor including a mode detector for setting compression mode |
US5826079A (en) | 1996-07-05 | 1998-10-20 | Ncr Corporation | Method for improving the execution efficiency of frequently communicating processes utilizing affinity process scheduling by identifying and assigning the frequently communicating processes to the same processor |
US5805876A (en) * | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5964884A (en) * | 1996-09-30 | 1999-10-12 | Advanced Micro Devices, Inc. | Self-timed pulse control circuit |
US5848264A (en) * | 1996-10-25 | 1998-12-08 | S3 Incorporated | Debug and video queue for multi-processor chip |
GB2320388B (en) | 1996-11-29 | 1999-03-31 | Sony Corp | Image processing apparatus |
US6061521A (en) | 1996-12-02 | 2000-05-09 | Compaq Computer Corp. | Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle |
US5909572A (en) | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
KR100236533B1 (en) | 1997-01-16 | 2000-01-15 | 윤종용 | Digital signal processor |
EP0855718A1 (en) | 1997-01-28 | 1998-07-29 | Hewlett-Packard Company | Memory low power mode control |
US6154857A (en) | 1997-04-08 | 2000-11-28 | Advanced Micro Devices, Inc. | Microprocessor-based device incorporating a cache for capturing software performance profiling data |
US6185732B1 (en) | 1997-04-08 | 2001-02-06 | Advanced Micro Devices, Inc. | Software debug port for a microprocessor |
US6584525B1 (en) | 1998-11-19 | 2003-06-24 | Edwin E. Klingman | Adaptation of standard microprocessor architectures via an interface to a configurable subsystem |
US6021500A (en) | 1997-05-07 | 2000-02-01 | Intel Corporation | Processor with sleep and deep sleep modes |
US5950120A (en) | 1997-06-17 | 1999-09-07 | Lsi Logic Corporation | Apparatus and method for shutdown of wireless communications mobile station with multiple clocks |
US5931950A (en) | 1997-06-17 | 1999-08-03 | Pc-Tel, Inc. | Wake-up-on-ring power conservation for host signal processing communication system |
US6035374A (en) | 1997-06-25 | 2000-03-07 | Sun Microsystems, Inc. | Method of executing coded instructions in a multiprocessor having shared execution resources including active, nap, and sleep states in accordance with cache miss latency |
US6088786A (en) | 1997-06-27 | 2000-07-11 | Sun Microsystems, Inc. | Method and system for coupling a stack based processor to register based functional unit |
US5878264A (en) | 1997-07-17 | 1999-03-02 | Sun Microsystems, Inc. | Power sequence controller with wakeup logic for enabling a wakeup interrupt handler procedure |
US6760833B1 (en) | 1997-08-01 | 2004-07-06 | Micron Technology, Inc. | Split embedded DRAM processor |
US6226738B1 (en) | 1997-08-01 | 2001-05-01 | Micron Technology, Inc. | Split embedded DRAM processor |
US6026478A (en) | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
JPH1185515A (en) * | 1997-09-10 | 1999-03-30 | Ricoh Co Ltd | Microprocessor |
JPH11143571A (en) | 1997-11-05 | 1999-05-28 | Mitsubishi Electric Corp | Data processor |
US6014743A (en) | 1998-02-05 | 2000-01-11 | Intergrated Device Technology, Inc. | Apparatus and method for recording a floating point error pointer in zero cycles |
US6151672A (en) | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US6374349B2 (en) | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6289417B1 (en) | 1998-05-18 | 2001-09-11 | Arm Limited | Operand supply to an execution unit |
US6308279B1 (en) | 1998-05-22 | 2001-10-23 | Intel Corporation | Method and apparatus for power mode transition in a multi-thread processor |
JPH11353225A (en) | 1998-05-26 | 1999-12-24 | Internatl Business Mach Corp <Ibm> | Memory that processor addressing gray code system in sequential execution style accesses and method for storing code and data in memory |
US6466333B2 (en) * | 1998-06-26 | 2002-10-15 | Canon Kabushiki Kaisha | Streamlined tetrahedral interpolation |
US20020053015A1 (en) | 1998-07-14 | 2002-05-02 | Sony Corporation And Sony Electronics Inc. | Digital signal processor particularly suited for decoding digital audio |
US6327651B1 (en) | 1998-09-08 | 2001-12-04 | International Business Machines Corporation | Wide shifting in the vector permute unit |
US6253287B1 (en) * | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US6240521B1 (en) | 1998-09-10 | 2001-05-29 | International Business Machines Corp. | Sleep mode transition between processors sharing an instruction set and an address space |
US6347379B1 (en) | 1998-09-25 | 2002-02-12 | Intel Corporation | Reducing power consumption of an electronic device |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6862563B1 (en) | 1998-10-14 | 2005-03-01 | Arc International | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US6671743B1 (en) * | 1998-11-13 | 2003-12-30 | Creative Technology, Ltd. | Method and system for exposing proprietary APIs in a privileged device driver to an application |
DE69910826T2 (en) * | 1998-11-20 | 2004-06-17 | Altera Corp., San Jose | COMPUTER SYSTEM WITH RECONFIGURABLE PROGRAMMABLE LOGIC DEVICE |
US6189091B1 (en) | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US6341348B1 (en) * | 1998-12-03 | 2002-01-22 | Sun Microsystems, Inc. | Software branch prediction filtering for a microprocessor |
US6957327B1 (en) | 1998-12-31 | 2005-10-18 | Stmicroelectronics, Inc. | Block-based branch target buffer |
US6826748B1 (en) * | 1999-01-28 | 2004-11-30 | Ati International Srl | Profiling program execution into registers of a computer |
US6477683B1 (en) | 1999-02-05 | 2002-11-05 | Tensilica, Inc. | Automated processor generation system for designing a configurable processor and method for the same |
US6418530B2 (en) * | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6499101B1 (en) * | 1999-03-18 | 2002-12-24 | I.P. First L.L.C. | Static branch prediction mechanism for conditional branch instructions |
US6427206B1 (en) * | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
US6438700B1 (en) | 1999-05-18 | 2002-08-20 | Koninklijke Philips Electronics N.V. | System and method to reduce power consumption in advanced RISC machine (ARM) based systems |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6546481B1 (en) | 1999-11-05 | 2003-04-08 | Ip - First Llc | Split history tables for branch prediction |
US6571333B1 (en) | 1999-11-05 | 2003-05-27 | Intel Corporation | Initializing a memory controller by executing software in second memory to wakeup a system |
US6909744B2 (en) | 1999-12-09 | 2005-06-21 | Redrock Semiconductor, Inc. | Processor architecture for compression and decompression of video and images |
KR100395763B1 (en) * | 2000-02-01 | 2003-08-25 | 삼성전자주식회사 | A branch predictor for microprocessor having multiple processes |
US6412038B1 (en) | 2000-02-14 | 2002-06-25 | Intel Corporation | Integral modular cache for a processor |
JP2001282548A (en) | 2000-03-29 | 2001-10-12 | Matsushita Electric Ind Co Ltd | Communication equipment and communication method |
US6519696B1 (en) | 2000-03-30 | 2003-02-11 | I.P. First, Llc | Paired register exchange using renaming register map |
US20030070013A1 (en) | 2000-10-27 | 2003-04-10 | Daniel Hansson | Method and apparatus for reducing power consumption in a digital processor |
US6948054B2 (en) * | 2000-11-29 | 2005-09-20 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
TW477954B (en) * | 2000-12-05 | 2002-03-01 | Faraday Tech Corp | Memory data accessing architecture and method for a processor |
US20020073301A1 (en) | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US7139903B2 (en) * | 2000-12-19 | 2006-11-21 | Hewlett-Packard Development Company, L.P. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US20020087851A1 (en) * | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US8285976B2 (en) | 2000-12-28 | 2012-10-09 | Micron Technology, Inc. | Method and apparatus for predicting branches using a meta predictor |
US6925634B2 (en) * | 2001-01-24 | 2005-08-02 | Texas Instruments Incorporated | Method for maintaining cache coherency in software in a shared memory system |
US7039901B2 (en) * | 2001-01-24 | 2006-05-02 | Texas Instruments Incorporated | Software shared memory bus |
US6823447B2 (en) | 2001-03-01 | 2004-11-23 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
AU2002238325A1 (en) | 2001-03-02 | 2002-09-19 | Atsana Semiconductor Corp. | Data processing apparatus and system and method for controlling memory access |
JP3890910B2 (en) | 2001-03-21 | 2007-03-07 | 株式会社日立製作所 | Instruction execution result prediction device |
US7010558B2 (en) | 2001-04-19 | 2006-03-07 | Arc International | Data processor with enhanced instruction execution and method |
US7165168B2 (en) | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
US20020194462A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US6886093B2 (en) * | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
US20020194461A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US7165169B2 (en) * | 2001-05-04 | 2007-01-16 | Ip-First, Llc | Speculative branch target address cache with selective override by secondary predictor based on branch instruction type |
GB0112269D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for alignment of floating point significands in a simd array mpp |
GB0112275D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for normalization of floating point significands in a simd array mpp |
JP3805339B2 (en) * | 2001-06-29 | 2006-08-02 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Method for predicting branch target, processor, and compiler |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
US7010675B2 (en) | 2001-07-27 | 2006-03-07 | Stmicroelectronics, Inc. | Fetch branch architecture for reducing branch penalty without branch prediction |
US7191445B2 (en) | 2001-08-31 | 2007-03-13 | Texas Instruments Incorporated | Method using embedded real-time analysis components with corresponding real-time operating system software objects |
US6751331B2 (en) | 2001-10-11 | 2004-06-15 | United Global Sourcing Incorporated | Communication headset |
JP2003131902A (en) * | 2001-10-24 | 2003-05-09 | Toshiba Corp | Software debugger, system-level debugger, debug method and debug program |
US7051239B2 (en) | 2001-12-28 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip |
US20030225998A1 (en) | 2002-01-31 | 2003-12-04 | Khan Mohammed Noshad | Configurable data processor with multi-length instruction set architecture |
US7168067B2 (en) * | 2002-02-08 | 2007-01-23 | Agere Systems Inc. | Multiprocessor system with cache-based software breakpoints |
US7181596B2 (en) | 2002-02-12 | 2007-02-20 | Ip-First, Llc | Apparatus and method for extending a microprocessor instruction set |
US7529912B2 (en) | 2002-02-12 | 2009-05-05 | Via Technologies, Inc. | Apparatus and method for instruction-level specification of floating point format |
US7315921B2 (en) | 2002-02-19 | 2008-01-01 | Ip-First, Llc | Apparatus and method for selective memory attribute control |
US7328328B2 (en) | 2002-02-19 | 2008-02-05 | Ip-First, Llc | Non-temporal memory reference control mechanism |
US7546446B2 (en) | 2002-03-08 | 2009-06-09 | Ip-First, Llc | Selective interrupt suppression |
US7395412B2 (en) | 2002-03-08 | 2008-07-01 | Ip-First, Llc | Apparatus and method for extending data modes in a microprocessor |
US7155598B2 (en) | 2002-04-02 | 2006-12-26 | Ip-First, Llc | Apparatus and method for conditional instruction execution |
US7185180B2 (en) | 2002-04-02 | 2007-02-27 | Ip-First, Llc | Apparatus and method for selective control of condition code write back |
US7373483B2 (en) | 2002-04-02 | 2008-05-13 | Ip-First, Llc | Mechanism for extending the number of registers in a microprocessor |
US7302551B2 (en) | 2002-04-02 | 2007-11-27 | Ip-First, Llc | Suppression of store checking |
US7380103B2 (en) | 2002-04-02 | 2008-05-27 | Ip-First, Llc | Apparatus and method for selective control of results write back |
US7380109B2 (en) | 2002-04-15 | 2008-05-27 | Ip-First, Llc | Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor |
US20030204705A1 (en) * | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
KR100450753B1 (en) | 2002-05-17 | 2004-10-01 | 한국전자통신연구원 | Programmable variable length decoder including interface of CPU processor |
US6938151B2 (en) * | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US7493480B2 (en) | 2002-07-18 | 2009-02-17 | International Business Machines Corporation | Method and apparatus for prefetching branch history information |
US7000095B2 (en) | 2002-09-06 | 2006-02-14 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
US20050125634A1 (en) * | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US6968444B1 (en) | 2002-11-04 | 2005-11-22 | Advanced Micro Devices, Inc. | Microprocessor employing a fixed position dispatch unit |
US7266676B2 (en) | 2003-03-21 | 2007-09-04 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets utilizing tag and data arrays |
US20040193855A1 (en) * | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US7174444B2 (en) * | 2003-03-31 | 2007-02-06 | Intel Corporation | Preventing a read of a next sequential chunk in branch prediction of a subject chunk |
US7590829B2 (en) | 2003-03-31 | 2009-09-15 | Stretch, Inc. | Extension adapter |
US20040225870A1 (en) | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US7010676B2 (en) * | 2003-05-12 | 2006-03-07 | International Business Machines Corporation | Last iteration loop branch prediction upon counter threshold and resolution upon counter one |
US20040255104A1 (en) * | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US7668897B2 (en) | 2003-06-16 | 2010-02-23 | Arm Limited | Result partitioning within SIMD data processing systems |
US7783871B2 (en) * | 2003-06-30 | 2010-08-24 | Intel Corporation | Method to remove stale branch predictions for an instruction prior to execution within a microprocessor |
US7373642B2 (en) | 2003-07-29 | 2008-05-13 | Stretch, Inc. | Defining instruction extensions in a standard programming language |
US20050027974A1 (en) * | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
JP2005078234A (en) * | 2003-08-29 | 2005-03-24 | Renesas Technology Corp | Information processor |
US7237098B2 (en) * | 2003-09-08 | 2007-06-26 | Ip-First, Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050066305A1 (en) * | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
KR100980076B1 (en) * | 2003-10-24 | 2010-09-06 | 삼성전자주식회사 | System and method for branch prediction with low-power consumption |
US7363544B2 (en) | 2003-10-30 | 2008-04-22 | International Business Machines Corporation | Program debug method and apparatus |
US7219207B2 (en) | 2003-12-03 | 2007-05-15 | Intel Corporation | Reconfigurable trace cache |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US8607209B2 (en) | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
US20050216713A1 (en) * | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US7281120B2 (en) | 2004-03-26 | 2007-10-09 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050223202A1 (en) * | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
US20060015706A1 (en) * | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
TWI305323B (en) * | 2004-08-23 | 2009-01-11 | Faraday Tech Corp | Method for verification branch prediction mechanisms and readable recording medium for storing program thereof |
-
2005
- 2005-05-19 US US11/132,448 patent/US20050289323A1/en not_active Abandoned
- 2005-05-19 US US11/132,424 patent/US8719837B2/en active Active
- 2005-05-19 WO PCT/US2005/017586 patent/WO2005114441A2/en active Application Filing
- 2005-05-19 TW TW094116302A patent/TW200602974A/en unknown
- 2005-05-19 US US11/132,447 patent/US20050278505A1/en not_active Abandoned
- 2005-05-19 GB GB0622477A patent/GB2428842A/en not_active Withdrawn
- 2005-05-19 US US11/132,423 patent/US20050278513A1/en not_active Abandoned
- 2005-05-19 US US11/132,432 patent/US20050273559A1/en not_active Abandoned
- 2005-05-19 US US11/132,428 patent/US20050278517A1/en not_active Abandoned
- 2005-05-19 CN CNA2005800215322A patent/CN101002169A/en active Pending
-
2014
- 2014-03-21 US US14/222,194 patent/US9003422B2/en active Active
Patent Citations (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4594659A (en) * | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
US5148532A (en) * | 1987-12-25 | 1992-09-15 | Hitachi, Ltd. | Pipeline processor with prefetch circuit |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
US5317701A (en) * | 1990-01-02 | 1994-05-31 | Motorola, Inc. | Method for refilling instruction queue by reading predetermined number of instruction words comprising one or more instructions and determining the actual number of instruction words used |
US6948052B2 (en) * | 1991-07-08 | 2005-09-20 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5696958A (en) * | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
US5642500A (en) * | 1993-11-26 | 1997-06-24 | Fujitsu Limited | Method and apparatus for controlling instruction in pipeline processor |
US6038649A (en) * | 1994-03-14 | 2000-03-14 | Texas Instruments Incorporated | Address generating circuit for block repeat addressing for a pipelined processor |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5692168A (en) * | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
US5920711A (en) * | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US6292879B1 (en) * | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US5996071A (en) * | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
US5909566A (en) * | 1996-12-31 | 1999-06-01 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods for speculatively executing an instruction using its most recently used data while concurrently prefetching data for the instruction |
US5808876A (en) * | 1997-06-20 | 1998-09-15 | International Business Machines Corporation | Multi-function power distribution system |
US20040068643A1 (en) * | 1997-08-01 | 2004-04-08 | Dowling Eric M. | Method and apparatus for high performance branching in pipelined microsystems |
US6157988A (en) * | 1997-08-01 | 2000-12-05 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US5978909A (en) * | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6044458A (en) * | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6560754B1 (en) * | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6550056B1 (en) * | 1999-07-19 | 2003-04-15 | Mitsubishi Denki Kabushiki Kaisha | Source level debugger for debugging source programs |
US6609194B1 (en) * | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6681295B1 (en) * | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718460B1 (en) * | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US6963554B1 (en) * | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US6823444B1 (en) * | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US6718504B1 (en) * | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US6774832B1 (en) * | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US20050138607A1 (en) * | 2003-12-18 | 2005-06-23 | John Lu | Software-implemented grouping techniques for use in a superscalar data processing system |
US20050204121A1 (en) * | 2004-03-12 | 2005-09-15 | Arm Limited | Prefetching exception vectors |
US20050273559A1 (en) * | 2004-05-19 | 2005-12-08 | Aris Aristodemou | Microprocessor architecture including unified cache debug unit |
US20050278513A1 (en) * | 2004-05-19 | 2005-12-15 | Aris Aristodemou | Systems and methods of dynamic branch prediction in a microprocessor |
US20050278517A1 (en) * | 2004-05-19 | 2005-12-15 | Kar-Lik Wong | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
US20050289321A1 (en) * | 2004-05-19 | 2005-12-29 | James Hakewill | Microprocessor architecture having extendible logic |
US20050289323A1 (en) * | 2004-05-19 | 2005-12-29 | Kar-Lik Wong | Barrel shifter for a microprocessor |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050289323A1 (en) * | 2004-05-19 | 2005-12-29 | Kar-Lik Wong | Barrel shifter for a microprocessor |
US8719837B2 (en) | 2004-05-19 | 2014-05-06 | Synopsys, Inc. | Microprocessor architecture having extendible logic |
US9003422B2 (en) | 2004-05-19 | 2015-04-07 | Synopsys, Inc. | Microprocessor architecture having extendible logic |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US20090198905A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Techniques for Prediction-Based Indirect Data Prefetching |
US20090198948A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Techniques for Data Prefetching Using Indirect Addressing |
US8166277B2 (en) | 2008-02-01 | 2012-04-24 | International Business Machines Corporation | Data prefetching using indirect addressing |
US8209488B2 (en) | 2008-02-01 | 2012-06-26 | International Business Machines Corporation | Techniques for prediction-based indirect data prefetching |
US20190012177A1 (en) * | 2017-07-04 | 2019-01-10 | Arm Limited | Apparatus and method for controlling use of a register cache |
US10732980B2 (en) * | 2017-07-04 | 2020-08-04 | Arm Limited | Apparatus and method for controlling use of a register cache |
US11243880B1 (en) * | 2017-09-15 | 2022-02-08 | Groq, Inc. | Processor architecture |
US11263129B1 (en) | 2017-09-15 | 2022-03-01 | Groq, Inc. | Processor architecture |
US11360934B1 (en) | 2017-09-15 | 2022-06-14 | Groq, Inc. | Tensor streaming processor architecture |
US11645226B1 (en) | 2017-09-15 | 2023-05-09 | Groq, Inc. | Compiler operations for tensor streaming processor |
US11822510B1 (en) | 2017-09-15 | 2023-11-21 | Groq, Inc. | Instruction format and instruction set architecture for tensor streaming processor |
US11868250B1 (en) | 2017-09-15 | 2024-01-09 | Groq, Inc. | Memory design for a processor |
US11875874B2 (en) | 2017-09-15 | 2024-01-16 | Groq, Inc. | Data structures with multiple read ports |
US11868908B2 (en) | 2017-09-21 | 2024-01-09 | Groq, Inc. | Processor compiler for scheduling instructions to reduce execution delay due to dependencies |
US11809514B2 (en) | 2018-11-19 | 2023-11-07 | Groq, Inc. | Expanded kernel generation |
US11868804B1 (en) | 2019-11-18 | 2024-01-09 | Groq, Inc. | Processor instruction dispatch configuration |
US11392535B2 (en) | 2019-11-26 | 2022-07-19 | Groq, Inc. | Loading operands and outputting results from a multi-dimensional array using only a single side |
Also Published As
Publication number | Publication date |
---|---|
CN101002169A (en) | 2007-07-18 |
US20050273559A1 (en) | 2005-12-08 |
WO2005114441A2 (en) | 2005-12-01 |
TW200602974A (en) | 2006-01-16 |
US9003422B2 (en) | 2015-04-07 |
US20050289321A1 (en) | 2005-12-29 |
US20140208087A1 (en) | 2014-07-24 |
GB2428842A (en) | 2007-02-07 |
WO2005114441A3 (en) | 2007-01-18 |
US8719837B2 (en) | 2014-05-06 |
US20050278517A1 (en) | 2005-12-15 |
US20050278513A1 (en) | 2005-12-15 |
GB0622477D0 (en) | 2006-12-20 |
US20050289323A1 (en) | 2005-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050278505A1 (en) | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory | |
US8069336B2 (en) | Transitioning from instruction cache to trace cache on label boundaries | |
US7836287B2 (en) | Reducing the fetch time of target instructions of a predicted taken branch instruction | |
JP3182740B2 (en) | A method and system for fetching non-consecutive instructions in a single clock cycle. | |
US6880073B2 (en) | Speculative execution of instructions and processes before completion of preceding barrier operations | |
US7257699B2 (en) | Selective execution of deferred instructions in a processor that supports speculative execution | |
US7444501B2 (en) | Methods and apparatus for recognizing a subroutine call | |
JP2002525742A (en) | Mechanism for transfer from storage to load | |
US7877586B2 (en) | Branch target address cache selectively applying a delayed hit | |
US20090049286A1 (en) | Data processing system, processor and method of data processing having improved branch target address cache | |
EP1849061A2 (en) | Unaligned memory access prediction | |
US6260134B1 (en) | Fixed shift amount variable length instruction stream pre-decoding for start byte determination based on prefix indicating length vector presuming potential start byte | |
US7257700B2 (en) | Avoiding register RAW hazards when returning from speculative execution | |
US7143269B2 (en) | Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor | |
EP3171264A1 (en) | System and method of speculative parallel execution of cache line unaligned load instructions | |
JP2003515214A (en) | Method and apparatus for performing calculations with narrow operands | |
CN106557304B (en) | Instruction fetch unit for predicting the target of a subroutine return instruction | |
US5946468A (en) | Reorder buffer having an improved future file for storing speculative instruction execution results | |
US5915110A (en) | Branch misprediction recovery in a reorder buffer having a future file | |
US7865705B2 (en) | Branch target address cache including address type tag bit | |
JP3683439B2 (en) | Information processing apparatus and method for suppressing branch prediction | |
US20090198985A1 (en) | Data processing system, processor and method of data processing having branch target address cache with hashed indices | |
US20050144427A1 (en) | Processor including branch prediction mechanism for far jump and far call instructions | |
US6219784B1 (en) | Processor with N adders for parallel target addresses calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARC INTERNATIONAL, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIM, SEOW CHUAN;WONG, KAR-LIK;REEL/FRAME:016933/0023 Effective date: 20050721 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |