WO1995022101A1 - Randomly-accessible instruction buffer for microprocessor - Google Patents

Randomly-accessible instruction buffer for microprocessor Download PDF

Info

Publication number
WO1995022101A1
WO1995022101A1 PCT/US1995/001705 US9501705W WO9522101A1 WO 1995022101 A1 WO1995022101 A1 WO 1995022101A1 US 9501705 W US9501705 W US 9501705W WO 9522101 A1 WO9522101 A1 WO 9522101A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
instruction buffer
buffer
bytes
branch
Prior art date
Application number
PCT/US1995/001705
Other languages
French (fr)
Inventor
James A. Kane
Graham B. Whitted, Iii.
Hsiao-Shih Chang
Original Assignee
Meridian Semiconductor, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meridian Semiconductor, Inc. filed Critical Meridian Semiconductor, Inc.
Publication of WO1995022101A1 publication Critical patent/WO1995022101A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3812Instruction prefetching with instruction modification, e.g. store into instruction stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • G06F9/381Loop buffering

Definitions

  • This invention relates to microprocessors.
  • this invention relates to instruction queues and buffers that hold instruction data prior to execution by a microprocessor.
  • Microprocessors commonly use an instruction buffer to queue instruction data prior to execution.
  • the instruction buffer or "queue” is loaded with lines of instruction data (typically comprising multiple instructions per line) that are fetched either from an external memory or a cache m ⁇ y. Individual instructions are read or shifted out of the instruction queue for execution on a first-in-first-out basis.
  • the instruction buffer thereby reduc the number of cache reads that are required as instructions are executed in sequential order. Since cache reads typically require at least one clock cycle to perform, a minimum of one clock cycle is avoided when an instruction can be read .m the instruction queue.
  • the first type is a register-based instruction queue.
  • Register-based instruction queues comprise multiple registers that are connected in a sequential fashion. Instruction data is loaded into the first register of the sequence, and is clocked to the next register in the sequence with successive load operation. Instruction data is thereby shifted in parallel through the instruction queue until the instruction data exits the instruction queue or the queue is flushed.
  • the second type of prior art instruction queue is a memory based instruction queue.
  • Memory-based queues use a read pointer and a write pointer to address specific memory locations of the queue. Data written to a specific location remains at that location until overwritten (i.e., it is not shifted through the queue).
  • the write pointer is automatically incremented to the next location in the queue when a load is performed, and the read pointer is similarly incremented whenever a read from the queue is performed.
  • the read pointer and the write pointer loop back to the beginning of the queue (i.e., address zero) when incremented beyond the highest queue address. Data is thereby written to and read from the queue on a first-in-first-out basis.
  • the present invention relates to an instruction buffer circuit and method that overcomes the above-described limitation in the prior art.
  • instruction buffer will refer to a physical memory array or set of sequentially-connected registers in which fetched instruction data is stored (with the term “buffer” being used rather than “queue” to indicate that instruction data can be accessed out-of-order).
  • instruction buffer circuit will refer to an instruction buffer in combination with control circuitry used for accessing the instruction buffer.
  • branch instruction will refer to any type of macroi ⁇ structio ⁇ that may pass program control to an instruction address that does not immediately follow the subject instruction. Branch instructions include, for example, subroutine calls, conditional jump instructions, and unconditional jump instructions.
  • Tiget branch address will refer to the program address to which control is passed when a branch is taken.
  • Tiget instruction will refer to the instruction that is the target of a branch (i.e., the instruction at the target branch address).
  • the present invention relates to an instruction buffer circuit that can perform a branch to a target instruction in an instruction buffer.
  • the instruction buffer circuit thereby allows program execution to continue without flushing the instruction buffer, and without performing a code-fetch to re-load the instruction buffer.
  • the instruction buffer circuit comprises an instruction buffer.
  • the instruction buffer is preferably in the form of either an addressable memory array or a plurality of sequentially-connected registers.
  • the instruction buffer circuit further comprises a read pointer for selecting a location of the instruction buffer from which to read code data for execution.
  • the instruction buffer circuit further comprises two relative offset circuits.
  • the first relative offset circuit generates a first relative offset value that indicates the number of valid instruction bytes that are currently in the instruction buffer that fall sequentially ahead of the read pointer address.
  • the second relative offset circuit generates a second relative offset value that indicates the number of valid instruction bytes that are currently in the instruction buffer that fall sequentially behind the read pointer address.
  • the instruction buffer circuit further comprises a compare circuit that compares a relative displacement for a relative jump instruction to the first and second relative offset values, to thereby determine whether the target instruction can be read from the instruction buffer.
  • the read pointer is incremented as instruction bytes are read from the instruction buffer for execution, and the instruction buffer acts as a first-in-first-out buffer.
  • the compare circuit compares the relative displacement for the jump instruction to the first and second relative offset values. If the relative displacement is positive, indicating a forward jump in memory, the relative displacement is compared with the first relative offset value, which indicates the number of instruction bytes ahead of the read pointer address. If the positive relative displacement is less than or equal to the first relative offset value, indicating that the target instruction is in the instruction buffer, a flush and corresponding re load of the instruction buffer is inhibited, and the read-pointer is "bumped" to the instruction buffer location that contains the target instruction. On the following clock cycle, the target instruction is read from the instruction buffer and decoded. Thus, the delay normally associated with having to re-load the instruction buffer is eliminated.
  • the relative displacement is compared with the second relative value, which indicates the number of instruction bytes that fall behind the read pointer address. If the negative relative displacement is less than or equal in magnitude to the second relative offset value, indicating that the target instruction is in the instruction buffer, a flush and corresponding re-load of the instruction buffer is similarly inhibited, and the read-pointer is "bumped" to the instruction buffer location that contains the target instruction. On the following clock cycle, the target instruction is read from the instruction buffer and decoded.
  • the compare circuit is temporarily disabled following a write to memory, to thereby ensure that the following jump instruction will cause a code-fetch to be performed. This ensures that an unmodified version of a modified instruction will not be executed from the instruction buffer.
  • Fig. 1 is a simplified block diagram that illustrates an exemplary embodiment of a pipelined microprocessor. The microprocessor shown will be used to describe an instruction buffer circuit in accordance with the present invention.
  • Fig. 2 is a block diagram of one embodiment of an instruction buffer circuit in accordance with the present invention.
  • Fig. 3 is a block diagram of a circuit for generating a BYTES AHEAD relative offset value and a BYTES BEHIND relative offset value for the circuit of Fig. 2.
  • Fig. 4 is a block diagram of a second embodiment of an instruction buffer circuit in accordance with the present invention.
  • Fig. 5 is a block diagram of a circuit for generating a DWORDS AHEAD relative offset value and a DWORDS BEHIND relative offset value for the circuit of Fig. 4.
  • Fig. 1 is a high-level block diagram of a pipelined microprocessor 100 that is connected to an external memory 170.
  • the microprocessor 100 shown is an exemplary embodiment of a microprocessor to which the present invention may be applied, and will be used to describe a preferred embodiment of an instruction buffer circuit in accordance with the present invention. It should be understood that the present invention is equally applicable to microprocessors other than the one that will be described herein. Specific widths of busses of the microprocessor 100 are indicated in Fig. 1 where helpful to understanding the preferred embodiment of the instruction buffer circuit that will be described.
  • the microprocessor 100 includes an execution/addressing unit 110, an instruction control unit (ICU) 120, and a cache/bus unit 130.
  • ICU instruction control unit
  • the ICU 120 has an instruction buffer 124, a 32-bit advanced instruction pointer (AIP) register 122, an instruction decode circuit 126, and a fetch unit 128.
  • the AIP register 122 is connected to the instruction buffer 124 by a bus 123.
  • the instruction buffer 124 is connected to the instruction decode circuit 126 by a bus 125.
  • the cache/bus unit 130 has a cache memory (cache) 132.
  • the execution/addressing unit 110 is connected to the ICU 120 by a jump-address (JMP ADDR) bus 140, a 32-bit immediate operand bus 142, a displacement (DISP) bus 144, and a micro-instruction ( ⁇ - INSTRUCTION) bus 146.
  • the execution/addressing unit 110 is connected to the bus/cache unit 130 by a data bus 148 and a 32-bit address bus 152.
  • the ICU 120 is connected to the bus/cache unit 130 by the microinstruction bus 146, a 64-bit code data bus 130, and two data available (DAV) lines 151.
  • the bus/cache unit 130 is connected to an external memory 170 by an address/control (ADDR/CNTRL) bus 160 and a data bus 162.
  • the execution/addressing unit 110 on memory access command cycles the execution/addressing unit 110 generates an address for performing either a code-fetch (i.e., a read of instruction data) or an operand access (i.e., a read or write of operand data).
  • the address generated by the execution/addressing unit 110 is provided to the bus/cache unit 130 on the 32-bit address bus 152.
  • the ICU 120 provides a corresponding memory access command (i.e., a fetch request or an operand access request) to the bus/cache unit 130 on the microinstruction bus 146.
  • the memory access command is in the form of a microinstruction field that specifies the access type (i.e., code-fetch, operand read, operand write), and certain parameters for performing the access.
  • the bus/cache unit 130 performs the memory access command by accessing the cache 132 and/or performing an access on the external busses 160, 162. For operand accesses, operand data is passed between the bus/cache unit 130 and the execution/addressing unit 110 on the data bus 148. For code-fetches (hereinafter “fetches”), the bus/cache unit 130 returns the requested instruction data on the code data bus 150.
  • the bus/cache unit 130 places either 4 bytes or 8 bytes of code data on the code data bus 150.
  • the data available (DAV) lines 151 indicate on each clock cycle whether 0, 4 or 8 bytes of code data are being transferred, with a high value on DAV[1] indicating a valid 4-byte value on the upper 32 bits of the code data bus 150, and high value on DAV[0] indicating a valid 4-byte value on the lower 32 bits of the code data bus 150.
  • Code data returned on the bus 150 is placed in the instruction buffer 124.
  • the instruction buffer 124 holds the code data until it is either overwritten or the instruction buffer 124 is flushed.
  • the least significant five bits of the AIP (advanced instruction pointer) 122 are used as a read pointer for reading instruction data from the instruction buffer 124, as will be described in detail with reference to Fig. 2. Instructions referenced by the AIP 122 are read from the instruction buffer 124 and are passed to the instruction decode circuit 126.
  • the instruction decode circuit 126 decodes individual instructions and generates microinstructions on the bus 146.
  • the microprocessor 100 has a variable-length instruction format, which includes an opcode of either 1 or 2 bytes, an immediate operand data field of 0, 1, 2 or 4 bytes, and a displacement field of 0, 1, 2 or 4 bytes.
  • Circuitry (Fig. 2) of the ICU 120 extracts any immediate operand data and displacement data included within each instruction. Immediate operand data is passed to the execution/addressing unit 110 on the bus 142. Displacement data is passed to the execution/addressing unit 110 on the displacement (DISP) bus 144. The displacement values are used by the execution/addressing unit 110 to generate addresses on the bus 152.
  • Branch short relative instructions are characterized by a one-byte opcode and a one-byte displacement field (DISP[7:0]>.
  • This displacement field specifies a relative displacement value that can range from -128 to + 127 (with DISP[7] acting as a sign bit and negative numbers being represented in standard two's complement format).
  • This relative displacement value specifies the target branch address relative to the next sequential instruction.
  • a jump (JMP) short relative instruction with DISP[7:0] - 15 10 specifies a jump forward in memory by 15 byte locations relative to the next sequential instruction.
  • a variety of branch-short-relative-type instructions are included within the instruction set of the microprocessor 100.
  • the microprocessor 100 has a JMP short relative absolute instruction and a jump short relative conditional instruction.
  • the microprocessor 100 also has a loop short relative instruction that causes the microprocessor 100 to decrement a count and then perform a short jump if certain conditions are met.
  • the branch short relative instruction is the only type of branch instruction that can potentially reference a target instruction that is in the instruction buffer 124.
  • the fetch unit 128 controls the generation of fetch requests. Whenever a fetch request is generated, the ICU 120 provides a displacement value on the bus 144. The execution/addressing unit 110 uses this displacement value to generate a fetch address on the bus 152.
  • the fetch unit 128 generates two types of fetch requests.
  • the first type is a pre fetch request, wherein the next sequential 16-byte line of code in the memory 170 (relative to the line of code currently being executed) is requested.
  • the second type is a branch fetch request.
  • a branch fetch request may be generated if the microprocessor 100 takes a branch (as the result of a call, interrupt, conditional jump instruction, unconditional jump instruction, loop instruction, etc.) that requires execution to begin at a new program address.
  • the bus/cache unit 130 is designed to perform all pre-fetches as 16-byte aligned reads from the memory 170 (or the cache 132). Thus, for example, if the execution/addressing unit 110 issues a pre-fetch address of 00005553- 6 on the bus 152, the bus/cache unit 130 will return the 16 instruction bytes from 00005550 1B to 0000555F, 6 in the memory 170.
  • the bus/cache unit 130 is designed to perform branch fetches in a slightly different manner. If the branch fetch address (i.e., the target branch address) falls in the first doubleword (i.e. four bytes) of a 16-byte line, the bus/cache unit 130 returns the entire 16-byte line.
  • the bus/cache unit 130 returns the second, third and fourth doubiewords of the line. If the branch fetch address falls in the third doubleword of a 16-byte line, the bus/cache unit 130 returns the third and fourth doubiewords of the line. And if the branch fetch address falls in the fourth doubleword of a 16-byte line, the bus/cache unit 130 returns only the fourth doubleword of the line. 5 As will be described in detail, :ne present invention inhibits the generation of a branch fetch request on a branch short relative instruction if the target instruction can be read from the instruction buffer 124.
  • prior art microprocessors flush the instruction buffer and generate a branch fetch request whenever a program branch is taken, even if the target instruction is in the instruction buffer
  • the present invention solves this problem by including a mechanism to inhibit the generation of an instruction buffer flush and a corresponding branch fetch request when a target instruction can be read from the instruction buffer 124.
  • the microprocessor 100 handles relative branch instructions as follows. During
  • the instruction decode circuit 126 determines whether a branch will be taken as a result of the branch instruction. If the decode circuit 126 determines that a branch will be taken, the AIP 122 is incremented or decremented by adding the displacement (DISP) value for the instruction to the AIP 122. The five least significant bits of the AIP register 122 (which serve as the r d pointer) are thereby "bumped" to point to the instruction buffer 124 location from which the target
  • a comparison circuit determines whether the target instruction can be read from the instruction buffer 124 on the following clock cycle. If so, a "hit" signal is generated which inhibits the generation of a flush and corresponding branch fetch. The target instruction is then read from the instruction buffer 124 on the following clock cycle, and execution
  • the ICU 120 flushes the instruction buffer 124
  • the ICU 120 also provides a displacement value to the execution/addressing unit 110 on the displacement bus 144.
  • the execution/addressing unit 110 uses the displacement value to calculate the branch fetch address for performing the branch fetch.
  • the bus/cache unit 130 uses the branch fetch address to perform a fetch to re-load the instruction buffer 124. Once the instruction buffer 124 is re-loaded with the target instruction, execution resumes.
  • the branch address is read from the memory 170 (or cache 132).
  • "Return” and "jump indirect” are examples of instructions that require such a memory access.
  • the execution/addressing unit 1 10 provides the jump address to the ICU 120 on the jump address (JMP ADDR) bus 140, and the ICU 120 loads the jump address into the AIP 122.
  • a flush and corresponding branch fetch request are always generated for this type of branch instruction.
  • a flush and branch fetch are also generated whenever an absolute jump instruction is executed.
  • the ability of the instruction buffer circuit to effect a jump within the instruction buffer 124 is temporarily disabled following a memory write operation and until the next flush of the instruction buffer 124 occurs. This ensures that the instruction buffer 124 will be flushed and re-loaded upon the first branch that follows a memory write.
  • Figs. 2 and 3 illustrate a preferred embodiment of the instruction buffer circuit. The circuit shown in Fig. 2 will initially be described.
  • the instruction buffer 124 is in the form of a 32-byte addressable memory array arranged as four lines of eight bytes each. As will be described with reference to Fig. 4, the instruction buffer 124 may alternatively be in the form of a plurality of registers sequentially connected in parallel.
  • the code data bus 150 is connected as a data input to the instruction buffer 124.
  • the output of a 5-bit write pointer (WR PTR) register 200 is connected as a first address input to the instruction buffer 124 by a three- bit bus 201.
  • the three lines of the bus 201 correspond to WR PTR[4:2] (i.e., the three most significant bits of the write pointer 200).
  • a bus 123 is connected to the output lines of the AIP register 122.
  • the bit lines AIP[4:3] on the bus 123 are connected as a second address input to the instruction buffer 124.
  • the data output of the instruction buffer 124 is connected to a byte barrel shifter 202 by a 64-bit bus 204.
  • the byte barrel shift 202 rotates the data appearing on the bus 204 to the right by N bytes, where N may range from 0 to 7.
  • the number N is specified by the bits lines AIP[2:0] of the bus 123.
  • the byte barrel shifter 202 can be implemented using a combination of multiplexers.
  • the output of the byte barrel shifter 202 is connected by the 64-bit bus 125 to a displacement (DISP) register 206, an immediate (IMMED) register 208, and the instruction decode circuit 126.
  • the outputs of the DISP register 206 and the IMMED register 208 are connected to the execution/addressing unit 110 (Fig. 1 ) by the busses 144 and 142 respectively.
  • the instruction decode circuit 126 has four outputs that are shown.
  • the first output is the microinstruction bus 146 of Fig. 1.
  • the second output is a branch short (BRSHRT) signal line 212 which becomes active during a decode cycle for a branch snort relative instruction if a branch will be taken.
  • BRSHRT branch short
  • the third output is a branch (BR) output line 213 that goes high whenever a branch occurs as the result of something other than a branch short relative instruction, or example, the BR line 213 will go high upon the decode of a branch that results from an absolute jump instruction, a call instruction, a return instruction, or a relative jump instruction that uses a displacement value of two or more bytes.
  • the BR line 213 thus goes
  • the fourth output is a BYTES USED bus 210, that indicates the number of instruction bytes used during the current clock cycle.
  • the BYTES USED bus 210 is connected as a first data input to a multiplexer 216.
  • the displacement (DISP) bus 144 is connected as a second data input to the multiplexer 216.
  • the 15 216 has a control input that is connected to the output of an OR gate 217 by a line 218.
  • the OR gate 217 has a first input connected to the BRSHRT line 212 and a second input connected to the BR line 213.
  • the output of the multiplexer 216 is connected as a first input to a binary adder 220 by an 8-bit bus 222.
  • the bus 123 is connected as the second input to the adder 220.
  • the output of the adder 220 is connected as an input to a register 225 by a bus 224.
  • the output of the register 225 is connected as a first data
  • the jump address (JMP ADDR) bus 140 (Fig. 1) is connected as a second data input to the multiplexer 226.
  • the multiplexer 226 has a control input that is connected to a jump address (JA) signal line 227 that becomes active when the execution/addressing unit 110 provides a jump address to the ICU 120 on the JMP ADDR bus 140.
  • the output of the multiplexer 226 is connected as a data input to the AIP register 122 by a bus 229.
  • the lines DISP[7:0] of the bus 144 are connected as a first data input to a comparator 240.
  • the second data input to the comparator 240 is connected to a BYTES BEHIND circuit (Fig. 3) by a 5-bit bus 242.
  • the BYTES BEHIND circuit provides a relative offset value BYTES BEHIND T+ , on the bus 242.
  • the value BYTES BEHIND is equal to the number of valid instruction bytes in the instruction buffer 124 that currently fall behind the instruction byte referenced
  • the value BYTES BEH1ND T+ is thus equal to the number of valid bytes that will exist behind AIP[4:0] on the following clock cycle T+ 1.
  • the comparator 240 has an enable input that is connected to the BRSHRT line 212.
  • the lines DISP[7:0] of the bus 144 are also connected as a first data input to a comparator 250.
  • the second data input to the comparator 250 is connected to a BYTES AHEAD circuit (Fig. 3) by a 5-bit bus 252.
  • the bus 252 specifies a value BYTES AHEAD T+ selfish which is equal to the number of valid instruction bytes that will fall ahead of AIP[4:0] in the instruction buffer 124 on the following clock cycle T+ 1.
  • the comparator 250 has an enable input that is connected to the BRSHRT line 212.
  • the output of the comparator 240 is connected as a first input to an OR gate 260 by a HITA signal 5 line 246.
  • the output of the comparator 250 is connected as a second input to the OR gate 260 by a HITB signal line 256.
  • the output of the OR gate 260 is connected as a first input to a NAND gate 270 by a line 272.
  • the second input to the NAND gate 270 is connected to the output (Q) of an R-S flip-flop 274 by a line 276.
  • the set (S) input of the R-S flip-flop 274 is connected to the output of an OR gate 291 by a flush/branch-fetch (FLSH/BR-FETCH) signal line 278.
  • the reset (R) input of the flip-flop 274 is connected to 10 a MEM WRITE signal line 279 that goes high whenever a write to the memory 170 (Fig. 1) and/or the cache 132 occurs.
  • the output of the NAND gate 270 is connected as a first input to an AND gate 290 by a HIT signal line 288.
  • the BRSHRT signal line 212 is connected as the second input to the AND gate 290.
  • the output of the AND gate 290 is connected as a first input to an OR gate 294 by a line 292.
  • the FLSH/BR-FETCH signal line 278 is connected to the fetch unit 128 (Fig. 1).
  • the FLSH/BR-FETCH signal line 278 is also connected to the circuit shown in Fig. 3.
  • the FLSH/BR-FETCH signal line 278 is also connected as an input to a flip-flop 298.
  • the output of the flip-flop 298 is connected to a BROUT signal line 299.
  • code data is provided to the instruction buffer 124 on the code data bus 150 either four bytes at-a-time or eight bytes at-a-time. Thus, on a given clock cycle, 0, 4 or 8 bytes may be loaded into the instruction buffer 124. Instructions fetched from byte locations in the memory 170 are loaded into corresponding byte locations of the instruction buffer 124, with the five least significant bits of the memory 170 address specifying the
  • Instructions are loaded with the opcode byte (or bytes) falling at the lowest byte address in the instruction buffer 124. For example, a three-byte instruction loaded into the BYTEO, BYTE1 and BYTE2 locations in LINEO will have an opcode at the BYTEO location of LINEO. Instructions can fall on any byte boundary in the instruction buffer 124, and can fall across one or more of the 8-byte lines LINE0-LINE3.
  • the write pointer 200 is a 5-bit register that provides a write address for loading the instruction buffer 124. Loads are performed using WR PTR[4:2] (i.e., write pointer bits 4, 3 and 2) as the write address. All loads of the instruction buffer 124 are thus performed on four-byte boundaries (i.e., all loads start at either BYTEO or BYTE4 of one of the four lines LINE0-LINE3). As loads are performed, the write pointer 200 is incremented by the number of bytes being loaded (either four or eight), as indicated by the DAV lines 151 (Fig. 1). The write pointer 200 automatically loops back to LINEO when the 5-bit value is incremented beyond its maximum value of 31 , 0 .
  • Data written to the instruction buffer 124 may be read out until either the data is overwritten or the instruction buffer 124 is flushed. If a branch occurs that causes the instruction buffer 124 to be flushed, the write pointer 200 is automatically loaded with the five least significant bits of the target branch address.
  • the circuitry for incrementing and loading the write pointer is omitted to simplify the figure.
  • the five least significant bits of the 32-bit AIP register 122 are used as a read pointer for reading instructions from the instruction buffer 124. All reads are performed as 8-byte aligned accesses, with the bits AIP[4:3] used to address one of the four 8-byte lines LINE0-LINE3. Whenever a read is performed, the 8 bytes of instruction data read from the addressed line are passed through the byte barrel shifter 202. The byte barrel shifter 202 rotates the 8 bytes of instruction data such that the opcode (if any) of the next instruction to be executed falls in the right-most (i.e., least significant) byte position on the bus 125.
  • the byte barrel shifter 202 thereby aligns the instructions so that the opcode, immediate, and displacement fields can be extracted. Displacement and immediate fields, if any, are loaded into the registers 206 and 208 respectively. In the preferred embodiment, 32-bit shifters (not shown) are also used to align the displacement and immediate values before the values are loaded into the registers 206, 208. Opcodes (and other fields that require decoding) are passed to the instruction decode circuit 126.
  • the instruction decode circuit 126 On every clock cycle the instruction decode circuit 126 generates a BYTES USED value on the bus 210.
  • the BYTES USED value may range from 0 to 8, and indicates the number of bytes of the 8-byte line that are used during the current clock cycle. For example, if the next instruction to be executed is five bytes long, and the first two instruction bytes fall in the line currently referenced by AIP[4:3], BYTES USED will be 2 for the current clock cycle.
  • the multiplexer control line 218 is low, causing the multiplexer 216 to select the BYTES USED bus 210.
  • the BYTES USED value selected by the multiplexer 216 is added to the current AIP value on the bus 123 by the adder 220.
  • the output of the adder 220 is clocked into the register 225 (clock not shown).
  • AIP T+1 - AIP T + BYTES USED T the instruction buffer 124 acts as a first-i ⁇ -first-out (FIFO) buffer (i.e., a queue) during sequential program 5 execution.
  • FIFO first-i ⁇ -first-out
  • the instruction decode circuit 126 When a branch instruction that causes a branch is decoded, the instruction decode circuit 126 asserts either the BRSHRT signal line 212 or the BR signal line 213, depending upon the type branch instruction decoded. If the branch instruction is a relative branch instruction, the DISP bus 144 specifies the relative displacement value for performing the jump. If the branch instruction is an absolute jump
  • the DISP bus 144 specifies the absolute target branch address to which the jump will be performed.
  • the high value on either the BRSHRT line 212 or the BR line 213 causes the multiplexer select line 218 to go high, thereby causing the multiplexer 216 to select the DISP bus 144. If the branch instruction is a relative (long or short) branch instruction, the adder 220 adds the displacement value on the DISP bus
  • the adder 220 passes the displacement value appearing on the bus 222 through to its output without adding it to the current AIP value on the bus 123 (circuitry for implementing pass-through function of the adder 220 not shown), in either case, the output of the adder 220 is clocked into the register 225. 0 On the following clock cycle the multiplexer 226 selects either the output of the register 225 or the value (if any) on the JMP ADDR bus 140.
  • the jump address (JA) line 227 will be high, causing the multiplexer 226 to select the JMP ADDR bus 140 as the source of the next AIP value. Otherwise, the multiplexer 226 will select the output of the register 225 as the next AIP value. 5
  • the five least significant bits of the value loaded into the AIP register 122 following a branch specify the instruction buffer 124 address where the target instruction will be read from. If the branch is to an instruction not currently in the instruction buffer 124, the instruction buffer 124 must be flushed and re-loaded before the target instruction can be read from the instruction buffer and executed. If the branch is a relative short branch to an instruction that can be read from the instruction buffer 124 on the following
  • the target instruction is or will be in the instruction buffer 124, and the internal jump mechanism is currently enabled
  • the 8-byte line containing the target instruction is read from the instruction buffer 124 on the following clock cycle and the target instruction is executed.
  • a comparison circuit (comprising the comparators 240 and 250 and the OR gate 260) i: :r. ⁇ to determine whether or not the target instruction can be read from the instruction buffer 124.
  • Th mparators 240 and 250 compare DISP[7:0] to the BYTES BEHIND and BYTES AHEAD values respectively.
  • the comparators 240 and 250 compare DISP[7:0] to BYTES BEHIND T+1 and BYTES AHEAD T+ r The comparisons thus take into account any code data that is loaded into the instruction buffer 124 during the current clock cycle (which may include the target instruction, or may overwrite the target instruction in the instruction buffer 124).
  • the comparator 240 generates a signal HITA on the line 246 according to the following logic equation:
  • HITA BRSHRT and DISP[7] and
  • the HITA line 246 becomes active if a branch short relative instruction is decoded that has a negative relative displacement value that is less than or equal (in magnitude) to BYTES BEHII ⁇ ID T+1 . Since BYTES BEHIND T+ , is equal to the number of valid bytes that will be in the instruction buffer 124 on the following cycle that will fall behind the address AIP[4:0] T+1 , the HITA signal line 246 will go high if a backward branch is taken to a target instruction that can be read from the instruction buffer 124.
  • the HITB line 256 becomes active if a branch short relative instruction is decoded that has a positive relative displacement value that is less than or equal to BYTES AHEAD T+1 . Since BYTES AHEAD T+1 is equal to the number of valid bytes that will be in the instruction buffer 124 on the following cycle that will fail ahead of the address AIP[4:0] T+1 , the HITB signal line 256 will go high if a forward branch is taken to a target instruction that can be read from the instruction buffer 124.
  • the HIT signal 288 will go low if either the HITA or the HITB signal line goes high.
  • the HIT signal line 288 ; . . .. low only if a relative short jump is taken to an instruction that can be read from the instruction buffer 124.
  • the output of the AND gate 290 on the line 292 goes high on clock cycles for which a jump short relative instruction that causes a branch is decoded and no hit occurs.
  • the FLSH/BR-FETCH signal line 278 thus goes high if either a non-branch-short-relative branch is decoded, or a branch short relative branch is decoded to a target instruction that cannot be read from the instruction buffer 124.
  • the FLSH/BR- FETCH signal line 278 goes high whenever a branch is taken to a target instruction that cannot be read from the instruction buffer 124.
  • the FLSH/BR-FETCH signal line 278 is connected to the fetch unit 128 (Fig. 1), and initiates a branch fetch to re-load the instruction buffer 124 when high.
  • a high value on the FLSH/BR- FETCH signal line 278 also causes the BYTES AHEAD and BYTES BEHIND offset values to be reset, as discussed below with reference to Fig. 3.
  • a high value on the FLSH/BR-FETCH signal line 278 also generates a flush of the instruction buffer 124.
  • the FLSH/BR-FETCH signal on the line 278 is delayed by one clock cycle by the flip-flop 298 (clock not shown) to produce the branch outside (BROUT) signal on the line 299.
  • the BROUT signal line thus goes high on the execution cycle for any branch instruction that causes a branch outside the instruction buffer 124.
  • the BROUT signal is used for generating the BYTES AHEAD and BYTES BEHIND relative offsets.
  • This circuit has the purpose of preventing branches within the instruction buffer 124 if a memory write has been performed and a flush of the instruction buffer 124 has not been performed since the memory write.
  • the circuit thereby allows the microprocessor 100 to implement code modification, wherein writes are performed to the memory 170 (Fig. 1) to modify individual instructions.
  • the MEM WRITE signal line 279 goes high, causing the output line 276 of the R-S flip-flop to go low.
  • the low value on the line 276 masks any HIT signals appearing on the line 272, thereby preventing a jump within the instruction buffer 124.
  • the first jump instruction to follow the memory write causes the FLSH/BR-FETCH line 278 to go high, even if the target instruction is in the instruction buffer 124.
  • the high value on the FLSH/BR-FETCH line 278 causes a flush of the instruction buffer 124, and causes a branch fetch to be generated.
  • the high level on the FLSH/BR-FETCH line 278 also causes the output of the R-S flip flop 274 to go high, to thereby re-enable jumps within the instruction buffer 124.
  • the circuit comprising the R-S flip-flop 274 and the NAND gate 270 is desirable only if the microprocessor to which the present invention is applied supports code modification. It will further be apparent that the MEM WRITE signal line 279 can be appropriately qualified for certain microprocessor designs to reset the flip-flop 274 only upon certain types of write operations that can affect code. For example, since stack operations are not normally used as a means for modifying code, the MEM WRITE signal line 279 can be qualified such that it only becomes active for non- stack memory writes. 6. Circuit for Generating Relative Offset Values
  • the BYTES AHEAD T+1 value is generated as the output of a 3-input adder 300.
  • the output lines 252 of the adder 300 are connected as an input to a register 302.
  • One input of the adder 300 is connected to the output of a multiplexer 304 by a bus 306.
  • the multiplexer 304 has a control input that is connected to the branch outside (BROUT) signal line 299.
  • the multiplexer 304 has a first data input that is connected to a two's complement circuit 310 by a bus 312.
  • the two's complement circuit 310 has an input that is connected to a 5-bit bus 314.
  • Bits 3 and 4 of the bus 314 are tied to zero (i.e., tied low). Bits[2:0] of the bus 314 are connected to bits[2:0] of the bus 229 (Fig. 2), which represent AIP[2:0] T+1 (i.e., the three least significant bits of the next AIP value).
  • the multiplexer 304 has a second data input that is connected to the output of the register 302 by the bus 313.
  • a second input to the adder 300 is connected to the output of a multiplexer 318 by a bus 320.
  • the multiplexer 318 has a control input that is connected to the data available lines DAV[0:1] (Fig. 1).
  • the multiplexer 318 has a first data input of zero on a bus 322 (i.e., all lines of the bus 322 are tied to zero).
  • the multiplexer 318 has a second data input of "4" on a bus 324 (i.e., bits [3:0] - 0100 2 on the bus 324).
  • the multiplexer 318 has a third data input of "8" on a bus 326 (i.e., bits [3:0] - 1000 2 on the bus 326).
  • a third input of the adder 300 is connected to the output of a two's complement circuit 330 by a bus 332.
  • the input of the two's complement circuit 330 is connected to the output of a multiplexer 334 by a bus 336.
  • the multiplexer 334 has a first data input of zero on a bus 328.
  • the multiplexer 334 has a second data input that is connected to the BYTES USED bus 210 (Fig. 2).
  • the multiplexer 334 has a third data input that is connected to a DISP[7:0] T , bus 337, which provides a value equal to DISP[7:0] (on the bus 144 of Figs. 1 and 2) from the previous clock cycle T-1 (circuitry for delaying DISP[7:0] not shown).
  • the multiplexer 334 has a first control input that is connected to a HIT -., signal line 335, which is generated by delaying the HIT signal line 288 (Fig. 2) by one clock cycle (circuitry for delaying the HIT signal not shown).
  • the multiplexer 334 has a second control input connected to the BROUT signal line 299 (Fig. 2).
  • the BYTES BEHIND T+1 value is generated as the output of a 3-input adder 340.
  • the output lines 242 of the adder 340 are connected as an input to a register 342.
  • One input of the adder 340 is connected to the output of the multiplexer 334 by the bus 336.
  • a second input of the adder 340 is connected to the output of a two's complement circuit 346 by a bus 348.
  • the input of the two's complement circuit is connected to the output of a multiplexer 350 by a bus 352.
  • the multiplexer 350 has a first data input of zero on a bus 354, and a second data input of "16" (i.e., 16, 0 ) on a bus 356.
  • the multiplexer 350 has a control input that is connected to a PRE FETCH signal line 358 that goes high for one clock cycle when a pre-fetch is initiated by the bus/cache unit 130 (Fig. 1).
  • a third input of the adder 340 is connected to the output of a multiplexer 362 by a bus 364.
  • the multiplexer 362 has a first data input that is connected to a 5-bit bus 368. Bit[4] of the bus 368 is tied to zero (i.e., tied low), and bits[3:0] of the bus 368 are connected to the JMP ADDR[3:0] lines of the bus 140 (Fig. 1 and 2).
  • the multiplexer 362 has a second data input that is connected to the output of the register 342 by a bus 370.
  • the multiplexer 362 has a control input that is connected to the BROUT signal line 299.
  • the register 302 holds the value BYTES AHEAD T , which is the BYTES AHEAD value for the current clock cycle.
  • the output of the adder 300 is clocked into the register 302 (clock not shown) as the new BYTES AHEAD value.
  • the value on the bus 252 represents the BYTES AHEAD value for the following clock cycle.
  • the adder generates the BYTES AHEAD T+1 value by adding the three values on the busses 306, 320 and 332.
  • the multiplexer 304 selects the feedback path 313. As discussed above, the BROUT signal line 299 will become high only if a branch is taken to a target instruction that cannot be retrieved from the instruction buffer 124. Thus, during all other clock cycles (including execution cycles of branch short relative instructions that cause jumps within the instruction buffer 124), the current BYTES AHEAD value is added in as one of the three components for generating the next BYTES AHEAD value.
  • the multiplexer 304 selects the bus 312, which has a value that is the two's complement of the value 0,0,AIP[2:0] T+1 . The value AIP[2:0] T+1 is thereby subtracted from the values appearing on the busses 320 and 332.
  • the data available lines DAV[1 :0] 151 specify the number of instruction bytes being loaded into the instruction buffer 124 on the current clock cycle.
  • the multiplexer 318 outputs a zero on the bus 320.
  • the multiplexer 318 outputs the value 4 on the bus 320.
  • the multiplexer 318 When the DAV lines 151 indicate that eight instruction bytes are being loaded into the instruction buffer 124 (i.e., DAV[1:0] - 11 2 ), the multiplexer 318 outputs the value 8 on the bus 320. Thus the value on the bus 320 indicates the number of bytes being loaded into the instruction buffer 124 on the current clock cycle.
  • the control lines BROUT 299 and HIT ⁇ . , 355 are used to select between the zero bus 328, the BYTES USED bus 210 and the DISP[7:0] T. , bus 337 according to Table 1.
  • the output of the multiplexer 334 is passed through the two's complement circuit 330 to effect a subtraction of the value selected by the multiplexer 334.
  • signal line 335 is high (inactive), indicating that a hit for a branch short relative instruction did not occur
  • the multiplexer 334 selects the BYTES USED bus 210.
  • the BYTES AHEAD T+ value is generated by subtracting the number of bytes used from the sum of the current BYTES AHEAD value plus the number of bytes being loaded into the instruction buffer 124.
  • the DISP[7:0] T ., bus 337 is selected by the multiplexer 334.
  • the relative displacement DISP[7:0] for the branch short relative instruction being executed is subtracted from the sum of BYTES AHEAD T and the number of bytes being loaded into the instruction buffer 124 on the current clock cycle. If the relative displacement DISP[7:0] is positive (indicating a forward jump in the instruction buffer
  • the next BYTES AHEAD value decreases as a result of the jump, indicating that fewer instructions will be ahead of the read pointer AIP[4:0] following the jump. If the relative displacement DISP[7:0] is negative (indicating a backward jump in the instruction buffer 124), the next BYTES AHEAD value increases as a result of the jump, indicating that a greater number of instructions will be ahead of the read pointer AIP[4:0] following the jump.
  • the multiplexer 334 outputs a zero.
  • the multiplexer 304 outputs the two's complement of AIP[2:01 ⁇ +1 .
  • the new BYTES AHEAD value is generated by subtracting AIP[2:0] T+1 from the number of instruction bytes being loaded into the instruction buffer 124.
  • the BYTES AHEAD value will initially be negative, and will become positive as the corresponding branch fetch is performed. Thus, for example, if a branch outside the instruction buffer 124 is taken to a jump address of xxxxxx02 16 (x - "don't care"), and no instruction bytes are loaded during the current cycle, BYTES AHEAD will initially be -2. If 8 bytes of code from the branch fetch are loaded into the instruction buffer 124 on the following clock cycle, the BYTES AHEAD value will be incremented to 6, indicating that six instruction bytes are ahead of the new read pointer value of AIP[4:0] - 00010 2 .
  • the BYTES BEHIND relative offset is similarly calculated as the summation of three components.
  • the multiplexer 362 selects the current BYTES BEHIND value on the bus 370.
  • the current BYTES BEHIND value is used as a first component of the next BYTES BEHIND value if no branch outside the instruction buffer 124 is currently being executed.
  • the multiplexer 362 selects the bus 368.
  • a second component of the addition is provided as the output of the multiplexer 334 on the bus
  • the value selected by the multiplexer 334 is routed to the adder 340 without being passed through the two's complement circuit 330.
  • BYTES AHEAD is decremented by BYTES USED
  • BYTES BEHIND is incremented by BYTES USED.
  • DISP[7:0] is added to BYTES BEHIND.
  • the third component of the addition is provided as the two's complement of the output of the multiplexer 350.
  • the multiplexer 350 selects the "16" bus 356.
  • 16, 0 is subtracted from the current BYTES BEHIND value whenever a pre-fetch is initiated, to indicate that 16 bytes of instruction data in the instruction buffer 124 will be overwritten during the subsequent clock cycles as the pre-fetch is performed.
  • the PRE-FETCH line 358 is low, the zero bus 354 is selected, resulting in a zero on the bus 348.
  • the PRE-FETCH line 358 When the BROUT signal line 299 is high, indicating a branch to a target instruction outside the instruction buffer 124, the PRE-FETCH line 358 will be low (i.e., pre-f etches are blocked when jumps outside the instruction buffer 124 occur). Thus, the next BYTES BEHIND value following the branch will be AIP[3:0] T+1 .
  • the new BYTES BEHIND vaiue will be 5, indicating that 5 of the 16 instruction bytes (in the 16-byte line to be fetched from xxxxxx10, 6 to xxxxxx1F 16 ) will fall behind the new read pointer vaiue of AIP[4:0] - 15 16 . Since this new BYTES BEHIND value represents the number of bytes that will fall behind the read pointer once the fetch data has been loaded into the instruction buffer 124, logic (not shown) is included to inhibit the BYTES BEHIND comparison until the last clock cycle of the branch fetch.
  • FIG. 4 An alternative embodiment of the instruction buffer circuit of Fig. 2 will now be described with reference to Fig. 4.
  • the first variation involves the use of a register-based instruction buffer ; ⁇ place of the randomiy-accessible-memory based instruction buffer 124 of Fig. 2.
  • the second variation, wh.. can be made independently from the first variation, is the replacement of the BYTES AHEAD and BYTES BEHIND relative offset values w th relative offset values that indicate the numbers of doubiewords (i.e., four- byte values) ahead and behind the instruction currently referenced by the AIP.
  • like reference numbers will be used to refer to elements that are functionally similar to the elements shown in Fig. 2.
  • the instruction buffer 124 is show- 3S a register-based buffer comprising four 8-byte registers 124a, 124b, 124c and 124d.
  • the code data bus 150 is connected to the register 124a.
  • the register 124a is connected to the register 124b by a bus 400.
  • the register 124b is connected to the register 124c by a bus 402.
  • the register 124c is connected to the register 124d by a bus 404.
  • Each least significant byte of the registers 124a-124d is connected as an input to a first 4:1 multiplexer 40Ca.
  • the second byte of each of the registers 124a-124d is connected as an input to a second 4:1 multiplexer (not shown).
  • the third byte of each of the registers 124a-124d is connected as an input to a second 4:1 multiplexer (not shown).
  • the fourth byte of each of the registers 124a-124d is connected as an input to a third 4:1 multiplexer (not shown).
  • the fifth byte of each of the registers 124a-124d is connected as an input to a fourth 4:1 multiplexer (not shown).
  • the sixth byte of each of the registers 124a-124d is connected as an input to a fifth 4:1 multiplexer (noi shown).
  • the seventh byte of each of the registers 124a-124d is connected as an input to a seventh 4:1 multiplexer (not shown).
  • the eighth byte of each of the registers 124a-124d is connected as an input to an eighth 4:1 multiplexer 408h.
  • the outputs of the eight 4:1 multiplexers (e.g., 408a and 408h) are each connected to 8 of the bit lines of the 64-bit bus 204.
  • the 64-bit bus is connected to the byte barrel shifter 202, as in Fig. 2.
  • the multiplexers (e.g., 408a and 408h and the other multiplexers, not shown) each have respective control inputs connected to a control logic circuit 409 by respective pairs of select lines (e.g., 410a, 410b and 41 Oh).
  • the control logic circuit 409 has a first input that is connected to a 3-bit DWORDS AHEAD bus 412.
  • the DWORDS AHEAD bus 412 provides a DWORDS (doubiewords) AHEAD T+ , value that indicates the 5 number of doubiewords that will be in the instruction buffer 124 on the following clock cycle that fall ahead of the doubleword selected by the multiplexers (e.g., 408a and 408h).
  • the control logic circuit 409 has a second input that is connected to the AIP[2:0] lines of the bus 123 (Fig. 2).
  • the control logic circuit 409 has a third input that is connected to the DISP[7:0] lines of the bus 144.
  • the comparator 240 now has a first input that is connected to the DISP[7:2] lines of the DISP bus
  • DWORDS BEHIND bus 416 that provides a DWORDS BEHIMD T+ , relative offset value.
  • the DWORDS BEHIND value is equal to the number of doubiewords in the instruction buffer 124 that are behind the doubleword selected by the multiplexer 408.
  • DWORDS BEHIND T+1 is the DWORDS BEHIND value for the following clock cycle.
  • the comparator 250 now has a first input that is connected to the DISP[7:2] lines of the bus 144, and a second input connected to the DISP[7:2] lines of the bus 144, and a second input connected
  • the instruction buffer circuit of Fig. 4 is otherwise substantially identical to the circuit of Fig. 2.
  • instruction data loaded into the instruction buffer 124 from the code data bus 150 is loaded into the register 124a. With successive load operations the code data is shifted to the next register (124b, 124c and 124d) in sequence. Code data in the register 124d is overwritten in the instruction buffer 124 with the
  • the multiplexers (e.g., 408a and 408h) select eight bytes of contiguous code data from the registers 124a-124d as the source of the code data to be decoded during the current clock cycle.
  • the control logic circuit 409 controls each of the multiplexers (e.g., 408a and 408h) based on the DWORDS AHEAD T+1 , AIP[2:0] and DISP[7:0] values.
  • the control logic circuit 409 keeps track of the current location within the instruction buffer 124 from which instruction data is being read. In the
  • the control logic circuit 409 uses DISP[7:0] value on the bus 144 to determine the registers 124a-124d and the byte locations within the registers from which to perform the next read.
  • the DWORDS AHEADT+ 1 input on the bus 412 allows the control logic 409 to keep track of the status of the instruction buffer 124 (empty, full, partially full, etc).
  • control logic circuit 409 can advantageously be designed to read code data during sequential
  • the comparators 240 and 250 of Fig. 4 compare DISP[7:2] to DWORDS AHEAD T+1 and DWORDS BEHIND-,, to generate the HITA and HITB signals on the lines 246 and 256 respectively according to the following equations:
  • HITB BRSHRT and DISP[7] and (DISP[7:2] ⁇ DWORDS AHEAD T+1 )
  • the HITA and HITB signal lines 246 and 256 are used to generate the FLSH/BR-FLUSH signal ⁇ the line 278 and the BROUT signal on the line 299 in the same manner described above for Fig. 2.
  • Fig. 5 illustrates a circuit for generating the DWORDS AHEAD and DWORDS BEHIND values used by the circuit of Fig. 4.
  • the circuit is identical to the circuit of Fig. 3 with the following exception?
  • the registers 302 and 342 now hold 3-bit DWORDS AHEAD and DWORDS BEHIND values rather t --bit BYTES AHEAD and BYTES BEHIND values.
  • the multiplexer 304 now selects between the output of the register 302 and a zero bus 512.
  • the multiplexer 318 now selects between the zero bus 322, a "one" bus 524 and a "two” bus 526, corresponding to the three possible quantities of doubiewords that may be loaded into the instruction buffer 124 during a given clock cycle.
  • the multiplexer 334 now has a data input that is connected to the output of a doubleword boundary-cross logic circuit 515 by a bus 510.
  • the circuit 515 has a first input that is connected to the AIP[2:0] lines of the bus 123, and a second input that is connected to the BYTES USED bus 210.
  • the circuit 515 outputs a value that is equal to the number of doubleword boundaries crossed during the current cycle. For example, for a current AIP vaiue of xxxxxx02, 6 and a BYTES USED vaiue of 5, the circuit 515 will output a "1 " on the bus 510, indicating that a single doubleword boundary is crossed.
  • the multiplexer 334 now has an input DISP[7:2] T ., on the bus 337.
  • the multiplexer 350 now selects between the zero bus 354 and a "4" bus 556 ("4" corresponding to the number of doubiewords fetched whenever a pre-fetch is performed).
  • the multiplexer 362 now selects between a zero bus 528 and the feedback path 370.
  • the operation of the circuit of Fig. 5 is analogous to the operation of the circuit of Fig. 3.
  • the multiplexer 318 is controlled to add either 1 or 2 to the current DWORDS AHEAD value, corresponding to the number of doubiewords being loaded into the instruction buffer 124.
  • the multiplexer 334 selects the output of the circuit 515 to decrement DWORDS AHEAD and increment DWORDS BEHIND by the number of doubleword boundaries crossed.
  • the muitiplexer 334 selects the bus 337 to subtract DISP[7:2] from the current DWORDS AHEAD vaiue and add DISP[7:2] to the current DWORDS BEHIND value.
  • the multiplexer 350 selects the "4" bus 556 to increment DWORDS BEHIND by four.
  • the multiplexers 304, 318, 334 350 and 362 all select their respective zero busses 512, 322, 328, 354 and 568 to reset DWORDS AHEAD and DWORDS BEHIND to zero.

Abstract

An instruction buffer circuit performs jumps within an instruction buffer (124), thereby eliminating the need to re-load the instruction buffer when the target instruction is in the instruction buffer. The instruction buffer circuit uses relative offset pointer registers (302, 342) that indicate the number of instruction bytes that fall in front of and behind the read pointer (122) address in the instruction buffer (124). When a jump relative instruction is executed, the relative displacement for performing the jump is compared to the relative offset values to determine whether the target instruction is in the instruction buffer (124). If the target instruction is in the instruction buffer (124), a flush and corresponding re-load of the instruction buffer is inhibited, and the read pointer (122) is bumped to the target instruction. The target instruction is thereby read from the instruction buffer (124) without the delay normally associated with having to re-load the instruction buffer.

Description

RANDOMLY ACCESSIBLE INSTRUCTION BUFFER FOR MICROPROCESSOR
BACKGROUND OF THE INVENTION
FIELD PΓ THE INVENTION
This invention relates to microprocessors. In particular, this invention relates to instruction queues and buffers that hold instruction data prior to execution by a microprocessor.
DESCRIPTION OF THE RELATED ART
Microprocessors commonly use an instruction buffer to queue instruction data prior to execution. The instruction buffer or "queue" is loaded with lines of instruction data (typically comprising multiple instructions per line) that are fetched either from an external memory or a cache m ~y. Individual instructions are read or shifted out of the instruction queue for execution on a first-in-first-out basis. The instruction buffer thereby reduc the number of cache reads that are required as instructions are executed in sequential order. Since cache reads typically require at least one clock cycle to perform, a minimum of one clock cycle is avoided when an instruction can be read .m the instruction queue. Also, interference with operand accesses is reduced since instructions can be read from the instruction queue while operand (i.e., non-code data) accesses are performed to cache or external memory. The use of an instruction queue can thus significantly increase the performance of a microprocessor.
Two types of instruction queues are commonly used in existing microprocessor designs. The first type is a register-based instruction queue. Register-based instruction queues comprise multiple registers that are connected in a sequential fashion. Instruction data is loaded into the first register of the sequence, and is clocked to the next register in the sequence with successive load operation. Instruction data is thereby shifted in parallel through the instruction queue until the instruction data exits the instruction queue or the queue is flushed.
The second type of prior art instruction queue is a memory based instruction queue. Memory-based queues use a read pointer and a write pointer to address specific memory locations of the queue. Data written to a specific location remains at that location until overwritten (i.e., it is not shifted through the queue). The write pointer is automatically incremented to the next location in the queue when a load is performed, and the read pointer is similarly incremented whenever a read from the queue is performed. The read pointer and the write pointer loop back to the beginning of the queue (i.e., address zero) when incremented beyond the highest queue address. Data is thereby written to and read from the queue on a first-in-first-out basis. Since existing register-based and memory-based instruction queue designs only allow instruction data to be accessed only on a first-in-first-out basis, they permit a cache access to be avoided only when instructions are executed in sequential order. When a program branch occurs, the instruction queue is flushed and a code-fetch from the cache or the external memory is performed to re-load the instruction queue. The operation of the execution unit of the microprocessor is thus temporarily suspended whenever a program branch occurs.
Forward and backward program branches to instructions that are relatively close (i.e., within approximately 6 instructions) to the branch instruction tend to occur at a high rate at the machine language level. Thus, depending upon the size of the instruction queue, it is not uncommon to have a program branch to an instruction that is currently in the instruction queue. A significant increase in performance can thus be achieved by providing a mechanism for performing a jump to a target instruction within the instruction queue. The present invention relates to an instruction buffer circuit and method that overcomes the above-described limitation in the prior art. Throughout the description that follows, the term "instruction buffer" will refer to a physical memory array or set of sequentially-connected registers in which fetched instruction data is stored (with the term "buffer" being used rather than "queue" to indicate that instruction data can be accessed out-of-order). The term "instruction buffer circuit" will refer to an instruction buffer in combination with control circuitry used for accessing the instruction buffer. The term "branch instruction" will refer to any type of macroiπstructioπ that may pass program control to an instruction address that does not immediately follow the subject instruction. Branch instructions include, for example, subroutine calls, conditional jump instructions, and unconditional jump instructions. "Target branch address" will refer to the program address to which control is passed when a branch is taken. "Target instruction" will refer to the instruction that is the target of a branch (i.e., the instruction at the target branch address).
SUMMARY OF THE INVENTION
The present invention relates to an instruction buffer circuit that can perform a branch to a target instruction in an instruction buffer. The instruction buffer circuit thereby allows program execution to continue without flushing the instruction buffer, and without performing a code-fetch to re-load the instruction buffer.
The instruction buffer circuit comprises an instruction buffer. The instruction buffer is preferably in the form of either an addressable memory array or a plurality of sequentially-connected registers. The instruction buffer circuit further comprises a read pointer for selecting a location of the instruction buffer from which to read code data for execution. The instruction buffer circuit further comprises two relative offset circuits. The first relative offset circuit generates a first relative offset value that indicates the number of valid instruction bytes that are currently in the instruction buffer that fall sequentially ahead of the read pointer address. The second relative offset circuit generates a second relative offset value that indicates the number of valid instruction bytes that are currently in the instruction buffer that fall sequentially behind the read pointer address. The instruction buffer circuit further comprises a compare circuit that compares a relative displacement for a relative jump instruction to the first and second relative offset values, to thereby determine whether the target instruction can be read from the instruction buffer.
During sequential program execution (i.e., when no branches are taken) the read pointer is incremented as instruction bytes are read from the instruction buffer for execution, and the instruction buffer acts as a first-in-first-out buffer.
When a relative branch instruction (i.e., an instruction that uses a relative displacement to specify the target address) that causes a branch is decoded, the compare circuit compares the relative displacement for the jump instruction to the first and second relative offset values. If the relative displacement is positive, indicating a forward jump in memory, the relative displacement is compared with the first relative offset value, which indicates the number of instruction bytes ahead of the read pointer address. If the positive relative displacement is less than or equal to the first relative offset value, indicating that the target instruction is in the instruction buffer, a flush and corresponding re load of the instruction buffer is inhibited, and the read-pointer is "bumped" to the instruction buffer location that contains the target instruction. On the following clock cycle, the target instruction is read from the instruction buffer and decoded. Thus, the delay normally associated with having to re-load the instruction buffer is eliminated.
If the relative displacement is negative, indicating a backward jump in memory, the relative displacement is compared with the second relative value, which indicates the number of instruction bytes that fall behind the read pointer address. If the negative relative displacement is less than or equal in magnitude to the second relative offset value, indicating that the target instruction is in the instruction buffer, a flush and corresponding re-load of the instruction buffer is similarly inhibited, and the read-pointer is "bumped" to the instruction buffer location that contains the target instruction. On the following clock cycle, the target instruction is read from the instruction buffer and decoded.
To permit self-modifying code, the compare circuit is temporarily disabled following a write to memory, to thereby ensure that the following jump instruction will cause a code-fetch to be performed. This ensures that an unmodified version of a modified instruction will not be executed from the instruction buffer.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a simplified block diagram that illustrates an exemplary embodiment of a pipelined microprocessor. The microprocessor shown will be used to describe an instruction buffer circuit in accordance with the present invention. Fig. 2 is a block diagram of one embodiment of an instruction buffer circuit in accordance with the present invention.
Fig. 3 is a block diagram of a circuit for generating a BYTES AHEAD relative offset value and a BYTES BEHIND relative offset value for the circuit of Fig. 2.
Fig. 4 is a block diagram of a second embodiment of an instruction buffer circuit in accordance with the present invention.
Fig. 5 is a block diagram of a circuit for generating a DWORDS AHEAD relative offset value and a DWORDS BEHIND relative offset value for the circuit of Fig. 4.
In the drawings, like reference numbers indicate identical or functioπaiiy similar elements. Additionally, the left-most digit of a reference number identifies the drawing in which the reference number first appears.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Fig. 1 is a high-level block diagram of a pipelined microprocessor 100 that is connected to an external memory 170. The microprocessor 100 shown is an exemplary embodiment of a microprocessor to which the present invention may be applied, and will be used to describe a preferred embodiment of an instruction buffer circuit in accordance with the present invention. It should be understood that the present invention is equally applicable to microprocessors other than the one that will be described herein. Specific widths of busses of the microprocessor 100 are indicated in Fig. 1 where helpful to understanding the preferred embodiment of the instruction buffer circuit that will be described. Referring to Fig. 1, the microprocessor 100 includes an execution/addressing unit 110, an instruction control unit (ICU) 120, and a cache/bus unit 130. The ICU 120 has an instruction buffer 124, a 32-bit advanced instruction pointer (AIP) register 122, an instruction decode circuit 126, and a fetch unit 128. The AIP register 122 is connected to the instruction buffer 124 by a bus 123. The instruction buffer 124 is connected to the instruction decode circuit 126 by a bus 125. The cache/bus unit 130 has a cache memory (cache) 132.
The execution/addressing unit 110 is connected to the ICU 120 by a jump-address (JMP ADDR) bus 140, a 32-bit immediate operand bus 142, a displacement (DISP) bus 144, and a micro-instruction (μ- INSTRUCTION) bus 146. The execution/addressing unit 110 is connected to the bus/cache unit 130 by a data bus 148 and a 32-bit address bus 152. The ICU 120 is connected to the bus/cache unit 130 by the microinstruction bus 146, a 64-bit code data bus 130, and two data available (DAV) lines 151. The bus/cache unit 130 is connected to an external memory 170 by an address/control (ADDR/CNTRL) bus 160 and a data bus 162. 1. General Operation of Microprocessor
The general operation of the microprocessor 100 will now be described. This description will provide the necessary foundation for describing the preferred embodiment of the instruction buffer 124.
Referring to Fig. 1, on memory access command cycles the execution/addressing unit 110 generates an address for performing either a code-fetch (i.e., a read of instruction data) or an operand access (i.e., a read or write of operand data). The address generated by the execution/addressing unit 110 is provided to the bus/cache unit 130 on the 32-bit address bus 152. The ICU 120 provides a corresponding memory access command (i.e., a fetch request or an operand access request) to the bus/cache unit 130 on the microinstruction bus 146. The memory access command is in the form of a microinstruction field that specifies the access type (i.e., code-fetch, operand read, operand write), and certain parameters for performing the access. The bus/cache unit 130 performs the memory access command by accessing the cache 132 and/or performing an access on the external busses 160, 162. For operand accesses, operand data is passed between the bus/cache unit 130 and the execution/addressing unit 110 on the data bus 148. For code-fetches (hereinafter "fetches"), the bus/cache unit 130 returns the requested instruction data on the code data bus 150.
During each transfer cycle in which code data is returned to the ICU 120, the bus/cache unit 130 places either 4 bytes or 8 bytes of code data on the code data bus 150. The data available (DAV) lines 151 indicate on each clock cycle whether 0, 4 or 8 bytes of code data are being transferred, with a high value on DAV[1] indicating a valid 4-byte value on the upper 32 bits of the code data bus 150, and high value on DAV[0] indicating a valid 4-byte value on the lower 32 bits of the code data bus 150.
Code data returned on the bus 150 is placed in the instruction buffer 124. The instruction buffer 124 holds the code data until it is either overwritten or the instruction buffer 124 is flushed. The least significant five bits of the AIP (advanced instruction pointer) 122 are used as a read pointer for reading instruction data from the instruction buffer 124, as will be described in detail with reference to Fig. 2. Instructions referenced by the AIP 122 are read from the instruction buffer 124 and are passed to the instruction decode circuit 126. The instruction decode circuit 126 decodes individual instructions and generates microinstructions on the bus 146.
The microprocessor 100 has a variable-length instruction format, which includes an opcode of either 1 or 2 bytes, an immediate operand data field of 0, 1, 2 or 4 bytes, and a displacement field of 0, 1, 2 or 4 bytes. Circuitry (Fig. 2) of the ICU 120 extracts any immediate operand data and displacement data included within each instruction. Immediate operand data is passed to the execution/addressing unit 110 on the bus 142. Displacement data is passed to the execution/addressing unit 110 on the displacement (DISP) bus 144. The displacement values are used by the execution/addressing unit 110 to generate addresses on the bus 152.
Of particular importance to the present invention is a type of instruction that will hereinafter be referred to as a "branch short relative" instruction. Branch short relative instructions are characterized by a one-byte opcode and a one-byte displacement field (DISP[7:0]>. This displacement field specifies a relative displacement value that can range from -128 to + 127 (with DISP[7] acting as a sign bit and negative numbers being represented in standard two's complement format). This relative displacement value specifies the target branch address relative to the next sequential instruction. For example, a jump (JMP) short relative instruction with DISP[7:0] - 1510 specifies a jump forward in memory by 15 byte locations relative to the next sequential instruction.
A variety of branch-short-relative-type instructions are included within the instruction set of the microprocessor 100. For example, the microprocessor 100 has a JMP short relative absolute instruction and a jump short relative conditional instruction. The microprocessor 100 also has a loop short relative instruction that causes the microprocessor 100 to decrement a count and then perform a short jump if certain conditions are met.
In the preferred embodiment of the microprocessor 100, the branch short relative instruction is the only type of branch instruction that can potentially reference a target instruction that is in the instruction buffer 124.
The fetch unit 128 controls the generation of fetch requests. Whenever a fetch request is generated, the ICU 120 provides a displacement value on the bus 144. The execution/addressing unit 110 uses this displacement value to generate a fetch address on the bus 152.
The fetch unit 128 generates two types of fetch requests. The first type is a pre fetch request, wherein the next sequential 16-byte line of code in the memory 170 (relative to the line of code currently being executed) is requested. The second type is a branch fetch request. A branch fetch request may be generated if the microprocessor 100 takes a branch (as the result of a call, interrupt, conditional jump instruction, unconditional jump instruction, loop instruction, etc.) that requires execution to begin at a new program address.
The bus/cache unit 130 is designed to perform all pre-fetches as 16-byte aligned reads from the memory 170 (or the cache 132). Thus, for example, if the execution/addressing unit 110 issues a pre-fetch address of 00005553-6 on the bus 152, the bus/cache unit 130 will return the 16 instruction bytes from 000055501B to 0000555F,6 in the memory 170. The bus/cache unit 130 is designed to perform branch fetches in a slightly different manner. If the branch fetch address (i.e., the target branch address) falls in the first doubleword (i.e. four bytes) of a 16-byte line, the bus/cache unit 130 returns the entire 16-byte line. If the branch fetch address falls in the second doubleword of a 16-byte line, the bus/cache unit 130 returns the second, third and fourth doubiewords of the line. If the branch fetch address falls in the third doubleword of a 16-byte line, the bus/cache unit 130 returns the third and fourth doubiewords of the line. And if the branch fetch address falls in the fourth doubleword of a 16-byte line, the bus/cache unit 130 returns only the fourth doubleword of the line. 5 As will be described in detail, :ne present invention inhibits the generation of a branch fetch request on a branch short relative instruction if the target instruction can be read from the instruction buffer 124.
2. Overview of Present Invention
As described ivme, prior art microprocessors flush the instruction buffer and generate a branch fetch request whenever a program branch is taken, even if the target instruction is in the instruction buffer
10 124 at the time the branch is taken. Program execution is thereby temporarily suspended while the new fetch data is read from the m .ry or cache of the system. The present invention solves this problem by including a mechanism to inhibit the generation of an instruction buffer flush and a corresponding branch fetch request when a target instruction can be read from the instruction buffer 124.
Referring to Fig. 1 , the microprocessor 100 handles relative branch instructions as follows. During
15 a decode clock cycle for a branch instruction, the instruction decode circuit 126 determines whether a branch will be taken as a result of the branch instruction. If the decode circuit 126 determines that a branch will be taken, the AIP 122 is incremented or decremented by adding the displacement (DISP) value for the instruction to the AIP 122. The five least significant bits of the AIP register 122 (which serve as the r d pointer) are thereby "bumped" to point to the instruction buffer 124 location from which the target
20 instruction will be read.
If the branch instruction is a short relative instruction, a comparison circuit (Figs. 2 and 4) determines whether the target instruction can be read from the instruction buffer 124 on the following clock cycle. If so, a "hit" signal is generated which inhibits the generation of a flush and corresponding branch fetch. The target instruction is then read from the instruction buffer 124 on the following clock cycle, and execution
25 continues without delay. The delay heretofore associated with having to re-load the instruction buffer 124 in this event is thereby avoided.
If the target instruction for a branch short relative instruction cannot be read from the instruction buffer 124 on the following cycle, or if the relative branch instruction decoded is not a branch short relative instruction (i.e., does not use a 1 byte relative displacement), the ICU 120 flushes the instruction buffer 124
30 and generates a branch fetch request. The ICU 120 also provides a displacement value to the execution/addressing unit 110 on the displacement bus 144. The execution/addressing unit 110 uses the displacement value to calculate the branch fetch address for performing the branch fetch. The bus/cache unit 130 uses the branch fetch address to perform a fetch to re-load the instruction buffer 124. Once the instruction buffer 124 is re-loaded with the target instruction, execution resumes.
For certain types of program branches, the branch address, or a value used to calculate the branch address, is read from the memory 170 (or cache 132). "Return" and "jump indirect" are examples of instructions that require such a memory access. For this type of branch instruction the execution/addressing unit 1 10 provides the jump address to the ICU 120 on the jump address (JMP ADDR) bus 140, and the ICU 120 loads the jump address into the AIP 122. A flush and corresponding branch fetch request are always generated for this type of branch instruction. A flush and branch fetch are also generated whenever an absolute jump instruction is executed. Since the preferred embodiment of the microprocessor 100 permits self-modifying code, the ability of the instruction buffer circuit to effect a jump within the instruction buffer 124 is temporarily disabled following a memory write operation and until the next flush of the instruction buffer 124 occurs. This ensures that the instruction buffer 124 will be flushed and re-loaded upon the first branch that follows a memory write.
3. Description of the Instruction Buffer Circuit
Figs. 2 and 3 illustrate a preferred embodiment of the instruction buffer circuit. The circuit shown in Fig. 2 will initially be described.
Referring to Fig. 2, the instruction buffer 124 is in the form of a 32-byte addressable memory array arranged as four lines of eight bytes each. As will be described with reference to Fig. 4, the instruction buffer 124 may alternatively be in the form of a plurality of registers sequentially connected in parallel. The code data bus 150 is connected as a data input to the instruction buffer 124. The output of a 5-bit write pointer (WR PTR) register 200 is connected as a first address input to the instruction buffer 124 by a three- bit bus 201. The three lines of the bus 201 correspond to WR PTR[4:2] (i.e., the three most significant bits of the write pointer 200). A bus 123 is connected to the output lines of the AIP register 122. The bit lines AIP[4:3] on the bus 123 are connected as a second address input to the instruction buffer 124.
The data output of the instruction buffer 124 is connected to a byte barrel shifter 202 by a 64-bit bus 204. The byte barrel shift 202 rotates the data appearing on the bus 204 to the right by N bytes, where N may range from 0 to 7. The number N is specified by the bits lines AIP[2:0] of the bus 123. As will be recognized by one skilled in the art, the byte barrel shifter 202 can be implemented using a combination of multiplexers.
The output of the byte barrel shifter 202 is connected by the 64-bit bus 125 to a displacement (DISP) register 206, an immediate (IMMED) register 208, and the instruction decode circuit 126. The outputs of the DISP register 206 and the IMMED register 208 are connected to the execution/addressing unit 110 (Fig. 1 ) by the busses 144 and 142 respectively. The instruction decode circuit 126 has four outputs that are shown. The first output is the microinstruction bus 146 of Fig. 1. The second output is a branch short (BRSHRT) signal line 212 which becomes active during a decode cycle for a branch snort relative instruction if a branch will be taken. If the branch short relative instruction falls across a line boundary of the 5 instruction buffer 124, the BRSHRT signal line 212 becomes active during the second of two decode cycles. The third output is a branch (BR) output line 213 that goes high whenever a branch occurs as the result of something other than a branch short relative instruction, or example, the BR line 213 will go high upon the decode of a branch that results from an absolute jump instruction, a call instruction, a return instruction, or a relative jump instruction that uses a displacement value of two or more bytes. The BR line 213 thus goes
10 high only for branches to target instructions that cannot (or are highly unlikely to) be found in the instruction buffer 124. The fourth output is a BYTES USED bus 210, that indicates the number of instruction bytes used during the current clock cycle.
The BYTES USED bus 210 is connected as a first data input to a multiplexer 216. The displacement (DISP) bus 144 is connected as a second data input to the multiplexer 216. The multiplexer
15 216 has a control input that is connected to the output of an OR gate 217 by a line 218. The OR gate 217 has a first input connected to the BRSHRT line 212 and a second input connected to the BR line 213. The output of the multiplexer 216 is connected as a first input to a binary adder 220 by an 8-bit bus 222. The bus 123 is connected as the second input to the adder 220. The output of the adder 220 is connected as an input to a register 225 by a bus 224. The output of the register 225 is connected as a first data
20 input to a multiplexer 226 by a bus 228. The jump address (JMP ADDR) bus 140 (Fig. 1) is connected as a second data input to the multiplexer 226. The multiplexer 226 has a control input that is connected to a jump address (JA) signal line 227 that becomes active when the execution/addressing unit 110 provides a jump address to the ICU 120 on the JMP ADDR bus 140. The output of the multiplexer 226 is connected as a data input to the AIP register 122 by a bus 229.
25 The lines DISP[7:0] of the bus 144 are connected as a first data input to a comparator 240. The second data input to the comparator 240 is connected to a BYTES BEHIND circuit (Fig. 3) by a 5-bit bus 242. The BYTES BEHIND circuit provides a relative offset value BYTES BEHINDT+, on the bus 242. As will be described in greater detail with reference to Fig. 3, the value BYTES BEHIND is equal to the number of valid instruction bytes in the instruction buffer 124 that currently fall behind the instruction byte referenced
30 by AIP[4:0]. The value BYTES BEH1NDT+, is thus equal to the number of valid bytes that will exist behind AIP[4:0] on the following clock cycle T+ 1. The comparator 240 has an enable input that is connected to the BRSHRT line 212.
The lines DISP[7:0] of the bus 144 are also connected as a first data input to a comparator 250. The second data input to the comparator 250 is connected to a BYTES AHEAD circuit (Fig. 3) by a 5-bit bus 252. The bus 252 specifies a value BYTES AHEADT+„ which is equal to the number of valid instruction bytes that will fall ahead of AIP[4:0] in the instruction buffer 124 on the following clock cycle T+ 1. The comparator 250 has an enable input that is connected to the BRSHRT line 212.
The output of the comparator 240 is connected as a first input to an OR gate 260 by a HITA signal 5 line 246. The output of the comparator 250 is connected as a second input to the OR gate 260 by a HITB signal line 256. The output of the OR gate 260 is connected as a first input to a NAND gate 270 by a line 272. The second input to the NAND gate 270 is connected to the output (Q) of an R-S flip-flop 274 by a line 276. The set (S) input of the R-S flip-flop 274 is connected to the output of an OR gate 291 by a flush/branch-fetch (FLSH/BR-FETCH) signal line 278. The reset (R) input of the flip-flop 274 is connected to 10 a MEM WRITE signal line 279 that goes high whenever a write to the memory 170 (Fig. 1) and/or the cache 132 occurs.
The output of the NAND gate 270 is connected as a first input to an AND gate 290 by a HIT signal line 288. The BRSHRT signal line 212 is connected as the second input to the AND gate 290. The output of the AND gate 290 is connected as a first input to an OR gate 294 by a line 292. The BR line
15 213 (from the instruction decode circuit 126) is connected as a second input to the OR gate 294. The output of the OR gate is connected to the FLSH/BR-FETCH signal line 278. The FLSH/BR-FETCH signal line 278 is connected to the fetch unit 128 (Fig. 1). The FLSH/BR-FETCH signal line 278 is also connected to the circuit shown in Fig. 3. The FLSH/BR-FETCH signal line 278 is also connected as an input to a flip-flop 298. The output of the flip-flop 298 is connected to a BROUT signal line 299.
20 The operation of the circuit of Fig. 2 will now be described. As noted above, code data is provided to the instruction buffer 124 on the code data bus 150 either four bytes at-a-time or eight bytes at-a-time. Thus, on a given clock cycle, 0, 4 or 8 bytes may be loaded into the instruction buffer 124. Instructions fetched from byte locations in the memory 170 are loaded into corresponding byte locations of the instruction buffer 124, with the five least significant bits of the memory 170 address specifying the
25 instruction buffer 124 location. Instructions are loaded with the opcode byte (or bytes) falling at the lowest byte address in the instruction buffer 124. For example, a three-byte instruction loaded into the BYTEO, BYTE1 and BYTE2 locations in LINEO will have an opcode at the BYTEO location of LINEO. Instructions can fall on any byte boundary in the instruction buffer 124, and can fall across one or more of the 8-byte lines LINE0-LINE3.
30 The write pointer 200 (WR PTR) is a 5-bit register that provides a write address for loading the instruction buffer 124. Loads are performed using WR PTR[4:2] (i.e., write pointer bits 4, 3 and 2) as the write address. All loads of the instruction buffer 124 are thus performed on four-byte boundaries (i.e., all loads start at either BYTEO or BYTE4 of one of the four lines LINE0-LINE3). As loads are performed, the write pointer 200 is incremented by the number of bytes being loaded (either four or eight), as indicated by the DAV lines 151 (Fig. 1). The write pointer 200 automatically loops back to LINEO when the 5-bit value is incremented beyond its maximum value of 31 ,0. Data written to the instruction buffer 124 may be read out until either the data is overwritten or the instruction buffer 124 is flushed. If a branch occurs that causes the instruction buffer 124 to be flushed, the write pointer 200 is automatically loaded with the five least significant bits of the target branch address. The circuitry for incrementing and loading the write pointer is omitted to simplify the figure.
The five least significant bits of the 32-bit AIP register 122 are used as a read pointer for reading instructions from the instruction buffer 124. All reads are performed as 8-byte aligned accesses, with the bits AIP[4:3] used to address one of the four 8-byte lines LINE0-LINE3. Whenever a read is performed, the 8 bytes of instruction data read from the addressed line are passed through the byte barrel shifter 202. The byte barrel shifter 202 rotates the 8 bytes of instruction data such that the opcode (if any) of the next instruction to be executed falls in the right-most (i.e., least significant) byte position on the bus 125. The byte barrel shifter 202 thereby aligns the instructions so that the opcode, immediate, and displacement fields can be extracted. Displacement and immediate fields, if any, are loaded into the registers 206 and 208 respectively. In the preferred embodiment, 32-bit shifters (not shown) are also used to align the displacement and immediate values before the values are loaded into the registers 206, 208. Opcodes (and other fields that require decoding) are passed to the instruction decode circuit 126.
On every clock cycle the instruction decode circuit 126 generates a BYTES USED value on the bus 210. The BYTES USED value may range from 0 to 8, and indicates the number of bytes of the 8-byte line that are used during the current clock cycle. For example, if the next instruction to be executed is five bytes long, and the first two instruction bytes fall in the line currently referenced by AIP[4:3], BYTES USED will be 2 for the current clock cycle.
4. Generation of Next AIP Value
During each clock cycle the circuit of Fig. 2 generates a 32-bit value AIPT+, that will be clocked into the AIP register 122 at the end of the clock cycle as the next AIP value. The method used to generate this value depends upon the particular type of instruction decoded (if any) by the instruction decode circuit 126.
On clock cycles for which both the BRSHRT signal line 212 and the BR signal line 213 are low
(indicating that no branches will be taken on the immediately-following execution cycle), the multiplexer control line 218 is low, causing the multiplexer 216 to select the BYTES USED bus 210. The BYTES USED value selected by the multiplexer 216 is added to the current AIP value on the bus 123 by the adder 220.
The output of the adder 220 is clocked into the register 225 (clock not shown). The jump address (JA) line
227 will be low on the following clock cycle, causing the multiplexer 226 to select the output of the register
225 on the bus 228 as the next AIP value. Thus, during sequential program execution (i.e., execution without program branches), AIPT+1 - AIPT + BYTES USEDT. Note that when the read pointer bits AIP[4:0] exceed the maximum instruction buffer address of 31 ,0, the read pointer automatically loops back to address zero of the instruction buffer 124, causing LINEO to be read on the following decode cycle. Thus, the instruction buffer 124 acts as a first-iπ-first-out (FIFO) buffer (i.e., a queue) during sequential program 5 execution.
When a branch instruction that causes a branch is decoded, the instruction decode circuit 126 asserts either the BRSHRT signal line 212 or the BR signal line 213, depending upon the type branch instruction decoded. If the branch instruction is a relative branch instruction, the DISP bus 144 specifies the relative displacement value for performing the jump. If the branch instruction is an absolute jump
10 instruction, the DISP bus 144 specifies the absolute target branch address to which the jump will be performed.
The high value on either the BRSHRT line 212 or the BR line 213 causes the multiplexer select line 218 to go high, thereby causing the multiplexer 216 to select the DISP bus 144. If the branch instruction is a relative (long or short) branch instruction, the adder 220 adds the displacement value on the DISP bus
15 144 to the current AIP value on the bus 123 to generate the next AIP value. If the branch instruction is an absolute branch instruction, the adder 220 passes the displacement value appearing on the bus 222 through to its output without adding it to the current AIP value on the bus 123 (circuitry for implementing pass-through function of the adder 220 not shown), in either case, the output of the adder 220 is clocked into the register 225. 0 On the following clock cycle the multiplexer 226 selects either the output of the register 225 or the value (if any) on the JMP ADDR bus 140. If a jump address is being provided by the execution/addressing unit 110 on the current clock cycle, the jump address (JA) line 227 will be high, causing the multiplexer 226 to select the JMP ADDR bus 140 as the source of the next AIP value. Otherwise, the multiplexer 226 will select the output of the register 225 as the next AIP value. 5 The five least significant bits of the value loaded into the AIP register 122 following a branch specify the instruction buffer 124 address where the target instruction will be read from. If the branch is to an instruction not currently in the instruction buffer 124, the instruction buffer 124 must be flushed and re-loaded before the target instruction can be read from the instruction buffer and executed. If the branch is a relative short branch to an instruction that can be read from the instruction buffer 124 on the following
30 clock cycle (i.e., the target instruction is or will be in the instruction buffer 124, and the internal jump mechanism is currently enabled), the 8-byte line containing the target instruction is read from the instruction buffer 124 on the following clock cycle and the target instruction is executed.
5. Comparison Circuit -1*1-
Whenever a branch short relative instru *-** ; decoded, a comparison circuit (comprising the comparators 240 and 250 and the OR gate 260) i: :r.ά to determine whether or not the target instruction can be read from the instruction buffer 124. Th mparators 240 and 250 compare DISP[7:0] to the BYTES BEHIND and BYTES AHEAD values respectively. Since it must be determined whether or not the target instruction will be in the instruction buffer 124 during the following clock cycle (i.e., the clock cycle following the decode of the branch short relative instruction), the comparators 240 and 250 compare DISP[7:0] to BYTES BEHINDT+1 and BYTES AHEADT+ r The comparisons thus take into account any code data that is loaded into the instruction buffer 124 during the current clock cycle (which may include the target instruction, or may overwrite the target instruction in the instruction buffer 124). The comparator 240 generates a signal HITA on the line 246 according to the following logic equation:
HITA = BRSHRT and DISP[7] and | DISP[7:0] | < BYTES BEfflNDT+1
Thus, the HITA line 246 becomes active if a branch short relative instruction is decoded that has a negative relative displacement value that is less than or equal (in magnitude) to BYTES BEHII\IDT+1. Since BYTES BEHINDT+, is equal to the number of valid bytes that will be in the instruction buffer 124 on the following cycle that will fall behind the address AIP[4:0]T+1, the HITA signal line 246 will go high if a backward branch is taken to a target instruction that can be read from the instruction buffer 124.
The comparator 250 generates a logic signal HITB on the line 256 according to the following logic equation: HITB = BRSHRT and DISP[7] and (DISP[7:0] < BYTES AHEAD T+I)
Thus, the HITB line 256 becomes active if a branch short relative instruction is decoded that has a positive relative displacement value that is less than or equal to BYTES AHEADT+1. Since BYTES AHEADT+1 is equal to the number of valid bytes that will be in the instruction buffer 124 on the following cycle that will fail ahead of the address AIP[4:0]T+1, the HITB signal line 256 will go high if a forward branch is taken to a target instruction that can be read from the instruction buffer 124.
Assuming for purposes of illustration that the line 276 is currently high, the HIT signal 288 will go low if either the HITA or the HITB signal line goes high. Thus, the HIT signal line 288 ; . . .. low only if a relative short jump is taken to an instruction that can be read from the instruction buffer 124. The output of the AND gate 290 on the line 292 goes high on clock cycles for which a jump short relative instruction that causes a branch is decoded and no hit occurs. The FLSH/BR-FETCH signal line 278 thus goes high if either a non-branch-short-relative branch is decoded, or a branch short relative branch is decoded to a target instruction that cannot be read from the instruction buffer 124. Thus, the FLSH/BR- FETCH signal line 278 goes high whenever a branch is taken to a target instruction that cannot be read from the instruction buffer 124. The FLSH/BR-FETCH signal line 278 is connected to the fetch unit 128 (Fig. 1), and initiates a branch fetch to re-load the instruction buffer 124 when high. A high value on the FLSH/BR- FETCH signal line 278 also causes the BYTES AHEAD and BYTES BEHIND offset values to be reset, as discussed below with reference to Fig. 3. A high value on the FLSH/BR-FETCH signal line 278 also generates a flush of the instruction buffer 124.
The FLSH/BR-FETCH signal on the line 278 is delayed by one clock cycle by the flip-flop 298 (clock not shown) to produce the branch outside (BROUT) signal on the line 299. The BROUT signal line thus goes high on the execution cycle for any branch instruction that causes a branch outside the instruction buffer 124. As will be discussed with reference to Fig. 3, the BROUT signal is used for generating the BYTES AHEAD and BYTES BEHIND relative offsets.
The operation of the circuit comprising the R-S flip-flop 274 and the NAND gate 270 will now be described. This circuit has the purpose of preventing branches within the instruction buffer 124 if a memory write has been performed and a flush of the instruction buffer 124 has not been performed since the memory write. The circuit thereby allows the microprocessor 100 to implement code modification, wherein writes are performed to the memory 170 (Fig. 1) to modify individual instructions.
Referring to Fig. 2, when a memory write is performed, the MEM WRITE signal line 279 goes high, causing the output line 276 of the R-S flip-flop to go low. The low value on the line 276 masks any HIT signals appearing on the line 272, thereby preventing a jump within the instruction buffer 124. This guarantees that an instruction modified by the write operation will be executed out of the cache 132 (or the memory 170), and thus eliminates the possibility that the unmodified version of a modified instruction will be executed from the instruction buffer 124. (Note: the preferred embodiment of the microprocessor 100 requires that a jump be performed before a modified instruction is executed).
The first jump instruction to follow the memory write causes the FLSH/BR-FETCH line 278 to go high, even if the target instruction is in the instruction buffer 124. The high value on the FLSH/BR-FETCH line 278 causes a flush of the instruction buffer 124, and causes a branch fetch to be generated. The high level on the FLSH/BR-FETCH line 278 also causes the output of the R-S flip flop 274 to go high, to thereby re-enable jumps within the instruction buffer 124.
As will be apparent to one skilled in the art, use of the circuit comprising the R-S flip-flop 274 and the NAND gate 270 is desirable only if the microprocessor to which the present invention is applied supports code modification. It will further be apparent that the MEM WRITE signal line 279 can be appropriately qualified for certain microprocessor designs to reset the flip-flop 274 only upon certain types of write operations that can affect code. For example, since stack operations are not normally used as a means for modifying code, the MEM WRITE signal line 279 can be qualified such that it only becomes active for non- stack memory writes. 6. Circuit for Generating Relative Offset Values
The circuit for generating BYTES AHEADT+1 and BYTES BEHINDT+1 will now be described. Referring to Fig. 3, the BYTES AHEADT+1 value is generated as the output of a 3-input adder 300. The output lines 252 of the adder 300 are connected as an input to a register 302. One input of the adder 300 is connected to the output of a multiplexer 304 by a bus 306. The multiplexer 304 has a control input that is connected to the branch outside (BROUT) signal line 299. The multiplexer 304 has a first data input that is connected to a two's complement circuit 310 by a bus 312. The two's complement circuit 310 has an input that is connected to a 5-bit bus 314. Bits 3 and 4 of the bus 314 are tied to zero (i.e., tied low). Bits[2:0] of the bus 314 are connected to bits[2:0] of the bus 229 (Fig. 2), which represent AIP[2:0]T+1 (i.e., the three least significant bits of the next AIP value). The multiplexer 304 has a second data input that is connected to the output of the register 302 by the bus 313.
A second input to the adder 300 is connected to the output of a multiplexer 318 by a bus 320. The multiplexer 318 has a control input that is connected to the data available lines DAV[0:1] (Fig. 1). The multiplexer 318 has a first data input of zero on a bus 322 (i.e., all lines of the bus 322 are tied to zero). The multiplexer 318 has a second data input of "4" on a bus 324 (i.e., bits [3:0] - 01002 on the bus 324).
The multiplexer 318 has a third data input of "8" on a bus 326 (i.e., bits [3:0] - 10002 on the bus 326).
A third input of the adder 300 is connected to the output of a two's complement circuit 330 by a bus 332. The input of the two's complement circuit 330 is connected to the output of a multiplexer 334 by a bus 336. The multiplexer 334 has a first data input of zero on a bus 328. The multiplexer 334 has a second data input that is connected to the BYTES USED bus 210 (Fig. 2). The multiplexer 334 has a third data input that is connected to a DISP[7:0]T , bus 337, which provides a value equal to DISP[7:0] (on the bus 144 of Figs. 1 and 2) from the previous clock cycle T-1 (circuitry for delaying DISP[7:0] not shown).
The multiplexer 334 has a first control input that is connected to a HIT -., signal line 335, which is generated by delaying the HIT signal line 288 (Fig. 2) by one clock cycle (circuitry for delaying the HIT signal not shown). The multiplexer 334 has a second control input connected to the BROUT signal line 299 (Fig. 2).
The BYTES BEHINDT+1 value is generated as the output of a 3-input adder 340. The output lines 242 of the adder 340 are connected as an input to a register 342. One input of the adder 340 is connected to the output of the multiplexer 334 by the bus 336. A second input of the adder 340 is connected to the output of a two's complement circuit 346 by a bus 348. The input of the two's complement circuit is connected to the output of a multiplexer 350 by a bus 352. The multiplexer 350 has a first data input of zero on a bus 354, and a second data input of "16" (i.e., 16,0) on a bus 356. The multiplexer 350 has a control input that is connected to a PRE FETCH signal line 358 that goes high for one clock cycle when a pre-fetch is initiated by the bus/cache unit 130 (Fig. 1).
A third input of the adder 340 is connected to the output of a multiplexer 362 by a bus 364. The multiplexer 362 has a first data input that is connected to a 5-bit bus 368. Bit[4] of the bus 368 is tied to zero (i.e., tied low), and bits[3:0] of the bus 368 are connected to the JMP ADDR[3:0] lines of the bus 140 (Fig. 1 and 2). The multiplexer 362 has a second data input that is connected to the output of the register 342 by a bus 370. The multiplexer 362 has a control input that is connected to the BROUT signal line 299.
7. Calculation of BYTES AHEAD Offset
The operation of the circuit used to calculate BYTES AHEADT+, will now be described. Referring to Fig. 3, the register 302 holds the value BYTES AHEADT, which is the BYTES AHEAD value for the current clock cycle. At the end of each clock cycle the output of the adder 300 is clocked into the register 302 (clock not shown) as the new BYTES AHEAD value. Thus, the value on the bus 252 represents the BYTES AHEAD value for the following clock cycle. The adder generates the BYTES AHEADT+1 value by adding the three values on the busses 306, 320 and 332.
Referring to the multiplexer 304, when the BROUT signal line 299 is low the multiplexer 304 selects the feedback path 313. As discussed above, the BROUT signal line 299 will become high only if a branch is taken to a target instruction that cannot be retrieved from the instruction buffer 124. Thus, during all other clock cycles (including execution cycles of branch short relative instructions that cause jumps within the instruction buffer 124), the current BYTES AHEAD value is added in as one of the three components for generating the next BYTES AHEAD value. When the BROUT signal line 299 is high, the multiplexer 304 selects the bus 312, which has a value that is the two's complement of the value 0,0,AIP[2:0]T+1. The value AIP[2:0]T+1 is thereby subtracted from the values appearing on the busses 320 and 332.
Referring to the multiplexer 318, the data available lines DAV[1 :0] 151 specify the number of instruction bytes being loaded into the instruction buffer 124 on the current clock cycle. When the DAV lines 151 indicate that no instruction bytes are being loaded into the instruction buffer 124 (i.e., DAV[1:0] - 002), the multiplexer 318 outputs a zero on the bus 320. When the DAV lines 151 indicate that four instruction bytes are being loaded into the instruction buffer 124 (i.e., DAV[1:0] - 102 or DAV[1:0] - 012), the multiplexer 318 outputs the value 4 on the bus 320. When the DAV lines 151 indicate that eight instruction bytes are being loaded into the instruction buffer 124 (i.e., DAV[1:0] - 112), the multiplexer 318 outputs the value 8 on the bus 320. Thus the value on the bus 320 indicates the number of bytes being loaded into the instruction buffer 124 on the current clock cycle. Referriπg to the multiplexer 334, the control lines BROUT 299 and HIT τ., 355 are used to select between the zero bus 328, the BYTES USED bus 210 and the DISP[7:0]T., bus 337 according to Table 1. The output of the multiplexer 334 is passed through the two's complement circuit 330 to effect a subtraction of the value selected by the multiplexer 334.
BROUT HIT T, MUX OUTPUT
0 1 BYTES USED
0 0 DISP[7:0]T.,
1 0 (does not occur)
1 1 0
10
TABLE 1
Referring to Table 1, if the BROUT signal line 299 is low (inactive) during the current clock cycle
(indicting that a branch outside the instruction buffer 124 is not currently being executed) and the HIT τ., signal line 335 is high (inactive), indicating that a hit for a branch short relative instruction did not occur
15 during the previous clock cycle, the multiplexer 334 (MUX) selects the BYTES USED bus 210. Thus, when no branches are being taken, the BYTES AHEADT+, value is generated by subtracting the number of bytes used from the sum of the current BYTES AHEAD value plus the number of bytes being loaded into the instruction buffer 124.
Referring to Table 1, if the BROUT signal line 299 is low (inactive) and the HIT τ., signal line 335
20 is low (active) during the current clock cycle (indicating that an instruction buffer 124 hit occurred on the previous clock cycle), the DISP[7:0]T., bus 337 is selected by the multiplexer 334. Thus, the relative displacement DISP[7:0] for the branch short relative instruction being executed is subtracted from the sum of BYTES AHEADT and the number of bytes being loaded into the instruction buffer 124 on the current clock cycle. If the relative displacement DISP[7:0] is positive (indicating a forward jump in the instruction buffer
25 124), the next BYTES AHEAD value decreases as a result of the jump, indicating that fewer instructions will be ahead of the read pointer AIP[4:0] following the jump. If the relative displacement DISP[7:0] is negative (indicating a backward jump in the instruction buffer 124), the next BYTES AHEAD value increases as a result of the jump, indicating that a greater number of instructions will be ahead of the read pointer AIP[4:0] following the jump.
30 Referring to Table 1, if the BROUT signal line 299 is high (active) and the HIT τ., signal line 335 is high (inactive) during the current clock cycle, indicating that a branch outside the instruction buffer 124 is being executed, the multiplexer 334 outputs a zero. During the same clock cycle the multiplexer 304 outputs the two's complement of AIP[2:01γ+1. Thus, in the event of a jump outside the instruction buffer 124, the new BYTES AHEAD value is generated by subtracting AIP[2:0]T+1 from the number of instruction bytes being loaded into the instruction buffer 124. If no instruction bytes are being loaded into the instruction buffer 124 on the current clock cycle, the BYTES AHEAD value will initially be negative, and will become positive as the corresponding branch fetch is performed. Thus, for example, if a branch outside the instruction buffer 124 is taken to a jump address of xxxxxx0216 (x - "don't care"), and no instruction bytes are loaded during the current cycle, BYTES AHEAD will initially be -2. If 8 bytes of code from the branch fetch are loaded into the instruction buffer 124 on the following clock cycle, the BYTES AHEAD value will be incremented to 6, indicating that six instruction bytes are ahead of the new read pointer value of AIP[4:0] - 000102.
Note that in Table 1, the combination 1,0 does not occur because BROUT and HIT τ., cannot be active during the same cycle.
8. Calculation of BYTES BEHIND Offset
The BYTES BEHIND relative offset is similarly calculated as the summation of three components. Referring to the multiplexer 362, when the BROUT signal line 299 is low the multiplexer 362 selects the current BYTES BEHIND value on the bus 370. Thus, the current BYTES BEHIND value is used as a first component of the next BYTES BEHIND value if no branch outside the instruction buffer 124 is currently being executed. When the BROUT signal line 299 is high, indicating that a branch outside the instruction buffer 124 is being executed, the multiplexer 362 selects the bus 368. A second component of the addition is provided as the output of the multiplexer 334 on the bus
336. The value selected by the multiplexer 334 is routed to the adder 340 without being passed through the two's complement circuit 330. Thus, on clock cycles for which BYTES AHEAD is decremented by BYTES USED, BYTES BEHIND is incremented by BYTES USED. On clock cycles for which the relative displacement DISP[7:0] is subtracted BYTES AHEAD, DISP[7:0] is added to BYTES BEHIND. The third component of the addition is provided as the two's complement of the output of the multiplexer 350. When the pre-fetch line 358 is high, indicating that a pre-fetch is being initiated by the bus/cache unit 130 (Fig. 1), the multiplexer 350 selects the "16" bus 356. Thus, 16,0 is subtracted from the current BYTES BEHIND value whenever a pre-fetch is initiated, to indicate that 16 bytes of instruction data in the instruction buffer 124 will be overwritten during the subsequent clock cycles as the pre-fetch is performed. When the PRE-FETCH line 358 is low, the zero bus 354 is selected, resulting in a zero on the bus 348.
When the BROUT signal line 299 is high, indicating a branch to a target instruction outside the instruction buffer 124, the PRE-FETCH line 358 will be low (i.e., pre-f etches are blocked when jumps outside the instruction buffer 124 occur). Thus, the next BYTES BEHIND value following the branch will be AIP[3:0]T+1. For example, if a jump to a jump address of xxxxxx1516 is executed, the new BYTES BEHIND vaiue will be 5, indicating that 5 of the 16 instruction bytes (in the 16-byte line to be fetched from xxxxxx10,6 to xxxxxx1F16) will fall behind the new read pointer vaiue of AIP[4:0] - 1516. Since this new BYTES BEHIND value represents the number of bytes that will fall behind the read pointer once the fetch data has been loaded into the instruction buffer 124, logic (not shown) is included to inhibit the BYTES BEHIND comparison until the last clock cycle of the branch fetch.
9. Alternative Embodiments of Instruction Buffer Circuit
An alternative embodiment of the instruction buffer circuit of Fig. 2 will now be described with reference to Fig. 4. This embodiment that will be described illustrates two variations that can be made to the instruction buffer circuit. The first variation involves the use of a register-based instruction buffer ;π place of the randomiy-accessible-memory based instruction buffer 124 of Fig. 2. The second variation, wh.. can be made independently from the first variation, is the replacement of the BYTES AHEAD and BYTES BEHIND relative offset values w th relative offset values that indicate the numbers of doubiewords (i.e., four- byte values) ahead and behind the instruction currently referenced by the AIP. in describing the embodiment shown in Fig. 4, like reference numbers will be used to refer to elements that are functionally similar to the elements shown in Fig. 2.
Referring to Fig. 4, the instruction buffer 124 is show- 3S a register-based buffer comprising four 8-byte registers 124a, 124b, 124c and 124d. The code data bus 150 is connected to the register 124a. The register 124a is connected to the register 124b by a bus 400. The register 124b is connected to the register 124c by a bus 402. The register 124c is connected to the register 124d by a bus 404. Each least significant byte of the registers 124a-124d is connected as an input to a first 4:1 multiplexer 40Ca. The second byte of each of the registers 124a-124d is connected as an input to a second 4:1 multiplexer (not shown). The third byte of each of the registers 124a-124d is connected as an input to a second 4:1 multiplexer (not shown). The fourth byte of each of the registers 124a-124d is connected as an input to a third 4:1 multiplexer (not shown). The fifth byte of each of the registers 124a-124d is connected as an input to a fourth 4:1 multiplexer (not shown). The sixth byte of each of the registers 124a-124d is connected as an input to a fifth 4:1 multiplexer (noi shown). The seventh byte of each of the registers 124a-124d is connected as an input to a seventh 4:1 multiplexer (not shown). The eighth byte of each of the registers 124a-124d is connected as an input to an eighth 4:1 multiplexer 408h. The outputs of the eight 4:1 multiplexers (e.g., 408a and 408h) are each connected to 8 of the bit lines of the 64-bit bus 204. The 64-bit bus is connected to the byte barrel shifter 202, as in Fig. 2. The multiplexers (e.g., 408a and 408h and the other multiplexers, not shown) each have respective control inputs connected to a control logic circuit 409 by respective pairs of select lines (e.g., 410a, 410b and 41 Oh). The control logic circuit 409 has a first input that is connected to a 3-bit DWORDS AHEAD bus 412. The DWORDS AHEAD bus 412 provides a DWORDS (doubiewords) AHEADT+, value that indicates the 5 number of doubiewords that will be in the instruction buffer 124 on the following clock cycle that fall ahead of the doubleword selected by the multiplexers (e.g., 408a and 408h). The control logic circuit 409 has a second input that is connected to the AIP[2:0] lines of the bus 123 (Fig. 2). The control logic circuit 409 has a third input that is connected to the DISP[7:0] lines of the bus 144.
The comparator 240 now has a first input that is connected to the DISP[7:2] lines of the DISP bus
10 144, and a second input that is connected to a DWORDS (doubiewords) BEHIND bus 416, that provides a DWORDS BEHIMDT+, relative offset value. The DWORDS BEHIND value is equal to the number of doubiewords in the instruction buffer 124 that are behind the doubleword selected by the multiplexer 408. Thus, DWORDS BEHINDT+1 is the DWORDS BEHIND value for the following clock cycle. The comparator 250 now has a first input that is connected to the DISP[7:2] lines of the bus 144, and a second input connected
15 to the DWORDS AHEAD bus 412. The instruction buffer circuit of Fig. 4 is otherwise substantially identical to the circuit of Fig. 2. instruction data loaded into the instruction buffer 124 from the code data bus 150 is loaded into the register 124a. With successive load operations the code data is shifted to the next register (124b, 124c and 124d) in sequence. Code data in the register 124d is overwritten in the instruction buffer 124 with the
20 following load operation. On every clock cycle, the multiplexers (e.g., 408a and 408h) select eight bytes of contiguous code data from the registers 124a-124d as the source of the code data to be decoded during the current clock cycle. The control logic circuit 409 controls each of the multiplexers (e.g., 408a and 408h) based on the DWORDS AHEADT+1, AIP[2:0] and DISP[7:0] values. The control logic circuit 409 keeps track of the current location within the instruction buffer 124 from which instruction data is being read. In the
25 event of a hit, the control logic circuit 409 uses DISP[7:0] value on the bus 144 to determine the registers 124a-124d and the byte locations within the registers from which to perform the next read. The DWORDS AHEADT+ 1 input on the bus 412 allows the control logic 409 to keep track of the status of the instruction buffer 124 (empty, full, partially full, etc).
The control logic circuit 409 can advantageously be designed to read code data during sequential
30 program execution from the registers 124b and 124c (when possible) once the instruction buffer 124 becomes full to thereby maintain a least two doubiewords ahead and two doubiewords behind the current point of execution. This assures that a branch forward by 8 bytes or less or a branch backward by 8 bytes or less can be performed without re-loading the instruction buffer 124 (provided that the comparison circuit is currently enabled). The use of the eight separate multiplexers (e.g., 408a and 408h) advantageously allows code data to read from the instruction buffer 124 on any byte boundary.
When a branch short relative instruction is decoded, the comparators 240 and 250 of Fig. 4 compare DISP[7:2] to DWORDS AHEADT+1 and DWORDS BEHIND-,, to generate the HITA and HITB signals on the lines 246 and 256 respectively according to the following equations:
HITA = BRSHRT and DISP[7] and | DISP[7:2] | < DWORDS BEHINDT+1
HITB = BRSHRT and DISP[7] and (DISP[7:2] < DWORDS AHEADT+1)
These comparisons advantageously result in a reduction in logic over the comparisons performed for the circuit of Fig. 2. The HITA and HITB signal lines 246 and 256 are used to generate the FLSH/BR-FLUSH signal π the line 278 and the BROUT signal on the line 299 in the same manner described above for Fig. 2.
Fig. 5 illustrates a circuit for generating the DWORDS AHEAD and DWORDS BEHIND values used by the circuit of Fig. 4. The circuit is identical to the circuit of Fig. 3 with the following exception? The registers 302 and 342 now hold 3-bit DWORDS AHEAD and DWORDS BEHIND values rather t --bit BYTES AHEAD and BYTES BEHIND values. The multiplexer 304 now selects between the output of the register 302 and a zero bus 512. The multiplexer 318 now selects between the zero bus 322, a "one" bus 524 and a "two" bus 526, corresponding to the three possible quantities of doubiewords that may be loaded into the instruction buffer 124 during a given clock cycle. The multiplexer 334 now has a data input that is connected to the output of a doubleword boundary-cross logic circuit 515 by a bus 510. The circuit 515 has a first input that is connected to the AIP[2:0] lines of the bus 123, and a second input that is connected to the BYTES USED bus 210. The circuit 515 outputs a value that is equal to the number of doubleword boundaries crossed during the current cycle. For example, for a current AIP vaiue of xxxxxx02,6 and a BYTES USED vaiue of 5, the circuit 515 will output a "1 " on the bus 510, indicating that a single doubleword boundary is crossed. The multiplexer 334 now has an input DISP[7:2]T., on the bus 337. The multiplexer 350 now selects between the zero bus 354 and a "4" bus 556 ("4" corresponding to the number of doubiewords fetched whenever a pre-fetch is performed). The multiplexer 362 now selects between a zero bus 528 and the feedback path 370.
The operation of the circuit of Fig. 5 is analogous to the operation of the circuit of Fig. 3. Whenever an instruction buffer load is performed, the multiplexer 318 is controlled to add either 1 or 2 to the current DWORDS AHEAD value, corresponding to the number of doubiewords being loaded into the instruction buffer 124. When an instruction is executed that does not cause a jump, the multiplexer 334 selects the output of the circuit 515 to decrement DWORDS AHEAD and increment DWORDS BEHIND by the number of doubleword boundaries crossed. When a branch within the instruction buffer is executed, the muitiplexer 334 selects the bus 337 to subtract DISP[7:2] from the current DWORDS AHEAD vaiue and add DISP[7:2] to the current DWORDS BEHIND value. When a pre-fetch is initiated, the multiplexer 350 selects the "4" bus 556 to increment DWORDS BEHIND by four. When a branch outside the instruction buffer 124 is executed, the multiplexers 304, 318, 334 350 and 362 all select their respective zero busses 512, 322, 328, 354 and 568 to reset DWORDS AHEAD and DWORDS BEHIND to zero.
The circuits and methods that have been described for performing a branch within an instruction buffer have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

WHAT IS CLAIMED IS:
1. An instruction buffer circuit that uses relative offset values to perform a jump within an instruction buffer without performing a fetch, comprising: an instruction buffer; a read pointer that provides an address for reading instruction data from said instruction buffer; a first relative offset circuit that generates a first relative offset value, said first relative offset vaiue indicating a number of instruction bytes in said instruction buffer that are ahead of said address provided by said read pointer; a second relative offset circuit that generates a second relative offset value, said second relative offset value indicating a number of instruction bytes in said instruction buffer that are behind said address provided by said read pointer; a compare circuit that compares a relative displacement value for a branch instruction to said first relative offset value or to said second relative offset value, to thereby determine ither a target instruction for said branch instruction can be read from said instruction buffer; and a modify circuit that modifies said read pointer to point to said target instruction when said target instruction can be read from said instruction buffer.
2. The instruction buffer circuit as defined in Claim 1, wherein said instruction buffer acts as a first-in-first-out instruction buffer during sequential program execution.
3. The instruction buffer circuit as defined in Claim 1, further comprising an inhibit circuit that inhibits said compare circuit after a memory write operation.
4. A method of branching to a target instruction in an instruction buffer to avoid having to re-load said instruction buffer, said instruction buffer having a read pointer that provides an instruction buffer address for reading instruction data from said instruction buffer, said method comprising the steps of: generating a relative offset value that indicates the number of instruction bytes in said instruction buffer that fall ahead of said instruction buffer address provided by said read pointer; comparing said relative offset value to a relative displacement value for a branch instruction to determine whether a target induction r * "d branch instruction is in said instruction buffer; and incrementing said rear er to . instruction buffer address of the target instruction without flushing said instructio; letter when said target instruction is in said instruction buffer to thereby effect a forward branch within said instruction buffer.
5. The method as defined in Claim 4, wherein said step of incrementing said read pointer is performed by adding said relative displacement value to the current read pointer value.
6. The method as defined in Claim 4, further comprising the steps of: generating a second relative offset value that indicates the number of instruction bytes in said instruction buffer that fall behind said instruction buffer address provided by said read pointer; comparing said second relative offset vaiue to said relative displacement value for a branch instruction to determine whether a target instruction of said branch instruction is in said instruction buffer; and decrementing said read pointer to the instruction buffer address of the target instruction when said target instruction is in said instruction buffer to thereby effect a backward branch within said instruction buffer.
7. An instruction buffer circuit for a microprocessor, comprising: an addressable instruction array that comprises a plurality of storage locations that store a plurality of instruction bytes, said storage locations corresponding to sequential address locations in a memory from which said instruction bytes were transferred; a current location pointer that points to one of said storage locations in said instruction array for a current instruction byte; a circuit that generates a first relative value indicator that indicates the number of storage locations in said instruction array that correspond to instruction bytes from sequential address locations in said memory ahead of said current instruction byte; a circuit that generates a second relative value indicator that indicates the number of storage locations in said instruction array that correspond to instruction bytes from sequential address locations in said memory behind said current instruction byte; a comparison circuit that compares a relative displacement value in a branch instruction executed by said microprocessor with said first and second relative value indicators to determine whether a next instruction byte at an address resulting from said branch instruction is currently stored in said instruction array; and a selection circuit that selects a byte from said instruction array as said next instruction byte when said comparison circuit indicates that said next instruction byte is currently stored in said instruction array.
PCT/US1995/001705 1994-02-08 1995-02-08 Randomly-accessible instruction buffer for microprocessor WO1995022101A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US19329694A 1994-02-08 1994-02-08
US08/193,296 1994-02-08

Publications (1)

Publication Number Publication Date
WO1995022101A1 true WO1995022101A1 (en) 1995-08-17

Family

ID=22713034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/001705 WO1995022101A1 (en) 1994-02-08 1995-02-08 Randomly-accessible instruction buffer for microprocessor

Country Status (2)

Country Link
TW (1) TW234175B (en)
WO (1) WO1995022101A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252567A1 (en) * 2000-01-31 2002-10-30 Intel Corporation Method and apparatus for loop buffering digital signal processing instructions
US7249248B2 (en) 2002-11-25 2007-07-24 Intel Corporation Method, apparatus, and system for variable increment multi-index looping operations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4200927A (en) * 1978-01-03 1980-04-29 International Business Machines Corporation Multi-instruction stream branch processing mechanism
US4363091A (en) * 1978-01-31 1982-12-07 Intel Corporation Extended address, single and multiple bit microprocessor
US4992932A (en) * 1987-12-29 1991-02-12 Fujitsu Limited Data processing device with data buffer control
US5226126A (en) * 1989-02-24 1993-07-06 Nexgen Microsystems Processor having plurality of functional units for orderly retiring outstanding operations based upon its associated tags

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4200927A (en) * 1978-01-03 1980-04-29 International Business Machines Corporation Multi-instruction stream branch processing mechanism
US4363091A (en) * 1978-01-31 1982-12-07 Intel Corporation Extended address, single and multiple bit microprocessor
US4449184A (en) * 1978-01-31 1984-05-15 Intel Corporation Extended address, single and multiple bit microprocessor
US4992932A (en) * 1987-12-29 1991-02-12 Fujitsu Limited Data processing device with data buffer control
US5226126A (en) * 1989-02-24 1993-07-06 Nexgen Microsystems Processor having plurality of functional units for orderly retiring outstanding operations based upon its associated tags

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252567A1 (en) * 2000-01-31 2002-10-30 Intel Corporation Method and apparatus for loop buffering digital signal processing instructions
EP1252567A4 (en) * 2000-01-31 2005-02-02 Intel Corp Method and apparatus for loop buffering digital signal processing instructions
US7249248B2 (en) 2002-11-25 2007-07-24 Intel Corporation Method, apparatus, and system for variable increment multi-index looping operations

Also Published As

Publication number Publication date
TW234175B (en) 1994-11-11

Similar Documents

Publication Publication Date Title
US5649145A (en) Data processor processing a jump instruction
EP0380850B1 (en) Method and digital computer for preproccessing multiple instructions
EP1116103B1 (en) Mechanism for store-to-load forwarding
US5604909A (en) Apparatus for processing instructions in a computing system
US6085315A (en) Data processing device with loop pipeline
US6237074B1 (en) Tagged prefetch and instruction decoder for variable length instruction set and method of operation
US20200364054A1 (en) Processor subroutine cache
US6865667B2 (en) Data processing system having redirecting circuitry and method therefor
US8171240B1 (en) Misalignment predictor
US7865699B2 (en) Method and apparatus to extend the number of instruction bits in processors with fixed length instructions, in a manner compatible with existing code
KR100385495B1 (en) Processing system with word alignment branch target
US5404471A (en) Method and apparatus for switching address generation modes in CPU having plural address generation modes
WO1995022101A1 (en) Randomly-accessible instruction buffer for microprocessor
US7020769B2 (en) Method and system for processing a loop of instructions
US6192449B1 (en) Apparatus and method for optimizing performance of a cache memory in a data processing system
JP2928879B2 (en) Data processing device
US5345568A (en) Instruction fetch circuit which allows for independent decoding and execution of instructions
EP0771442A1 (en) Instruction memory limit check in microprocessor
KR20000060415A (en) Stack with head stack pointer and tail stack pointer
JPH0769808B2 (en) Data processing device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase