WO2005096141A2 - Apparatus and method for asymmetric dual path processing - Google Patents

Apparatus and method for asymmetric dual path processing Download PDF

Info

Publication number
WO2005096141A2
WO2005096141A2 PCT/GB2005/001069 GB2005001069W WO2005096141A2 WO 2005096141 A2 WO2005096141 A2 WO 2005096141A2 GB 2005001069 W GB2005001069 W GB 2005001069W WO 2005096141 A2 WO2005096141 A2 WO 2005096141A2
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
data
instraction
instructions
control
Prior art date
Application number
PCT/GB2005/001069
Other languages
French (fr)
Other versions
WO2005096141A3 (en
Inventor
Simon Knowles
Original Assignee
Icera Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Icera Inc. filed Critical Icera Inc.
Priority to CA002560469A priority Critical patent/CA2560469A1/en
Priority to EP05729258.3A priority patent/EP1735697B1/en
Priority to JP2007505614A priority patent/JP5744370B2/en
Publication of WO2005096141A2 publication Critical patent/WO2005096141A2/en
Publication of WO2005096141A3 publication Critical patent/WO2005096141A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • G06F9/30127Register windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path

Definitions

  • This invention relates to a computer processor, a method of operating the same, and a
  • Dual execution path processors can operate
  • SIMD single instruction multiple data
  • Typical dual execution path processors use two substantially identical channels, so that each channel handles both control code and datapath code. While
  • a computer processor In one embodiment according to the invention, there is provided a computer processor.
  • the computer processor comprises: a decode unit for decoding instruction packets fetched from a
  • each channel comprising a plurality of functional units, wherein the first processing channel is
  • control register file capable of performing control operations and comprises a control register file having a relatively
  • the second processing channel is capable of performing data processing
  • the decode unit is operable to detect for each instruction
  • the first processing channel may further comprise a
  • the second processing channel may further comprise a
  • the fixed data execution unit and a configurable data execution unit.
  • the fixed data execution unit is a fixed data execution unit and a configurable data execution unit.
  • the configurable data execution unit may both operate according to a single instruction multiple data format.
  • the first and second processing channels may share a load store unit.
  • load store unit may use control information supplied by the first processing channel and data
  • the instruction packets may be all of equal bit length, such as a 64-bit length.
  • the control instructions may be all of a bit length between 18 and 24 bits, such as a 21 -bit length.
  • the nature of each instruction in an instruction packet may be selected at least from a control instruction, a data instruction, and a memory access instruction.
  • the bit length of each data instruction maybe, for example, 34 bits; and the bit length of each
  • memory access instruction maybe, for example, 28 bits.
  • the decode unit may be operable to supply the first processing
  • the decode unit may be operable to supply the second processing channel with at least the data instruction whereby the two instructions are
  • the decode unit may be operable to read the values of a set of
  • packet defines a plurality of instructions of which at least one is a data instruction, the nature of
  • each of the two instructions selected from: a control instruction; a data instruction; and a memory
  • the configurable data execution unit may be capable of executing more than
  • the method comprises: decoding an instruction packet to detect whether the instruction packet defines a plurality of control instructions of equal length or two instructions comprising at least one data instruction, at
  • instruction packets including a first type of instruction packet comprising a plurality of control
  • instruction packet is executed by a dedicated data processing channel, the dedicated control
  • the first processing channel comprises a control register file having a relatively narrower bit width and the second processing channel
  • the method comprises: fetching a sequence of instruction packets from a program memory, all of said instruction packets containing a set of designated bits at predetermined bit locations; decoding each instruction
  • said decoding step including reading the values of said designated bits to determine: a)
  • instruction packets including a first type of instruction packet comprising a plurality of control
  • packet defines a plurality of control instructions or a plurality of instructions of which at least one
  • FIG. 1 is a block diagram of an asymmetric dual execution path computer processor
  • Fig. 2 shows exemplary classes of instructions for the processor of Fig. 1, according to an
  • Fig. 3 is a schematic showing components of a configurable deep execution unit, in
  • FIG. 1 is a block diagram of an asymmetric dual path computer processor, according to an
  • the processor of Fig. 1 divides processing of a single instruction
  • control execution path 102 which is dedicated to processing control code
  • data execution path 103 which is dedicated to
  • control code favors fewer, narrower registers, is difficult to parallelize, is typically (but not exclusively) written in C code or another high-level language, and its code density is
  • execution paths 102 and 103 are dedicated to handling the two different types of code, with each
  • control register file 104 and data register
  • control registers are of narrower
  • width by number of bits (in one example, 32-bits), and the data registers are of wider width (in one example, 32-bits), and the data registers are of wider width (in one example, 32-bits), and the data registers are of wider width (in one example, 32-bits), and the data registers are of wider width (in one example, 32-bits), and the data registers are of wider width (in one example, 32-bits), and the data registers are of wider width (in
  • the processor is therefore asymmetric, in that its two execution paths are
  • the instruction stream 100 is made up of a series of instruction
  • Each instruction packet supplied is decoded by an instruction decode unit 101, which
  • control instructions separates control instructions from data instructions, as described further below.
  • execution path 102 handles control-flow operations for the instruction stream, and manages the
  • branch unit 106 and execution unit 107 is in accordance with conventional processor design
  • the data execution path 103 employs SIMD (single instruction multiple data) parallelism,
  • the configurable deep execution unit 110 provides a depth dimension of
  • decoded instruction defines an instruction with either a fixed
  • instruction is a fixed or configurable data processing instruction, and in the case of a configurable
  • instruction further designated bits define configuration information, hi dependence on the sub ⁇
  • data is supplied to either the fixed or the
  • configuration of an operator is effective to cause an operator (i) to perform a certain type of
  • control switching configurations associated with the data path In a preferred embodiment, at
  • control and data processing instructions can define memory access (load/store) and basic arithmetic operations.
  • the inputs/operands for control operations may be supplied to/from the
  • processing operation can be a vector.
  • circuitry of the configurable data path can be regarded as configurable to perform vector
  • a 64-bit vector input to a data processing operation may include four 16-bit scalar
  • a "vector” is an assembly of scalar operands.
  • Vector arithmetic maybe
  • scalar operands may include steering, movement, and
  • a vector operation may have both a scalar and at least one vector as inputs;l and output
  • control instructions include instructions dedicated to program flow, and branch and address generation; but not data processing.
  • Data processing instructions include
  • Data processing instructions may operate on multiple data instructions, for example in SIMD processing, or in processing wider, short vectors of data elements.
  • SIMD processing or in processing wider, short vectors of data elements.
  • control instructions and data instructions do not overlap; however, a
  • Fig. 2 shows three types of instruction packet for the processor of Fig. 1. Each type of
  • Instruction packet is 64-bits long.
  • Instruction packet 211 is a 3 -scalar type, for dense control code, and includes three 21-bit control instructions (c21).
  • Instruction packets 212 and 213 are LIW (long instruction word) type, for parallel execution of datapath code, hi this example each
  • instruction packet 212, 213 includes two instructions but different numbers may be included if
  • Instruction packet 212 includes a 34-bit data instruction (d34) and a 28-bit memory instruction (m28); and is used for parallel execution of data-side arithmetic (the d34 instruction)
  • Instruction packet 213 includes a 34-bit data instruction (d34) and a 21-bit control instruction (c21); and is used for parallel execution of data-side arithmetic
  • control-side operation (the d34 instruction) with a control-side operation (the c21 instruction), such as a control-side
  • Instruction decode unit 101 of the embodiment of Fig. 1 uses the initial identification bits, or some other designated identification bits at predetermined bit locations, of each instruction
  • initial indicator bit "1" signifies that an instruction packet is of a scalar control instruction type
  • decode unit 101 of Fig. 1 passes the instructions of each packet appropriately to either the control
  • processor of the embodiment of Fig.1 fetches program packets from memory sequentially; and the program packets are executed sequentially.
  • the instructions of packet 211 are executed sequentially, with the 21 -bit control instruction at the least significant end of the 64-bit word being executed first, then the next 21 -bit control instruction, and then the
  • packet can be executed either sequentially, for packet type 211, or simultaneously, for packet
  • instruction packets of types 212 and 213 are abbreviated as MD and
  • CD-packets respectively (containing one memory and one data instruction; and one control instruction and one data instruction, respectively).
  • Fig. 1 overcomes a number of
  • processors that support a combination of 32-bit standard encoding for data instructions and 16-bit
  • instruction signatures may be any of the following, where C-format, M- format, and D-format signify control, memory access, and data format respectively:
  • the C-format instructions all of the instructions
  • control instructions may
  • Memory instructions may provide
  • registers to data registers; and immediate to register instructions.
  • the processor of Fig. 1 features a
  • the first data path has a fixed SIMD execution unit split into lanes in a similar fashion to conventional SIMD
  • the second data path has a configurable deep execution unit 110. "Deep
  • execution refers to the ability of a processor to perform multiple consecutive operations on the
  • Deep execution may also be
  • a conventional two- operand addition which has one result, is not an example of this type of deep execution, because the number of operands is not equal to the number of results; whereas convolution, Fast Fourier Transfonns, Trellis/Niterbi encoding, correlators, finite impulse response filters, and other signal
  • processing algorithms are examples of deep execution in accordance with preferred
  • DSP digital signal processing
  • register-mapped general purpose DSP's do not perform deep execution, instead executing
  • Fig. 1 provides a register-mapped general purpose processor that is capable of deep execution of
  • data format instructions contain bit positions
  • the deep execution unit 110 executes instructions
  • Deep execution adds a depth dimension to the parallelism of execution, which is orthogonal to the width dimension offered by the earlier
  • Fig. 3 shows the components of an exemplary configurable deep execution unit 310, in
  • the configurable deep execution unit 110 is part of the data execution path 103, and may therefore be instructed by data-side instructions from the MD and CD-instruction packets 212 and 213 of Fig. 2.
  • h Fig. 3 the configurable deep execution unit 110 is part of the data execution path 103, and may therefore be instructed by data-side instructions from the MD and CD-instruction packets 212 and 213 of Fig. 2.
  • an instruction 314 and operands 315 are supplied to the deep execution unit 310 from instruction decode unit 101 and data register file 105 of Fig. 1.
  • a multi-bit configuration code in the instruction 314 is used to access a control map 316, which expands the multi-bit code into a
  • control map 316 may, for example, be embodied as a look-up table, in which different
  • a crossbar interconnect 3 17 configures a set of operators 318-321 in whatever arrangement is necessary to execute the operator configuration indicated by the multi-
  • the operators may include, for example, a multiply operator 318, an
  • ALU arithmetic logic unit
  • the deep execution unit contains fifteen operators: one multiply operator 318,
  • a second crossbar interconnect 322 which may supply the operands to appropriate
  • the second crossbar interconnect 322 also receives a feedback 324 of intermediate results from the operator 318-321 , which may then in turn also be supplied to the
  • interconnect 323 multiplexes the results from the operators 318-321, and outputs a final result
  • control map 316 of the embodiment of Fig. 3 need not necessarily be embodied as a single look-up table, but may be embodied as a series of two or more cascaded look-up tables.
  • up table could point from a given multi-bit instruction code to a second look-up table, thereby
  • the first look-up table could be organized into libraries of
  • operators 319 are pre-configured as ALU
  • operators are pre-configured as state operators; and operators 321 are pre-

Abstract

According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width. The decode unit is operate e to detect for each instruction packet whether the instruction packet defines (i) a plurality of control instructions to be executed sequentially on the first processing channel or (ii) a plurality of instructions comprising at least one data processing instruction to be executed simultaneously on the second execution channel, and to control the first and second channels in dependence on said detection.

Description

APPARATUS AND METHOD FOR ASYMMETRIC DUAL PATH PROCESSING TECHNICAL FIELD
This invention relates to a computer processor, a method of operating the same, and a
computer program product comprising an instruction set for the computer. BACKGROUND In order to increase the speed of computer processors, prior art architectures have used
dual execution paths for executing instructions. Dual execution path processors can operate
according to a single instruction multiple data (SIMD) principle, using parallelism of operations
to increase processor speed.
However, despite use of dual execution paths and SIMD processing, there is an ongoing
need to increase processor speed. Typical dual execution path processors use two substantially identical channels, so that each channel handles both control code and datapath code. While
known processors support a combination of 32-bit standard encoding and 16-bit "dense"
encoding, such schemes suffer from several disadvantages, including a lack of semantic content
in the few bits available in a 16-bit format.
Furthermore, conventional general purpose digital signal processors are not able to match application specific algorithms for many purposes, including performing specialized operations
such as convolution, Fast Fourier Transforms, Trellis/Viterbi encoding, correlation, finite
impulse response filtering, and other operations. SUMMARY
In one embodiment according to the invention, there is provided a computer processor.
The computer processor comprises: a decode unit for decoding instruction packets fetched from a
memory holding a sequence of instruction packets; and first and second processing channels,
each channel comprising a plurality of functional units, wherein the first processing channel is
capable of performing control operations and comprises a control register file having a relatively
narrower bit width, and the second processing channel is capable of performing data processing
operations at least one input of which is a vector and comprises a data register file having a
relatively wider bit width; wherein the decode unit is operable to detect for each instruction
packet whether the instruction packet defines (i) a plurality of control instructions to be executed
sequentially on the first processing channel or (ii) a plurality of instructions comprising at least
one data processing instruction to be executed simultaneously on the second execution channel,
and to control the first and second channels in dependence on said detection.
In further related embodiments, the first processing channel may further comprise a
branch unit and a control execution unit. The second processing channel may further comprise a
fixed data execution unit and a configurable data execution unit. The fixed data execution unit
and the configurable data execution unit may both operate according to a single instruction multiple data format. The first and second processing channels may share a load store unit. The
load store unit may use control information supplied by the first processing channel and data
supplied by the second processing channel. The instruction packets may be all of equal bit length, such as a 64-bit length. The control instructions may be all of a bit length between 18 and 24 bits, such as a 21 -bit length. The nature of each instruction in an instruction packet may be selected at least from a control instruction, a data instruction, and a memory access instruction. The bit length of each data instruction maybe, for example, 34 bits; and the bit length of each
memory access instruction maybe, for example, 28 bits. hi further related embodiments, when the decode unit detects that the instruction packet
defines three control instructions, the decode unit may be operable to supply the first processing
channel with the three control instructions whereby the three control instructions are executed
sequentially. Also, when the decode unit detects that the instruction packet defines two
instructions comprising at least one data instruction, the decode unit may be operable to supply the second processing channel with at least the data instruction whereby the two instructions are
executed simultaneously. The decode unit may be operable to read the values of a set of
designated bits at predetermined bit locations in each instruction packet of the sequence, to
determine: a) whether the instruction packet defines a plurality of control instructions or a
plurality of instructions of which at least one is a data instruction; and b) where the instruction
packet defines a plurality of instructions of which at least one is a data instruction, the nature of
each of the two instructions selected from: a control instruction; a data instruction; and a memory
access instruction. The configurable data execution unit may be capable of executing more than
two consecutive operations on the data provided by a single issued instruction before returning a result to a destination register file.
In another embodiment according to the invention, there is provided a method of
operating a computer processor which comprises first and second processing channels each
comprising a plurality of functional units, wherein the first processing chamiel comprises a control register file having a relatively narrower bit width and the second processing channel comprises a data register file having a relatively wider bit width. The method comprises: decoding an instruction packet to detect whether the instruction packet defines a plurality of control instructions of equal length or two instructions comprising at least one data instruction, at
least one of which is a vector; when the instruction packet defines a plurality of control
instructions of equal length, supplying the control instructions to the first processing channel whereby the control instructions are executed sequentially; and when the instruction packet
defines a plurality of instructions comprising at least one data instruction, supplying at least the
data instruction to the second processing chamiel whereby the plurality of instructions are
executed simultaneously. In another embodiment according to the invention, there is provided a computer program
product comprising program code means which include a sequence of instruction packets, said
instruction packets including a first type of instruction packet comprising a plurality of control
instructions of equal length and a second type of instruction packet comprising a plurality of
instructions including at least one data instruction, wherein the computer program product is
adapted to run on a computer such that the first type of instruction packet is executed by a
dedicated control processing channel, and the at least one data instruction of the second
instruction packet is executed by a dedicated data processing channel, the dedicated control
processing channel having a relatively narrower bit width than the dedicated data processing channel.
In another embodiment according to the invention, there is provided a method of
operating a computer processor which comprises first and second processing channels each
comprising a plurality of functional units, wherein the first processing channel comprises a control register file having a relatively narrower bit width and the second processing channel
comprises a data register file having a relatively wider bit width. The method comprises: fetching a sequence of instruction packets from a program memory, all of said instruction packets containing a set of designated bits at predetermined bit locations; decoding each instruction
packet, said decoding step including reading the values of said designated bits to determine: a)
whether the instruction packet defines a plurality of control instructions or a plurality of
instructions of which at least one is a data instruction; and b) where the instruction packet defines
a plurality of instructions of which at least one is a data instruction, the nature of each of the two
instructions selected at least from: a control instruction; a data instruction; and a memory access
instruction. In another embodiment according to the invention, there is provided a computer program
product comprising program code means which include a sequence of instruction packets, said
instruction packets including a first type of instruction packet comprising a plurality of control
instructions of substantially equal length and a second type of instruction packet comprising first
and second instructions including at least one data instruction, said instruction packets including
at least one indicator bit at a designated bit location within the instruction packet, wherein the
computer program product is adapted to run on a computer such that said indication bit is
adapted to cooperate with a decode unit of the computer to designate whether: a) the instruction
packet defines a plurality of control instructions or a plurality of instructions of which at least one
is a data instruction; and b) in the case when there is a plurality of instructions comprising at least
one data instruction, the nature of each of the two instructions selected from: a control
instruction; a data instruction; and a memory access instruction.
Additional advantages and novel features of the invention will be set forth in part in the
description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings; or may be learned by practice of the invention. BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the present invention, and to show how the same may be
carried into effect, reference will now be made, by way of example only, to the accompanying
drawings, in which: Fig. 1 is a block diagram of an asymmetric dual execution path computer processor,
according to an embodiment of the invention;
Fig. 2 shows exemplary classes of instructions for the processor of Fig. 1, according to an
embodiment of the invention; and
Fig. 3 is a schematic showing components of a configurable deep execution unit, in
accordance with an embodiment of the invention.
DETAILED DESCRIPTION Fig. 1 is a block diagram of an asymmetric dual path computer processor, according to an
embodiment of the invention. The processor of Fig. 1 divides processing of a single instruction
stream 100 between two different hardware execution paths: a control execution path 102, which is dedicated to processing control code, and a data execution path 103, which is dedicated to
processing data code. The data widths, operators, and other characteristics of the two execution
paths 102, 103 differ according to the different characteristics of control code and datapath code. Typically, control code favors fewer, narrower registers, is difficult to parallelize, is typically (but not exclusively) written in C code or another high-level language, and its code density is
generally more important than its speed performance. By contrast, datapath code typically favors
a large file of wide registers, is highly parallelizable, is written in assembly language, and its performance is more important than its code density. In the processor of Fig. 1, the two different
execution paths 102 and 103 are dedicated to handling the two different types of code, with each
side having its own architectural register file, such as control register file 104 and data register
file 105, differentiated by width and number of registers; the control registers are of narrower
width, by number of bits (in one example, 32-bits), and the data registers are of wider width (in
one example, 64-bits). The processor is therefore asymmetric, in that its two execution paths are
different bit-widths owing to the fact that they each perform different, specialised functions.
In the processor of Fig. 1, the instruction stream 100 is made up of a series of instruction
packets. Each instruction packet supplied is decoded by an instruction decode unit 101, which
separates control instructions from data instructions, as described further below. The control
execution path 102 handles control-flow operations for the instruction stream, and manages the
machine's state registers, using a branch unit 106, an execution unit 107, and a load store unit
108, which in this embodiment is shared with the data execution path 103. Only the control side
of the processor need be visible to a compiler, such as a compiler for the C, C++, or Java language, or another high-level language compiler. Within the control side, the operation of
branch unit 106 and execution unit 107 is in accordance with conventional processor design
known to those of ordinary skill in the art.
The data execution path 103 employs SIMD (single instruction multiple data) parallelism,
in both a fixed execution unit 109 and a configurable deep execution unit 110. As will be described further below, the configurable deep execution unit 110 provides a depth dimension of
processing, to increase work per instruction, in addition to the width dimension used by conventional SIMD processors. If the decoded instruction defines a control instruction it is applied to the appropriate
functional unit on the control execution path of the machine (e.g. branch unit 106, execution unit
107, and load/store unit 108). If the decoded instruction defines an instruction with either a fixed
or configurable data processing operation it is supplied to the data processing execution path.
Within the data instruction part of the instruction packet designated bits indicate whether the
instruction is a fixed or configurable data processing instruction, and in the case of a configurable
instruction further designated bits define configuration information, hi dependence on the sub¬
type of decoded data processing instruction, data is supplied to either the fixed or the
configurable execution sub-paths of the data processing path of the machine. Herein, "configurable" signifies the ability to select an operator configuration from
amongst a plurality of predefined ("pseudo-static") operator configurations. A pseudo-static
configuration of an operator is effective to cause an operator (i) to perform a certain type of
operation or (ii) to be intercomiected with associated elements in a certain manner or (iii) a
combination of (i) or (ii) above. In practice, a selected pseudo-static configuration may
determine the behavior and interconnectivity of many operator elements at a time. It can also
control switching configurations associated with the data path. In a preferred embodiment, at
least some of the plurality of pseudo-static operator configurations are selectable by an operation
code portion of a data processing instruction, as will be illustrated further below. Also in accordance embodiments herein, a "configurable instruction" allows the performance of
customized operations at the level of multibit values; for example, at the level of four or more bit
multibit values, or at the level of words.
It is pointed out that both control and data processing instructions, performed on their respective different sides of the machine, can define memory access (load/store) and basic arithmetic operations. The inputs/operands for control operations may be supplied to/from the
control register file 104, whereas the data/operands for data processing operations are supplied
to/from the register file 105. In accordance with an embodiment of the invention, at least one input of each data
processing operation can be a vector. In this respect, the configurable operators and/or switching
circuitry of the configurable data path can be regarded as configurable to perform vector
operations by virtue of the nature of operation performed and/or interconnectivity therebetween.
For example, a 64-bit vector input to a data processing operation may include four 16-bit scalar
operands. Herein, a "vector" is an assembly of scalar operands. Vector arithmetic maybe
performed on a plurality of scalar operands, and may include steering, movement, and
permutation of scalar elements. Not all operands of a vector operation need be vectors; for
example, a vector operation may have both a scalar and at least one vector as inputs;l and output
a result that is either a scalar or a vector.
Herein, "control instructions" include instructions dedicated to program flow, and branch and address generation; but not data processing. "Data processing instructions" include
instructions for logical operations, or arithmetic operations for which at least one input is a
vector. Data processing instructions may operate on multiple data instructions, for example in SIMD processing, or in processing wider, short vectors of data elements. The essential functions
of control instructions and data instructions, just mentioned, do not overlap; however, a
commonality is that both types of code have logic and scalar arithmetic capabilities.
Fig. 2 shows three types of instruction packet for the processor of Fig. 1. Each type of
instruction packet is 64-bits long. Instruction packet 211 is a 3 -scalar type, for dense control code, and includes three 21-bit control instructions (c21). Instruction packets 212 and 213 are LIW (long instruction word) type, for parallel execution of datapath code, hi this example each
instruction packet 212, 213 includes two instructions but different numbers may be included if
desired. Instruction packet 212 includes a 34-bit data instruction (d34) and a 28-bit memory instruction (m28); and is used for parallel execution of data-side arithmetic (the d34 instruction)
with a data-side load-store operation (the m28 instruction). Memory-class instructions (m28) can
be read from, or written to, either the control side or the data side of the processor, using
addresses from the control side. Instruction packet 213 includes a 34-bit data instruction (d34) and a 21-bit control instruction (c21); and is used for parallel execution of data-side arithmetic
(the d34 instruction) with a control-side operation (the c21 instruction), such as a control-side
arithmetic, branching, or load-store operation.
Instruction decode unit 101 of the embodiment of Fig. 1 uses the initial identification bits, or some other designated identification bits at predetermined bit locations, of each instruction
packet to determine which type of packet is being decoded. For example, as shown in Fig. 2, an
initial indicator bit "1" signifies that an instruction packet is of a scalar control instruction type,
with three control instructions; while initial indicator bits "0 1" and "0 0" signify instruction
packets of type 212 and 213, with a data and memory instruction in packet 212 or a data and
control instruction in packet 213. Having decoded the initial bits of each instruction packet, the
decode unit 101 of Fig. 1 passes the instructions of each packet appropriately to either the control
execution path 102 or the data execution path 103, according to the type of instruction packet.
In order to execute the instruction packets of Fig. 2, the instruction decode unit 101 of the
processor of the embodiment of Fig.1 fetches program packets from memory sequentially; and the program packets are executed sequentially. Within an instruction packet, the instructions of packet 211 are executed sequentially, with the 21 -bit control instruction at the least significant end of the 64-bit word being executed first, then the next 21 -bit control instruction, and then the
21 -bit control instruction at the most-significant end. Within instruction packets 212 and 213,
the instructions can be executed simultaneously (although this need not necessarily be the case,
in embodiments according to the invention). Thus, in the program order of the processor of the
embodiment of Fig. 1, the program packets are executed sequentially; but instructions within a
packet can be executed either sequentially, for packet type 211, or simultaneously, for packet
types 212 and 213. Below, instruction packets of types 212 and 213 are abbreviated as MD and
CD-packets respectively (containing one memory and one data instruction; and one control instruction and one data instruction, respectively).
In using 21 -bit control instructions, the embodiment of Fig. 1 overcomes a number of
disadvantages found in processors having instructions of other lengths, and in particular
processors that support a combination of 32-bit standard encoding for data instructions and 16-bit
"dense" encoding for control code, hi such dual 16/32-bit processors, there is a redundancy
arising from the use of dual encodings for each instruction, or the use of two separate decoders
with a means of switching between encoding schemes by branch, fetch address, or other means.
This redundancy is removed by using a single 21 -bit length for all control instructions, in
accordance with an embodiment of the invention. Furthermore, use of 21 -bit control instructions
removes disadvantages arising from insufficient semantic content in a 16-bit "dense" encoding
scheme. Because of insufficient semantic content, processors using a 16-bit scheme typically
require some mix of design compromises, such as: use of two-operand destructive operations, with corresponding code bloat for copies; use of windowed access to a subset of the register file, with code bloat for spill/fill or window pointer manipulation; or frequent reversion to the 32-bit format, because not all operations can be expressed in the very few available opcode bits in a 16- bit format. These disadvantages are alleviated by use of 21 -bit control instructions, in an embodiment of the invention. A large variety of instructions may be used, in accordance with an embodiment of the invention. For example, instruction signatures may be any of the following, where C-format, M- format, and D-format signify control, memory access, and data format respectively:
Figure imgf000013_0001
Also in accordance with one embodiment of the invention, the C-format instructions all
provide SISD (single instruction single data) operation, while the M-format and D-format instructions provide either SISD or SIMD operation. For example, control instructions may
provide general arithmetic, comparison, and logical instructions; control flow instructions;
memory loads and store instructions; and others. Data instructions may provide general
arithmetic, shift, logical, and comparison instructions; shuffle, sort, byte extend, and permute
instructions; linear feedback shift register instructions; and, via the configurable deep execution unit 110 (described further below), user-defined instructions. Memory instructions may provide
memory loads and stores; copy selected data registers to control registers; copy broadcast control
registers to data registers; and immediate to register instructions.
In accordance with an embodiment of the invention, the processor of Fig. 1 features a
first, fixed data execution path and a second configurable data execution path. The first data path has a fixed SIMD execution unit split into lanes in a similar fashion to conventional SIMD
processing designs. The second data path has a configurable deep execution unit 110. "Deep
execution" refers to the ability of a processor to perform multiple consecutive operations on the
data provided by a single issued instruction, before returning a result to the register file. One example of deep execution is found in the conventional MAC operation (multiply and
accumulate), which performs two operations (a multiplication and an addition), on data from a
single instruction, and therefore has a depth of order two. Deep execution may also be
characterized by the number of operands input being equal to the number of results output; or, equivalently, the valency-in equals the valency-out. Thus, for example, a conventional two- operand addition, which has one result, is not an example of this type of deep execution, because the number of operands is not equal to the number of results; whereas convolution, Fast Fourier Transfonns, Trellis/Niterbi encoding, correlators, finite impulse response filters, and other signal
processing algorithms are examples of deep execution in accordance with preferred
embodiments. Application-specific digital signal processing (DSP) algorithms do perform deep
execution, typically at the bit level and in a memory-mapped fashion. However, conventional
register-mapped general purpose DSP's do not perform deep execution, instead executing
instructions at a depth of order two at most, in the MAC operation. By contrast, the processor of
Fig. 1 provides a register-mapped general purpose processor that is capable of deep execution of
dynamically configurable word-level instructions values at orders greater than two. In the
processor of Fig. 1, the nature of the deep execution instruction (the graph of the mathematical
function to be performed) can be adjusted/customised by configuration information in the
instruction itself. In the preferred embodiment, data format instructions contain bit positions
allocated to configuration information. To provide this capability, the deep execution unit 110
has configurable execution resources, which means that operator modes, interconnections, and
constants can be uploaded to suit each application. Deep execution adds a depth dimension to the parallelism of execution, which is orthogonal to the width dimension offered by the earlier
concepts of SIMD and LIW processing; it therefore represents an additional dimension for
increasing work-per-instruction of a general purpose processor.
Fig. 3 shows the components of an exemplary configurable deep execution unit 310, in
accordance with an embodiment of the invention. As shown in Fig. 1, the configurable deep execution unit 110 is part of the data execution path 103, and may therefore be instructed by data-side instructions from the MD and CD-instruction packets 212 and 213 of Fig. 2. h Fig. 3,
an instruction 314 and operands 315 are supplied to the deep execution unit 310 from instruction decode unit 101 and data register file 105 of Fig. 1. A multi-bit configuration code in the instruction 314 is used to access a control map 316, which expands the multi-bit code into a
relatively complex set of configuration signals for configuring operators of the deep execution
unit. The control map 316 may, for example, be embodied as a look-up table, in which different
possible multi-bit codes of the instruction are mapped to different possible operator
configurations of the deep execution unit. Based on the result of consulting the look-up table of
the control map 316, a crossbar interconnect 3 17 configures a set of operators 318-321 in whatever arrangement is necessary to execute the operator configuration indicated by the multi-
bit instruction code. The operators may include, for example, a multiply operator 318, an
arithmetic logic unit (ALU) operator 319, a state operator 320, or a cross-lane permuter 321. hi
one embodiment, the deep execution unit contains fifteen operators: one multiply operator 318,
eight ALU operators 319, four state operators 320, and two cross-lane permuters 321; although
other numbers of operators are possible. The operands 315 supplied to the deep execution unit
maybe, for example, two 16-bit operands, four 8 bit operands on a single 32 bit operand; these
are supplied to a second crossbar interconnect 322 which may supply the operands to appropriate
operators 318-321. The second crossbar interconnect 322 also receives a feedback 324 of intermediate results from the operator 318-321 , which may then in turn also be supplied to the
appropriate operator 318-321 by the second crossbar interconnect 322. A third crossbar
interconnect 323 multiplexes the results from the operators 318-321, and outputs a final result
325. Various control signals can be used to configure the operators; for example, control map 316 of the embodiment of Fig. 3 need not necessarily be embodied as a single look-up table, but may be embodied as a series of two or more cascaded look-up tables. An entry in the first look¬
up table could point from a given multi-bit instruction code to a second look-up table, thereby
reducing the amount of storage required in each look-up table for complex operator configurations. For example, the first look-up table could be organized into libraries of
configuration categories, so that multiple multi-bit instruction codes are grouped together in the
first look-up table with each group pointing to a subsequent look-up table that provides specific
configurations for each multi-bit code of the group. In accordance with the embodiment of Fig. 3, the operators are advantageously pre-
configured into various operator classes. In practice, this is achieved by a strategic level of
hardwiring. An advantage of this approach is that it means fewer predefined configurations need
to be stored, and the control circuitry can be simpler. For example, operators 318 are pre-
configured to be in the class of multiply operators; operators 319 are pre-configured as ALU
operators; operators 320 are pre-configured as state operators; and operators 321 are pre-
configured as cross-lane permuters; and other pre-configured operator classes are possible.
However, even though the classes of operators are pre-configured, there is run-time flexibility for
instructions to be able to arrange at least: (i) connectivity of the operators within each class; (ii)
connectivity with operators from the other classes; (iii) connectivity of any relevant switching
means; for the final arrangement of a specific configuration for implementing a given algorithm. A skilled reader will appreciate that, while the foregoing has described what is considered to be the best mode and where appropriate other modes of performing the invention, the invention should not be limited to specific apparatus configurations or method steps disclosed in this description of the preferred embodiment. Those skilled in the art will also recognize that the invention has a broad range of applications, and that the embodiments admit of a wide range of different implementations and modifications without departing from the inventive concepts. In particular, exemplary bit widths mentioned herein are not intended to be limiting, nor is the arbitrary selection of bit widths referred to as half words, words, long, etc. 276295

Claims

What is claimed is:
1. A computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence
of instruction packets; and first and second processing channels, each channel comprising a plurality of functional
units, wherein the first processing channel is capable of performing control operations and
comprises a confrol register file having a relatively narrower bit width, and the second processing
channel is capable of performing data processing operations at least one input of which is a
vector and comprises a data register file having a relatively wider bit width; wherein the decode unit is operable to detect for each instruction packet whether the
instruction packet defines (i) a plurality of control instructions to be executed sequentially on the
first processing channel or (ii) a plurality of instructions comprising at least one data processing
instruction to be executed simultaneously on the second execution channel, and to control the first and second channels in dependence on said detection.
2. A computer processor according to claim 1, wherein the first processing channel further
comprises a branch unit and a control execution unit.
3. A computer processor according to claim 1 or 2, wherein the second processing channel
further comprises a fixed data execution unit and a configurable data execution unit.
4. A computer processor according to claim 3, wherein the fixed data execution unit and the
configurable data execution unit both operate according to a single instruction multiple data
format.
5. A computer processor according to any preceding claim, wherein the first and second
processing channels share a load store unit.
6. A computer processor according to claim 5, wherein the load store unit uses control
information supplied by the first processing channel and data supplied by the second processing
channel.
7. A computer processor according to any preceding claim, wherein the instruction packets are
all of equal bit length.
8. A computer processor according to claim 7, wherein the instruction packets are all of a 64-bit
length.
9. A computer processor according to any preceding claim, wherein the control instructions are
all of a bit length between 18 and 24 bits.
10. A computer processor according to claim 9, wherein the control instructions are all of a 21-
bit length.
11. A computer processor according to claim 7, wherein the nature of each instruction in an
instruction packet is selected at least from a control instruction, a data instruction, and a memory
access instruction.
12. A computer processor according to claim 11, wherein the bit length of each data instruction
is 34 bits.
13. A computer processor according to claim 11 , wherein the bit length of each memory access
instruction is 28 bits.
14. A computer processor according any preceding claim, wherein when the decode unit detects that the instraction packet defines three control instructions, the decode unit is operable to supply
the first processing channel with the three control instructions whereby the three control
instructions are executed sequentially.
15. A computer processor according to any preceding claim, wherein when the decode unit detects that the instruction packet defines two instructions comprising at least one data
instruction, the decode unit is operable to supply the second processing chamiel with at least the
data instruction whereby the two instructions are executed simultaneously.
16. A computer processor according to any preceding claim, wherein the decode unit is operable to read the values of a set of designated bits at predetermined bit locations in each instruction
packet of the sequence, to determine: a) whether the instruction packet defines a plurality of control instmctions or a plurality
of instructions of which at least one is a data instruction; and b) where the instruction packet defines a plurality of instructions of which at least one is a
data instraction, the nature of each of the two instructions selected from: a control instraction; a
data instruction; and a memory access instruction.
17. A computer processor according to any of claims 3-16, wherein the configurable data
execution unit is capable of executing more than two consecutive operations on the data provided
by a single issued instraction before returning a result to a destination register file.
18. A method of operating a computer processor which comprises first and second processing
channels each comprising a plurality of functional units, wherein the first processing channel
comprises a control register file having a relatively narrower bit width and the second processing
channel comprises a data register file having a relatively wider bit width, the method comprising: decoding an instraction packet to detect whether the instruction packet defines a plurality
of control instructions of equal length or two instructions comprising at least one data instraction,
at least one of which is a vector; when the instraction packet defines a plurality of control instractions of equal length, supplying the control instractions to the first processing channel whereby the control instractions
are executed sequentially; and when the instraction packet defines a plurality of instructions comprising at least one data
instruction, supplying at least the data instruction to the second processing chamiel whereby the plurality of instructions are executed simultaneously.
19. A computer program product comprising program code means which include a sequence of
instraction packets, said instraction packets including a first type of instruction packet comprising a plurality
of control instructions of equal length and a second type of instruction packet comprising a plurality of instructions including at least one data instraction, wherein the computer program product is adapted to run on a computer such that the first
type of instruction packet is executed by a dedicated control processing channel, and the at least
one data instruction of the second instruction packet is executed by a dedicated data processing
chamiel, the dedicated control processing channel having a relatively narrower bit width than the
dedicated data processing channel.
20. A method of operating a computer processor which comprises first and second processing channels each comprising a plurality of functional units, wherein the first processing channel
comprises a control register file having a relatively narrower bit width and the second processing
channel comprises a data register file having a relatively wider bit width, the method comprising: fetching a sequence of instraction packets from a program memory, all of said instruction
packets containing a set of designated bits at predetermined bit locations; decoding each instraction packet, said decoding step including reading the values of said
designated bits to determine: a) whether the instraction packet defines a plurality of confrol instructions or a plurality
of instractions of which at least one is a data instraction; and b) where the instruction packet defines a plurality of instractions of which at least one is a
data instraction, the nature of each of the two instractions selected at least from: a confrol
instraction; a data instraction; and a memory access instraction.
21. A computer program product comprising program code means which include a sequence of
instruction packets, said instraction packets including a first type of instraction packet comprising a plurality
of confrol instructions of substantially equal length and a second type of instraction packet
comprising first and second instractions including at least one data instraction, said instraction packets including at least one indicator bit at a designated bit location
within the instruction packet, wherein the computer program product is adapted to run on a
computer such that said indication bit is adapted to cooperate with a decode unit of the computer
to designate whether: a) the instraction packet defines a plurality of control instractions or a plurality of
instractions of which at least one is a data instraction; and b) in the case when there is a plurality of instructions comprising at least one data
instraction, the nature of each of the two instractions selected from: a control instruction; a data
instraction; and a memory access instraction.
PCT/GB2005/001069 2004-03-31 2005-03-22 Apparatus and method for asymmetric dual path processing WO2005096141A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA002560469A CA2560469A1 (en) 2004-03-31 2005-03-22 Apparatus and method for asymmetric dual path processing
EP05729258.3A EP1735697B1 (en) 2004-03-31 2005-03-22 Apparatus and method for asymmetric dual path processing
JP2007505614A JP5744370B2 (en) 2004-03-31 2005-03-22 Apparatus and method for asymmetric dual path processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/813,615 2004-03-31
US10/813,615 US9047094B2 (en) 2004-03-31 2004-03-31 Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor

Publications (2)

Publication Number Publication Date
WO2005096141A2 true WO2005096141A2 (en) 2005-10-13
WO2005096141A3 WO2005096141A3 (en) 2006-06-01

Family

ID=34962959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/001069 WO2005096141A2 (en) 2004-03-31 2005-03-22 Apparatus and method for asymmetric dual path processing

Country Status (7)

Country Link
US (2) US9047094B2 (en)
EP (1) EP1735697B1 (en)
JP (1) JP5744370B2 (en)
CN (1) CN100583027C (en)
CA (1) CA2560469A1 (en)
TW (1) TWI384400B (en)
WO (1) WO2005096141A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100456230C (en) * 2007-03-19 2009-01-28 中国人民解放军国防科学技术大学 Computing group structure for superlong instruction word and instruction flow multidata stream fusion
DE102011081585A1 (en) 2010-08-27 2012-05-03 Icera Inc. Processor architecture with increased efficiency
EP3488338A4 (en) * 2016-07-22 2020-01-22 Intel Corporation Technologies for adaptive processing of multiple buffers

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7949856B2 (en) * 2004-03-31 2011-05-24 Icera Inc. Method and apparatus for separate control processing and data path processing in a dual path processor with a shared load/store unit
US9047094B2 (en) 2004-03-31 2015-06-02 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor
US7296129B2 (en) 2004-07-30 2007-11-13 International Business Machines Corporation System, method and storage medium for providing a serialized memory interface with a bus repeater
US7512762B2 (en) 2004-10-29 2009-03-31 International Business Machines Corporation System, method and storage medium for a memory subsystem with positional read data latency
US7299313B2 (en) * 2004-10-29 2007-11-20 International Business Machines Corporation System, method and storage medium for a memory subsystem command interface
US7331010B2 (en) 2004-10-29 2008-02-12 International Business Machines Corporation System, method and storage medium for providing fault detection and correction in a memory subsystem
US7356737B2 (en) * 2004-10-29 2008-04-08 International Business Machines Corporation System, method and storage medium for testing a memory module
US7277988B2 (en) * 2004-10-29 2007-10-02 International Business Machines Corporation System, method and storage medium for providing data caching and data compression in a memory subsystem
US7478259B2 (en) 2005-10-31 2009-01-13 International Business Machines Corporation System, method and storage medium for deriving clocks in a memory system
US7685392B2 (en) 2005-11-28 2010-03-23 International Business Machines Corporation Providing indeterminate read data latency in a memory system
KR100807039B1 (en) 2006-04-07 2008-02-25 주식회사 퓨쳐시스템 Asymmetric multiprocessing system and method thereof
US7669086B2 (en) 2006-08-02 2010-02-23 International Business Machines Corporation Systems and methods for providing collision detection in a memory system
US7870459B2 (en) 2006-10-23 2011-01-11 International Business Machines Corporation High density high reliability memory module with power gating and a fault tolerant address and command bus
US7721140B2 (en) 2007-01-02 2010-05-18 International Business Machines Corporation Systems and methods for improving serviceability of a memory system
US8201069B2 (en) * 2008-07-01 2012-06-12 International Business Machines Corporation Cyclical redundancy code for use in a high-speed serial link
US8139430B2 (en) * 2008-07-01 2012-03-20 International Business Machines Corporation Power-on initialization and test for a cascade interconnect memory system
US20100005335A1 (en) * 2008-07-01 2010-01-07 International Business Machines Corporation Microprocessor interface with dynamic segment sparing and repair
US8082474B2 (en) * 2008-07-01 2011-12-20 International Business Machines Corporation Bit shadowing in a memory system
US8245105B2 (en) * 2008-07-01 2012-08-14 International Business Machines Corporation Cascade interconnect memory system with enhanced reliability
US7895374B2 (en) * 2008-07-01 2011-02-22 International Business Machines Corporation Dynamic segment sparing and repair in a memory system
US8082475B2 (en) * 2008-07-01 2011-12-20 International Business Machines Corporation Enhanced microprocessor interconnect with bit shadowing
US8234540B2 (en) * 2008-07-01 2012-07-31 International Business Machines Corporation Error correcting code protected quasi-static bit communication on a high-speed bus
US8755515B1 (en) 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US8493979B2 (en) 2008-12-30 2013-07-23 Intel Corporation Single instruction processing of network packets
US7979759B2 (en) * 2009-01-08 2011-07-12 International Business Machines Corporation Test and bring-up of an enhanced cascade interconnect memory system
US20100180154A1 (en) * 2009-01-13 2010-07-15 International Business Machines Corporation Built In Self-Test of Memory Stressor
KR101109304B1 (en) * 2009-04-23 2012-01-31 주식회사 효성 Method for manufacturing cation dyeable polyamide yarn
KR101918464B1 (en) 2011-09-14 2018-11-15 삼성전자 주식회사 A processor and a swizzle pattern providing apparatus based on a swizzled virtual register
CN102508636B (en) * 2011-11-02 2013-12-11 中国人民解放军国防科学技术大学 Program stream control method for vector processor and system
US9501268B2 (en) 2013-12-23 2016-11-22 International Business Machines Corporation Generating SIMD code from code statements that include non-isomorphic code statements
US9983884B2 (en) * 2014-09-26 2018-05-29 Intel Corporation Method and apparatus for SIMD structured branching
CN108874730B (en) * 2018-06-14 2021-06-22 北京理工大学 Data processor and data processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5600810A (en) 1994-12-09 1997-02-04 Mitsubishi Electric Information Technology Center America, Inc. Scaleable very long instruction word processor with parallelism matching
US20030154358A1 (en) 2002-02-08 2003-08-14 Samsung Electronics Co., Ltd. Apparatus and method for dispatching very long instruction word having variable length

Family Cites Families (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228498A (en) * 1977-10-12 1980-10-14 Dialog Systems, Inc. Multibus processor for increasing execution speed using a pipeline effect
US5136697A (en) * 1989-06-06 1992-08-04 Advanced Micro Devices, Inc. System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
EP0419105B1 (en) 1989-09-21 1997-08-13 Texas Instruments Incorporated Integrated circuit formed on a surface of a semiconductor substrate and method for constructing such an integrated circuit
JPH0412361A (en) 1990-04-28 1992-01-16 Konica Corp Processing method and processing device for photosensitive planographic printing plate
JP2523952B2 (en) 1990-06-29 1996-08-14 松下電器産業株式会社 Thin film forming method and thin film forming apparatus
US5299320A (en) * 1990-09-03 1994-03-29 Matsushita Electric Industrial Co., Ltd. Program control type vector processor for executing a vector pipeline operation for a series of vector data which is in accordance with a vector pipeline
JPH05324430A (en) 1992-05-26 1993-12-07 Toshiba Corp Data processor
US5423051A (en) * 1992-09-24 1995-06-06 International Business Machines Corporation Execution unit with an integrated vector operation capability
US5600801A (en) * 1993-07-15 1997-02-04 Dell Usa, L.P. Multiple function interface device for option card
US6052773A (en) * 1995-02-10 2000-04-18 Massachusetts Institute Of Technology DPGA-coupled microprocessors
US5737631A (en) * 1995-04-05 1998-04-07 Xilinx Inc Reprogrammable instruction set accelerator
JP2931890B2 (en) * 1995-07-12 1999-08-09 三菱電機株式会社 Data processing device
JP3658072B2 (en) 1996-02-07 2005-06-08 株式会社ルネサステクノロジ Data processing apparatus and data processing method
JPH09265397A (en) * 1996-03-29 1997-10-07 Hitachi Ltd Processor for vliw instruction
GB2311882B (en) * 1996-04-04 2000-08-09 Videologic Ltd A data processing management system
US5956518A (en) * 1996-04-11 1999-09-21 Massachusetts Institute Of Technology Intermediate-grain reconfigurable processing device
DE19634031A1 (en) * 1996-08-23 1998-02-26 Siemens Ag Processor with pipelining structure
US6006321A (en) * 1997-06-13 1999-12-21 Malleable Technologies, Inc. Programmable logic datapath that may be used in a field programmable device
US5922065A (en) * 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
JP3451921B2 (en) 1998-03-30 2003-09-29 松下電器産業株式会社 Processor
EP0953898A3 (en) 1998-04-28 2003-03-26 Matsushita Electric Industrial Co., Ltd. A processor for executing Instructions from memory according to a program counter, and a compiler, an assembler, a linker and a debugger for such a processor
US6226735B1 (en) * 1998-05-08 2001-05-01 Broadcom Method and apparatus for configuring arbitrary sized data paths comprising multiple context processing elements
US6292845B1 (en) * 1998-08-26 2001-09-18 Infineon Technologies North America Corp. Processing unit having independent execution units for parallel execution of instructions of different category with instructions having specific bits indicating instruction size and category respectively
DE19843640A1 (en) * 1998-09-23 2000-03-30 Siemens Ag Procedure for configuring a configurable hardware block
US6553414B1 (en) * 1998-10-02 2003-04-22 Canon Kabushiki Kaisha System used in plural information processing devices for commonly using peripheral device in network
EP1073951A1 (en) * 1999-02-15 2001-02-07 Koninklijke Philips Electronics N.V. Data processor with a configurable functional unit and method using such a data processor
EP1050810A1 (en) * 1999-05-03 2000-11-08 STMicroelectronics SA A computer system comprising multiple functional units
GB2352066B (en) * 1999-07-14 2003-11-05 Element 14 Ltd An instruction set for a computer
AU6864400A (en) * 1999-08-30 2001-03-26 Ip Flex Inc. Control unit and recorded medium
US6526430B1 (en) * 1999-10-04 2003-02-25 Texas Instruments Incorporated Reconfigurable SIMD coprocessor architecture for sum of absolute differences and symmetric filtering (scalable MAC engine for image processing)
US7039790B1 (en) * 1999-11-15 2006-05-02 Texas Instruments Incorporated Very long instruction word microprocessor with execution packet spanning two or more fetch packets with pre-dispatch instruction selection from two latches according to instruction bit
EP1102163A3 (en) 1999-11-15 2005-06-29 Texas Instruments Incorporated Microprocessor with improved instruction set architecture
US6255849B1 (en) * 2000-02-04 2001-07-03 Xilinx, Inc. On-chip self-modification for PLDs
TW516320B (en) * 2000-02-22 2003-01-01 Intervideo Inc Implementation of quantization for SIMD architecture
JP2001306321A (en) 2000-04-19 2001-11-02 Matsushita Electric Ind Co Ltd Processor
US7120781B1 (en) 2000-06-30 2006-10-10 Intel Corporation General purpose register file architecture for aligned simd
JP4651790B2 (en) * 2000-08-29 2011-03-16 株式会社ガイア・システム・ソリューション Data processing device
US20020089348A1 (en) * 2000-10-02 2002-07-11 Martin Langhammer Programmable logic integrated circuit devices including dedicated processor components
US20020174266A1 (en) * 2001-05-18 2002-11-21 Krishna Palem Parameterized application programming interface for reconfigurable computing systems
JP2003005958A (en) * 2001-06-25 2003-01-10 Pacific Design Kk Data processor and method for controlling the same
JP2003099397A (en) 2001-09-21 2003-04-04 Pacific Design Kk Data processing system
US6798239B2 (en) * 2001-09-28 2004-09-28 Xilinx, Inc. Programmable gate array having interconnecting logic to support embedded fixed logic circuitry
JP3785343B2 (en) 2001-10-02 2006-06-14 日本電信電話株式会社 Client server system and data communication method in client server system
JP3779602B2 (en) 2001-11-28 2006-05-31 松下電器産業株式会社 SIMD operation method and SIMD operation device
US7159099B2 (en) * 2002-06-28 2007-01-02 Motorola, Inc. Streaming vector processor with reconfigurable interconnection switch
JP3982353B2 (en) 2002-07-12 2007-09-26 日本電気株式会社 Fault tolerant computer apparatus, resynchronization method and resynchronization program
US7024543B2 (en) * 2002-09-13 2006-04-04 Arm Limited Synchronising pipelines in a data processing apparatus
TW569138B (en) 2002-09-19 2004-01-01 Faraday Tech Corp A method for improving instruction selection efficiency in a DSP/RISC compiler
US7464254B2 (en) * 2003-01-09 2008-12-09 Cisco Technology, Inc. Programmable processor apparatus integrating dedicated search registers and dedicated state machine registers with associated execution hardware to support rapid application of rulesets to data
JP2004217989A (en) 2003-01-14 2004-08-05 Toyota Central Res & Dev Lab Inc Hydrogen storage alloy powder, production method therefor, and hydrogen storage device using the hydrogen storage alloy powder
JP2004309570A (en) 2003-04-02 2004-11-04 Seiko Epson Corp Optical communication module, optical communication equipment and method for manufacturing the module
US7496776B2 (en) * 2003-08-21 2009-02-24 International Business Machines Corporation Power throttling method and apparatus
US7176713B2 (en) * 2004-01-05 2007-02-13 Viciciv Technology Integrated circuits with RAM and ROM fabrication options
US8484441B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths
US9047094B2 (en) * 2004-03-31 2015-06-02 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor
US7949856B2 (en) 2004-03-31 2011-05-24 Icera Inc. Method and apparatus for separate control processing and data path processing in a dual path processor with a shared load/store unit
US8512714B2 (en) 2006-05-22 2013-08-20 Biogasol Ipr Aps Thermoanaerobacter mathranii strain BG1

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5600810A (en) 1994-12-09 1997-02-04 Mitsubishi Electric Information Technology Center America, Inc. Scaleable very long instruction word processor with parallelism matching
US20030154358A1 (en) 2002-02-08 2003-08-14 Samsung Electronics Co., Ltd. Apparatus and method for dispatching very long instruction word having variable length

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100456230C (en) * 2007-03-19 2009-01-28 中国人民解放军国防科学技术大学 Computing group structure for superlong instruction word and instruction flow multidata stream fusion
DE102011081585A1 (en) 2010-08-27 2012-05-03 Icera Inc. Processor architecture with increased efficiency
DE102011081585B4 (en) 2010-08-27 2023-10-12 Nvidia Technology Uk Limited Processor architecture with increased efficiency
EP3488338A4 (en) * 2016-07-22 2020-01-22 Intel Corporation Technologies for adaptive processing of multiple buffers
US10944656B2 (en) 2016-07-22 2021-03-09 Intel Corporation Technologies for adaptive processing of multiple buffers

Also Published As

Publication number Publication date
JP2007531134A (en) 2007-11-01
CN100583027C (en) 2010-01-20
US20150234659A1 (en) 2015-08-20
US20050223196A1 (en) 2005-10-06
TW200540706A (en) 2005-12-16
TWI384400B (en) 2013-02-01
US9047094B2 (en) 2015-06-02
EP1735697A2 (en) 2006-12-27
CA2560469A1 (en) 2005-10-13
JP5744370B2 (en) 2015-07-08
CN1973260A (en) 2007-05-30
EP1735697B1 (en) 2016-07-06
WO2005096141A3 (en) 2006-06-01
US9477475B2 (en) 2016-10-25

Similar Documents

Publication Publication Date Title
US9477475B2 (en) Apparatus and method for asymmetric dual path processing
EP2290526B1 (en) Apparatus and method for control processing in dual path processor
KR100464406B1 (en) Apparatus and method for dispatching very long instruction word with variable length
KR100715055B1 (en) Vliw processor processes commands of different widths
US6438676B1 (en) Distance controlled concatenation of selected portions of elements of packed data
EP1267256A2 (en) Conditional execution of instructions with multiple destinations
US8782376B2 (en) Vector instruction execution to load vector data in registers of plural vector units using offset addressing logic
EP1735699B1 (en) Apparatus and method for dual data path processing
WO2017070675A1 (en) Conditional execution specification of instructions using conditional extension slots in the same execute packet in a vliw processor
EP1267255A2 (en) Conditional branch execution in a processor with multiple data paths
US20060095713A1 (en) Clip-and-pack instruction for processor
KR20070022239A (en) Apparatus and method for asymmetric dual path processing
US20060095714A1 (en) Clip instruction for processor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REEP Request for entry into the european phase

Ref document number: 2005729258

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2005729258

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2560469

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 1020067020245

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2007505614

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Ref document number: DE

WWE Wipo information: entry into national phase

Ref document number: 200580017666.7

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2005729258

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020067020245

Country of ref document: KR