WO2001061576A2 - Automated processor generation system for designing a configurable processor and method for the same - Google Patents

Automated processor generation system for designing a configurable processor and method for the same Download PDF

Info

Publication number
WO2001061576A2
WO2001061576A2 PCT/US2001/005051 US0105051W WO0161576A2 WO 2001061576 A2 WO2001061576 A2 WO 2001061576A2 US 0105051 W US0105051 W US 0105051W WO 0161576 A2 WO0161576 A2 WO 0161576A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
instruction
processor
register
tie
Prior art date
Application number
PCT/US2001/005051
Other languages
French (fr)
Other versions
WO2001061576A3 (en
Inventor
Albert R. Wang
Richard Ruddell
David W. Goodwin
Earl A. Killian
Nupur Bhattacharyya
Marines P. Medina
Walter D. Lichtenstein
Pavlos Konas
Rangarajan Srinivasan
Christopher M. Songer
Akilesh Parameswar
Dror E. Maydan
Ricardo E. Gonzales
Original Assignee
Tensilica, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tensilica, Inc. filed Critical Tensilica, Inc.
Priority to KR1020027010522A priority Critical patent/KR100589744B1/en
Priority to JP2001560891A priority patent/JP4619606B2/en
Priority to GB0217221A priority patent/GB2376546B/en
Priority to AU2001238403A priority patent/AU2001238403A1/en
Publication of WO2001061576A2 publication Critical patent/WO2001061576A2/en
Publication of WO2001061576A3 publication Critical patent/WO2001061576A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • G06F9/30167Decoding the operand specifier, e.g. specifier format of immediate specifier, e.g. constants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields

Definitions

  • the present invention is directed to computer processors as well as systems and techniques for developing the same, and is more particularly directed to processors which have features configurable at the option of a user and related development systems and techniques.
  • Prior art processors have generally been fairly rigid objects which are difficult to modify or extend.
  • a limited degree of extensibility to processors and their supporting software tools, including the ability to add register-to-register computational instructions and simple state (but not register files) has been provided in certain prior art systems.
  • This limited extensibility was a significant advance in the state of the art; many applications using these improvements see speedups or efficiency improvements of four times or better.
  • the limitations on extensibility of these prior art systems meant that other applications could not be adequately addressed.
  • the need to use the existing core register file, with its fixed 32-bit width registers generally prevents the use of these improvements in applications that require additional precision or replicated functional units where the combined width of the data operands exceeds 32 bits.
  • the core register file often lacks sufficient read or write ports to implement certain instructions. For these reasons, there is a need in the art to support the addition of new register files that are configurable in width and in number of read and write ports.
  • the core instruction set includes such load and store instructions for the core register file, but additional register files require additional load and store instructions.
  • prior art systems support the addition of processor state, the quantity of that state is typically small. Consequently, there is a need in the art for a larger number of state bits to be easily added to the processor architecture.
  • This state often needs to be context switched by the operating system. Once the quantity of state becomes large, new methods that minimize context switch time are desirable. Such methods have been implemented in prior art processors (e.g., the MIPS R2000 coprocessor enable bits). However, there is a need in the art to extend this further by generating the code sequences and logic automatically from the input specification to support realtime operating systems (RTOSes) and other software which need to know about new state and use it in a timely manner.
  • RTOSes realtime operating systems
  • prior art processors do not allow for sharing of logic between the core processor implementation and instruction extensions.
  • load and store instruction extensions it is important that the data cache be shared between the core and the extensions. This is so that stores by newly- configured instructions are seen by loads by the core and vice versa to ensure cache coherency -- separate caches would need special mechanisms to keep them consistent, a possible but undesirable solution.
  • the data cache is one of the larger circuits in the core processor, and sharing it promotes a reduction in the size of the core processor.
  • register files also makes it desirable to support allocation of high-level language variables to these registers.
  • Prior art processors use the core register file to which prior art compilers already support allocation of user variables. Thus, compiler allocation is expected and should be supported for user-defined register files.
  • To allocate variables to registers a compiler supporting user-defined register files requires knowledge of how to spill, restore, and move such registers in order to implement conventional-compiler functionality.
  • a related but more general limitation of prior art processor systems is the level of compiler support therefor. Often instructions are added to a processor to support new data types appropriate to the application (e.g., many DSP applications require processors implementing saturating arithmetic instead of the more conventional two's complement arithmetic usually supported by processors).
  • a multiply/accumulate operation typically requires two cycles.
  • the multiplier produces the product in carry-save form;
  • the carry-save product and the accumulator are reduced from three values to two values using a single level of carry-save-add, and then added in a carry-propagate-adder.
  • the simplest declaration would be to say that multiply/accumulate instructions take two cycles from any source operand to the destination; however, then it would not be possible to do back-to-back multiply/accumulates into the same accumulator register, since there would be a one-cycle stall because of the two-cycle latency.
  • Reference semantics are one component of instruction set documentation. It is traditional to describe instruction semantics in both English and a more precise notation. English is often ambiguous or error-prone but easier to read. Therefore, it provides the introduction, purpose and a loose definition of an instruction. The more formal definition is useful to have a precise understanding of what the instruction does. One of the purposes of the reference semantics is to serve as this precise definition.
  • Other components include the instruction word, assembler syntax, and text description. Prior art systems have sufficient information in the extension language to generate the instruction word and assembler syntax. With the addition of the reference semantics, only the text description was missing, and there is a need to include the specification of instruction descriptions that can be converted to formatted documentation to produce a conventional ISA description book.
  • HDL Hardware Description Language
  • HAL Hardware Abstraction Layer
  • FIGURES 1 and 2 show control logic associated with a four-stage pipelined extensible register according to a preferred embodiment of the invention
  • FIGURE 3 shows a two-stage pipelined version of the register of FIGs. 1 and 2;
  • FIGURE 4 shows interface signals to a core adder according to the first embodiment;
  • FIGURE 5 shows a prior load aligner and FIGURE 6 shows a load aligner according to the preferred embodiment
  • FIGURE 7 shows a semantic block output interface signal according to the preferred embodiment
  • FIGURES 8(a) - 8(c) show pipeline register optimization according to the preferred embodiment
  • FIGURE 9 shows exception processing in the preferred embodiment
  • FIGURE 10 shows further exception processing in the preferred embodiment
  • FIGURE 11 shows the processing of reference semantic information in the preferred embodiment
  • FIGURE 12 shows automatically-generated instruction documentation according to the preferred embodiment
  • FIGURE 13 shows a TIE verification process according to the preferred embodiment
  • FIGURE 14 shows a cosimulation process in the preferred embodiment.
  • the invention to a degree builds upon the technology described in the Killian et al. and Wilson et al. applications in which the Tensilica Instruction Set Extension (TIE) language and its compiler and other tools are described.
  • TIE Tensilica Instruction Set Extension
  • a preferred embodiment of the invention extends the TIE language with new constructs and augmented software tools such as compilers and the like which support these constructs.
  • Extended Register Files Extended Register Files
  • register files are a set of N storage locations of B bits each. A field in an instruction selects members of this set as source operand values or destination operand values for the results of the instruction.
  • a register file is designed to support the reading of R of the N members in parallel, and the writing of W of N members in parallel, so that instructions can have one or more source operands and one or more destination operands and still require only one cycle for register file access.
  • the TIE language construct for declaring a new register file is regf ile ⁇ rfname> ⁇ eltwidth> ⁇ entries> ⁇ shortname> where ⁇ rf name> is a handle used to refer to the register file in subsequent TIE constructs;
  • ⁇ eltwidth> is the width in bits of a register file element ("register");
  • ⁇ entries> is the number of elements in the register file
  • ⁇ shortname> is a short prefix (often a single letter) used to create register names for the assembly language. Register names are ⁇ shortname> with the register number appended.
  • the regfile construct does not declare the number of read or write ports; such physical implementation details are left to the TIE compiler as will be described in greater detail below, thereby keeping TIE as implementation-independent as possible and maintaining TIE as a high-level specification description.
  • the generated processor will include an additional ⁇ eltwidth>* ⁇ entries > bits of programmer-visible state along with logic to read and write multiple ⁇ eltwidth> values of this state.
  • the logic .generation algorithm will be described in greater detail below after other relevant TIE language constructs are described.
  • This construct is the same as described in the Killian et al. application, except that ⁇ rf name> may designate a register file declared with regf ile in addition to the core register file (named "AR").
  • the ⁇ oname> handle is then usable in iclass declarations to describe register file in, out, and inout operands in instructions.
  • a 16-entry, 8-bit register file is created (each register holds a polynomial over GF(2) modulo the polynomial stored in gfmod), and two instructions are defined that operate on these registers.
  • GFADD8 adds the polynomial in the register specified by the s field of the instruction word (the "gs register") to the polynomial in the register specified by the t field of the instruction word (the "gt register”), and writes the result to the register specified by the r field of the instruction word (the "gr register”).
  • GFMULX8 multiplies the polynomial in the gs register by x modulo gfmod and writes the result to the gr register.
  • GFR MOD8 is for reading and writing the gfmod polynomial register.
  • TIE is a high-level specification that describes instruction sets at a level familiar to users of instruction sets, and not as low-level as written by implementors of instruction sets (i.e., processor designers).
  • FIG. 1 An example of register pipeline control logic generated by the TIE code is shown in FIG. 1.
  • This shows a four stage pipelined register which includes on the left side of the Figure a read data pipe formed by four pipeline registers and their corresponding input multiplexers.
  • each pair of pipeline registers in the read port delineate the boundaries of the CO (R), Cl (E), C2 (M), C3 (W) and C4 pipeline stages.
  • the output of each pipeline register, rdO_dataCl - rd0_dataC4 is provided to the register's datapath interposed between the read and write ports (not shown for simplicity).
  • These outputs, as well as outputs of all later pipeline registers in the read port are provided as inputs to the next stage multiplexer. Control signal generation for the read port multiplexers is described in detail below.
  • the Figure also shows a write port on the right side of the Figure formed by four pipeline registers and corresponding input multiplexers for the three latest pipeline stages therein.
  • Four signals wO_dataCl - w0_dataC4 from the register datapath are provided to inputs of corresponding ones of the write port register inputs either directly or via multiplexing with an output wrO -resultC2 - wr0_resultC4 of the previous write port pipeline register.
  • These output signals are multiplexed along with the output of the register file xr egf 1 e RF and fed to the CO stage multiplexer of the read port pipeline.
  • Control signals for the multiplexers in the read and write ports are generated along with a write enable for xregfile RF and a stall signal stall_R using the circuitry of FIG. 2 as will be readily apparent to those skilled in the art when read in conjunction with the discussion of compiler generation of register files below.
  • FIG. 3 For ease of understanding, a two-stage register file combining the two-stage versions of the circuits of FIGs. 1 and 2 is shown in FIG. 3. Generating Register Files For each register file declared by a regfile statement, the compiler must produce:
  • the first steps in generating a register file are to determine the number of read and write ports, assign pipeline stages to the ports, and assign operands to the ports. Many algorithms could be used to do these operations, each resulting in different speed and area tradeoffs. The following algorithm is used in the preferred embodiment.
  • the above algorithm will generate three register read ports (one each for the r, s, and t fields of the instruction word), even though no instruction uses more than two GF register file reads at the same time.
  • a 2: 1 mux in front of one of the read ports to select between the r and s fields or between the r and t fields.
  • This mux must be controlled by decode logic that distinguishes the GFRWMOD and GFADD instructions. In a complicated example, the logic could be substantial, making the register file read take much longer.
  • the extra area required by the algorithm used in the preferred embodiment can generally be avoided by the instruction set designer arranging the register file access fields of instructions such that the number of different fields used to read each register file is equal to the largest number of reads used by any instruction. This is why operand gt is used instead of gr in the iclass gfr in the above example.
  • a possible enhancement to the above algorithm is to track the minimum stage number specified in a schedule statement (explained in greater detail in the "Multi-Cycle Instructions in TIE" section below) for each field. If the minimum stage number is greater than the stage number in which instruction decode is performed, then muxing of fields may be used to reduce the number of read ports. For all fields where the minimum stage number is in the instruction decode stage, a separate port for each field used to read the register file is used.
  • the interface of the register file read and write ports to the processor pipeline will vary according to the core processor's pipeline architecture.
  • the core processor's pipeline always uses the read and write ports in a fixed pipeline stage as shown in U.S. Patent Application Serial Numbers 09/192,395 to Dixit et al. and 09/322,735 to Killian et al., both of which are hereby incorporated by reference, where the read ports are always used before the first stage and the write ports after the last (fourth) stage in a four-stage pipelined register file.
  • Each read port will be read in the earliest stage of any instruction that uses it as a source operand; instructions that use such operands in later stages read the register file early and stage the data along to the specified stage.
  • This staging also includes bypass muxes so that instructions that produce the desired element after the register file is read are still available.
  • the write occurs in the latest stage of any instruction that uses it as a destination operand of in the instruction commit stage, e.g., the W stage, if that stage comes later.
  • FIG. 1 shows the logic schema for register file read and write ports in the preferred embodiment.
  • the bypass logic is illustrated in FIG. 1 and is accomplished by the mux's on the read-port logic. For example, if an instruction produces a result in stage 3 (wr 0_data__C3) and a subsequent instruction needs to use the data in stage 1, the control signals to the first mux on the read-port logic will be set such that the fourth input from the left will be selected. Consequently, in the next clock cycle, the data (rdO_data_Cl) is available for the instruction. Interlock Logic
  • the interlock logic is illustrated in FIG. 2.
  • the instruction decoding logic Based on the schedule information, the instruction decoding logic generates a def N for each read port and an useN signal for each write port for the instruction about to be issued.
  • useN indicates that the instruction will need its input register operand in stage N.
  • def N indicates that the instruction will produce its result in stage N.
  • the def N signal for an instruction is piped along with the instruction in the pipeline.
  • the stall signal is generated by examining the combination of all the def N ' s and useN ' s signals.
  • the following example illustrated the stall logic for a 4-stage pipelined register file with two read ports (rdO and rdl) and one write port (wdO).
  • the suffix in the signal name (_Cn) indicates that the signal exists in stage n of the pipeline.
  • wf ield ( ) and rf ield ( ) are functions to construct a signal name from a simple signal name, a port name, and a stage number.
  • the expression is written in an efficient factored form.
  • write port addresses are muxed in the preferred embodiment to reduce the hardware cost associated with each write port, it becomes necessary to have an algorithm for determining which operands use which ports.
  • One criteria for this muxing is to minimize the logic required.
  • the primary logic cost is that of staging data to the write port stages. If all writes occur in the same pipeline stage, there is no difference in this logic cost, but if writes occur in multiple stages, logic may be saved by grouping together destination operands with similar write stages.
  • regfile SR 32 8 s operand sx x ⁇ SR [x] ⁇ operand sy y ⁇ SR [y] ⁇ operand sz z ⁇ SR[z] ⁇ operand su u ⁇ SR [u] ⁇ operand sv v ⁇ SR [v] ⁇ iclass il ⁇ instl ⁇ ⁇ out sx, out sy, in su, in sv ⁇ iclass i2 ⁇ inst2 ⁇ ⁇ out sz, in su, in sv ⁇ schedule si ⁇ instl ⁇ ⁇ out sx 8; out sy 3 ;
  • instl produces two results for SR, one in 3 cycles and the other in 8 cycles.
  • inst2 produces one result for SR in 9 cycles. Since instl needs two write ports and inst2 needs one write port, register file SR only needs to have two write ports. Let the ports be wrO and wrl. For ins 11 , the mapping of operands to write ports is simply sx - > wrO sy - > wrl
  • Mapping sz to wrO implies adding one more stage to wrO (increasing from 8 to 9) and to wrl implies adding 6 more stages to wrl (increasing from 3 to 9).
  • the preferred embodiment uses the following algorithm. For each instruction, sort the operands by stage number in descending order and assign them to sequentially to write port 0 to write port n-1. Thus the write port 0 will have the longest data chains and the write port n-1 the shortest. For instructions with m operands where m is less than n, the operands will be mapped to the first m write ports in the similar descending order by the stage numbers.
  • regfile SR 32 8 s operand sx x ⁇ SR [x] ⁇ operand sy y ⁇ SR [y] ⁇ operand sz z ⁇ SR[z] ⁇ operand su u ⁇ SR[u] ⁇ operand sv v ⁇ SR [v] ⁇ operand sw w ⁇ SR [w] ⁇ iclass il ⁇ instl ⁇ ⁇ out sx, out sy, in su, in sv ⁇ iclass ⁇ 2 ⁇ inst2 ⁇ ⁇ out sz, in su, in sv ⁇ iclass i3 ⁇ inst3 ⁇ ⁇ out sw, in su, in sv ⁇ schedule si ⁇ instl ⁇ ⁇ out sx 8; out sy 3 ;
  • Assigning sw to wr 0 would require the pipeline to be active for 9 cycles.
  • the following procedure can be used as the second pass to further improve the write-port assignment for additional cost considerations such as power consumption.
  • no operands of instl can be moved because it already uses all the write ports.
  • sz can not be reassigned to wrl without increasing the staging cost.
  • sw can be re-assigned from wrO to wrl without increasing the staging cost.
  • TIE load and store instructions are required to provide a means for transferring data to and from TIE register files directly to memory. So they must, by this requirement, share the local memories of the memory (M) stage of the core pipeline, i.e., data cache, Data RAM, Data ROM, etc. In addition to sharing the local memory, it is desirable to share as far as is possible other hardware resources used in core load/store. Sharing of resources yields a more optimum solution in terms of area and timing. As will be described below, the address computation logic and the data alignment logic are two sets of resources that are shared between core and TIE load/store.
  • FIG. 6 shows LSSize 927, emDataOut ⁇ n> 901 and MemDataIn ⁇ n> 938.
  • LSSize gives the size of the data reference in bytes (1, 2, 4, 8, or 16 in the preferred embodiment).
  • MemDataOut ⁇ n> provides store data from the TEE semantics to the core, and MemDataIn ⁇ n> provides load data from the core to the TEE semantics.
  • ⁇ n> maybe 8, 16, 32, 64, or 128.
  • the interface signals represent inputs to the core address adder as shown in FIG. 4. This address logic is intended for supporting the addressing modes
  • the selection between the two modes is made by the LS Indexed interface signal.
  • the immediate used by the I -form is provided on the VAddrOf f set input, and the AR [t] value used by the X-form is provided on the VAddrlndex input.
  • VaddrBase is used to provide AR [s] . While other values than AR [s] and AR [t ] could be provided on VAddrBase and VAddrlndex by TEE semantic blocks, providing these values allows logic optimization to significantly simplify the resulting logic, and thus keeps the address generation from being timing-critical. This is because the logic optimization would recognize that the VaddrBase (AR [ s ] ) from TEE logic is the same as the base address of the core and reduces it to the same signal.
  • TEE can benefit from the load and store alignment logic in the core - given certain modifications to this logic. Because alignment requires a large amount of logic to implement, avoiding replication for TEE provides a significant area savings. Moreover, replication could introduce timing critical paths due to the heavy loading it compels the local memory outputs and alignment and data select control signals to drive. In order to implement sharing of the alignment resources though, the modifications exemplified in FIGS. 5 and 6 are required.
  • TEE load/store requires/provides multiple load/store widths as opposed to the 32 bits of core load/store. This means that all the data paths within the alignment logic must increase in width to match the maximum of the TEE or core data width.
  • TIE load could require a more general alignment function as opposed to the simple right shift required by the core. This means that the alignment logic must perform a superset of the TEE alignment function and the core right shift.
  • FIG. 5 shows prior art core load alignment logic for a three-way set associative data cache 803- 805 of 128-bit access width and a parallel data RAM 806.
  • the uncached data input 808 is also chosen to be 128 bits wide for cache refill convenience, and the data RAM access is 32 bits wide because it is accessed only through core load/stores whose maximum width is 32 bits.
  • the primary alignment mechanism used is the 4: 1 multiplexer 809-812 followed by a byte-level right shift that also does sign extension 814-819.
  • the amount of the shift is given by the load address 813, 821 and the one-hot decoded coreSize signal 820.
  • the store and data RAM data do not require the 4: 1 multiplexer because they are already 32 bits wide.
  • the 32 bit wide aligned data is then selected by a series of subsequent multiplexers 822-833 to yield the final core load data 834.
  • FIG! 6 shows an example of load alignment implementation in this embodiment.
  • the primary difference is that all the load data sources 906-911 are now 128 bits wide to support 128 bit-wide TEE load instructions, and the load alignment result is also 128 bits wide.
  • the alignment itself is done using a byte-level rotator 914-918 followed by a sign extender 921-925.
  • a byte-level rotator is required because in this example the TEE semantics happen to call for data rotation (again, in addition to the simple right shift required by the core load alignment).
  • the amount of the shift or rotate is given by the load address 919 and the one-hot decoded LSSize 927 or coreSize 926 signal.
  • the final output of the load alignment could be used either by the TIE coprocessor - the entire
  • 128-bit width 938 providing all the multiple load widths as specified by LSSi ze; or by the core - only the least significant 32-bit portion 939 providing the three core load widths 32/16/8-bit as specified by coreSize.
  • the core provides the virtual address back to the semantic block in addition to the memory data.
  • the virtual address is sometimes needed for additional processing on the load data.
  • this allows load and store instructions to be defined that modify the registers used to form the virtual address. For example, the "update" modes of the core ISA do
  • the bundled write to the base address register AR [ s ] avoids a separate increment instruction in many inner loops. This is accomplished in TIE as simply as changing "in” to "inout” and adding an assignment.
  • This example loops over two input arrays (px and py) in which the elements are 8 bytes wide, performs a computation (instl), and stores the result in another array (pz). Three out of seven instructions in this loop were used to advance the base pointers for the load and store instructions.
  • tie_loadiu (tie_storeiu) will calculate the virtual address as p+8, load (store) the memory data, and change p to p+8 in one instruction.
  • the initial subtractions are needed to correct px, py, and p z because the first now begins at px+ 8 , py + 8 and first store at px+ 8.
  • stage numbers of core signals are fixed by the core pipeline, and are not specified in the schedule declarations.
  • the appropriate values are used, however, in the pipeline insertion algorithm described above.
  • the following adds load and store instructions to the Galois-field arithmetic GF unit example above: opcode LGF8 .
  • I r 4 ' b0000 LSCI opcode SGF8.
  • I r 4'b0001 LSCI opcode LGF8.
  • IU r 4'b0010 LSCI opcode SGF8.
  • schedule gfloadxu ⁇ LGF8.XU ⁇ ⁇ use ars 1; use art 1, def art 1 def gr 2; ⁇ • - •
  • module loadalign (out, in, va, vamask, TIEload, L16SI, L16UI, L8UI) ;
  • # rotate is done with 4 : 1 muxes and one 2 : 1 mux
  • L16SI; wire vam[3:0] TIEload
  • L16SI ; wire vam[2 : 0] TIEload va & vamask
  • L16SI; wire vam [1:0] TIEload ? va & vamask
  • Loads are stores are typically processed within the processor pipeline using a data cache or a small data RAM. For both cost and correctness, the new load and store instructions must also use this data cache/RAM to maintain the integrity of the cache/RAM data which is processed by both TEE and core instructions. In prior art systems, instructions added to the core did not share logic with the core. The preferred embodiment provides a mechanism for such sharing.
  • out] declares a signal ⁇ sname> that interfaces to TEE module ⁇ mname>. This signal is ⁇ width> bits wide, and is either an input or output to this TEE code according to the last parameter.
  • ⁇ mname> is core.
  • the TEE iclass construct is extended to list interface signals used by instructions. Its syntax is iclass ⁇ classname>
  • FIG. 7 illustrates the implementation of output interface signal sname by the TIE compiler.
  • sname_seml represents the value of sname produced by the i'th semantic block.
  • iNl and iN2 are one-bit instruction decode signals, and
  • sname_seml_sel is a signal representing the condition under which the i'th semantic produces sname.
  • Each input interface signal is fed directly to the modules which use the signal. Compiler/OS Support in TIE
  • the compiler may also be necessary for the compiler to move a value from one register to another. For example, the value produced by a function may be returned in register A, and the next instruction may require that the value be used from register B.
  • the compiler can move the value from register A to register B by first storing register A to a temporary memory location, and then loading register B from that memory location. However, it is likely to be more efficient to move the value directly from register A to register B. Thus it is desirable, but not required, that the compiler know how to move a value from one register to another.
  • the save and restore sequences may be more complex than a simple concatenation of the save and restore sequences of the individual registers. In doing the entire register file, there may be opportunity for performance and/or space savings versus the obvious concatenation of the spill instructions. This may also include coprocessor state that is not in a register file.
  • the state of each coprocessor is composed of a variety of different and potentially interdependent components.
  • the instruction sequence used to save and restore these components may depend on the interdependencies.
  • This dependency information can be expressed as a graph. If the graph is cyclic, then the state cannot be successfully saved at an arbitrary point in time. But if the dependency graph is acyclic (a DAG) then there is a way to order the save and restore of the components so that all of the coprocessor's state can be saved and restored at an arbitrary point in time.
  • the TIE compiler uses standard graph construction and analysis algorithms to generate and analyze this dependency information and takes this information into account when generating the save and restore sequence for a given coprocessor.
  • regf ile_a has four 32 bit registers and regf ile_b has sixteen 128 bit values.
  • the additional state is a bitfield of which registers have been touched, called reg_touched, and a push register to back register 0 of regf ile_a called reg_back.
  • the coprocessor provides the following load and store instructions to save and restore the coprocessor state:
  • reg_b_register loads the register file regf ile_b from the address specified by regf ile_a's register s32a reg_a_register, reg_a_register — stores the register file regf ile_a into the address specified by regf ile_a's register
  • reg_a_register loads the register file regf ile_a into the address specified by regf ile_a's register
  • the DAG for this save state dependency looks like: reg_touched ⁇ - - regfile_a , regfile_b , reg_back
  • regf ile_a because the save of the registers in regf ile_a requires a free register in regf ile_a. To get a free register in regf ile_a requires that the register's value be moved through reg_back. This destroys the current value of reg_back.
  • regf ile_a ⁇ regf ile_b because the store instructions for regf ile_b use a register in regf ile_a as the address to which to store. This means that regf ile_b can only be stored once regf ile_a is already stored — actually only one register in regf ile_a. This is glossed over for simplicity of the example.
  • the preferred embodiment allows the definition of register files whose elements cannot be represented by the built-in types of standard programming languages (e.g., 64+ bits in C or saturating arithmetic as described above), it is necessary to have a mechanism for adding new types to match the defined hardware. Programming language types are also useful for determining to which register files a variable may be allocated.
  • the TEE construct ctype ⁇ tname> ⁇ size> ⁇ alignment> ⁇ rfname> creates a programming language type ⁇ tname> and declares it to be ⁇ si ze> bits, aligned on an ⁇ al ignment> bit boundary in memory, and which is allocated to ⁇ rf name>.
  • the TIE construct proto ⁇ pname> ⁇ ⁇ ospec> , . . . ⁇ ⁇ tspec> , . . . ⁇ ⁇ inst> . . . ⁇ is used to specify instruction sequences that perform various functions that the compiler must know about or to give type information about the operands of intrinsics.
  • ⁇ ospec> are operand type specifications
  • ⁇ tspec> are temporary register specifications needed by the instruction sequence
  • ⁇ inst> are the instructions of the sequence.
  • ⁇ ospec> is [in I out I inout] ⁇ typename> [ *] ⁇ oname> where ⁇ oname> is an operand name that may be substituted into the instructions ( ⁇ inst>) of the sequence. ⁇ typename> is the type name of the operand (a pointer to that type ifthe optional asterisk is given).
  • ⁇ tspec> The syntax of temporary register specification ⁇ tspec> is ⁇ rfname> ⁇ oname> where ⁇ oname> is an operand name that may be substituted into the instructions ( ⁇ inst>) of the sequence.
  • ⁇ typename> is a type name that identifies the register file from which ⁇ oname> should be temporarily allocated for this sequence.
  • the syntax of the instructions in the sequence ⁇ inst> is ⁇ iname> [ ⁇ oname>
  • ⁇ inst> sequence should consist of a single instruction.
  • An example might be: proto GFADD8 ⁇ out gf8 r, in gf8 s, in gf8 t ⁇ ⁇ ⁇ GFADD8 r, s, t; ⁇
  • ⁇ tspec> may be nonempty.
  • An additional use of proto is to instruct the compiler how to load and store values of programming language types declared using the ctype TIE construct. As discussed earlier, being able to load and store values to and from memory is necessary for the compiler to perform register allocation, and to allow a register file's contents to be saved and restored on a task switch.
  • the ⁇ tname>_loadi proto tells the compiler the instruction sequence that should be used to load a value of type ⁇ tname> into a register from memory.
  • the ⁇ tname>_storei proto tells the compiler the instruction sequence that should be used to store a value of type ⁇ tname> from a register into memory.
  • GFADD8I r, s, 0; ⁇ would be required input to the preferred embodiment to have the compiler do register allocation of gf 8 variables; they would also be required input to generate the task state switch sequence for the gf register file.
  • a final use of proto is to define the allowed conversions between built-in and new types, and between different new types. Conversion prototypes are not required; if, for example, a conversion between new type A and new type B is not specified, the compiler does not allow variables of type A to be converted to variables of type B. For each pair of new or built-in types ⁇ tlname> and
  • ⁇ t2name> (at most one of which can be a built-in type; this mechanism does not allow specification of a conversion between two built-in types, since that conversion is already defined by the programming language) there can be up to three proto declarations of the form: proto ⁇ tlname>_rtor_ ⁇ t2name> ⁇ out ⁇ t2name> ⁇ x>, in ⁇ tlname> ⁇ y> ⁇ ⁇ ⁇ tspec>, ... ⁇
  • compilers maintain type information for each program variable and compiler-generated temporary variable.
  • built-in variable types correspond to the high-level-language types (e.g., in C, char, short, int, float, double, etc.).
  • the compiler For each built-in type, the compiler must know the name of the type, the size and alignment requirements for the type, and the register file to which values of the type must be allocated. For new types, this information is provided by the ctype language construct. Using the ctype information, the compiler generates an internal type structure to represent that type, and uses that type for program variables and compiler-generated temporaries in a manner identical to that done for built-in types.
  • the prior art GNU C compiler represents types internally using the enumerated type machine_mode. Related types are grouped together in classes, described by the enumerated type mode_class. To support the new types, ' one skilled in the art can add an enumerator to mode_class to represent the class of types that represent user-defined types, and can add one enumerator to machine_mode for each new type declared using the ctype TEE language construct. For example, assuming the class representing the new types is called MODE_USER, the definition of mode_class in file machmode . h becomes: enum mode_class ⁇ MODE_RANDOM, MODE_INT , ODE_FLOAT ,
  • Enumerators are added to machine_mode by inserting lines in file machmode . def. Each line defines a new type, its name, its class, and its size (given in 8-bit bytes). Enumerators for user- defined types are named U ⁇ n>mode, where 0 ⁇ n> is a number between zero and the total number of user-defined types. For example, to add an internal type to represent user-defined type gf 8 from the earlier example, the following line is added:
  • the code selector (or code generator) is responsible for substituting a sequence of low-level instructions (corresponding more or less to assembly instructions) for each internally represented instruction.
  • the code selector determines which instruction sequence to substitute by examining the operation performed by the internal instruction, and by the type of the operands to the instruction. For example, an internal instruction representing an add may have as input two values of type int and have as output one value of type int; or may have as input two values of type float and have as output one value of type float. Based on the types of the input and output values, the code selector chooses either the sequence of instructions to perform an integer add or the sequence of instructions to perform a floating-point add.
  • the load, store, move, and conversion proto definitions describe the instruction sequences to substitute for internal instructions that have one or more operands with a user-defined type.
  • the code selector consults the gf 8_loadi proto to determine the instruction sequence that should be substituted for that instruction.
  • the instructions available in the target processor are described using instruction patterns; see, e.g., Stallman, "Using and Porting GNU CC" (1995) for more information. These instruction patterns describe the instruction, including the number and type of the operands.
  • load, store, move, and conversion proto is converted to the instruction pattern expected by the compiler.
  • the gf8_load proto is represented with the following pattern (assuming the gf 8 ctype has been mapped to machine_mode enumerator UOmode):
  • Protos that specify a temporary register are converted to an instruction pattern that overwrites or "clobbers" an operand of the appropriate type.
  • the compiler will ensure that the clobbered operand is unused at the location of the instruction, so that the instruction can use it as a temporary.
  • the following load proto for user-defined type tt generates an instruction pattern containing a clobber: proto tt_loadi ⁇ out tt x, in tt* y, in immediate z ⁇ ⁇ char t ⁇ ⁇
  • an intrinsic function declaration file is generated that contains definitions of all TEE instructions as functions using GNU asm statements.
  • each instruction function is qualified with the C volatile property to suppress optimization that could otherwise occur.
  • This method though safe, prevents certain compiler optimizations where the TEE instructions can be safely re-ordered.
  • the present invention improves the prior art system in two ways. First, only the load and store instructions are declared as volatile, therefore giving the compiler maximum freedom to reorder the instructions during code optimization. In the second improvement, instructions using special and user-declared states are declared with an explicit state argument, therefore giving compiler more accurate information about the side effect of the instructions.
  • the following header file is generated from the TEE compiler to declare all instructions in the GF example as intrinsic functions:
  • arithmetic instructions such as GFADD 81 are not declared as volatile.
  • Load and store instructions such as LGF8_ I are declared as volatile.
  • Instructions which read or write processor states such as GFRWMOD ⁇ have one more argument _xt_state to signal the compiler that these instructions has side effects.
  • Prior art systems include register allocation algorithms designed for portability. Portability requires that the compiler support a wide variety of IS As. Even though these IS As are not themselves configurable or extensible, a compiler that must "target any of them must take a generic approach to register allocation. Thus, prior art systems may allow multiple register allocation, and some may restrict programming language types to certain register files.
  • the prior art GNU C compiler allows any number of register files to be specified by modifying the machine description of the target.
  • One skilled in the art can add support to GCC for one or more new register files by modifying the machine description for the target as described in "Using and Porting GNU CC".
  • the compiler For each TEE regfile construct, the compiler is automatically configured to assign values to the registers in that register file.
  • the regfile construct indicates the number of registers in the register file.
  • the TEE ctype construct specifies the register file that values of that type should be assigned to. The compiler uses this information, as well as the number of registers in the register file, when attempting to assign each program value that has a user-defined type.
  • the regfile construct for the gf registers is: regfile gf ⁇ 16 g This indicates that there are 16 gf registers, each with size 8 bits.
  • the ctype construction for the gf ⁇ type is: ctype gf 8 8 8 gf, indicating the values of type gf 8 must be assigned to the gf register file.
  • the compiler will allocate all values of type gf 8 to the gf register file, which has 16 registers.
  • Prior art systems include instruction scheduling algorithms that reorder instructions to increase performance by reducing pipeline stalls. These algorithms operate by simulating the target processor's pipeline to determine the instruction ordering that results in the fewest number of stall cycles, while satisfying other pipeline constraints such as issue width, and function unit availability.
  • the prior art GNU C compiler simulates the processor's pipeline by determining, for any pair of instructions, the number of stall cycles that would result if one instruction were scheduled immediately after another. Based upon the stall information for each instruction pair, the compiler attempts to find an ordering of instructions that minimizes the total stall cycles. For new TEE instructions, the compiler determines the stall cycles by using information provided by the TEE language schedule construct. To determine the number of stalls that would occur if instruction B is scheduled immediately after instruction A, the compiler compares the pipeline stage for the write of each output operand in A with the pipeline stage for the read of each corresponding input operand in B.
  • the difference in these values indicates the minimum number of cycles that must separate A from B to avoid stalls.
  • a value of one indicates that B can be schedule immediately after A without stalling, a value of two indicates that scheduling B immediately after A will result in one stall cycle, etc.
  • the maximum stall value over all operands written by A is the number of stall cycles that would result if B were scheduled immediately after A.
  • the xt operand in the ALD instruction, x3 is the same as the xa operand in the AADD instructions.
  • the AADD instruction must be scheduled (def xt) -
  • the xt operand in the ALD instruction, x3 is the same as the xb operand in the AADD instructions.
  • AADD xO , xl , x3 Lazy State Switch Adding register files to processors significantly increases the quantity of state that must be saved and restored as part of task switching in a multi-tasking environment as implemented by most real-time operating systems. Because the additional state is often specific to certain computations which are performed in a subset of the tasks, it is undesirable to save and restore this additional state for every task switch because doing so unnecessarily increases the task switch cycle count. This can also be an issue in non-extensible processors for which a solution exists in the prior art. For example, the MIPS R2000 CPENABLE bits allow for "lazy" switching of coprocessor registers from one task to another. The preferred embodiment allows lazy switching to be applied to the state created via processor extension (the TEE state and regfile declarations).
  • Task A uses cp_0
  • Task A uses cp_l
  • a Task A's use of cp__l causes an exception. This exception sets the valid bit for cp_l.
  • the run-time seeing that Task B owned cp-1, saves the contents of cp_l to Task B's stack. It then restores Task A's state to cp_l.
  • Task B uses cp_l
  • Task B's use of cp_l causes an exception. This exception turns on the valid bit for cp_l.
  • the run-time sees that Task A currently owns cp_l and saves the current state to Task A's save area. The run time then restores Task B's state to cp_l.
  • the lazy switch mechanism requires that state be grouped into sets to which access can be enabled or disabled, access to disabled states cause an exception, the exception handler can determine which state must be switched, and the exception handler can save to memory and restore from memory the state and re-enable access.
  • the TIE construct coprocessor ⁇ came> ⁇ cumber> ⁇ ⁇ sname>, ... ⁇ declares that the state named by ⁇ sname>, ... is a group for the purpose of lazy switching. This grouping is given the name ⁇ came>, and a number ⁇ cumber> in the range 0 to 7. It is an error if any of ⁇ sname>, ... are named in more than one coprocessor statement.
  • a list of instructions are created that have ⁇ sname> in the in/out/inout list of the iclass.
  • a signal is then created that is the OR of the instruction one-hot decodes for these instructions. This signal is ANDed with the complement of the CPENABLE bit.
  • the core processor of the preferred embodiment different exceptions all use the same vector and are distinguished by the code loaded into the EXCCAUSE register by the exception.
  • the core processor has reserved eight cause codes (from 32 to 39) for these exceptions.
  • the TEE compiler adds bit ⁇ cumber> to the CPENABLE register, adds logic to the processor to cause an exception if ⁇ cumber> is clear and any instruction accessing ⁇ sname>, ... is executed, and adds logic to the processor to load 32+ ⁇ cnumber> into the EXCCAUSE register when that exception is recognized by the core.
  • compilers for such processors should include algorithms to reorder instructions to minimize pipeline stalls.
  • the first item is typically implemented by processor designers by writing logic that has pipeline registers inserted at carefully chosen locations.
  • the second item is typically implemented by comparing the source operands of an instruction to be issued to all not-yet-computed destination operands in the pipeline, and holding the instruction if there is a match. These three items must be coordinated. Ifthe pipelining of the computational logic does not match the changes to the issue logic, then the processor may produce incorrect results. If reordering to minimize pipeline stalls is inconsistent with pipelining the combinational logic, then sub-optimal performance will result (e.g., scheduling a use of a result before it is ready will result in a pipeline stall). Take the following example:
  • MUL logic If MUL logic is carried over two cycles but the control logic issues one instruction every cycle, a 6 will have incorrect results because a3 does not have the correct value at the time the ADD instruction needs it.
  • the issue logic must know that MUL is pipelined over two stages and stall one cycle before issuing the ADD instruction. Even though stalling ADD instruction by one cycle results in correct logic, it does not provide optimal performance. By switching the order of ADD and SUB instructions, it is no longer necessary to stall any instructions in this example and therefore result in optimal performance. This can only be achieved by appropriate coordination between implementation of MUL logic, implementation of instruction issuing logic, and instruction re-ordering (scheduling).
  • the instruction set simulator of the preferred embodiment uses the same specification of scheduling information in its timing model. This allows application developers using all the features of the preferred embodiment to get good predictions of performance before the hardware is built without running their applications on a slow HDL simulator.
  • the TEE language now includes the declaration schedule ⁇ schedulename> ⁇ ⁇ iname>, ... ⁇ in ⁇ oname> ⁇ stage>;
  • ⁇ iname> are the names of instructions; ⁇ oname> is an operand or state name, and
  • ⁇ stage> is an ordinal denoting a pipeline stage.
  • the def stage numbers used by TIE are one less than the values described in Chapter 10 of the XtensaTM Instruction Set Architecture (ISA) Reference Manual by Killian and Warthman and thus the separation between instructions is max(SA - SB + 1, 0) instead of max(SA - SB, 0).
  • the TEE compiler as described in the Killian et al. and Wilson et al. applications is extended to insert pipeline registers into the semantic logic specification as follows.
  • a stage number is assigned to every input to the semantic block. Instruction decode signals and immediate operands are assigned implementation-specific numbers (0 in the preferred embodiment).
  • Register source operands, state registers, and interface signals (described below) are assigned stage numbers from the TEE schedule declaration (with an implementation-specific default - 1 in the preferred embodiment).
  • each node of the semantic block is visited in postorder (that is after each of its predecessor nodes has been visited).
  • the stage number of the node NS is the maximum stage number of any of its inputs.
  • the compiler inserts NS-IS pipeline registers between the input and the node.
  • the output register operands, state registers and interface signals are visited. Ifthe stage number from the semantic block IS is greater than the stage number OS declared in the schedule statement, the input TIE specification is in error. Otherwise if OS > IS, then insert OS-IS pipeline registers before the output.
  • ⁇ schedule complex ⁇ example ⁇ ⁇ in ars 1; /* using operand ars in stage 1 */ in art 1; /* using operand art in stage 1 */ in si 2; /* using state si in stage 2 */ in s2 2; /* using state s2 in stage 2 */ in s3 1; /* using state s3 in stage 1 */ out arr 3; /* defining operand arr in stage 3 */ ⁇
  • a semantic block that uses or defines a register operand in one pipeline stage for one instruction, and in another stage for a different instruction because the two instructions may share some common logic. Specifying the instructions in two separate semantic blocks would require unnecessary duplication of logic. This is a possible extension in a variation on the preferred embodiment. This capability would be supported by using separate signal names in the semantic block for two operands, e.g., ⁇ operand>@ ⁇ stage> instead of just ⁇ operand>. Once this modification is made, the above algorithms operate correctly even in the multi-system environment.
  • a divide instruction may cause an exception when the divisor is zero.
  • the preferred embodiment of the present invention supports this capability from TEE by first declaring the new exception exception ⁇ ename> ⁇ exceptioncode> ⁇ ⁇ excl>, ... ⁇ ⁇ string> where ⁇ ename> is the name of the instruction and the signal used in semantic blocks to raise it; ⁇ except ioncode> is the value passed to the software exception handler to distinguish this exception from others; ⁇ excl>, etc., are lower-priority exceptions; and ⁇ string> is a descriptive string to be used in the documentation.
  • exception signals may be listed in iclass declarations as described above. With this declaration, a single-bit signal having the exception's name is created within semantic TIE blocks containing the defined instruction, and this signal must be assigned.
  • FIG. 9 shows the logic generated by the TEE compiler to combine exception signals from multiple TEE blocks and to prioritize between exceptions when more than one are signaled by a single instruction.
  • the exception signal may also be given a stage number i the schedule declaration.
  • the core processor processes all exceptions in its M pipeline stage.
  • the stage number specified by the schedule declaration is checked to ensure that it is less than or equal to the stage number of the M-stage, and if not an error is signaled at compile time. If the specified stage number is less than or equal to the stage number of the M-stage, then the stage number of the M-stage is used instead. Thus, the logic of FIG. 9 is evaluated in the M-stage.
  • the exception signal generated by each semantic block is ANDed with the OR of the one-hot instruction decode signals that declare the exception signal in their interface section (this allows the TEE code to only produce a valid exception signal when instructions that raise that exception are executed).
  • all of the exception signals are ORed to produce a single signal indicating that some exception is occurring. This signal is processed by the core as in the prior art.
  • a priority encoder is used to determine which exception code will be written into the core processor's EXCCAUSE register.
  • the list of lower priority exceptions is used to form a directed ⁇ r mh ⁇ if Q r-vrlp ic Hftpp.tpH it ii r-.nn ⁇ iHprpfl a ⁇ omnile-time error).
  • exception excl can be raised by instl in Cl and by inst4 in C3, exc2 by inst2 in C3, and exc3 by inst3 in C2.
  • all exception signals are generated in their declared stages and pipelined forward to the commit stage at which point the exception cause value is computed by selecting the exception code by the priority of exception signals as specified in the above TEE description.
  • the exception signal Exception and the cause signal ExcCause feed to the core. Once an exception is handled, the core will issue a signal back to TEE logic to kill all the instruction in the pipeline and effectively clear the remaining unhandled exceptions.
  • FIG. 10 shows a circuit described by the code below which has two exceptions and some instructions that generate one exception and one that generates both.
  • Overflow is lower-priority than Divide by Zero (actually both cannot occur at the same time in a divide, so the relative priority is irrelevant).
  • each pictured semantic block generates some subset of the total set of TIE exceptions; thus, exact wirings are input-dependent.
  • exception outputs are pipelined to the resolution stage by the TEE schedule mechanism.
  • FIG. 10 shows an arrangement in which all TEE exceptions have a single fixed priority relative to all core exceptoins.
  • a straightforward extension would allow the TIE exception statement to refer explicitly to various core exceptions.
  • the TEE compiler would then be able to generate a priority encoder than combines TEE and core exceptions.
  • the preferred embodiment allows the specification of two sets of semantics.
  • One set is called the reference semantics.
  • This semantic definition is generally written for clarity to define the expected operation of the instruction.
  • the second set of semantics, implementation semantics is for hardware implementation.
  • Reference semantics are simple and direct. The semantic description, however, has to concern itself with the implementation efficiency, specifically in this case to share the adders required by the three instructions. To do this, it relies on the mathematical identity that subtracting a number is the same as adding the bit-wise complemented number and a constant of 1. Reference semantics also allow an instruction set to be defined once, via the reference semantics, and then implemented multiple times with different sets of implementation semantics. Having a single ISA definition with multiple implementations is common practice in the industry, though usually the reference semantics are defined only in the ISA documentation instead of formally. The preferred embodiment reverses this typical procedure and defines the reference semantics formally and derives the documentation from the TEE specification, rather than vice versa.
  • the circuits generated from reference and implementation semantics are in general not equivalent. For a given instruction, only a subset of output signals will be set. For the rest of the output signals, the reference and implementation semantics may choose to assign different values based on cost criteria or ease of description because they are logically "don't cares", i.e., they are unused.
  • the preferred embodiment solves this problem by creating additional logic such that the output signals produced by a particular instruction are unchanged and the rest of output signals are forced to a particular logic value such as 0, as illustrated in FIG. 11.
  • a typical instruction set reference manual can include for each instruction its machine code format; its package; its assembler syntax; a synopsis (a one-line text description of the instruction); a full text description of the instruction; and a more precise operational definition of the instruction, as well as additional information such as assembler notes and exceptions associated with the instruction. All of the information necessary to generate the machine code format is already found in the TEE specification since it contains the opcode bits and the operand fields. Similarly, the assembler syntax is derived from the mnemonic and operand names. The TIE reference semantics become the precise definition. Only the synopsis and text description are missing. The preferred embodiment therefore adds constructs to TIE to allow the instruction set designer to specify the synopsis and text description.
  • the TEE package specification has the format package ⁇ pname> ⁇ string> endpackage ⁇ pname>
  • the package name ⁇ pname> is associated with all instructions defined between package and endpackage. Packages have other uses than for documentation, as described below.
  • the ⁇ string> parameter gives the name of package for documentation purposes (it may have spaces).
  • the TEE synopsis specification has the format synopsis ⁇ iname> ⁇ string> where ⁇ string> is a short (approximately half a line) description of the instruction. No formatting control is required in this text. This text is typically used for headings in books and additional material in instruction lists.
  • the TIE description specification has the format description ⁇ iname> ⁇ string> where ⁇ string> is a long (usually several paragraphs) string containing text describing the operation of the instruction in English or another natural language. There is a need for text formatting commands in this text.
  • the preferred embodiment implements an HTML-like language (the specification for HTML may be found, e.g., at http://www.w3.org/TR/REC-html40).
  • two optional documentation strings are supported: assembly_note ⁇ iname> ⁇ string> implementation_note ⁇ iname> ⁇ string>
  • These optional specifications provide additional per-instruction text.
  • HTML Like HTML, two sorts of formatting controls are supported: elements and character entities.
  • the intent is to specify the attributes of the data and not its exact appearance. The data will be rendered suitably for the output medium based on its attributes.
  • the character entity & ⁇ name> specifies characters not available in ASCII or that should use special rendering.
  • Elements represent HTML-defined entities such as paragraphs, lists, code examples, etc. Quoting from the HTML 4.0 specification, "[e]ach element type declaration describes three parts: a start tag, content, and an end tag. The element's name appears in the start tag (written ⁇ ELEMENT-NAME>) and the end tag (written ⁇ /ELEMENT-NAME>); note the slash before the element name in the end tag.”
  • ⁇ ELEMENT-NAME>DOCUMENTATION ⁇ /ELEMENT-NAME> specify a format to be applied to DOCUMENTATION.
  • the end tag ( ⁇ /ELEMENT-NAME>) is never optional.
  • tags block and inline.
  • Block tags specify paragraph-like structure and inline tags are used to specify the formatting of text within those paragraphs.
  • Inline TAGs may be nested.
  • Block tags may not be nested, except for LI within UL.
  • HTML documentation As part of a program such as the one in Appendix C that assembles an HTML page for each instruction, and an index of instructions.
  • HTML documentation can be used to establish an on-line reference manual for processor users.
  • a program for doing this in the preferred embodiment is written in the Perl programming language and works by creating a index .
  • html file with an HTML table of two columns, one for the mnemonics and one for the synopsis text string. The rows of the table are filled by processing the instructions in sorted order.
  • the instruction mnemonics are HTML-linked to a page created for each instruction.
  • the per-instruction page begins with an HTML level-1 heading ("HI") giving the mnemonic and synopsis.
  • HTML level-2 headings (“H2”
  • the first section labeled "Instruction Word” gives the machine code format represented by a HTML-table with one column per bit.
  • Opcode bits ('0' or '1') are inserted in the corresponding table cells.
  • Operand fields are filled in with the field name. Fields that span multiple adjacent bits use the COLS PAN feature of HTML tables to avoid repetition.
  • the bits of the machine code box are numbered using a table row above, and the field widths are given in a row below.
  • the second section labeled "Package” gives the TEE package name that defines the instruction.
  • a simple hash is used to translate the package name from an identifier to the documentation string.
  • the package name itself is output inside of an HTML paragraph block-element ("P").
  • the third section labeled "Assembler Syntax" gives the assembly language format used to code the instruction. This consists of the instruction mnemonic, a space, and then the operand names separated by commas. Register operand names are formed by concatenating the short name of the register file with the field name. Immediate operand names are just the immediate name from TIE.
  • the assembler syntax is output inside of an HTML paragraph block-level element ("P") using an HTML code inline-element ("CODE").
  • CODE HTML code inline-element
  • the fourth section contains the text description, translated from TIE to HTML. Because TEE's formatting codes are similar to HTML's, this translation is fairly simple. The primary need is to translate the INSTREF element into an HTML link to the named instruction.
  • An optional fifth section labeled "Assembler Note”, contains that text translated from TIE to HTML.
  • the sixth section contains a list of exceptions that this instruction can raise. Load and Store instructions automatically have the LoadStoreError exception added to the list by the TEE compiler. Other exceptions are listed ifthe corresponding exception signal is listed in the signal list section of the instruction's iclass. Exceptions are listed in priority order (the result of the topological sort described above).
  • test case list contains that text translated It is possible to also copy the test case list from the TEE specification as described below into the documentation since this is sometimes useful to the reader.
  • ⁇ PxCODE>GFADD8 ⁇ /CODE> performs a ⁇ -bit Galois Field addition of the contents of GF registers ⁇ CODE>gs ⁇ /CODE> and ⁇ CODE>gt ⁇ /CODE> and writes the result to GF register ⁇ CODE>gr ⁇ /CODE> .
  • a development that makes embodiments of the present invention less sensitive to processor configuration options which change program execution characteristics is the ability to define a field as a sub-field of another field. This is in contrast to prior configurable processor systems which restricted the definition of fields to specified parts of instruction words, and did not permit them to be defined as parts of other fields. The ability to define fields as parts of other fields allows the software to in part be independent of the endianness of the configured processor.
  • a new field tlO that corresponds to the first two bits of the t field can only be defined wiith either of the following TEE statements: field tlO inst [5 : 4 ⁇ /* for field memory order */ or field tlO inst[15;14] /* for big . endian memory order */
  • the present invention allows 110 to be defined as follows: field tlO t [l : 0] Since t is defined by the processor core to be inst [7 : 4] for little endian and inst [17 : 14] for big endian, 110 is now independent of the memory order. Test Cases
  • the first is to ensure the correctness of the interface between core and TEE blocks and the user-defined states and register files.
  • the second is to verify the correctness of translation of the user semantics into hardware, in other words, the TEE compiler.
  • the first does not depend on the TEE instruction semantics, and it can be derived from the properties of the TEE specification.
  • TEE compiler generates the ISA description for the user instructions.
  • the diagnostic generator for TEE reads the ISA description of the TIE instructions. This also includes knowledge about the user-specified states and register files. This information is used the by the generator to create some meaningful set of diagnostics for the user TIE.
  • the reference semantics provide a method of verification for the implementation semantics.
  • the reference semantics are verified by using them in the target application. As described in the
  • the application is modified by the designer to use the new instructions via intrinsics.
  • the modified application and the instruction definitions are tested together either in the simulator or natively.
  • Native execution is facilitated by the ability of the TIE compiler (as in the prior art) to create conventional programming language (e.g., C) definitions of the intrinsics as functions.
  • the use in the target application is usually the best test of instruction definitions.
  • the designer is unsure if the application covers all of the cases that must be handled by the instruction. This is important if the application may change after the processor is generated,' or if new applications will use this processor. En this case, it is desirable to have other ways to test the instruction.
  • the instructions of a processor are usually tested by the running of hand-written diagnostics that execute the instruction with a selected set of source operand values and check the result operands for the expected value. The preferred embodiment automates this process by exploiting the additional information that is available from the TEE specification.
  • the TEE iclass specification lists all of the inputs and outputs of each instruction, whether register file operands, immediates, or processor state registers.
  • ⁇ oname> is the name of an operand or state register
  • ⁇ value> is the corresponding input value (for in or inout operands or registers in the test in list) or expected value (for out or inout operands, registers, or exception signals in the test out list).
  • the TIE compiler produces a test program in a conventional programming language (e.g., C) that the in and inout processor registers to the values in the test in list using the WUR intrinsic and the number declared with the TEE user_register construct described in the Wilson et al. application. It then sets up the in and inout register file operands using the intrinsics specified by the proto declaration for loading registers. Operands in core register files (e.g., the AR's in the preferred embodiment) use built-in language types. Next, the TEE compiler invokes the intrinsic with the operands listed in the order specified by the iclass. Next, the out and inout operands specified in the test out list are read and compared to the given expected values. Finally, the processor registers in the test out list are read using the RUR intrinsic and the register number for the user_register construct, and these values are compared to the given values.
  • C conventional programming language
  • This automatically generated programming language diagnostic may be run either in the instruction set simulator, or on the hardware RTL model or natively using the intrinsic-emulating functions generated by the TEE compiler by translating to the target programming language.
  • GFADD8 (gr , gs , gt) ;
  • the HDL simulator is in many cases too slow to run the application. It is therefore desirable to have a method for extracting tests from the application running natively or in the instruction set simulator.
  • the TEE compiler therefore should have an option to augment its translation of the input semantics to the application programming language with code that writes the input and outputs operands of instructions to a file.
  • This file can then be post-processed by eliminating duplicates and then using .statistical sampling to extract a number of test cases that is reasonable to simulate in the
  • a configuration is essentially a list of parts and attributes of the processor core that can customized by the user through a web-based interface. These processor attributes are referred to as configuration parameters.
  • the complete list* of the configuration parameters along with their default values and the ranges the values can assume define the configuration space of the processor core.
  • a concrete instantiation of the processor core that is, an instance of the core in which all the configuration parameters have been assigned concrete values, is a core configuration.
  • both the configuration space and concrete core configurations are represented as text files that list the configuration parameters and their values.
  • tpp provides a handle to the configuration environment enabling the developer to programmatically access the configuration information, as well as easily compute parts of the source code.
  • the computation is performed in the configuration environment and, thus, it is shared across all configured sources, developing configurable source code is simplified.
  • a PERL library for describing the ISA has been developed.
  • the TEE compiler is run to create the PERL objects for the user-defined instructions and this is added to the core ISA. From there on, all the verification tools query these PERL objects to get the ISA and pipeline information of the user-defined TEE.
  • the TEE compiler generates the following information about the TEE user state and the semantic of the instruction using it :
  • the instruction has two register operands, both of type AR, based on which it is possible to do some random register allocation, or even better, some intelligent register allocation, since the output and input fields are known. It is therefore possible to automatically generate assembly code for this instruction, such as ace $a7 , $al3 where a7 and al3 are the s and t fields of the instruction ace generated by a register allocation algorithm that looks at the regfile definition for AR.
  • opcode il281, package : UserDefined, size : 24, load Register Operands :
  • the algorithm used by the diagnostic generator for generating a correct TIE instruction is as follows:
  • TEE compiler provides this information and it is represented in PERL objects and used by the verification tools. Taking the following example with a user-defined register file and a set of instructions which simply moves data at different stages of the pipeline, note the convention 1 : E stage, 2 : M stage, 3 : W stage:
  • stage 1 Inst il28s: Field t
  • a goal of this section is to generate micro-architectural diagnostics for the TEE logic based on the knowledge of the implementation of the interface between TIE and the core, as well as that of TEE state and register file, if any.
  • the ISA and pipeline description of the TIE itself are used; however, as mentioned earlier, the "correctness" of the implementation of TEE instruction is not verified in the test directly.
  • a set of MVP diagnostics are generated to test the following aspects of the implementation:
  • control logic in the core/tie interface control logic in the core/tie interface
  • Exceptions, interrupts and replay signals are tested by generating tests where every user instruction is killed by an control flow change in the core (e.g., a branch), exception and replay signals.
  • the instruction should be killed in all stages of its execution, right up to the completion stage.
  • TIE instruction controls the stage of TEE instruction execution at which it gets killed.
  • the algorithm for the hazard cases is derived similarly to that of the bypass case described above. There are two instructions that write the same regfile in stages 2 and 3, followed by an instruction that reads it in stage 1. The third instruction stalls for the result of the second write.
  • the expected result of the load depends on the load semantics, and although it can be determined for most cases, it may not be possible to do so for all possible semantics, in which case it is necessary to leave the checking to the state and memory compare.
  • Data breakpoints for TIE load/store instructions are also tested for TEE load/store instructions in the case where the configuration supports data breakpoints. The details of how the data breakpoints work for TEE instructions can be found in the load/store architecture section.
  • Random Diagnostic Generators for TIE Instructions Random diagnostics play a major role in the verification of the core ISA, and the microarchitecture of the implementation as well. The random sequence of instructions are likely to hit boundary cases and other scenarios that are unlikely to be covered by a directed test. They also adds to the coverage metrics for the design verification. Additional intelligence has been added to these random generators by adding some features.
  • templates of instruction sequences can be created to target specific interesting scenarios.
  • An example of this can be back-to-back stores that fill up the write-buffer, or a zero-overhead loop with a single instruction.
  • Relative probabilities attached to each type of instruction or instruction sequence can decide how often one wants to generate a particular kind of instruction; for example, if a branch instruction has a high relative probability (or weight), the test generated will have more branches.
  • User-controlled parameters can tune the nature of tests generated. For example, command line arguments can control the relative weight of certain instructions, the length of tests, the number of nested function calls, etc.
  • the random diagnostic generators can generate user-defined TEE instructions as well.
  • the underlying mechanism is similar to that of the microarchitectural tests.
  • the random generators read the ISA description that includes TIE instructions as well as the core ISA.
  • Valid TEE instructions are constructed by looking at the ISA description of a particular TEE instruction, and employing some register allocation mechanism: foreach operand ( tie_instr->operands () ) ⁇ if ( operand is TIE register file ) ⁇ do a random register allocation random (0, #entries in register file)
  • the random generators are preferably not accessible by end-users of the configuration system but are employed for internal verification and for a whole range of TEE descriptions such as those described above and further including exhaustive cases of TIE register files of varying widths, such as 8, 16, 32, 64, 128 bits, and states. Additionally, end-users may be given access to the random generators for use in further verification. Coverage Measurements for TIE Verification As stated above, a goal of this verification effort is to ensure the correctness of the core and TEE interface, the implementation of the user-defined state and register file and associated logic and the correct translation of the TIE instruction into hardware. Some coverage metrics of these areas are necessary.
  • FIG. 13 shows such a general purpose register file, and the implementation in hardware.
  • the figure shows one read port RdO and one write port Wd.
  • the naming convention for the signals is :
  • port_r_ai ⁇ e name of the register file port ( RdO , Rdl , Wd)
  • signal_name the signal names are: read port: mux. output of mux, da a: output of a flip-flop that goes to the datapath unit of TEE write port: mux. output of a mux, da ta: output of the datapath unit result: output of a flip-flop stage_name: this indicates the stage of the pipeline.
  • the convention here is: CO: R stage, Cl: E stage, C2: M stage, C3: W stage
  • the block diagram shows the different bypass paths for these stages.
  • RdO which is read by the datapath in stages 1 and 2 (this was represented as the use of the register file in the previous sections), the following traces or explains the block diagram:
  • RdO_mux_CO select from ( Wd_data__C2 : the result produced by the instr last in the pipeline Wd_data_Cl : the result produced by the instr before last in the pipeline Rd0_data_C0 : The current data in the register file )
  • RdO_mux__Cl select from ( Wd_data_C2 : the result produced by the instr last in the pipeline
  • RdO_data_Cl the result of the previous stage )
  • the write port Wd which is written in stages 2 and 3, has a similar bypass path:
  • Vld_result_C3 ⁇ Wd_nux_C2
  • Wd_ resul t_C3 is written to the register file.
  • a goal of the preferred embodiment is to generate a monitor that checks if all the bypass paths in the above block diagram have been exercised.
  • An example bypass path is traced in the dashed path in FIG. 13.
  • the monitor essentially traces the data through the paths, and hence it is necessary to make a very important assumption, which is that the data remains unchanged in the datapath unit of TEE. This means that the following check can be performed:
  • Identity 1 use Cl , def Cl : which reads the register file in the E stage, and produces the same data in the E stage; and Identity 2: use Cl , def C2 : which produces data after a cycle delay.
  • the path that is shown in dashed lines in FIG. 13 is generated as a signal list or trace from the above algorithm as : il28_wd_d ta_C2-> il28_rdO_mux_CO - > il28_rdO_data_Cl -> wai tcyclesl ->
  • i 128 is the register file name.
  • the path to the TEE register file ill28 from the top level of Xtensa is prepended to this. Notice that the dashed line from RdO_data_Cl -> Wd_data_C2 in the datapath in FIG. 13 has been represented as wait cycles 1 in the signal trace. A list of such signal traces are generated for all the bypass paths. Based on the signal trace, a small monitor module is generated in Verilog/Vera that checks if this path has been traced. If so, it reports a 1 for this path at the end of the simulation.
  • the state machine generated for the example bypass path is: case ( state)
  • the TIE coder modifies the application to use the new instructions using intrinsics and then either (1) compiles this to machine code and runs the application with the instruction set simulator or (2) compiles to native code and uses the macros and functions output by the TEE compiler to provide intrinsic compatibility.
  • the correctaess of the application verifies the correctaess of the instruction reference semantics with either of these two options.
  • the translation of the reference semantics is verified by option 2, and the correctaess of the extended compiler and simulator is verified by option 1. Additional coverage beyond that provided by the application is by the use of the test case TEE construct to generate tests of specific cases (e.g., unusual or "corner" cases).
  • the implementation semantics may be verified by using a TEE compiler option to translate these instead of the reference semantics using the same methods as above.
  • the implementation semantics and their translation to HDL may also be formally verified similar to the reference semantics by commercial equivalence checking tools working on the translation of each to HDL.
  • Implementation semantics and their translation are also checked by the use of the TEE-specified test cases run in the HDL simulator.
  • the HDL generated by the TEE compiler for the register files, interlock, bypass, core interface, and exceptions is verified by running automatically-generated tests based on the TIE input and using cosimulation to verify the results. These tests use the pipeline specification to exhaustively test all combinations of interlock, bypass, and exceptions.
  • the HAL code generated by the TEE compiler is verified by executing it in the instruction set simulator.
  • the assembler and compiler support for the new instructions is verified by most of the above. Cosimulation of Processors
  • Co-simulation is the process of running the RTL and the reference model in parallel, and comparing the architecturally visible states defined in the ISA at specified boundaries.
  • cosimulator acts as the synchronizer and the gateway between the RTL simulator, the ISS, and multiple other monitor/checker tasks that are executed in parallel.
  • a diagnostic fails as soon as a mismatch occurs between the RTL and the ISS or when an assertion checker signals a catastrophic event.
  • cosimulation provides easier debugging of failing diagnostics. It causes the simulation to stop at (or near) the cycle where the problem appeared, which significantly reduces debugging time and effort.
  • the ISS is the reference model and the boundaries are defined on instruction retirements and whenever external events occur.
  • the set of architecturally visible states to be compared is configurable.
  • One of the challenges of using cosim with configurable processors is the absence of complete knowledge regarding the process of comparing RTL and ISS. What is known about comparing RTL and ISS is that the comparison needs to occur on instruction retirement boundaries and on occurrences of external events.
  • the processor state that should be compared between RTL and ISS depends on the processor options the user elects to include in her configuration. When a processor option is not included in a specific configuration of the processor core, then the cosim environment should not even attempt to compare the state introduced by the option, since the state is not present in either the RTL or the ISS.
  • the preferred embodiment uses a cosim environment that is configurable and which is customized along with the software and hardware during the processor configuration. How the Cosim Works with TIE The ability of the user to extend the processor state as well as the instruction set using TEE complicates the cosim process since the cosim environment needs to be developed with no complete prior knowledge of the processor states and instruction set. En the presence of TEE, the cosim environment needs to be able to determine the new processor state that should be compared/validated as well as decide the boundaries at which the new state will compared between the RTL and ISS. In order for cosim to be able to achieve these two requirements/goals, it requires information regarding the new processor state defined in TEE.
  • the information required by cosim includes the names of the new states, the width of the state elements, the complete RTL hierarchy (path) defining the states, whether the state is defined on reset or not, whether it is an individual state or a register file, and the number of entries when the state is a register file.
  • the information required by cosim is generated from the user's TEE description in three steps.
  • the TIE compiler parses the TEE description and generates an intermediate representation of the states defined in the input file. This intermediate representation is subsequently used by the cosim preprocessor to generate the cosim source code necessary for the verification of the new TIE state.
  • the generated cosim code is integrated with the rest of the cosim framework to produce the cosim environment specific to the given configuration. This is preferably done using tpp to generate code in the VeraTM cosimulation language as implemented in, e.g., the VeraTM System Verifier by Synopsys, Inc. of Mountain View, CA.
  • the following section contains examples of the cosim preprocessor and the generated cosim source code obtained in connection with the Galois field TIE example presented earlier.
  • # TieRegister contains all the TIE register files names # # #
  • TEE state can be arbitrarily wide, an interface is needed to register values that are arbitrarily sized, but it is preferred that the interface not be used all the time for performance reasons. Because of this, the registers are partitioned into classes, and the gdb and cosim interfaces are modified so that they can find a class and an index within a class from a single integer code. The socket interface is changed so that arbitrary width values can be transmitted and received. New memory interfaces are added to support wide loads and stores. The initialization of TEE state is generalized to support register files and assignment of registers to coprocessors. Support for simulating pipeline delays associated with access of TEE state is also added. The interface to TEE state is modified to simulate the CPENABLE exception.
  • Design Compiler Synthesis gf.v Verilog source file gf_check.dcsh Syntax check generated verilog gf.dcsh Top-level Design Compiler synthesis script
  • TIE_opt.dc supporting script xmTIE_cons . dc supporting script prim.v supporting Verilog source file
  • Verysys Verification verysys subdirectory supporting Verysys verification verysys/verify_sem.v Verilog source generated from semantics verysys/verify_ref .v Verilog source generated from reference
  • Xtensa tool support libisa-gf .so dynamically linked library for xt-gcc libiss-gf.so dynamically linked library for xt-run xtensa-gf.h macro definitions of new instructions
  • tie opcode GFADD8 op2 4'b0000 CUSTO opcode
  • GFMULX8 op2 4'b0001 CUSTO opcode
  • GFRWMOD op2 4'b0010
  • CUSTO opcode GFADD8I op2 4'b0100
  • schedule gfloadu ⁇ LGF ⁇ .IU ⁇ ⁇ use imm ⁇ 0; use ars 1; def ars 1; def gt 2;
  • I “Load Galois Field Register Immediate” synopsis LGF8.
  • IU “Load Galois Field Register Immediate Update” synopsis LGF8.
  • X “Load Galois Field Register Indexed” synopsis LGF8.XU “Load Galois Field Register Indexed Update” synopsis SGF8.
  • GFADD8 ⁇ /C0DE> performs a 8-bit Galois Field addition of the contents of GF registers ⁇ C0DE>gs ⁇ /C0DE> and ⁇ CODE>gt ⁇ /CODE> and writes the result to GF register ⁇ CODE>gr ⁇ /CODE>. ⁇ /P>" description GFADD8I
  • ⁇ P> ⁇ C0DE>GFADD8I ⁇ /C0DE> performs a ⁇ -bit Galois Field addition of the contents of GF register ⁇ CODE>gs ⁇ /CODE> and a 4-bit immediate from the ⁇ CODE>t ⁇ /CODE> field and writes the result to GF register ⁇ CODE>gr ⁇ /COD ⁇ > .
  • ⁇ /P> description GFMULX8
  • ⁇ PxCODE>GFMULX8 ⁇ /CODE> performs a ⁇ -bit Galois Field multiplication of the contents of GF register ⁇ CODE>gs ⁇ /CODE> by ⁇ I>x ⁇ /I> modulo the polynomial in ⁇ CODE>gfmod ⁇ /CODE> . It writes the result to GF register
  • ⁇ P> ⁇ CODE>GFRWMOD ⁇ /CODE> reads and writes the ⁇ CODE>gfmod ⁇ /CODE> polynomial register.
  • GF register ⁇ CODE>gt ⁇ /CODE> and ⁇ CODE>gfmod ⁇ /CODE> are read these are written to ⁇ CODE>gfmod ⁇ /CODE> and ⁇ CODE>gt ⁇ /CODE> .
  • ⁇ /P> description LGF ⁇ .I
  • Vl_t VAddr ⁇ ⁇ 0 ⁇ ⁇ ;
  • Vl_t VAddrBase ⁇ o ⁇
  • Vl_t VAddrOffset ⁇ ;
  • Vl_t VAddrlndex ⁇ o ⁇
  • Vl_t VAddrln ⁇ ⁇ 0 ⁇ ⁇ ;
  • Vl_t LSSize ⁇ 0 ⁇
  • Vl_t LSIndexed
  • V4_t MemDataInl2 ⁇ ⁇ ⁇ , 0,0,0 ⁇ ⁇ ;
  • V2_t MemDataIn64 ⁇ 0,0 ⁇ ;
  • Vl_t MemDataIn32 ⁇ ;
  • Vl_t MemDataInl6
  • Vl_t MemDataln ⁇
  • V4_t MemDataOutl28 ⁇ 0,0,0,0 ⁇ ;
  • V2_t MemDataOut64 ⁇ , ⁇ ;
  • Vl_t MemDataOut32 ⁇ ° ⁇
  • Vl_t MemDataOutl ⁇ ⁇ ° ⁇ j
  • Vl_t MemDataOut ⁇
  • Vl_t Exception
  • Vl_t ExcCause
  • Vl_t CPEnable
  • VAddrln. data [0] VAddrBase. data [0] + VAddrlndex. data [0] ; ⁇ else ⁇
  • VAddrIn. data [0] VAddrBase. data [0] + VAddrOffset .data [0] ; ⁇ ⁇
  • VAddrIn_get ( ) VAddrIn_get ( ) ;
  • MemDataIn64.data [0] data [0] ;
  • MemDataIn64.data [1] data [1] ; ⁇ else ⁇
  • MemDataIn64. at tl] data[3];
  • VAddrIn_get ( ) VAddrIn_get ( ) ;
  • MemDataIn32.data[0] data [1] ; ⁇ else ⁇
  • VAddrIn_get () VAddrIn_get () ;
  • MemDataInl6.data[0] dat [3] >> 16; ⁇
  • VAddrIn_get () VAddrIn_get () ;
  • MemDataln ⁇ .data [0] data[l] >> 24; ⁇ else ⁇
  • MemDatalnS . data [0] data [3] >> 24; ⁇ void
  • VAddrIn_get () VAddrIn_get () ;
  • VAddrIn_get () VAddrIn_get () ;
  • VAddrIn_get () VAddrIn_get () ;
  • VAddrIn_get () VAddrIn_get () ;
  • VAddrIn_get () VAddrIn_get () ;
  • WURO (v) break; ⁇ default: ⁇ fprintf (stderr, "Error: invalid wur number %d ⁇ n", n) ; ⁇ exit(-l) ; ⁇ ⁇ gf ⁇
  • Vl_t gr_kill_o ⁇ ;
  • Vl_t GFADD8 ⁇ l ⁇
  • Vl_t gr_o Vl_t gs_i; Vl_t imm4;
  • Vl_t gr_kill_o
  • Vl_t GFMULX8 ⁇ l ⁇
  • Vl_t gt_kill_o
  • Vl_t GFRWMOD8 ⁇ l ⁇
  • Vl_t art_i

Abstract

A system for generating processor hardware supports a language for significant extensions to the processor instruction set, where the designer specifies only the semantics of the new instructions and the system generates other logic. The extension language provides for the addition of processor state, including register files, and instructions that operate on that state. The language also provides for new data types to be added to the compiler to represent the state added. It allows separate specification of reference semantics and instruction implementation, and uses this to automate design verification. In addition, the system generates formatted instruction set documentation from the language specification.

Description

AUTOMATED PROCESSOR GENERATION SYSTEM FOR DESIGNING A CONFIGURABLE
PROCESSOR AND METHOD FOR THE SAME
Background OfThe Invention
1. Field of the Invention The present invention is directed to computer processors as well as systems and techniques for developing the same, and is more particularly directed to processors which have features configurable at the option of a user and related development systems and techniques.
2. Background of the Related Art
Prior art processors have generally been fairly rigid objects which are difficult to modify or extend. A limited degree of extensibility to processors and their supporting software tools, including the ability to add register-to-register computational instructions and simple state (but not register files) has been provided in certain prior art systems. This limited extensibility was a significant advance in the state of the art; many applications using these improvements see speedups or efficiency improvements of four times or better. However, the limitations on extensibility of these prior art systems meant that other applications could not be adequately addressed. In particular, the need to use the existing core register file, with its fixed 32-bit width registers, generally prevents the use of these improvements in applications that require additional precision or replicated functional units where the combined width of the data operands exceeds 32 bits. In addition, the core register file often lacks sufficient read or write ports to implement certain instructions. For these reasons, there is a need in the art to support the addition of new register files that are configurable in width and in number of read and write ports.
With the addition of register files comes the need to transfer data between these files and memory. The core instruction set includes such load and store instructions for the core register file, but additional register files require additional load and store instructions. This is because one of the rationales for extensible register files is to allow them to be sized to required data types and bandwidths. In particular, the width of register file data may be wider than that supported by the rest of the instruction set. Therefore, it is not reasonable to load and store data by transferring the data to the registers provided by the core; it should be possible to load and store values from the new register file directly. Further, although prior art systems support the addition of processor state, the quantity of that state is typically small. Consequently, there is a need in the art for a larger number of state bits to be easily added to the processor architecture. This state often needs to be context switched by the operating system. Once the quantity of state becomes large, new methods that minimize context switch time are desirable. Such methods have been implemented in prior art processors (e.g., the MIPS R2000 coprocessor enable bits). However, there is a need in the art to extend this further by generating the code sequences and logic automatically from the input specification to support realtime operating systems (RTOSes) and other software which need to know about new state and use it in a timely manner.
Further, prior art processors do not allow for sharing of logic between the core processor implementation and instruction extensions. With load and store instruction extensions, it is important that the data cache be shared between the core and the extensions. This is so that stores by newly- configured instructions are seen by loads by the core and vice versa to ensure cache coherency -- separate caches would need special mechanisms to keep them consistent, a possible but undesirable solution. Also, the data cache is one of the larger circuits in the core processor, and sharing it promotes a reduction in the size of the core processor.
The addition of register files also makes it desirable to support allocation of high-level language variables to these registers. Prior art processors use the core register file to which prior art compilers already support allocation of user variables. Thus, compiler allocation is expected and should be supported for user-defined register files. To allocate variables to registers, a compiler supporting user-defined register files requires knowledge of how to spill, restore, and move such registers in order to implement conventional-compiler functionality.
A related but more general limitation of prior art processor systems is the level of compiler support therefor. Often instructions are added to a processor to support new data types appropriate to the application (e.g., many DSP applications require processors implementing saturating arithmetic instead of the more conventional two's complement arithmetic usually supported by processors).
Prior art systems allow instructions supporting new data types to be added, but it is necessary to map these new instructions to existing language data types when writing high-level language code that uses the extensions. In some cases an appropriate built-in data type may not exist.
For example, consider the saturating arithmetic example. As noted above, many DSP algorithms take advantage of arithmetic that saturates at the minimum value on underflow or maximum value on overflow of the number of bits used instead of wrapping, as in traditional two's complement systems. However, there is no C data type that has these semantics -- the C language requires that int a ; int b ; int c = a + b ;
have wrapping semantics. One could write int. a; int b ; int c = SATADD (a , b) ; instead using built-in types with new intrinsic functions, but this is awkward and obscures the algorithm (the writer thinks of the SATADD function simply as +).
On the other hand, adding new data types allows the + operator to function differently with those types — C already applies it to different operations for integer addition and floating-point addition operations, so the extension is natural. Thus, using new data types saturating addition might be coded as dspl6 a; dsplδ b; dsplδ c = a + b ;
where dsplδ defines a saturating data type. Thus, the last line implies a saturating add because both of its operands are saturating data types.
Most compilers schedule instructions to minimize pipeline stalls. However, with prior art systems there is no way the instruction specification may be used to extend the compiler's scheduling of data structures. For example, load instructions are pipelined with a two-cycle latency. Thus, if you reference the result of a load is reference on the next instruction after the load, there will be a one- cycle stall because the load is not finished. Thus, the sequence load rl, addrl store rl, addr2 load r2, addr3 store r2, addr4
will have two stall cycles. Ifthe compiler rearranges this to load rl, addrl load r2, addr3 store rl, addr2 store r2, addr4
then the sequence executes with no stall cycles. This is a common optimization technique called instruction scheduling. Prior art instruction scheduling requires tables giving the pipe stages that instructions use their inputs and outputs but does not make use of such information for newly- added instructions.
Another limitation of the prior art is that the computation portion of added instructions must be implemented in a single cycle of the pipeline. Some computations, such as multiplication of large operands, have a logic delay longer than the typical RISC pipeline stage. The inclusion of such operations using prior art techniques would require that the processor clock rate be reduced to provide more time" in which to complete the computation. It would therefore be desirable to support instructions where the computation is spread out over several pipeline stages. In addition to allowing the computation to be performed over multiple cycles, it could be useful to allow operands to be consumed and produced in different pipeline stages.
For example, a multiply/accumulate operation typically requires two cycles. In the first cycle, the multiplier produces the product in carry-save form; in the second cycle the carry-save product and the accumulator are reduced from three values to two values using a single level of carry-save-add, and then added in a carry-propagate-adder. So, the simplest declaration would be to say that multiply/accumulate instructions take two cycles from any source operand to the destination; however, then it would not be possible to do back-to-back multiply/accumulates into the same accumulator register, since there would be a one-cycle stall because of the two-cycle latency. In reality, however, the logic only requires one cycle from accumulator in to accumulator out, so a better approach is just to provide a more powerful description, such as D ÷- A + B * C being described as taking B and C in stage 1, taking A in stage 2, and producing D in stage 3. Thus, the latency from B or C to D is 3 - 1 = 2, and the latency from A to D is 3 - 2 = 1. With the addition of multi-cycle instructions, it also becomes necessary to generate interlock logic appropriate to the target pipeline for the added instructions. This is because with one instruction per cycle issue, no latency one instruction can produce a result that will cause an interlock on the next cycle, because the next instruction is always delayed by one cycle. In general, if you can only issue instructions only every K cycles, the latency of those instructions is L cycles and L > K, then those instructions cannot cause interlocks on their destination operand (instructions can still interlock on their source operands if their source operands were produced by a two-cycle instruction such as a load). If it is possible to have two-cycle newly-configured instructions, there is a need to have following instructions that interlock on the result of the newly-configured instructions.
Most instruction set architectures have multiple implementations for different processor architectures. Prior art systems combined the specification of the instruction semantics and the implementation logic for instructions and did not separate these, which might allow one set of reference semantics to be used with multiple implementations. Reference semantics are one component of instruction set documentation. It is traditional to describe instruction semantics in both English and a more precise notation. English is often ambiguous or error-prone but easier to read. Therefore, it provides the introduction, purpose and a loose definition of an instruction. The more formal definition is useful to have a precise understanding of what the instruction does. One of the purposes of the reference semantics is to serve as this precise definition. Other components include the instruction word, assembler syntax, and text description. Prior art systems have sufficient information in the extension language to generate the instruction word and assembler syntax. With the addition of the reference semantics, only the text description was missing, and there is a need to include the specification of instruction descriptions that can be converted to formatted documentation to produce a conventional ISA description book.
Processor development techniques including the above features would render design verification methods of the prior art no longer valid due to their increased flexibility and power. In conjunction with the above features, therefore, there is a need to verify the correctness of many aspects of the generated processor, including:
— the correctness of the input reference instruction semantics;
— the correctness of the input implementation instruction semantics;
— the translation by the compiler of instruction semantics to the application programming language;
-- the translation by the instruction semantics compiler to the Hardware Description Language (HDL);
— the translation by the instruction semantics compiler to the instruction set simulator programming language; — the HDL generated by the instruction semantics compiler for the register files, interlock, bypass, core interface, and exceptions;
— any system function abstraction layuers generated during the process, such as the the Hardware Abstraction Layer (HAL) code generated by the instruction semantics compiler (see the aforementioned Songer et al. patent application for further details on the HAL); and — the intrinsic and data type support in the programming language compiler.
The reference semantics are also used in some of the above.
Finally, all of the new hardware functionality must be supported by the instruction set . Summary Of The Invention
In view of the above problems of the prior art, it is an object of the present invention to provide a processor development system which allows extensibility of a wide variety of processor features including the addition of new register files that are configurable in width and in number of read and write ports.
It is a further object of the present invention to provide a processor development system which supports the addition of instructions for transferring data between such new register files and memory. It is another object of the invention to provide a processor development system which supports the sharing of logic between the core processor implementation and instruction extensions, particularly sharing of the data cache between the core and extension instructions.
It isjan additional object of the invention to provide a processor development system which supports compiler allocation of high-level language variables to extended register files, including the ability to spill, restore and move such registers. It is a still further object of the invention to provide a processor development system which supports instructions where computation is spread out over several pipeline stages.
It is another object of the invention to provide, a processor development system which allows operands to be consumed and produced in different pipeline stages. It is an even further object of the invention to provide a processor development system which supports the generation of interlock logic appropriate to the target pipeline for added multi-cycle instructions.
It is yet an additional object of the invention to provide a processor development system which uses instruction specifications to extend its compiler's scheduling of data structures to minimize pipeline stalls.
It is still another object of the invention to support specification of instruction semantics and implementation logic for instructions to allow one set of reference semantics to be used with multiple instruction implementations.
It is another object of the invention to provide a processor development system which can make use of the specification of instruction descriptions for conversion to formatted documentation.
It is yet another object of the invention to provide a processor development system which is able to verify a wide range of extensible features of processor design.
It is still a further object of the invention to provide a processor development system which can generate code sequences and logic for minimal time context switching automatically from the input specification
It is yet another object of the invention to provide a processor development system including an instruction set simulator which can support a wide variety of extέnsible functions as described above. Brief Description Of The Drawings
These and other objects, features, and advantages of the invention are better understood by reading the following detailed description of the preferred embodiment, taken in conjunction with the accompanying drawings, in which:
FIGURES 1 and 2 show control logic associated with a four-stage pipelined extensible register according to a preferred embodiment of the invention;
FIGURE 3 shows a two-stage pipelined version of the register of FIGs. 1 and 2; FIGURE 4 shows interface signals to a core adder according to the first embodiment;
FIGURE 5 shows a prior load aligner and FIGURE 6 shows a load aligner according to the preferred embodiment;
FIGURE 7 shows a semantic block output interface signal according to the preferred embodiment; FIGURES 8(a) - 8(c) show pipeline register optimization according to the preferred embodiment; FIGURE 9 shows exception processing in the preferred embodiment; "
FIGURE 10 shows further exception processing in the preferred embodiment;
FIGURE 11 shows the processing of reference semantic information in the preferred embodiment; FIGURE 12 shows automatically-generated instruction documentation according to the preferred embodiment;
FIGURE 13 shows a TIE verification process according to the preferred embodiment; and
FIGURE 14 shows a cosimulation process in the preferred embodiment. Detailed Description Of Presently Preferred Exemplary Embodiments The invention to a degree builds upon the technology described in the Killian et al. and Wilson et al. applications in which the Tensilica Instruction Set Extension (TIE) language and its compiler and other tools are described. A preferred embodiment of the invention extends the TIE language with new constructs and augmented software tools such as compilers and the like which support these constructs. Extended Register Files
One type of new functionality provided by the preferred embodiment is support for register files. In existing processor art, a register file is a set of N storage locations of B bits each. A field in an instruction selects members of this set as source operand values or destination operand values for the results of the instruction. Typically a register file is designed to support the reading of R of the N members in parallel, and the writing of W of N members in parallel, so that instructions can have one or more source operands and one or more destination operands and still require only one cycle for register file access.
The TIE language construct for declaring a new register file is regf ile <rfname> <eltwidth> <entries> <shortname> where <rf name> is a handle used to refer to the register file in subsequent TIE constructs;
<eltwidth> is the width in bits of a register file element ("register");
<entries> is the number of elements in the register file; and
<shortname> is a short prefix (often a single letter) used to create register names for the assembly language. Register names are <shortname> with the register number appended. The regfile construct does not declare the number of read or write ports; such physical implementation details are left to the TIE compiler as will be described in greater detail below, thereby keeping TIE as implementation-independent as possible and maintaining TIE as a high-level specification description.
As a result of the regfile declaration, the generated processor will include an additional <eltwidth>* <entries > bits of programmer-visible state along with logic to read and write multiple <eltwidth> values of this state. The logic .generation algorithm will be described in greater detail below after other relevant TIE language constructs are described.
The TIE construct operand <oname> <f ieldname> { <rfname> [<f ieldname>] } declares <oname> as a handle for reading or writing register file <rf name> elements designated by field <f ieldname> of the instruction word. This construct is the same as described in the Killian et al. application, except that <rf name> may designate a register file declared with regf ile in addition to the core register file (named "AR"). As described in the Killian et al. application, the <oname> handle is then usable in iclass declarations to describe register file in, out, and inout operands in instructions.
As an example, the TIE specification opcode GFADD8 op2=4'b0000 CUSTO opcode GFMULX8 op2=4'b0001 CUSTO opcode GFR MOD8 op2=4'b0010 CUSTO state gfmod 8 user_register 0 { gfmod } regfile gf 8 16 g operand gr r { gf [r] } operand gs s { gf [s] } operand gt t { gf [t] } iclass gfrrr { GFADD8 } {out gr, in gs, in gt} {} {} iclass gfrr { GFMULX8 } {out gr, in gs} {in gfmod} {} iclass gfr { GFR OD8 } {inout gt} {inout gfmod} {} semantic gfl { GFADD8 } { assign gr = gs gt;} semantic gf2 { GFMULX8 } { assign gr = gs [7] ? ( {gs [6 : 0] , 1 'b'O} Λ gfmod) : {gs[6:0] ,1'bO};
} semantic gf3 { GFRWMOD8 } { wire [7:0] tl = gt; wire [7 : 0] t2 = gfmod; assign gfmod = tl; assign gt = t2; }
implements a simplified Galois-field arithmetic unit on an 8-bit data value (an entire set of TIE files for implementing this example may be found in Appendix A). A 16-entry, 8-bit register file is created (each register holds a polynomial over GF(2) modulo the polynomial stored in gfmod), and two instructions are defined that operate on these registers. GFADD8 adds the polynomial in the register specified by the s field of the instruction word (the "gs register") to the polynomial in the register specified by the t field of the instruction word (the "gt register"), and writes the result to the register specified by the r field of the instruction word (the "gr register"). GFMULX8 multiplies the polynomial in the gs register by x modulo gfmod and writes the result to the gr register. GFR MOD8 is for reading and writing the gfmod polynomial register.
The logic generated from this simple TIE code is more complicated as it requires control logic to handle the assignment of various operations to different pipeline stages. TIE is a high-level specification that describes instruction sets at a level familiar to users of instruction sets, and not as low-level as written by implementors of instruction sets (i.e., processor designers).
An example of register pipeline control logic generated by the TIE code is shown in FIG. 1. This shows a four stage pipelined register which includes on the left side of the Figure a read data pipe formed by four pipeline registers and their corresponding input multiplexers. Starting from the top, each pair of pipeline registers in the read port delineate the boundaries of the CO (R), Cl (E), C2 (M), C3 (W) and C4 pipeline stages. The output of each pipeline register, rdO_dataCl - rd0_dataC4, is provided to the register's datapath interposed between the read and write ports (not shown for simplicity). These outputs, as well as outputs of all later pipeline registers in the read port, are provided as inputs to the next stage multiplexer. Control signal generation for the read port multiplexers is described in detail below.
The Figure also shows a write port on the right side of the Figure formed by four pipeline registers and corresponding input multiplexers for the three latest pipeline stages therein. Four signals wO_dataCl - w0_dataC4 from the register datapath are provided to inputs of corresponding ones of the write port register inputs either directly or via multiplexing with an output wrO -resultC2 - wr0_resultC4 of the previous write port pipeline register. These output signals are multiplexed along with the output of the register file xr egf 1 e RF and fed to the CO stage multiplexer of the read port pipeline.
Control signals for the multiplexers in the read and write ports are generated along with a write enable for xregfile RF and a stall signal stall_R using the circuitry of FIG. 2 as will be readily apparent to those skilled in the art when read in conjunction with the discussion of compiler generation of register files below.
For ease of understanding, a two-stage register file combining the two-stage versions of the circuits of FIGs. 1 and 2 is shown in FIG. 3. Generating Register Files For each register file declared by a regfile statement, the compiler must produce:
-- the register file storage cells;
-- the read ports;
— the write ports;
-- source operand interlock logic; ~ source operand bypass logic; and — destination operand write logic. Read and Write Ports
The first steps in generating a register file are to determine the number of read and write ports, assign pipeline stages to the ports, and assign operands to the ports. Many algorithms could be used to do these operations, each resulting in different speed and area tradeoffs. The following algorithm is used in the preferred embodiment.
For each field used to select a source operand from the register file, a read port is generated. In some cases this will generate more read ports than necessary, but it generally produces a faster register read because it allows the register reads to begin in parallel with instruction decode. Consider the previous Galois-field arithmetic example where iclass gfr { GFR MOD8 } {inout gt} {inout gfmod} {} has been changed to iclass gfr { GFR OD8 } {inout gr} {inout gfmod} {}
The above algorithm will generate three register read ports (one each for the r, s, and t fields of the instruction word), even though no instruction uses more than two GF register file reads at the same time. However, if only two read ports are generated, then it is necessary to have a 2: 1 mux in front of one of the read ports to select between the r and s fields or between the r and t fields. This mux must be controlled by decode logic that distinguishes the GFRWMOD and GFADD instructions. In a complicated example, the logic could be substantial, making the register file read take much longer. The extra area required by the algorithm used in the preferred embodiment can generally be avoided by the instruction set designer arranging the register file access fields of instructions such that the number of different fields used to read each register file is equal to the largest number of reads used by any instruction. This is why operand gt is used instead of gr in the iclass gfr in the above example. A possible enhancement to the above algorithm is to track the minimum stage number specified in a schedule statement (explained in greater detail in the "Multi-Cycle Instructions in TIE" section below) for each field. If the minimum stage number is greater than the stage number in which instruction decode is performed, then muxing of fields may be used to reduce the number of read ports. For all fields where the minimum stage number is in the instruction decode stage, a separate port for each field used to read the register file is used.
Consider the following example: regfile SR 32 4 r operand sx x { SR [x] } operand sy y { SR [y] } operand sz z { SR[z] } operand su u { SR [u] } operand sv v { SR [v] } iclass stu {instl} {out sz, in sx, in sy, in su} iclass stv {inst2} {out sz, in sx, in sy, in sv} schedule stu {instl} { in sx 1 ; in sy 1; in su 2, out sz 2 ;
} schedule stv {inst2} { in sx 1, in sy 1; in sv 2 , out sz 2 ; }
where there are four input operands of the register file SR: sx, sy, su, and sv. According to the schedule information, su and sv are both used in the second pipeline stage and therefore can be mapped to a single read port without impacting the cycle time. Consequently, there is no need to create four read ports of the SR register file. In this case, let the address signals of the three read ports be: read_addr_0, read_addr_l, and read_addr_2, then the logic for the three addresses will be read_addr_0 = x; read_addr_l = y; read_addr_2 = instl ? u : v;
Write ports are less time-critical. Even a very short pipeline would read the register file in cycle 0, perform a calculation in cycle 1, and write the register file in cycle 2. Thus there is plenty of time in which to decode and mux between all the fields used to write the register file. A more critical timing path is interlocking; after reading the register file in cycle 0, it is necessary to know what register file is being written at the beginning of cycle 1 so that a following instruction reading the register file can be stalled if necessary. However, generally one cycle is sufficient time in which to decode and mux the destination register fields, and so this algorithm saves area without affecting speed.
The interface of the register file read and write ports to the processor pipeline will vary according to the core processor's pipeline architecture. In the preferred embodiment, the core processor's pipeline always uses the read and write ports in a fixed pipeline stage as shown in U.S. Patent Application Serial Numbers 09/192,395 to Dixit et al. and 09/322,735 to Killian et al., both of which are hereby incorporated by reference, where the read ports are always used before the first stage and the write ports after the last (fourth) stage in a four-stage pipelined register file. Each read port will be read in the earliest stage of any instruction that uses it as a source operand; instructions that use such operands in later stages read the register file early and stage the data along to the specified stage. This staging also includes bypass muxes so that instructions that produce the desired element after the register file is read are still available. For write ports, the write occurs in the latest stage of any instruction that uses it as a destination operand of in the instruction commit stage, e.g., the W stage, if that stage comes later. FIG. 1 shows the logic schema for register file read and write ports in the preferred embodiment. Bypass Logic
The bypass logic is illustrated in FIG. 1 and is accomplished by the mux's on the read-port logic. For example, if an instruction produces a result in stage 3 (wr 0_data__C3) and a subsequent instruction needs to use the data in stage 1, the control signals to the first mux on the read-port logic will be set such that the fourth input from the left will be selected. Consequently, in the next clock cycle, the data (rdO_data_Cl) is available for the instruction. Interlock Logic
The interlock logic is illustrated in FIG. 2. Based on the schedule information, the instruction decoding logic generates a def N for each read port and an useN signal for each write port for the instruction about to be issued. useN indicates that the instruction will need its input register operand in stage N. def N indicates that the instruction will produce its result in stage N. Furthermore, the def N signal for an instruction is piped along with the instruction in the pipeline. The stall signal is generated by examining the combination of all the def N ' s and useN ' s signals. The following example illustrated the stall logic for a 4-stage pipelined register file with two read ports (rdO and rdl) and one write port (wdO). The suffix in the signal name (_Cn) indicates that the signal exists in stage n of the pipeline.
Thus, assign Stall_R =
( (wrO_addr_Cl == rd0_addr_C0 & (
(rd0_usel_C0 & (wrO_def2_Cl wrO_ns_def3_Cl | wrO_ns_def4_C1) )
(rd0_use2_C0 & (wrO_def3_Cl wr0_ns_def4_Cl) ) | (rd0_use3_C0 & (wr0_def4_Cl) )) I ( (wr0_addr_C2 == rd0_addr__C0 & ( (rd0_usel_C0 & (wr0_def3_C2 wr0_ns_def4_C2) ) | (rd0_use2_C0 & (wr0_def4__C2) ) ) I ( (wrO_addr_C3 == rd0_addr_C0 & ( (rd0_usel_C0 & (wrO_def4_C3) )) I ( (wrO_addr_Cl == rdl_addr_CO & ( (rdl_usel_C0 & (wrO_def2_Cl wr0_ns_def3_Cl | wr0__ns_def4_C1) )
(rdl_use2_C0 & (wrO_def3_Cl wrO_ns_def4_Cl) ) | (rdl_use3_C0 & (wrO_def4_C1) )) I ( (wrO addr C2 == rdl addr CO & ( (rdl_usel_C0 & (wr0_def3_ C2 | wrO_ns_def4_C2) ) |* (rdl_use2_C0 & (wr0_def4_C2 ) ) ) ) | ( (wrO_addr_C3 == rdl_addr_C0 ) & ( (rdl_usel_C0 & (wr0_def _C3 ) ) ) ) ;
The following perl code is used in the preferred embodiment to develop stall codes, wf ield ( ) and rf ield ( ) are functions to construct a signal name from a simple signal name, a port name, and a stage number. The expression is written in an efficient factored form.
print " assign Stall_R =\n" ; foreach $write_port (@{$rf->{ RITE_PORT} } ) { foreach $read_j?ort (@{$rf-> {READ_PORT} }) { for($s = 1; $s <= $write_port->{MAX__DEF}-l; $s++) { my($waddr.) = wfield("addr" , $write_port, $s) ; my($raddr) = rfield ( "addr" , $read_port, 0) ; print " (($waddr == $raddr) & (\n"; for($i = 1; $i <= $write__port->{MAX_DEF} - $s; $i++) { my($use) = rfield("use$i" , $read_port, 0) ; print " ($use & ("; for($j = $i+$s; $j <= $write_port->{ AX_DEF}; $j++) { my($ns_def) = wfield( "ns_def$j " , $write_port, $s) ; print "$ns_def"; if ($j != $write_port->{MAX_DEF}) { print " I "; }
} print " ) ) " ; if ($i == $write__port->{MAX_DEF} - $s) { print ")) |\n"; } else { print " I \n" ;
}
} } }
} print " l'b0;\n"; print "\τι" ,-
Write Logic
Because write port addresses are muxed in the preferred embodiment to reduce the hardware cost associated with each write port, it becomes necessary to have an algorithm for determining which operands use which ports. One criteria for this muxing is to minimize the logic required. In the target pipeline, the primary logic cost is that of staging data to the write port stages. If all writes occur in the same pipeline stage, there is no difference in this logic cost, but if writes occur in multiple stages, logic may be saved by grouping together destination operands with similar write stages.
Consider the following example: regfile SR 32 8 s operand sx x { SR [x] } operand sy y { SR [y] } operand sz z { SR[z] } operand su u { SR [u] } operand sv v { SR [v] } iclass il {instl} {out sx, out sy, in su, in sv} iclass i2 {inst2} {out sz, in su, in sv} schedule si {instl} { out sx 8; out sy 3 ;
} schedule s2 {inst2} { out sz 9 ; }
Here, instl produces two results for SR, one in 3 cycles and the other in 8 cycles. inst2 produces one result for SR in 9 cycles. Since instl needs two write ports and inst2 needs one write port, register file SR only needs to have two write ports. Let the ports be wrO and wrl. For ins 11 , the mapping of operands to write ports is simply sx - > wrO sy - > wrl
This implies that wrO needs to have 8 stages and wrl 3 stages. For inst2, there is a choice of either sz - > wrO or sz - > wrl
However, the two choices have different logic cost. Mapping sz to wrO implies adding one more stage to wrO (increasing from 8 to 9) and to wrl implies adding 6 more stages to wrl (increasing from 3 to 9).
The preferred embodiment uses the following algorithm. For each instruction, sort the operands by stage number in descending order and assign them to sequentially to write port 0 to write port n-1. Thus the write port 0 will have the longest data chains and the write port n-1 the shortest. For instructions with m operands where m is less than n, the operands will be mapped to the first m write ports in the similar descending order by the stage numbers. The following example is used to illustrate the write-port assignment process: regfile SR 32 8 s operand sx x { SR [x] } operand sy y { SR [y] } operand sz z { SR[z] } operand su u { SR[u] } operand sv v { SR [v] } operand sw w { SR [w] } iclass il {instl} {out sx, out sy, in su, in sv} iclass ±2 {inst2} {out sz, in su, in sv} iclass i3 {inst3} {out sw, in su, in sv} schedule si {instl} { out sx 8; out sy 3 ;
} schedule s2 {inst2} { out sz 9 ;
} schedule s3 {inst3} { out sw 2 ; }
This process would yield the following assignments: for instl, sx - > wrO sy - > wrl for inst2, sz - > wrO for inst3, sw - > wrO
Even though the above write-port assignment procedure minimizes the data staging cost, it can be further refined to optimize other cost criteria such as power consumption. In the above example, sw of ins t3 can be mapped to wrl without increasing the staging cost at all. However, by doing so provided opportunity to power-down the pipeline after the data is written into the register SR at the end of stage 2.
Assigning sw to wr 0 would require the pipeline to be active for 9 cycles. The following procedure can be used as the second pass to further improve the write-port assignment for additional cost considerations such as power consumption. For each instruction with m operands where m < n and for each operand in the reverse order, move the assignment of the operand to a new write port i where i is as large as possible without increasing the staging cost. To illustrate this procedure using the previous example, no operands of instl can be moved because it already uses all the write ports. For inst2, sz can not be reassigned to wrl without increasing the staging cost. For inst3, sw can be re-assigned from wrO to wrl without increasing the staging cost.
Many variations on the algorithms for assignment of register file read and write ports are possible. For example, in some circumstances it may be appropriate to provide more ports than strictly required to minimize data staging to consume less power. It is also possible to provide fewer ports than required by some instructions to further reduce the hardware cost associated with read and write ports; for read ports this would mean taking multiple cycles to read the register operands, and for write ports this would mean buffering some register writes to wait for a cycle where a write port is unused. Another possibility is to allow the TIE code to specify the register file read and write port assignments to handle cases for which the automatic algorithms give undesirable results.
The above concepts of extended register implementation are used in the code of Appendix B, a perl program which generates an N-read, M-write B-bit S-entry register file. Load/Store Instructions
As described in Background of the Related Art, TIE load and store instructions are required to provide a means for transferring data to and from TIE register files directly to memory. So they must, by this requirement, share the local memories of the memory (M) stage of the core pipeline, i.e., data cache, Data RAM, Data ROM, etc. In addition to sharing the local memory, it is desirable to share as far as is possible other hardware resources used in core load/store. Sharing of resources yields a more optimum solution in terms of area and timing. As will be described below, the address computation logic and the data alignment logic are two sets of resources that are shared between core and TIE load/store.
The following interface signals are required to implement TEE load/store in the preferred embodiment. interface VaddrOffset 32 core out interface VaddrBase 32 core out interface Vaddrlndex 32 core out interface LSIndexed 1 core out interface SSize 5 core out interface MemDataOut<nx <nn>> core out interface Vaddrln 32 core in interface MemDataIn<n> < <nn>> core in
Most of these signals are illustrated in FIG. 4; FIG. 6 shows LSSize 927, emDataOut<n> 901 and MemDataIn<n> 938. LSSize gives the size of the data reference in bytes (1, 2, 4, 8, or 16 in the preferred embodiment). MemDataOut<n> provides store data from the TEE semantics to the core, and MemDataIn<n> provides load data from the core to the TEE semantics. In the preferred embodiment <n> maybe 8, 16, 32, 64, or 128.
In computing the memory address of the TEE load/store, it is possible to share the address adder in cases where the format of the TEE load and store instructions match that of the core. Duplicating the address adder would be wasteful and introduces additional delay in the address calculation path. The interface signals represent inputs to the core address adder as shown in FIG. 4. This address logic is intended for supporting the addressing modes
I AR [s] + immediate X AR [s] + AR [t]
The selection between the two modes is made by the LS Indexed interface signal. The immediate used by the I -form is provided on the VAddrOf f set input, and the AR [t] value used by the X-form is provided on the VAddrlndex input. VaddrBase is used to provide AR [s] . While other values than AR [s] and AR [t ] could be provided on VAddrBase and VAddrlndex by TEE semantic blocks, providing these values allows logic optimization to significantly simplify the resulting logic, and thus keeps the address generation from being timing-critical. This is because the logic optimization would recognize that the VaddrBase (AR [ s ] ) from TEE logic is the same as the base address of the core and reduces it to the same signal. TEE can benefit from the load and store alignment logic in the core - given certain modifications to this logic. Because alignment requires a large amount of logic to implement, avoiding replication for TEE provides a significant area savings. Moreover, replication could introduce timing critical paths due to the heavy loading it compels the local memory outputs and alignment and data select control signals to drive. In order to implement sharing of the alignment resources though, the modifications exemplified in FIGS. 5 and 6 are required.
These modifications firstly relate to the fact that TEE load/store requires/provides multiple load/store widths as opposed to the 32 bits of core load/store. This means that all the data paths within the alignment logic must increase in width to match the maximum of the TEE or core data width. Secondly, TIE load could require a more general alignment function as opposed to the simple right shift required by the core. This means that the alignment logic must perform a superset of the TEE alignment function and the core right shift.
FIG. 5 shows prior art core load alignment logic for a three-way set associative data cache 803- 805 of 128-bit access width and a parallel data RAM 806. In this example, the uncached data input 808 is also chosen to be 128 bits wide for cache refill convenience, and the data RAM access is 32 bits wide because it is accessed only through core load/stores whose maximum width is 32 bits. There is also a 32 bit wide store data input 807 used when stored data must be bypassed to a subsequent load.
The primary alignment mechanism used is the 4: 1 multiplexer 809-812 followed by a byte-level right shift that also does sign extension 814-819. The amount of the shift is given by the load address 813, 821 and the one-hot decoded coreSize signal 820. The store and data RAM data do not require the 4: 1 multiplexer because they are already 32 bits wide. The 32 bit wide aligned data is then selected by a series of subsequent multiplexers 822-833 to yield the final core load data 834.
FIG! 6 shows an example of load alignment implementation in this embodiment. The primary difference is that all the load data sources 906-911 are now 128 bits wide to support 128 bit-wide TEE load instructions, and the load alignment result is also 128 bits wide. In this example, the alignment itself is done using a byte-level rotator 914-918 followed by a sign extender 921-925. A byte-level rotator is required because in this example the TEE semantics happen to call for data rotation (again, in addition to the simple right shift required by the core load alignment). The amount of the shift or rotate is given by the load address 919 and the one-hot decoded LSSize 927 or coreSize 926 signal. The final output of the load alignment could be used either by the TIE coprocessor - the entire
128-bit width 938 providing all the multiple load widths as specified by LSSi ze; or by the core - only the least significant 32-bit portion 939 providing the three core load widths 32/16/8-bit as specified by coreSize.
The core provides the virtual address back to the semantic block in addition to the memory data. The virtual address is sometimes needed for additional processing on the load data. In addition, this allows load and store instructions to be defined that modify the registers used to form the virtual address. For example, the "update" modes of the core ISA do
IU vAddr <- AR [s] + offset AR [s] < - vAddr XU vAddr < - AR [s] + AR [t]
AR [s] <- vAddr
The bundled write to the base address register AR [ s ] avoids a separate increment instruction in many inner loops. This is accomplished in TIE as simply as changing "in" to "inout" and adding an assignment.
To understand the benefit of a bundled write to the base address register, first consider a software loop which does not use this feature: for ( i = 0 ; i < n; i++ ) { x = tie__loadi (px, 0 ) ; y = tie_loadi (py, 0 ) ; z = instl (x, y) ; tie_storei ( z , pz , 0 ) ; px = ' px + 8 ; py = py + 8 ; pz = pz + 8 ;
}
This example loops over two input arrays (px and py) in which the elements are 8 bytes wide, performs a computation (instl), and stores the result in another array (pz). Three out of seven instructions in this loop were used to advance the base pointers for the load and store instructions.
Using the bundled write load and store instructions, the example would be made much more efficient as illustrated in the following code: px = px - 8 ; py = py - 8 ; pz = pz - 8 ; for ( i = 0 ; i < n; i++) { x = tie__loadiu (px, 8 ) ; y = tie_loadiu (py, 8) ; z = instl (x, y) ; tie_storeiu ( z , pz , 8 ) ;
}
Now, tie_loadiu (tie_storeiu) will calculate the virtual address as p+8, load (store) the memory data, and change p to p+8 in one instruction. The initial subtractions are needed to correct px, py, and p z because the first now begins at px+ 8 , py + 8 and first store at px+ 8.
The stage numbers of core signals, such as the load/store interface described here, are fixed by the core pipeline, and are not specified in the schedule declarations. The appropriate values are used, however, in the pipeline insertion algorithm described above. For example, the following adds load and store instructions to the Galois-field arithmetic GF unit example above: opcode LGF8 . I r=4 ' b0000 LSCI opcode SGF8. I r=4'b0001 LSCI opcode LGF8. IU r=4'b0010 LSCI opcode SGF8. IU r=4'b0011 LSCI opcode LGF8.X op2=4'bb0O0O0O 0O LSCX opcode SGF8.X op2=4'bb0O0O0O 1l LSCX opcode LGF8.XU op2=4'bb0O0O1l 0O LSCX opcode SGF8.XU op2=4'bb0O0O1l 1l LSCX interface VaddrOffset 12 core out interface VaddrBase 32 core out interface Vaddrlndex 32 core out interface LSIndexed 1 core out interface LSSize 5 core out interface MerαDatalnδ 8 core in interface Vaddrln 32 core in interface emDataOutδ 8 core out iclass gfloadi { LGF8. I } { out gt, in ars, in imm8 } {} { out LSSize, out LSIndexed, out VAddrOffset, out VAddrBase, in MemDatalnδ } iclass gfstorei { SGF8. I } { in gt, in ars, in immδ} {} { out LSSize, out LSIndexed, out VAddrOffset, out VAddrBase, out emDataOutδ } iclass gfloadiu { LGF8. IU } { out gt, inout ars, in immβ} {} { out LSSize, out LSIndexed, out VAddrOffset, out VAddrBase, in MemDatalnδ, in VAddrln } iclass gfstoreiu { SGF8. IU } { in gt, inout ars, in immδ} {} { out' LSSize, out LSIndexed, out VAddrOffset, out VAddrBase, out MemDataOutδ , in VAddrln } iclass gfloadx { LGF8.X } { out gr, in ars, in art} {} { out LSSize, out LSIndexed, out VAddrlndex, out VAddrBase, in MemData nβ } iclass gfstorex { SGF8.X } { in gr, in ars, in art} {} { out LSSize, out LSIndexed, out VAddrlndex, out VAddrBase, out MemDataOutδ } iclass gfloadxu { LGF8.XU } { out gr, inout ars, in art} {} { out LSSize, out LSIndexed, out VAddrlndex, out VAddrBase, in MemDataIn8, in VAddrln } iclass gfstorexu { SGF8.XU } { in gr, inout ars, in art} {} { out LSSize, out LSIndexed, out VAddrlndex, out VAddrBase, out MemDataOut8, in VAddrln } semantic lgf { LGF8.I, LGF8.IU, LGF8.X, LGF8.XU } { assign LSIndexed = LGF8.X|LGF8.XU; assign LSSize = 1; assign VAddrBase = ars; assign VAddrlndex = art; assign VAddrOffset = imm8; assign gt = MemDatalnδ; assign gr = MemDatalnδ; assign ars = VAddrln; } semantic sgf { SGF8.I, SGF8.IU, SGF8.X, SGF8.XU } { assign LSIndexed = SGF8.X| SGF8.XU; assign LSSize = 1; assign VAddrBase = ars; assign VAddrlndex = art; assign VAddrOffset = imm8 ; assign emDataOutδ = SGF8.X | SGF8.XU ? gr : gt; assign ars = VAddrln;
} schedule gfload { LGF8. I }
{ use imm8 0 ; use ars 1; def gt 2; } schedule gfloadu { LGF8.IU }
{ use imm8 0; use ars 1 ; def ars 1; def gt 2;
} schedule gfloadx { LGF8.X }
{ use ars 1; use art 1; def gr 2 ;
} schedule gfloadxu { LGF8.XU } { use ars 1; use art 1, def art 1 def gr 2; } -
Here is a tpp input for producing a load aligner for the invention: module loadalign (out, in, va, vamask, TIEload, L16SI, L16UI, L8UI) ;
; use Utilities; ; my $bits = $pr->dcache->accessBits; ; my $bytes = $bits >> 3 ; ,- my $mux = log2 ($bytes) ; output out $bits-1" : 0] ; input in["$bits-l" :0] ; input va ["$mux-l" -.0] ; input vamask [ "$mux-1" :0] ; input TIEload; input L16SI; input L16UI; input L8UI; wire L8orl6 = L8UI |L16UI |L16SI; wire vamP$mux-l" :0] = TIEload ? va & vamask
: {va['$mux-r :2] ,va[l] &L8orl6, va [0] &L8UI}; ; sub rot { my ($bits, $n, $step, $in, $out, $sel) = @_; my @muxin = map($_ == 0 • ? $in
: ' { ' .$in. ' [' . ($_*$step-l) . ' :0] , ' .$in. ' [' . ($bits- 1) .':'. ($_*$step) .']}',
0.. ($n-l)) ; xtmux $n"e #"$bits" ("$out",
"* join(" ,\n\t\t" , ©muxin) *" , "$sel") ; ; } my $in = ' input ' ; if ($mux & 1) {
# rotate is done with 4 : 1 muxes and one 2 : 1 mux
# combine the last 2 : 1 mux with the sign extend ,- for (my $i = $mux - 2; $i >= 1; $i -= 2) { my $out = ' t ' . ($temp++) ; wire P$bits-1" : 0] "$out"; rot($bits, 4, 8 * (1 << $i) , $in, $out, 'vam[' . ($i+l) . ' : ' . $i . ' ] ' ) ; ; $in = $out;
} if ($bits > 32) { xtmux2e #""$bits - 32" (output P$bits-1~" : 32] , $in" ["$bits-l-:32] , {"$in" [7:0] ,"$in" [~$bits-l" :40] }, vam[0] ) ;
; } xtmux4e #16 ( output [31 : 16] , "$in" [31:16] , ; * : if ($bits > 32) {
"$in" [39:24] }, ; } else {
{"$in- [7:0] ,"$in" [31:24] },
; } {I6{"$in" [15] & L16SI}},
16 'b0, // should never happen because vam[0]
// is forced 0 if L8orl6 is set
{L8orl6, vam[0] }) ; xtmux4e #8 ( output [15 : 8] ,
~$iιT [15:8] ,
-$in" [23:16] ,
8'b0,
8'bO, {L8UI,vam[0] }) ; xtmux2e #8 ( output [7:0],
"$in" [7:0] ,
-$in" [15:8] , vam [ 0] ) ,- ; } else {
# rotate is all done in 4:1 muxes,
# so sign extend must be done in separate 2:1 for (my $i = $mux - 2; $i >= 0; $i -= 2) { my $out = ' t ' . ($temp++) ; wire $bits-1" : 0] "~$ouf;
; rot($bits, 4, 8 * (1 << $i) , $in, $out, vam[' . ($i+l) . ' : ' .$i. '] ') ; ; $in = $OUt;
; as
; 5] & L16SI}} : "$in" [16 : 32] , UI}},
Figure imgf000024_0001
; } endmodule loadalign
Here is the output for width 128:
module loadalign (out, in, va, vamask, TIEload, L16SI, L16UI, L8UI); output out [127:0] ; input in [127:0] ; input va [3:0] ; input vamask [3:0] ; input TIEload; input L16SI; input L16UI; input L8UI ; wire L8orl6 = L8UI | L16UI | L16SI; wire vam[3:0] = TIEload
? va & vamask
.- {va [3 : 2] , va [l] &L8orl6 , va [0] &L8Ul } ; wire [127 : 0] t0 ; xtmux4e #128 (to , input ,
{input [31:0] , input [127:32] },
{input [63:0] , input [127 : 64] },
{input [95:0] , input [127:96] }, vam[3:2] ) ; wire [127:0] tl; xtmux4e #128 (tl, to,
{t0 [7:0] , t0 [127:8] }, {t0 [15:0] , t0 [127:16] }, {t0[23:0] ,t0 [127:24] }, vam[l:0] ) ; assign out = { tl[127:32] ,
L8orl6 ? {I6{tl[15] & L16SI}} tl [16:32] , tl[15:8] &- {8{L8UI}}, tl[7:0] }; endmodule loadalign
Here is the output for width 64:
module loadalign (out, in, va, vamask, TIEload, L16SI, L16UI, L8UI) output out [63 : 0] ; input in [63 :0] ; input va [2 : 0] ; input vamask [2:0] ; input TIEload; input L16SI; input L16UI; input L8UI; wire L8orl6 = L8UI | L16UI | L16SI ; wire vam[2 : 0] = TIEload va & vamask
: {va[2:2] ,va[l] &L8orl6 ,va [0] &L8UI} ; wire [63:0] tO; xtmux4e #64 (to, input ,
{input [15 ; 0] , input [63: 16] },
{input [31: 0] , input [63:32] },
{input [47 0] , input [63:48] }, vam[2:l] ) xtmux2e #32 (output [63 : 32] , tO [63:32] ,
{t0 [7:0] ,t0[63:40] }, vam[0] ) ; xtmux4e #16 ( output [31 : 16] , tO [31:16] , tO [39:24] },
{I6{t0 [15] & L16SI}},
16' bo, // should never happen because vam[0]
// is forced 0 if L8orl6 is set {L8orl6, vam[0] }) ; xtmux4e #8 ( output [15 : 8] , tO [15:8] , tO [23:16] , 8'bO,
8'bO,
{L8UI,vam[0] }) ,- xtmux2e #8 ( output [7:0] , t0[7:0], t0[15:8] , vam [ 0 ] ) ; endmodule loadalign
Here is the output for width 32:
module loadalign (out, in, va, vamask, TIEload, L16SI, L16UI, L8UI) ; output out [31:0] ; input in [ 31 : 0 ] ; input va [1 : 0] ; input vamas [1:0]; input TIEload; input L16SI; input L16UI; input L8UI; wire L8orl6 = L8UI | L16UI | L16SI; wire vam [1:0] = TIEload ? va & vamask
• •: {va[l:2] ,va[l] &L8orl6,va[0] &L8UI}; wire [31:0] tO; xtmux4e #32 (to, input , {input [7:0] , input [31:8] },
{input [15:0] , input [31:16] }, {input [23 :0] , input [31:24] } , vam[l:0] ) ; assign out = { L8orl6 ? {I6{t0[15] & L16SI}} : t0[16:32] , t0[15:8] &~ {8{L8UI}}, tO [7:0] }; endmodule loadalign Interface to Core
Loads are stores are typically processed within the processor pipeline using a data cache or a small data RAM. For both cost and correctness, the new load and store instructions must also use this data cache/RAM to maintain the integrity of the cache/RAM data which is processed by both TEE and core instructions. In prior art systems, instructions added to the core did not share logic with the core. The preferred embodiment provides a mechanism for such sharing.
The TIE construct interface <sname> <width> <mname> [in| out] declares a signal <sname> that interfaces to TEE module <mname>. This signal is <width> bits wide, and is either an input or output to this TEE code according to the last parameter. For interfacing to the core, <mname> is core. The TEE iclass construct is extended to list interface signals used by instructions. Its syntax is iclass <classname>
{ <iname> , . . . } { <operandspec> , . . . } { <statespec> , . . . }
{ <interf acespeo , . . . } where <interf acespec> is either in <sname> or out <sname> where <sname> is either an interface signal name or an exception signal name declared in an exception statement. Exception signal names may only be used as outputs, not as inputs. Likewise, the schedule construct is extended to allow interface signal names to be given pipeline stage numbers using "in" (for inputs) or "out" (for outputs).
Each output interface signal from a semantic block is ANDed with the OR of the one-hot instruction decode signals of the instructions with that output listed in the interface section of their iclass. The ANDed interface signals from all the semantic block are then ORed together to form the output signal to the core. FIG. 7 illustrates the implementation of output interface signal sname by the TIE compiler. sname_seml represents the value of sname produced by the i'th semantic block. iNl and iN2 are one-bit instruction decode signals, and sname_seml_sel is a signal representing the condition under which the i'th semantic produces sname. Each input interface signal is fed directly to the modules which use the signal. Compiler/OS Support in TIE
So far TEE constructs have allowed state and instructions to be defined, but have not provided any clue on how these instructions should be used automatically by software. In prior systems, all use of the instructions were referenced via intrinsics written into the application; hence, the compiler needed only to map the intrinsics onto instructions and did not need to know how to use the instructions themselves. With the addition of user-definable register files' it becomes desirable for the compiler to allocate program variables to elements of the register file. During register allocation, the compiler attempts to assign program values to the registers contained in the register fιle(s). At certain locations in a program, it may not be possible for all values to be assigned to registers. At these locations, one or more values must be moved to memory. To move a value from a register to memory requires a store, and to move a value from memory to a register requires a load. Thus, at a minimum the compiler must know how to load a value from memory into a register, and how to store a value from a register into memory.
During register allocation, it may also be necessary for the compiler to move a value from one register to another. For example, the value produced by a function may be returned in register A, and the next instruction may require that the value be used from register B. The compiler can move the value from register A to register B by first storing register A to a temporary memory location, and then loading register B from that memory location. However, it is likely to be more efficient to move the value directly from register A to register B. Thus it is desirable, but not required, that the compiler know how to move a value from one register to another. The save and restore sequences may be more complex than a simple concatenation of the save and restore sequences of the individual registers. In doing the entire register file, there may be opportunity for performance and/or space savings versus the obvious concatenation of the spill instructions. This may also include coprocessor state that is not in a register file.
The state of each coprocessor is composed of a variety of different and potentially interdependent components. The instruction sequence used to save and restore these components may depend on the interdependencies.
This dependency information can be expressed as a graph. If the graph is cyclic, then the state cannot be successfully saved at an arbitrary point in time. But if the dependency graph is acyclic (a DAG) then there is a way to order the save and restore of the components so that all of the coprocessor's state can be saved and restored at an arbitrary point in time.
The TIE compiler uses standard graph construction and analysis algorithms to generate and analyze this dependency information and takes this information into account when generating the save and restore sequence for a given coprocessor.
For example, consider a coprocessor that has two register files, regf ile_a and regf ile_b. Regf ile_a, has four 32 bit registers and regf ile_b has sixteen 128 bit values. The additional state is a bitfield of which registers have been touched, called reg_touched, and a push register to back register 0 of regf ile_a called reg_back. The coprocessor provides the following load and store instructions to save and restore the coprocessor state:
rur/wur — for access to reg_touched and reg_back
push_a -- copies regf ile_a register 0 into reg_back
pop_a — copies regf ile_a register 0 from reg__back sl28b reg_a_register , reg_b_register - stores the register file regf ile_b into the address specified by regf ile_a's register
1128b reg_a_register , reg_b_register - loads the register file regf ile_b from the address specified by regf ile_a's register s32a reg_a_register, reg_a_register — stores the register file regf ile_a into the address specified by regf ile_a's register
132a reg_a_register , reg_a_register — loads the register file regf ile_a into the address specified by regf ile_a's register
In this case, the DAG for this save state dependency looks like: reg_touched <- - regfile_a , regfile_b , reg_back
because the TIE for this coprocessor makes it so that reg_touched will change anytime regf ile_a, regf ile_b or reg_back are touched. regfile_a < reg_back
because the save of the registers in regf ile_a requires a free register in regf ile_a. To get a free register in regf ile_a requires that the register's value be moved through reg_back. This destroys the current value of reg_back. regf ile_a < regf ile_b because the store instructions for regf ile_b use a register in regf ile_a as the address to which to store. This means that regf ile_b can only be stored once regf ile_a is already stored — actually only one register in regf ile_a. This is glossed over for simplicity of the example.
So the save sequence makes sure that the state is saved in an appropriate order. In this case that order is: reg__touched, reg_back, regfile_a, regfile_b
In addition, because the preferred embodiment allows the definition of register files whose elements cannot be represented by the built-in types of standard programming languages (e.g., 64+ bits in C or saturating arithmetic as described above), it is necessary to have a mechanism for adding new types to match the defined hardware. Programming language types are also useful for determining to which register files a variable may be allocated.
For example, it is common in many ISAs to map integer values to one register file and floating point values to another because integer computation instructions only take their operands in the integer register file, and floating point instructions only take their operands in the floating point register file. Given the ability to create new data types, it is desirable to have a mechanism to specify allowed conversions between the built-in types and the new types, and between different new types. For example, in the C programming language conversions are allowed between char type variables and short type variables (by sign or zero-extending the char type).
The TEE construct ctype <tname> <size> <alignment> <rfname> creates a programming language type <tname> and declares it to be <si ze> bits, aligned on an <al ignment> bit boundary in memory, and which is allocated to <rf name>.
For example, continuing with the Galois-field arithmetic GF unit, the statement ctype gf8 8 8 gf declares a new type (for the C programming language in the preferred embodiment) named
"gf 8" that has 8-bit values aligned on 8-bit memory boundaries, and these values are register allocated to the "gf " register file as needed.
The TIE construct proto <pname> { <ospec> , . . . } { <tspec> , . . . } { <inst> . . . } is used to specify instruction sequences that perform various functions that the compiler must know about or to give type information about the operands of intrinsics. <ospec> are operand type specifications, <tspec> are temporary register specifications needed by the instruction sequence, and <inst> are the instructions of the sequence.
The syntax of <ospec> is [in I out I inout] <typename> [ *] <oname> where <oname> is an operand name that may be substituted into the instructions (<inst>) of the sequence. <typename> is the type name of the operand (a pointer to that type ifthe optional asterisk is given).
The syntax of temporary register specification <tspec> is <rfname> <oname> where <oname> is an operand name that may be substituted into the instructions (<inst>) of the sequence. <typename> is a type name that identifies the register file from which <oname> should be temporarily allocated for this sequence.
The syntax of the instructions in the sequence <inst> is <iname> [ <oname> | <literal>] , . . . ; where <iname> is the instruction name, <oname> is an operand name declared in either <ospec> or <tspec>, and <literal> is a constant or string that is used unchanged by the compiler when generating the instruction sequence specified by the proto.
One use of proto is simply to associate types with instruction operands for the purpose of defining intrinsics. In this case <pname> is the instruction name; <ospec> matches the iclass operand specification (except that typenames are added); the <tspec> list should be empty; and the
<inst> sequence should consist of a single instruction. An example might be: proto GFADD8 {out gf8 r, in gf8 s, in gf8 t} {} { GFADD8 r, s, t; }
Another use of proto is to define multi-instruction intrinsics. Here <tspec> may be nonempty. Example: proto GFADDXSQ8 {out gf8 r, in gf8 s} {gf8 tmp} { GFMULX8 tmp, s;
GFMULX8 r, tmp; }
An additional use of proto is to instruct the compiler how to load and store values of programming language types declared using the ctype TIE construct. As discussed earlier, being able to load and store values to and from memory is necessary for the compiler to perform register allocation, and to allow a register file's contents to be saved and restored on a task switch.
For each ctype <tname> declaration, there must be proto declarations of the form proto <tname>_loadi { out <tname> <x>, in <tname>* <y>, in immediate <z> } { <tspec>, ... }
{ <inst>... ' // sequence of instructions that loads
// register <x> from the address <y>+<z> } proto <tname>_storei { in <tname> <x>, in <tname>* <y>, in immediate <z> } { <tspec>, ... }
{ <inst>... // sequence of instructions that stores
// register <x> from the address <y>+<z> }
The < tname>_loadi proto tells the compiler the instruction sequence that should be used to load a value of type <tname> into a register from memory. The <tname>_storei proto tells the compiler the instruction sequence that should be used to store a value of type <tname> from a register into memory.
As described earlier, it is desirable that the compiler know how to move a value from one register to another. As with loads and stores, proto is used to instruct the compiler how to move values between registers. For each ctype <tname> declaration, there may be a proto declaration of the form
proto <tname>_move
{ out <tname> <x>, in <tname> <y> } { <tspec>, ... }
<inst>... // sequence of instructions that moves // register <y> to register <x>
For example, continuing with the Galois-field arithmetic GF unit, the proto declarations: proto gf8_loadi {out gf8 t, in gf8* s, in immediate o} {} { LGF8.I t, s, o;
} proto gf8_storei {in gf8 t, in gf8* s, in immediate o} {} {
SGF8.1 t, s, o;
} proto gf8_move {out gf8 r, in gf8 s} {} {
GFADD8I r, s, 0; } would be required input to the preferred embodiment to have the compiler do register allocation of gf 8 variables; they would also be required input to generate the task state switch sequence for the gf register file.
A final use of proto is to define the allowed conversions between built-in and new types, and between different new types. Conversion prototypes are not required; if, for example, a conversion between new type A and new type B is not specified, the compiler does not allow variables of type A to be converted to variables of type B. For each pair of new or built-in types <tlname> and
<t2name> (at most one of which can be a built-in type; this mechanism does not allow specification of a conversion between two built-in types, since that conversion is already defined by the programming language) there can be up to three proto declarations of the form: proto <tlname>_rtor_<t2name> { out <t2name> <x>, in <tlname> <y> } { <tspec>, ... }
{
<inst>... // sequence of instructions that converts // type <tlname> in register <y> to type
// <t2name> in register <x> proto <tlname>_rtom_<t2name> { in <tlname> <x>, in <t2name>* <y>, in immediate <z> } { <tspec>, ... }
{
<inst>... // sequence of instructions that stores
// tyPe <tlname> in register <x> as // type <t2name> at the address <y>+<z>
} proto <tlname>_mtor_<t2name>
{ out <t2name> <x>, in <tlname>* <y>, in immediate <z> }
{ <tspec>, ... }
{
<inst>... // sequence of instructions that loads
// type <tlname> from the address <y>+<z> // as type <t2name> into register <x>
For example, continuing with the Galois-field arithmetic GF unit, the proto declarations:
proto gf8_rtom_char {in gf8 t, in char* s, in immediate o} {} { SGF8.I t, S, o;
} proto char_mtor_gf8 {out gf8 t, in char* s, in immediate o}{} {
LGF8.I t, S, O; }
would allow conversions between variables of type char in memory and variables of type gf 8 in registers. With these protos, the following example shows how two vectors of chars can be added using the GFADD intrinsic:
void gfadd_vector (char *char_vectorO , char *char_vectorl , int size )
{ for (int i = 0; i < size; i++) { gf8 pO = char_vector0 [i] ; gf8 pi = char_vectorl [i] ; gf8 res = GFADD (pO, pi); char_vector0 [i] = res;
} }
In prior art systems (e.g., the GNU C compiler), compilers maintain type information for each program variable and compiler-generated temporary variable. These built-in variable types correspond to the high-level-language types (e.g., in C, char, short, int, float, double, etc.). For each built-in type, the compiler must know the name of the type, the size and alignment requirements for the type, and the register file to which values of the type must be allocated. For new types, this information is provided by the ctype language construct. Using the ctype information, the compiler generates an internal type structure to represent that type, and uses that type for program variables and compiler-generated temporaries in a manner identical to that done for built-in types.
The prior art GNU C compiler represents types internally using the enumerated type machine_mode. Related types are grouped together in classes, described by the enumerated type mode_class. To support the new types,' one skilled in the art can add an enumerator to mode_class to represent the class of types that represent user-defined types, and can add one enumerator to machine_mode for each new type declared using the ctype TEE language construct. For example, assuming the class representing the new types is called MODE_USER, the definition of mode_class in file machmode . h becomes: enum mode_class { MODE_RANDOM, MODE_INT , ODE_FLOAT ,
MODE_PARTIAL_INT, MODE_CC , MODE_COMPLEX_INT , MODE_COMPLEX_FLOAT , ODE_USER , AX_MODE__CLASS } ;
Enumerators are added to machine_mode by inserting lines in file machmode . def. Each line defines a new type, its name, its class, and its size (given in 8-bit bytes). Enumerators for user- defined types are named U<n>mode, where 0 <n> is a number between zero and the total number of user-defined types. For example, to add an internal type to represent user-defined type gf 8 from the earlier example, the following line is added:
DEF_MACHMODE (UOmode, "UO", ODEJJSER, 1, 1, VOIDmode) One skilled in the art can then modify the analysis and optimization applied by the GNU C compiler to perform correctly on types of the MODE_USER class.
Ln prior art compilers, the code selector (or code generator) is responsible for substituting a sequence of low-level instructions (corresponding more or less to assembly instructions) for each internally represented instruction. The code selector determines which instruction sequence to substitute by examining the operation performed by the internal instruction, and by the type of the operands to the instruction. For example, an internal instruction representing an add may have as input two values of type int and have as output one value of type int; or may have as input two values of type float and have as output one value of type float. Based on the types of the input and output values, the code selector chooses either the sequence of instructions to perform an integer add or the sequence of instructions to perform a floating-point add. For user-defined types, the load, store, move, and conversion proto definitions describe the instruction sequences to substitute for internal instructions that have one or more operands with a user-defined type. Continuing with the Galois-field arithmetic GF unit example, ifthe internal instruction represents a load of a gf 8 value, the code selector consults the gf 8_loadi proto to determine the instruction sequence that should be substituted for that instruction. In the prior art GNU C compiler, the instructions available in the target processor are described using instruction patterns; see, e.g., Stallman, "Using and Porting GNU CC" (1995) for more information. These instruction patterns describe the instruction, including the number and type of the operands. To support user-defined types in the compiler, load, store, move, and conversion proto is converted to the instruction pattern expected by the compiler. For example, the gf8_load proto is represented with the following pattern (assuming the gf 8 ctype has been mapped to machine_mode enumerator UOmode):
(define_insn ""
[(set (match_operand:UO 0 "register_operand" "v") (match_operand:UO 1 "memory_operand" "U"))]
II II
"LGF8 . l\t%0 , %1 " )
Protos that specify a temporary register are converted to an instruction pattern that overwrites or "clobbers" an operand of the appropriate type. The compiler will ensure that the clobbered operand is unused at the location of the instruction, so that the instruction can use it as a temporary. For example, the following load proto for user-defined type tt generates an instruction pattern containing a clobber: proto tt_loadi { out tt x, in tt* y, in immediate z } { char t } {
L8UIt, y, Z; MVTTx, t;
} (define_insn ""
[(parallel [(set (match_operand:UO 0 "register_operand" "v")
(match_operand:UO 1 "memory_operand" "U"))
(clobber (match_operand:UO 2 "register_operand" "a"))])]
II II "L8UI\t%2 , %l\nMVTT\t%0 , %2 " )
Intrinsic Function Declaration
In the Killian et al. application, an intrinsic function declaration file is generated that contains definitions of all TEE instructions as functions using GNU asm statements. In particular, each instruction function is qualified with the C volatile property to suppress optimization that could otherwise occur. This method, though safe, prevents certain compiler optimizations where the TEE instructions can be safely re-ordered. The present invention improves the prior art system in two ways. First, only the load and store instructions are declared as volatile, therefore giving the compiler maximum freedom to reorder the instructions during code optimization. In the second improvement, instructions using special and user-declared states are declared with an explicit state argument, therefore giving compiler more accurate information about the side effect of the instructions. The following header file is generated from the TEE compiler to declare all instructions in the GF example as intrinsic functions:
/* Do not modify. This is automatically generated.*/ typedef int gf8 attribute ( (user ("gf8") ) ) ;
#define GFADD8_ASM (gr, gs, gt) { \ asm ("gfaddδ %0,%1,%2" : "=v" (gr) : "v" (gs) , "v" (gt) ) ; \
}
#define GFADD8I_AS (gr, gs , imm4) { \ asm ("gfaddδi %0,%1,%2" : "=v" (gr) : "v" (gs) , "i" (imm4) ) ; \
} #define GFMULXδ_ASM (gr, gs) { \ register int _xt_state asm ("state"); \ asm ("gfmulxδ
%1,%2" :"+t" (_xt_state) ,"=v" (gr) : "v" (gs) ) ;\
}
#define GFR MODδ_ASM (gt) { \ register int _xt_state asm ("state"); \ asm ("gfrwmodδ %1" :"+t" (_xt_state) , "=v" (gt) :"1" (gt) ) ,-\
}
#define LGFδ_I_AS (gt , ars, immδ) { \ asm \ volatile ("lgfδ_i %0,%1,%2" :"=v" (gt) : "a" (ars) , "i" (immδ) ) ; \
}
#define SGFδ_I_ASM (gt, ars, immδ) { \ asm \ volatile ("sgfδ_i %0,%1,%2" : : "v" (gt) , "a" (ars) , "i" (imm8) ) ; \
}
#define LGF8_IU_AS (gt, ars, immδ) { \ asm volatile ("lgfδ_iu %0,%1,%3" : \
"=v" (gt) , "=a" (ars) : "1" (ars) , "i" (immδ)); \
}
#define SGFδ_IU_AS (gt, ars, immδ) { \ asm volatile ("sgfδ_iu %1,%0,%3" : \
"=a" (ars) : "v" (gt) , "0" (ars), "i" (imm8) ) ; \ }
#define LGF8_X_ASM (gr, ars, art) { \ asm volatile ("lgfδ_x %0,%1,%2" : \
"=v" (gr) : "a" (ars), "a" (art)); \
} ■ ■•
#define SGF8_X_AS (gr, ars, art) { \ asm volatile ("sgf8_x %0,%1,%2" : : \
"v" (gr) , "a" (ars), "a" (art)); \ }
#define LGFδ_XU_ASM (gr, ars, art) { \ asm volatile ("lgfδ_xu %0,%1,%3" : \ "=v" (gr) , "=a" (ars) : "1" (ars), "a" (art)); \ }
#define SGFδ_XU_ASM(gr, ars, art) { \ asm volatile ( " sgf 8_xu %1 , %0 , %3 " : \ "=a" (ars ) : "v" (gr) , " 0 " (ars) , "a" (art) ) ; \
}
In the above sample output, arithmetic instructions such as GFADD 81 are not declared as volatile. Load and store instructions such as LGF8_ I are declared as volatile. Instructions which read or write processor states such as GFRWMODδ have one more argument _xt_state to signal the compiler that these instructions has side effects. Register Allocation
Prior art systems (e.g., the GNU C compiler) include register allocation algorithms designed for portability. Portability requires that the compiler support a wide variety of IS As. Even though these IS As are not themselves configurable or extensible, a compiler that must "target any of them must take a generic approach to register allocation. Thus, prior art systems may allow multiple register allocation, and some may restrict programming language types to certain register files.
The prior art GNU C compiler allows any number of register files to be specified by modifying the machine description of the target. One skilled in the art can add support to GCC for one or more new register files by modifying the machine description for the target as described in "Using and Porting GNU CC".
For each TEE regfile construct, the compiler is automatically configured to assign values to the registers in that register file. The regfile construct indicates the number of registers in the register file. As described above, the TEE ctype construct specifies the register file that values of that type should be assigned to. The compiler uses this information, as well as the number of registers in the register file, when attempting to assign each program value that has a user-defined type. Continuing with the Galois-field arithmetic GF unit example, the regfile construct for the gf registers is: regfile gf δ 16 g This indicates that there are 16 gf registers, each with size 8 bits. The ctype construction for the gfδ type is: ctype gf 8 8 8 gf, indicating the values of type gf 8 must be assigned to the gf register file. Thus, the compiler will allocate all values of type gf 8 to the gf register file, which has 16 registers. Instruction Scheduling
Prior art systems (e.g., the GNU C compiler) include instruction scheduling algorithms that reorder instructions to increase performance by reducing pipeline stalls. These algorithms operate by simulating the target processor's pipeline to determine the instruction ordering that results in the fewest number of stall cycles, while satisfying other pipeline constraints such as issue width, and function unit availability.
The prior art GNU C compiler simulates the processor's pipeline by determining, for any pair of instructions, the number of stall cycles that would result if one instruction were scheduled immediately after another. Based upon the stall information for each instruction pair, the compiler attempts to find an ordering of instructions that minimizes the total stall cycles. For new TEE instructions, the compiler determines the stall cycles by using information provided by the TEE language schedule construct. To determine the number of stalls that would occur if instruction B is scheduled immediately after instruction A, the compiler compares the pipeline stage for the write of each output operand in A with the pipeline stage for the read of each corresponding input operand in B. For each operand, the difference in these values, plus one (because of the schedule construct's semantics for defined operand pipeline stage values), indicates the minimum number of cycles that must separate A from B to avoid stalls. A value of one indicates that B can be schedule immediately after A without stalling, a value of two indicates that scheduling B immediately after A will result in one stall cycle, etc. The maximum stall value over all operands written by A is the number of stall cycles that would result if B were scheduled immediately after A.
Consider the following example scheduling constructs: schedule aload { ALD }
{ use immδ 0 ; use ars 1; def xt 2; } . schedule aadd { AADD } { use xa 1; use xb 2 , def xc 2 ,
}
In the following code sequence, the xt operand in the ALD instruction, x3, is the same as the xa operand in the AADD instructions. Thus, the AADD instruction must be scheduled (def xt) -
(use xa) + 1 = 2 - 1 + 1 = 2 cycles after the ALD to avoid stalling. If AADD is scheduled immediately after ALD, then there is a one cycle stall. ALD χ3 , aO , 0 AADD χ0 , x3 , xl
In the following code sequence, the xt operand in the ALD instruction, x3, is the same as the xb operand in the AADD instructions. Thus, the AADD instruction must be scheduled (def xt) - (use xb) + 1 = 2 - 2 + 1= 1 cycle after the ALD to avoid stalling. In this case, if AADD is scheduled immediately after ALD, there is no stall.
ALD x3 , aO , 0
AADD xO , xl , x3 Lazy State Switch Adding register files to processors significantly increases the quantity of state that must be saved and restored as part of task switching in a multi-tasking environment as implemented by most real-time operating systems. Because the additional state is often specific to certain computations which are performed in a subset of the tasks, it is undesirable to save and restore this additional state for every task switch because doing so unnecessarily increases the task switch cycle count. This can also be an issue in non-extensible processors for which a solution exists in the prior art. For example, the MIPS R2000 CPENABLE bits allow for "lazy" switching of coprocessor registers from one task to another. The preferred embodiment allows lazy switching to be applied to the state created via processor extension (the TEE state and regfile declarations).
This is one of the most complex of the save and restore operations. It is complex for several reasons: it is happening at a point in time delayed from the context switch; the run-time must manage the validity of each coprocessor file; and the core itself is changing the validity of the coprocessors as exceptions occur.
To show how this can be handled, assume there is a system with two tasks, A and B. There also are two coprocessor registers, cp_0 and cp_l. The state of the system consists of the valid bits that are kept by the core and the register file owner records that are kept by the run-time. Consider, then, the sequence of events shown in TABLE I below. In this example, coprocessor state is assumed to be stored at the base of the stack of each task.
Figure imgf000039_0001
Figure imgf000040_0001
recognize that A's state is already loaded into cp_0 and avoid the restore at that point. The exception would have set the valid bit.
Task A uses cp_0
A B Because A's state is already in cp_0, the run time has already set the valid bit on the context switch. Since the valid bit is set, no exception occurs and no action must be taken by the run-time.
Task A uses cp_l
A A Task A's use of cp__l causes an exception. This exception sets the valid bit for cp_l. The run-time, seeing that Task B owned cp-1, saves the contents of cp_l to Task B's stack. It then restores Task A's state to cp_l.
Task B swaps in
A All of the valid bits owned by Task A are turned off. There are no coprocessors owned by Task B and so no valid bits are turned on.
Task B uses cp_l
B Task B's use of cp_l causes an exception. This exception turns on the valid bit for cp_l. The run-time sees that Task A currently owns cp_l and saves the current state to Task A's save area. The run time then restores Task B's state to cp_l.
Processing continues.
TABLE I
The lazy switch mechanism requires that state be grouped into sets to which access can be enabled or disabled, access to disabled states cause an exception, the exception handler can determine which state must be switched, and the exception handler can save to memory and restore from memory the state and re-enable access.
In the preferred embodiment, the TIE construct coprocessor <came> <cumber> { <sname>, ... } declares that the state named by <sname>, ... is a group for the purpose of lazy switching. This grouping is given the name <came>, and a number <cumber> in the range 0 to 7. It is an error if any of <sname>, ... are named in more than one coprocessor statement. Given the above construct, a list of instructions are created that have <sname> in the in/out/inout list of the iclass. A signal is then created that is the OR of the instruction one-hot decodes for these instructions. This signal is ANDed with the complement of the CPENABLE bit. These signals generated for each processor are then combined with the TEE source code generated exceptions described in greater detail below in the Exceptions section. All coprocessor disabled exceptions have higher priority than any exceptions from the TEE source code. Between the coprocessor disabled exceptions, the lowest number exception has priority.
In the core processor of the preferred embodiment, different exceptions all use the same vector and are distinguished by the code loaded into the EXCCAUSE register by the exception. The core processor has reserved eight cause codes (from 32 to 39) for these exceptions. In response to the coprocessor statement, the TEE compiler adds bit <cumber> to the CPENABLE register, adds logic to the processor to cause an exception if <cumber> is clear and any instruction accessing <sname>, ... is executed, and adds logic to the processor to load 32+<cnumber> into the EXCCAUSE register when that exception is recognized by the core.
Multi-Cycle Instructions in TIE
In the prior processor art, instructions that require multiple cycles of computation require additional logic to pipeline the combinatorial logic of the computation and to prevent instructions that depend on not-yet-computed results from issuing. In addition, compilers for such processors should include algorithms to reorder instructions to minimize pipeline stalls.
The first item is typically implemented by processor designers by writing logic that has pipeline registers inserted at carefully chosen locations. The second item is typically implemented by comparing the source operands of an instruction to be issued to all not-yet-computed destination operands in the pipeline, and holding the instruction if there is a match. These three items must be coordinated. Ifthe pipelining of the computational logic does not match the changes to the issue logic, then the processor may produce incorrect results. If reordering to minimize pipeline stalls is inconsistent with pipelining the combinational logic, then sub-optimal performance will result (e.g., scheduling a use of a result before it is ready will result in a pipeline stall). Take the following example:
MUL a3, a4, a5 /* a3 = a4 * a5, a 2-cycle instruction */
ADD a6, a3 , a7 /* a6 = a3 + a7, a single cycle instruction */
SUB a2, aO, al /* a2 = aO - al, a single cycle instruction */
If MUL logic is carried over two cycles but the control logic issues one instruction every cycle, a 6 will have incorrect results because a3 does not have the correct value at the time the ADD instruction needs it. To be correct, the issue logic must know that MUL is pipelined over two stages and stall one cycle before issuing the ADD instruction. Even though stalling ADD instruction by one cycle results in correct logic, it does not provide optimal performance. By switching the order of ADD and SUB instructions, it is no longer necessary to stall any instructions in this example and therefore result in optimal performance. This can only be achieved by appropriate coordination between implementation of MUL logic, implementation of instruction issuing logic, and instruction re-ordering (scheduling).
In prior art systems, these three items (pipeline logic, pipeline stalling and instruction rescheduling) are often implemented separately, making coordination more difficult and increasing design verification requirements. The preferred embodiment of the present invention provides a method of specifying the information required for these features once, and implementing the three items in the processor generator from that specification.
In addition, the instruction set simulator of the preferred embodiment uses the same specification of scheduling information in its timing model. This allows application developers using all the features of the preferred embodiment to get good predictions of performance before the hardware is built without running their applications on a slow HDL simulator.
Chapter 10 of the Xtensa™ Instruction Set Architecture (ISA) Reference Manual by Killian and Warthman, incorporated herein by reference discloses a method of describing pipeline hardware that has been used to model the performance of processor pipelines and which has been used in the prior art for minimizing pipeline stalls. In the preferred embodiment, however, this description is additionally used for the first two items above.
In particular, the TEE language now includes the declaration schedule <schedulename> { <iname>, ... } in <oname> <stage>;
out <oname> <stage>;
}
where <iname> are the names of instructions; <oname> is an operand or state name, and
<stage> is an ordinal denoting a pipeline stage.
The def stage numbers used by TIE are one less than the values described in Chapter 10 of the Xtensa™ Instruction Set Architecture (ISA) Reference Manual by Killian and Warthman and thus the separation between instructions is max(SA - SB + 1, 0) instead of max(SA - SB, 0). Based on this specification, the TEE compiler as described in the Killian et al. and Wilson et al. applications is extended to insert pipeline registers into the semantic logic specification as follows. A stage number is assigned to every input to the semantic block. Instruction decode signals and immediate operands are assigned implementation-specific numbers (0 in the preferred embodiment). Register source operands, state registers, and interface signals (described below) are assigned stage numbers from the TEE schedule declaration (with an implementation-specific default - 1 in the preferred embodiment). Next, each node of the semantic block is visited in postorder (that is after each of its predecessor nodes has been visited). The stage number of the node NS is the maximum stage number of any of its inputs. For each input with a stage number IS < NS, the compiler inserts NS-IS pipeline registers between the input and the node. Finally, the output register operands, state registers and interface signals are visited. Ifthe stage number from the semantic block IS is greater than the stage number OS declared in the schedule statement, the input TIE specification is in error. Otherwise if OS > IS, then insert OS-IS pipeline registers before the output.
This process is illustrated with the following example: state si 1 state s2 32 state s3 32 iclass complex {example} {out arr, in ars, in art} {in si, in s2, in s3} semantic complex {example} { wire [31:0] tempi = si ? ars : art ,- wire [31:0] temp2 = s2 - tempi; assign arr = s3 + temp2 ;
} schedule complex {example} { in ars 1; /* using operand ars in stage 1 */ in art 1; /* using operand art in stage 1 */ in si 2; /* using state si in stage 2 */ in s2 2; /* using state s2 in stage 2 */ in s3 1; /* using state s3 in stage 1 */ out arr 3; /* defining operand arr in stage 3 */ }
This example specifies that the instruction "example" uses operands ars, art and state s3 in stage 1 and states si and s2 in stage 2. It produces result operand arr in stage 3. For this description, the above register-insertion procedure would produce the circuit in FIG. 8(a). The NS of node "?" is 2 because the maximum input stage is 2. Because the IS of ars and art are 1, one register is inserted at the respective inputs of node "?". Similarly at node "+", the s3 input is delayed by one stage to match the other input. Finally, the output of node "+" is delayed by one stage before assigned to arr. If in the schedule description of the above example arr is declared as "out arr 1", the pipeline insertion procedure would product circuit in FIG. 8(b). Since 'the NS of node "+" is 2 and the OS of arr is 1, the procedure would issue an error message since the input schedule requirement is unsatisfiable.
The above algorithm correctly inserts pipeline registers as necessary, but the placement of these registers is far from optimal. It is necessary to use a pipeline register optimization algorithm, such as found in Synopsys' DesignCompiler, after initial insertion to generate acceptable logic for synthesis. This is typically done by moving registers across combinational logic to balance the logic delays on both sides of the registers. Using the above example, the register optimization would produce a circuit such as the one in FIG. 8(c) in which the register at the output of node "+" is moved to the inputs in order to balance the delay and reduce the cycle time.
In some cases, it may be desirable to have a semantic block that uses or defines a register operand in one pipeline stage for one instruction, and in another stage for a different instruction because the two instructions may share some common logic. Specifying the instructions in two separate semantic blocks would require unnecessary duplication of logic. This is a possible extension in a variation on the preferred embodiment. This capability would be supported by using separate signal names in the semantic block for two operands, e.g., <operand>@<stage> instead of just <operand>. Once this modification is made, the above algorithms operate correctly even in the multi-system environment.
For example, if one wants to have the following two instructions instl: arr = ars + art inst2 : arr = ars + art + si and for some reason s 1 must be a stage 1 input and the cycle time requirement is such that there is only time to perform one addition in a cycle. Using the above mentioned extension, the semantic description would look like semantic two {instl, inst2} { wire [31:0] temp = ars + (instl ? art : si); assign arr = temp; assign arr@2 = temp + art@2 ; }
By describing two instructions in a single semantic block with the extended signal names ars@2 and art@2, the two instructions can be implemented with only two adders instead of three had the two instructions be described in two separate semantic blocks. Exceptions
Most processors have some mechanism for instructions to conditionally cause an exception instead of completing. For example, a divide instruction may cause an exception when the divisor is zero. The preferred embodiment of the present invention supports this capability from TEE by first declaring the new exception exception <ename> <exceptioncode> { <excl>, ... } <string> where <ename> is the name of the instruction and the signal used in semantic blocks to raise it; <except ioncode> is the value passed to the software exception handler to distinguish this exception from others; <excl>, etc., are lower-priority exceptions; and <string> is a descriptive string to be used in the documentation.
Once declared, exception signals may be listed in iclass declarations as described above. With this declaration, a single-bit signal having the exception's name is created within semantic TIE blocks containing the defined instruction, and this signal must be assigned. FIG. 9 shows the logic generated by the TEE compiler to combine exception signals from multiple TEE blocks and to prioritize between exceptions when more than one are signaled by a single instruction.
The exception signal may also be given a stage number i the schedule declaration. However, in the preferred embodiment, the core processor processes all exceptions in its M pipeline stage. For this implementation, the stage number specified by the schedule declaration is checked to ensure that it is less than or equal to the stage number of the M-stage, and if not an error is signaled at compile time. If the specified stage number is less than or equal to the stage number of the M-stage, then the stage number of the M-stage is used instead. Thus, the logic of FIG. 9 is evaluated in the M-stage.
As shown in FIG. 9, the exception signal generated by each semantic block is ANDed with the OR of the one-hot instruction decode signals that declare the exception signal in their interface section (this allows the TEE code to only produce a valid exception signal when instructions that raise that exception are executed). Next, all of the exception signals are ORed to produce a single signal indicating that some exception is occurring. This signal is processed by the core as in the prior art.
Finally, a priority encoder is used to determine which exception code will be written into the core processor's EXCCAUSE register. The list of lower priority exceptions is used to form a directed σr mh Λif Q r-vrlp ic Hftpp.tpH it ii r-.nnςiHprpfl a πomnile-time error). A tor>oloeical sort of tt
condition" exception <exc3> <exccode3> {exc2} "High level exception condition" schedule si {instl} { def excl 1 schedule s2 {inst2} { def exc2 3 schedule s3 {inst3} { def exc3 2 schedule s4 {inst4} { def excl 3
In this case, exception excl can be raised by instl in Cl and by inst4 in C3, exc2 by inst2 in C3, and exc3 by inst3 in C2. In this embodiment, all exception signals are generated in their declared stages and pipelined forward to the commit stage at which point the exception cause value is computed by selecting the exception code by the priority of exception signals as specified in the above TEE description. The exception signal Exception and the cause signal ExcCause feed to the core. Once an exception is handled, the core will issue a signal back to TEE logic to kill all the instruction in the pipeline and effectively clear the remaining unhandled exceptions.
As another example, FIG. 10 shows a circuit described by the code below which has two exceptions and some instructions that generate one exception and one that generates both. In this example, Overflow is lower-priority than Divide by Zero (actually both cannot occur at the same time in a divide, so the relative priority is irrelevant). In the Figure, it should be noted that each pictured semantic block generates some subset of the total set of TIE exceptions; thus, exact wirings are input-dependent. Further, in the semantic blocks, exception outputs are pipelined to the resolution stage by the TEE schedule mechanism.
exception Overflow 40 {} "Integer Overflow" exception DivZero 41 { Overflow } "Integer Divide by Zero" iclass ov { ADDO, SUBO, MULO, DIVO } { out arr, ars, art } { out Overflow } reference ADDO { wire [32:0] t = {ars [31] , ars} + {art [31] , art} ; assign Overflow = t [32] != t [31] ; assign arr = t[31:0];
} reference SUBO { wire [32:0] t = {ars [31] , ars} - {art [31] , art} ; assign Overflow = t [32] != t [31] ; assign arr = t[31:0];
} reference MULO { wire [ 63 : 0] t = { { 32 {ars [31] } , ars } * { {32 {art [31] } , art } ; " assign Overflow = t [63 : 32 ] ! = { 32 { t [31] } } ; assign arr = t [31 : 0] ;
} semantic { ADDO, SUBO } { wire [32 : 0 ] t = {ars [31] , ars } + ( {ars [31] , art } Λ {{33}SUBO}) + SUBO; assign Overflow = t[32] != t[31]; assign arr = t[31:0];
} semantic { DIVO } { assign DivZero = art == 32 'b0; assign Overflow = (ars == 32 'h.80000000) & (art==
32'hffffffff) ; assign arr = ... ; }
FIG. 10 shows an arrangement in which all TEE exceptions have a single fixed priority relative to all core exceptoins. A straightforward extension would allow the TIE exception statement to refer explicitly to various core exceptions. The TEE compiler would then be able to generate a priority encoder than combines TEE and core exceptions. Reference Semantics
Systems such as those described in the Killian et al. and Wilson et al. applications have a single semantic definition of each instruction. This semantic definition was used for generating both the hardware and the software representing the instruction. Such systems allowed multiple instructions to be defined together, differentiated by the one-hot instruction decode input signals (e.g., so Add and Subtract instructions can share an adder). Use of this feature is necessary to generate efficient hardware. With the increasing complexity of instructions that can be defined with the preferred embodiment, an efficient set of implementation semantics becomes more difficult to read, write, verify and understand. They also become more tuned for pipelining and less abstract. This is because the description has to take into account pipeline effect and create signals where the pipeline registers can be moved.
For example, given a floating-point implementation in TIE, one would probably write different code for targeting a 2-cycle floating-point add operation as opposed to a 3 or 4-cycle floating-point add operation. It is less abstract because programmers often optimize code to generate fewer gates at the expense of clarity. For example, one might write assign x = y * 3 ; in reference semantics (quite clear), but assign x = y + {y [30 : 0] , 1 ' bθ } ; in implementation semantics because software development tools don't handle the multiply by a constant case as well as can be done manually, or the like.
As another example, to describe a multiply-accumulate instruction in a reference, it is as simple as ace = a * b + ace ; But in semantic description, one has to take into account that this instruction has to be implemented over two pipeline stages. A skilled hardware designer will know that a partial result of a * b needs to be computed using a carry-save-adder tree in the first stage and the final result of adding the two partial result with ace is computed in the second stage. Finally, implementation semantics become slower when translated to simulation software because the correspondence to the native machine instruction is lost. Using the previous instruction, the reference description can be simulated using two instructions. Simulating the semantic description in this case would take hundreds of instructions.
For the above reasons the preferred embodiment allows the specification of two sets of semantics. One set is called the reference semantics. There is one reference semantic per instruction, and there is no sharing of semantics between instructions. This semantic definition is generally written for clarity to define the expected operation of the instruction. The second set of semantics, implementation semantics, is for hardware implementation. These semantics retain the features of prior art systems to allow hardware to be shared by multiple instructions and will generally be written at a lower level with gate-level synthesis in mind.
This can be illustrated with a simple TEE example that defines two .instructions ADD and SUB as follows: iclass rrr {ADD, SUB} {out arr, in ars, in art} iclass rr {NEG} {out arr, in ars} reference ADD { assign arr = ars + art;
} reference SUB { assign arr = ars - art; ) reference NEG { assign arr = -ars;
} semantic alu {ADD, SUB, NEG} { wire [31:0] 1, r; assign 1 = SUB ? -art : NEG ? -ars : art; assign c = (SUB | NEG) ? 1 : 0; assign r = NEG ? 0 : ars; assign arr = 1 + r + c; }
The reference descriptions are simple and direct. The semantic description, however, has to concern itself with the implementation efficiency, specifically in this case to share the adders required by the three instructions. To do this, it relies on the mathematical identity that subtracting a number is the same as adding the bit-wise complemented number and a constant of 1. Reference semantics also allow an instruction set to be defined once, via the reference semantics, and then implemented multiple times with different sets of implementation semantics. Having a single ISA definition with multiple implementations is common practice in the industry, though usually the reference semantics are defined only in the ISA documentation instead of formally. The preferred embodiment reverses this typical procedure and defines the reference semantics formally and derives the documentation from the TEE specification, rather than vice versa.
Having separate reference and implementation semantics creates a need to verify their equivalence. In prior art systems, with the reference semantics in documentation, equivalence is checked by a human reading the documentation and writing tests to verify equivalence. This procedure is time consuming, and with the reference semantics specified in a precise language, it is possible to use logic equivalence tools to compare the reference semantics to the implementation semantics. The preferred embodiment automates this process by generating the necessary inputs to equivalence checking tools in two different ways, one for checking the equivalence of reference and implementation semantics for a particular instruction and one for checking that the entire circuit implemented using reference semantics is equivalent to that implemented using implementation semantics. The first method helps to debug the implementation semantic descriptions. The second method verifies the design as a whole including not only the logic specified by the semantics but also the glue logic for combining all the semantics.
The circuits generated from reference and implementation semantics are in general not equivalent. For a given instruction, only a subset of output signals will be set. For the rest of the output signals, the reference and implementation semantics may choose to assign different values based on cost criteria or ease of description because they are logically "don't cares", i.e., they are unused. The preferred embodiment solves this problem by creating additional logic such that the output signals produced by a particular instruction are unchanged and the rest of output signals are forced to a particular logic value such as 0, as illustrated in FIG. 11. This Figure shows that each output signal x generated by the reference description (x_ref ) and each generated by semantic description (x_impl) is ANDed with another signal ignore_x such that when x is not part of an instruction output, it is forced to 0, therefore avoiding false negative result from the equivalence checking tools. From the ICLASS statement, we know the set of instructions which set x; therefore, ignore_x is simply the logical OR of instructions not setting x .
Built-in Modules
Certain commonly-used computations have no language-defined operators. However, using other language constructs is either very tedious to decribe or very hard to implement efficiently. Tie provides the built-in operators shown in TABLE II below for some of these computations.
Figure imgf000051_0001
TABLE π
As an example, the following description shares an adder between ADD and SUB instructions: assign arr = TIEadd (ars , SUB ? -art : art , SUB) ;
The following semantic description adds four numbers using a carry-save adder (CSA) array followed by a full adder: wire [31:0] si, cl, s2, c2; assign{sl, cl} = TIEcsa(dl, d2 , d3 ) ; assign{s2, c2} = TIEcsa(cl << 1, si, d4) ; assign sum = (c2 << 1) + s2; The advantage of using built-in modules such as these is that the TEE compiler can recognize the built-in modules and use a module generator to derive more efficient implementations for them. Documentation
The reference semantics also are one important element of the instruction set documentation. A typical instruction set reference manual, an exemplary page of which is shown in FIG. 12, can include for each instruction its machine code format; its package; its assembler syntax; a synopsis (a one-line text description of the instruction); a full text description of the instruction; and a more precise operational definition of the instruction, as well as additional information such as assembler notes and exceptions associated with the instruction. All of the information necessary to generate the machine code format is already found in the TEE specification since it contains the opcode bits and the operand fields. Similarly, the assembler syntax is derived from the mnemonic and operand names. The TIE reference semantics become the precise definition. Only the synopsis and text description are missing. The preferred embodiment therefore adds constructs to TIE to allow the instruction set designer to specify the synopsis and text description. The TEE package specification has the format package <pname> <string> endpackage <pname>
The package name <pname> is associated with all instructions defined between package and endpackage. Packages have other uses than for documentation, as described below. The <string> parameter gives the name of package for documentation purposes (it may have spaces).
The TEE synopsis specification has the format synopsis <iname> <string> where <string> is a short (approximately half a line) description of the instruction. No formatting control is required in this text. This text is typically used for headings in books and additional material in instruction lists.
The TIE description specification has the format description <iname> <string> where <string> is a long (usually several paragraphs) string containing text describing the operation of the instruction in English or another natural language. There is a need for text formatting commands in this text. The preferred embodiment implements an HTML-like language (the specification for HTML may be found, e.g., at http://www.w3.org/TR/REC-html40). In addition, two optional documentation strings are supported: assembly_note <iname> <string> implementation_note <iname> <string> These optional specifications provide additional per-instruction text.
Like HTML, two sorts of formatting controls are supported: elements and character entities. The intent is to specify the attributes of the data and not its exact appearance. The data will be rendered suitably for the output medium based on its attributes. The character entity &<name>; specifies characters not available in ASCII or that should use special rendering. Elements represent HTML-defined entities such as paragraphs, lists, code examples, etc. Quoting from the HTML 4.0 specification, "[e]ach element type declaration describes three parts: a start tag, content, and an end tag. The element's name appears in the start tag (written <ELEMENT-NAME>) and the end tag (written </ELEMENT-NAME>); note the slash before the element name in the end tag."
In other words, <ELEMENT-NAME>DOCUMENTATION</ELEMENT-NAME> specify a format to be applied to DOCUMENTATION. Unlike HTML, the end tag (</ELEMENT-NAME>) is never optional. There are two kinds of tags: block and inline. Block tags specify paragraph-like structure and inline tags are used to specify the formatting of text within those paragraphs. Inline TAGs may be nested. Block tags may not be nested, except for LI within UL.
These constructs are easily translated to HTML to create HTML documentation as part of a program such as the one in Appendix C that assembles an HTML page for each instruction, and an index of instructions. Such HTML documentation can be used to establish an on-line reference manual for processor users. A program for doing this in the preferred embodiment is written in the Perl programming language and works by creating a index . html file with an HTML table of two columns, one for the mnemonics and one for the synopsis text string. The rows of the table are filled by processing the instructions in sorted order. The instruction mnemonics are HTML-linked to a page created for each instruction.
The per-instruction page begins with an HTML level-1 heading ("HI") giving the mnemonic and synopsis. Next, various sections are introduced by fixed names in HTML level-2 headings ("H2"). The first section, labeled "Instruction Word", gives the machine code format represented by a HTML-table with one column per bit. Opcode bits ('0' or '1') are inserted in the corresponding table cells. Operand fields are filled in with the field name. Fields that span multiple adjacent bits use the COLS PAN feature of HTML tables to avoid repetition. The bits of the machine code box are numbered using a table row above, and the field widths are given in a row below.
The second section, labeled "Package", gives the TEE package name that defines the instruction. A simple hash is used to translate the package name from an identifier to the documentation string. The package name itself is output inside of an HTML paragraph block-element ("P").
The third section, labeled "Assembler Syntax", gives the assembly language format used to code the instruction. This consists of the instruction mnemonic, a space, and then the operand names separated by commas. Register operand names are formed by concatenating the short name of the register file with the field name. Immediate operand names are just the immediate name from TIE. The assembler syntax is output inside of an HTML paragraph block-level element ("P") using an HTML code inline-element ("CODE"). The code inline-element renders the text in a fixed width font that resembles the way programming language code is usually rendered.
The fourth section, labeled "Description", contains the text description, translated from TIE to HTML. Because TEE's formatting codes are similar to HTML's, this translation is fairly simple. The primary need is to translate the INSTREF element into an HTML link to the named instruction. An optional fifth section, labeled "Assembler Note", contains that text translated from TIE to HTML.
The sixth section, labeled "Exceptions", contains a list of exceptions that this instruction can raise. Load and Store instructions automatically have the LoadStoreError exception added to the list by the TEE compiler. Other exceptions are listed ifthe corresponding exception signal is listed in the signal list section of the instruction's iclass. Exceptions are listed in priority order (the result of the topological sort described above).
A optional seventh section, labeled "Implementation Notes", contains that text translated
Figure imgf000053_0001
It is possible to also copy the test case list from the TEE specification as described below into the documentation since this is sometimes useful to the reader.
An example of the documentation for a processor instruction is given below.
<html>
<head> <title>
GFADD8 - Galois Field 8 -bit Add </title> </head>
<body> <hl>
GFADD8 &#8212 ; Galois Field 8 -bit Add </hl> <h2 >
Instruction Word </h2 >
<table frame= "void" rules= "groups " cellspacing=0 cellpadding=0> <colgroup colspan=8xcol width=28xcol width=28xcol width=28xcol width=28xcol width=28xcol width=28xcol width=28χcol width=28 χcolgroup colspan=4 χcol width=2δ χcol width=28χcol width=28χcol width=2δχcolgroup colspan=4xcol width=28xcol width=28 xcol width=2δ xcol width=2δxcolgroup colspan=4 χcol width=28 xcol width=28xcol width=28 xcol width=28χcolgroup colspan=4 col width=28xcol width=2δxcol width=28xcol width=28 > <thead> <tr> <td width=2δ align*--- " center" >
<small>23</small> </td>
<td width=28 align= " center" > </td> <td width=2δ align= " center" >
</td> •
<td width=28 align*--- " center" > </td>
<td width=28 align= " center" > </td>
<td width=2δ align= " center " > </td>
<td width=28 align= " center" > </td> <td width=28 align= " center" >
<small>16</small> </td>
<td width=2δ align=" center" > <small>15</small> </td>
<td width=2δ align= "center" > </td> <td width=28 align="center"> </td>
<td width=28 align-*-"center"> <small>12</small> </td>
<td width=28 align="center">
<small>ll</small> </td>
<td width=2δ align="center"> </td>
<td width=26 align="center"> </td>
<td width=28 align="center"> <small>δ</small> </td>
<td width=28 align="center" >
<small>7</small> </td>
<td width=28 align="center"> </td>
<td width=28 align="center"> </td>
<td width=2δ align="center"> <smal1>4</small> </td>
<td width=28 align="center" >
<smal1>3</small> </td>
<td width=28 align="center"> </td>
<td width=2δ align="center" > </td>
<td width=28 align="cente "> <small>0</small> </td>
</tr>
</thead> <tbody> <tr> <td width=28 align="center" bgcolor="#FFF0F5">
0 </td>
<td width=2δ align="center" bgcolor="#FFF0F5"> 0 </td>
<td width=28 align="center" bgcolor="#FFF0F5">
0 </td> <td width=28 align="center" bgcolor="#FFF0F5"> -: 0
</td>
<td width=28 align="center" bgcolor=»#FFF0F5"> 0 ■ </td> <td width=26 align="center" bgcolor="#FFF0F5">
1 </td>
<td width=28 align="center" bgcolor="#FFF0F5"> l
</td>
<td width=28 align="center" bgcolor="#FFF0F5">
0 </td> <td colspan=4 width=112 align="center" bgcolor="#FFE4El"> r </td>
<td colspan=4 width=112 align="center" bgcolor="#FFE4El"> s </td>
<td colspan=4 width=112 align="center" bgcolor="#FFE4El"> t </td>
<td width=28 align="center" bgcolor="#FFF0F5"> 0
</td>
<td width=2δ align="center" bgcolor="#FFF0F5">
0 </td> <td width=28 align="center" bgcolor="#FFF0F5">
0 </td>
<td width=28 align="center" bgcolor="#FFF0F5"> 0 </td>
</tr> " </tbody> <tfoot> <tr> <td colspan=δ width=224 align="center" >
<small>8</small> </td>
<td colspan=4 width=112 align="center" > <small>4</small> </td>
<td colspan=4 width=112 align="center" >
<small>4</small> </td>
<td colspan=4 width=112 align="center" > <small>4</small>
</td> <td colspan=4 width=112 align="center">
<small>4</small> </td> </tr>
</tfoot> </table> <h2>
Package </h2> <p> </p> <h2> Assembler Syntax
</h2> <p>
<code>GFADD8 gr, gs, gt</code>
</p> <h2>
Description </h2>
<PxCODE>GFADD8</CODE> performs a δ-bit Galois Field addition of the contents of GF registers <CODE>gs</CODE> and <CODE>gt</CODE> and writes the result to GF register <CODE>gr</CODE> . </P>
<h2>
Operation </h2>
<pre> gr = gs gt;
</pre>
<h2> Exceptions
</h2>
<p> None
</p> </body> </html>
Although HTML has been used as the documentation formatting language in the preferred embodiment, those skilled in the art will recognize that other equivalent specification languages, such as the Adobe Frame Maker MEF format, may also be used. Sub-fields
A development that makes embodiments of the present invention less sensitive to processor configuration options which change program execution characteristics is the ability to define a field as a sub-field of another field. This is in contrast to prior configurable processor systems which restricted the definition of fields to specified parts of instruction words, and did not permit them to be defined as parts of other fields. The ability to define fields as parts of other fields allows the software to in part be independent of the endianness of the configured processor.
For example, in prior systems a new field tlO that corresponds to the first two bits of the t field can only be defined wiith either of the following TEE statements: field tlO inst [5 : 4 } /* for field memory order */ or field tlO inst[15;14] /* for big. endian memory order */
Under this arrangement it is not possible to define 110 independent of the memory order. By permitting the use of sub-fields, the present invention allows 110 to be defined as follows: field tlO t [l : 0] Since t is defined by the processor core to be inst [7 : 4] for little endian and inst [17 : 14] for big endian, 110 is now independent of the memory order. Test Cases
There are two aspects of the verification of user-specified TEE. The first is to ensure the correctness of the interface between core and TEE blocks and the user-defined states and register files. The second is to verify the correctness of translation of the user semantics into hardware, in other words, the TEE compiler. The first does not depend on the TEE instruction semantics, and it can be derived from the properties of the TEE specification.
It is not possible to write any directed predetermined tests or diagnostics for the user-specified TEE. This problem is approached by deriving the tests from the user TEE specification at the same time the hardware and software for the TEE is generated. The TEE compiler generates the ISA description for the user instructions. The diagnostic generator for TEE reads the ISA description of the TIE instructions. This also includes knowledge about the user-specified states and register files. This information is used the by the generator to create some meaningful set of diagnostics for the user TIE.
The reference semantics provide a method of verification for the implementation semantics. The reference semantics are verified by using them in the target application. As described in the
Killian et al. and Wilson et al. applications, the application is modified by the designer to use the new instructions via intrinsics. The modified application and the instruction definitions are tested together either in the simulator or natively. Native execution is facilitated by the ability of the TIE compiler (as in the prior art) to create conventional programming language (e.g., C) definitions of the intrinsics as functions. The use in the target application is usually the best test of instruction definitions.
The correctness of the TIE compiler generating C code is checked by this process, but the translation of TEE code to HDL is not, unless the application is also run in the HDL simulator. However, HDL simulators are generally too slow to do this for many applications. It is therefore desirable to have some other way to test the correctness of the TEE compiler's translation of the input semantics to HDL.
Also, it may be that the designer is unsure ifthe application covers all of the cases that must be handled by the instruction. This is important ifthe application may change after the processor is generated,' or if new applications will use this processor. En this case, it is desirable to have other ways to test the instruction. Ln prior art systems, the instructions of a processor are usually tested by the running of hand-written diagnostics that execute the instruction with a selected set of source operand values and check the result operands for the expected value. The preferred embodiment automates this process by exploiting the additional information that is available from the TEE specification.
The TEE iclass specification lists all of the inputs and outputs of each instruction, whether register file operands, immediates, or processor state registers. The TEE construct test <iname> { in { <oname> => <value>, ... } out { <oname> => <value>, ... } in { <oname> => <value>, ... } out { <oname> => <value>, ... }
' V provides a list of source operand values and expected results for instruction <iname>. Here <oname> is the name of an operand or state register, and <value> is the corresponding input value (for in or inout operands or registers in the test in list) or expected value (for out or inout operands, registers, or exception signals in the test out list).
The TIE compiler produces a test program in a conventional programming language (e.g., C) that the in and inout processor registers to the values in the test in list using the WUR intrinsic and the number declared with the TEE user_register construct described in the Wilson et al. application. It then sets up the in and inout register file operands using the intrinsics specified by the proto declaration for loading registers. Operands in core register files (e.g., the AR's in the preferred embodiment) use built-in language types. Next, the TEE compiler invokes the intrinsic with the operands listed in the order specified by the iclass. Next, the out and inout operands specified in the test out list are read and compared to the given expected values. Finally, the processor registers in the test out list are read using the RUR intrinsic and the register number for the user_register construct, and these values are compared to the given values.
This automatically generated programming language diagnostic may be run either in the instruction set simulator, or on the hardware RTL model or natively using the intrinsic-emulating functions generated by the TEE compiler by translating to the target programming language. As an example, the specification test GFADD8 { in { gs => 8'xFF, gt => 8'xA5 } out { gr => δ'x5A }
} test GFMULX8 { in { gs => 8'xFF, gfmod => 8'xA5 }
.-'■; out { gr => δ'x5B }
} generates the C diagnostic unsigned char GFADD δ_0 [1] = { 255 } ; unsigned char GFADDδ_l[l] = { 165 }; unsigned char GFADDδ_2 [1] = { 90 } ; unsigned char GFMULXδ_0 [1] = { 255 },- unsigned char GFMULX8_1 [1] = { 91 } ; unsigned char GFMULX8_2 [1] = { 165 },- int main (int argc, char *argv[])
{ for (i = 0; i < 1; i += 1) { gf gr; gf gs; gf gt; unsigned char tO ; LGF8_I (gs , &GFADD8_0 [i] , 0 ) ; LGFδ_I (gt , &GFADDδ_l [i] , 0 ) ;
GFADD8 (gr , gs , gt) ;
SGFδ_I (gr , &t0 , 0 ) ; if (tO ! = GFADD8_2 [i] ) fail O ; } for ( i = 0 ; i < 1 ; i += 1) { gf gr ; gf gs ; unsigned char tO; LGF8_I (gs, &GFMULXδ_0 [i] , 0) ;
WUR (GFMULXδ_l [i] , ' 0) ; GFMULX8 (gr, gs) ; SGF8_I (gr, &t0, 0) ; if (tO != GFMULX8_2 [i] ) failO;
} " return 0 ,-
Automatic Sampling of Test Vectors to Produce Test Cases
In cases where running the application is sufficient for testing the correctness of the input instruction semantics, it is still desirable to have test cases for running in the HDL simulator to test the
TIE translation of the input semantics. The HDL simulator is in many cases too slow to run the application. It is therefore desirable to have a method for extracting tests from the application running natively or in the instruction set simulator.
The TEE compiler therefore should have an option to augment its translation of the input semantics to the application programming language with code that writes the input and outputs operands of instructions to a file. This file can then be post-processed by eliminating duplicates and then using .statistical sampling to extract a number of test cases that is reasonable to simulate in the
HDL simulator. These records can then be converted to the TEE test construct described above so that its implementation may be leveraged for the rest of the process. The motiviation behind using this methodology of generating architectural and microarchitectural tests is to provide a systematic verification process for implementation of the user TIE. This is very important because the user's application may not be sufficient for testing the microarchitecture of the TIE implementation. To generate such diagnostics from the TEE description, we employ a method that derivces the necessary information from the ISA description and pipeline information produced by the TIE compiler. This scheme is described below. ISA Description of the TIE Instructions
In order to be able to configure the processor core according to the user's requirements a configuration is used. A configuration is essentially a list of parts and attributes of the processor core that can customized by the user through a web-based interface. These processor attributes are referred to as configuration parameters. The complete list* of the configuration parameters along with their default values and the ranges the values can assume define the configuration space of the processor core. A concrete instantiation of the processor core, that is, an instance of the core in which all the configuration parameters have been assigned concrete values, is a core configuration. Currently, both the configuration space and concrete core configurations are represented as text files that list the configuration parameters and their values. Even though a flat list of all the configuration parameters and their values enumerated in a text file has the advantage of being easily human readable, it complicates the process of configuring the individual pieces of hardware and software. For that reason, a set of tools have been developed that read the configuration information and create an bject-oriented representation of the various parts of the processor and the values of the configuration parameters. The tools and the representation of configurations are collectively known as the configuration environment or configuration database.
During the configuration of the software and hardware, tpp provides a handle to the configuration environment enabling the developer to programmatically access the configuration information, as well as easily compute parts of the source code. In addition, since the computation is performed in the configuration environment and, thus, it is shared across all configured sources, developing configurable source code is simplified.
A PERL library for describing the ISA has been developed. For TEE, the TEE compiler is run to create the PERL objects for the user-defined instructions and this is added to the core ISA. From there on, all the verification tools query these PERL objects to get the ISA and pipeline information of the user-defined TEE.
The following example illustrates how this is done. Starting with a simple TEE description, opcode ace op2=0 CUSTO state accum 32 user_register 100 accum iclass ace {ace} {in ars, in art} {inout accum} reference ace { assign accum = accum + ars + art;
}
The TEE compiler generates the following information about the TEE user state and the semantic of the instruction using it :
State accum mapped to user register: 100, bits 31:0 opcode : ace, package : UserDefined, size : 20, Register Operands:
Name : as : input, regfile : AR, shortname:a, size -.32 bits, entries: 64 Name : at : input, regfile : AR, shortname :a, size:32 bits, entries: 64
From the above information, it is possible to generate the assembly code for the TIE instruction ace. It is known that the instruction has two register operands, both of type AR, based on which it is possible to do some random register allocation, or even better, some intelligent register allocation, since the output and input fields are known. It is therefore possible to automatically generate assembly code for this instruction, such as ace $a7 , $al3 where a7 and al3 are the s and t fields of the instruction ace generated by a register allocation algorithm that looks at the regfile definition for AR. Some more examples of the ISA description of the TEE instructions:
opcode : il281, package : UserDefined, size : 24, load Register Operands :
Name : il28t :output, regfile:il2δ, shortname : i12δ , size:12δ bits, entries: 16 Name : as : input, regfile:AR, shortname:a, size:32 bits, entries:64 Immediate Operands :
Name:offsetl2δ: bits 8, Table : [0 16 32 48 .... ] opcode : wurO , package : UserDefined, size : 24, Register Operands : Name : at : input, regfile : AR, shortname: a, size:32 bits, entries: 64
opcode : il28s, package : UserDefined, size : 24, store "Register Operands:
Name: il28t: input regfile : il2δ, shortname : il2δ , size:12δ bits, entries: 16 Name : as : input regfile : AR, shortname:a, size: 32 bits, entries: 64 Immediate Operands : '
Name:offsetl2δ:bits δ, shift 0, Table : [0 16 32 ....]
Since it isn't possible to derive enough information about the expected result of the instruction, it is not possible to check the correctness of the TEE semantics. For example, it is not possible to check ifthe result of the ace instruction is correct in the test. However, ifthe hardware produced the wrong result in the state accumulator, this would be detected by the cosimulation mechanism that compares all user state and register file between the RTL and ISS at all instruction boundaries as will be described in greater detail in another section. The following sections use some PERL like pseudo code to express algorithms. The diagnostic generators are mostly PERL based programs.
The algorithm used by the diagnostic generator for generating a correct TIE instruction is as follows:
subroutine gen_tie_instr ( tie_opcode, address_reg, index_reg)
{
// address_reg is a core register
// containing a valid address in case
// the TIE instruction does a load/store, // same for the index register, if the
// load/store is a indexed load foreach operand ( tie_inst->regoperands () ) { fid = operand->field() ; reg = &register_allocate (tie_inst , operand); if ( (isLoad(tie_inst) || isStore (tie_inst) ) && operand->name () eq 'as' ) { override wi th valid address reg = address_reg; } if ( ( isLoad(tie_inst) || isStore (tie_inst) ) && operand->name () eq ' at ' ) { reg = index_reg;
} push( operand_list , reg);
} foreach operand ( tie_inst->immoperands () ) { // specification of immediate operand // as a table of values or a range range = operand->range ( ) ; table = operand->table () ; legal = tie_inst->legals (operand->field() ->name) ; if ( legal ) { imm = legal [ random index ] ; } elsif ( range ) { imm = random value between range. lo and range. hi ; } elsif ( table) { imm = tajble [ random index ] ; } push ( operand_list , imm) ;
} }
subroutine register_allocate ( tie_inst, register_operand) { name = register_operand->shortname () ; numentries= register_operand->entries () ; legalrange = tie_inst->legals (register_operand->field() ->name() ) ; if ( legalrange ) { register_num = legalrange [ random index ] ; } else { register_num = random (0, numentries-1 );
} return concatenate ( name, register_num ) ;
}
Also, before it is possible to start executing TEE instructions, it is necessary to initialize the TEE state and register files. This is done in the following way: subroutine initTieState ( address_reg, data__reg ) { // Iterate over all state and get the vaue // for each user register that // the states are mapped to states = (tie->states () , map ($_->states () , tie-coprocessors () ) ); foreach state ( states ) {
UserRegMask{state->userReg} = getMask; } foreach ureg ( keys of the hashtable UserRegMask ) { mask the data register with the mask value do a WUR to the ureg
}
// Initialize register files by loading from a // valid memory location regfiles = (tie->regfiles () , map ($_->regfiles () , tie-coprocessors () ) > foreach regf ( regfiles ) { for( i=0; i< regf->entries ( ) ; i++ ) { generate the load instruction or instruction sequence using the addr _r eg that has the valid address to load index i of register file regf. }
}
Pipeline Information for TIE
To generate microarchitectural diagnostics that test the bypass and interlock logic in TIE, pipeline information of TEE instruction is needed. This provides a knowledge of the stages at which resources such as registers and states are read and written by a TIE instruction. Once again, the TEE compiler provides this information and it is represented in PERL objects and used by the verification tools. Taking the following example with a user-defined register file and a set of instructions which simply moves data at different stages of the pipeline, note the convention 1 : E stage, 2 : M stage, 3 : W stage:
regfile il28 128 16 il28 operand il28s s {il2δ[s]} operand il2δt t {il2δ[t]j operand il2δr r {il2δ[r]} opcode I128L r=0 LSCI opcode I128S r=l LSCI opcode I12δAND op2=0 CUSTO
schedule load {ll2δL} { def il2δt 2; }
This translates to the following in the PERL database:
Regfile il2δ width 128 entries 16 instructions : Writes: stage 2 : Inst il28and: Field r stage 3 : Inst il28l: Field t
Reads : stage 1 : Inst il28s: Field t
Inst il2δand: Field s
Inst il2δand: Field t
One can see how this information is used to generate diagnostics in the next section. Microarchitectural Tests for TIE
A goal of this section is to generate micro-architectural diagnostics for the TEE logic based on the knowledge of the implementation of the interface between TIE and the core, as well as that of TEE state and register file, if any. The ISA and pipeline description of the TIE itself are used; however, as mentioned earlier, the "correctness" of the implementation of TEE instruction is not verified in the test directly.
A set of MVP diagnostics are generated to test the following aspects of the implementation:
— control logic in the core/tie interface ; and
-- implementation of user state and register files, including loads/stores and bypass and interlock logic. Control Signals Between Core and TIE
Exceptions, interrupts and replay signals are tested by generating tests where every user instruction is killed by an control flow change in the core (e.g., a branch), exception and replay signals. The instruction should be killed in all stages of its execution, right up to the completion stage.
The algorithm to generate these tests simply iterate over all TIE opcodes in the ISA description generated by the TEE compiler and construct each of the following cases :
Case a) TEE instruction killed by a change of flow: foreach tie__opcode ( tie_opcode_list ) branch instr ( branch taken) tie^opcode end // foreach
Case b) TEE instruction killed by an exception foreach tie_opcode ( tie_opcode_list ) for (stage=0; stage < completion stage of tie__opcode; stage++ ) syscall or break instr (that generates an exception)
<stage> number of nops ti e_opcode end // for end // foreach
As can be seen, the number of no-ops between the instruction generating the exception and the
TIE instruction controls the stage of TEE instruction execution at which it gets killed.
Case c) TEE instruction replayed in the pipeline foreach tie_opcode ( tie_opcode_list ) isync instr tie_opcode end
Bypass Logic For User State And Register File:
These tests will exercise the bypass logic for the TEE state and register file by "pairing" instructions that write/read them. The test will ensure that there are no stalls on account of instruction and data fetch and then (ifthe configuration permits) check the cycle count register before and after the instruction sequence to look for any unnecessary stalls and flag that as an error. The algorithm is as follows: Generate a list of [instr, field] for all read/write stages to a particular register file or state. Check what is the maximum completion stage for this state/regfile. Now pair up the write and read instructions, varying the number of nops in between up to the maximum completion stage.
foreach regf ( tie->regfiles () ) {
//list of the stages at which regf is read // possibly (1,2) readstages = getReadStages ( regf);, // list of stages at which regf is written // possibly (2,3) writestages = getDefStages ( regf ) ; foreach wstage ( writestages ) { writelist = Generate list of [instr, field] pairs that write regf in stage wstage max_nops = maximum_completion_stage for regf - wstage ; foreach rstage ( readstages ) { readlist = Generate list of [instr, field] pairs that read regf in stage rstage } foreach write_instr ( writelist ) { foreach read_instr ( readlist )' { for ( i=0; i< max_nops; i++ ) { stalls = (wstage-rstage-1) if ( wstage > rstage ) else 0 ; ccount_before = read cycle count wri te_instr . I - nops read_instr ccount_after = read cycle count if ( ( ccount_after - ccount_before) != ( stalls + nops + 3 ) ) ERROR I I
) }
It is necessary to guarantee that there are no 1$ and D$ misses by executing the instruction sequence twice. In the second iteration, a cycle count check is done. The expected number of cycles depends on the read/write stages and nops. Some examples cases for the example above are :
#(il2δl field t stage 3) -> (il28and Field s Stage •' •-' #nops=0, stall 1 cycles Test_ll : rsr $a3, 234 <-- read cycle count before 11281 $H280, $alO, 0 ±128 and $i!285, $il280, $±12811 rsr $a4 , 234 < cycle count after add± $a3 , $a3 , 4 beq a4 , a3 , PASS_11 j FAIL PASS_11 :
#(il28and field r stage 2) -> (il28and Field s Stage 1) , #nops=0, stall .0 cycles, Test_12: rsr $a3 , 234
±128and $±1280 , $±1288, $±1284
±128and $±1286 , $±1280 , $±1285 rsr $a4 , 234 add± $a3 , $a3 , 3 beq a4 , a3 , PASS_12 j FAIL PASS_12 :
#(il28and field r stage 2) -> (il28and Field s Stage 1), # nops=l, stall 0 cycles, Test_13 : rsr $a3 , 234 ±128and $±1280, $±1288, $±1284 nop . n
±128and $±1286, $±1280, $±1285 rsr $a4 , 234 addi $a3 , $a3 , 4 beg a4 , a3 , PASS_13 j FAIL PASS_13 :
Interlocks and hazards This tests for correct stalls in the case of read-after-write, write-after-write and (possibly) write- after-read hazard cases.
The algorithm for the hazard cases is derived similarly to that of the bypass case described above. There are two instructions that write the same regfile in stages 2 and 3, followed by an instruction that reads it in stage 1. The third instruction stalls for the result of the second write.
#(Inst il28and r 2) -> #(Inst il281 t 3) -> #(Inst il28and s 1) Test_l : .--. rsr $a3, 234 il28and $il280, $H289, $il281 ±1281 $11280 , $a5, 0 ±128and $±12815, $±1280 , $±12813 rsr $a4 , 234 • add± $a3 , $a3 , 5 beq a4 , a3 , PASS_1 j FAIL PASS_1 :
Loads/Stores
Loads and stores to all register files are tested comprehensively for all aligned and misaligned addresses using the following algorithm:
foreach regf ( tie->regfiles () ) { PIFbytes = PIFWidth >> 3; // bytes
PIFwords = PIFbytes >> 2; // words ( eg 4 for 128 bit ) regfw = regf->size() >> 5; for ( k=0; k< PIFbytes; k++ ) { load_address = PIFWidth-aligned address + k; store_address = PIFWidth-aligned address + k;
* ini tialize memory
* store known data into load address
* store a default value to the store address for ( i=0; i<PIFwords; i++ ) { * store data_word to load_address + i
* store default_word to store_address + i
}
* do the load from load address
* do the store to store address expected_result = expected_tie_load_resul t ( load_address, data) ; for ( i=0; i<PIFw; i++ ) { result = load a word from store_address + i if ( i <regfw ) { check result == expected_result
} else { check resul t == defaul t_word
} } }
}
The expected result of the load depends on the load semantics, and although it can be determined for most cases, it may not be possible to do so for all possible semantics, in which case it is necessary to leave the checking to the state and memory compare.
Data breakpoints for TIE load/store instructions are also tested for TEE load/store instructions in the case where the configuration supports data breakpoints. The details of how the data breakpoints work for TEE instructions can be found in the load/store architecture section. The diagnostics ' generated test the data breakpoints for all possible combinations of the data break address register, the control mask register and the virtual address for the load/store. foreach regf ( tie->regfiles 0 ) { regfw = regf->size () >> 5; write dbreak register wi th an address aligned to regfw foreach mask ( set of masks for regfw ) { * write dbreak control mask
* set address register based on mask and dbreak address
* do a load/ store to regf that takes a data breakpoint exception * check if exception was taken end end
Data breakpoints that match will cause a debug exception. The debug exception handlers for the above test will update a counter that will be checked to ensure that the exception was indeed taken. In addition to this, more complex cases are also constructed where the load/store with data breakpoint coincides with overflow/underflow exceptions (for register windowing) to ensure the correct priority of such exceptions. Random Diagnostic Generators for TIE Instructions Random diagnostics play a major role in the verification of the core ISA, and the microarchitecture of the implementation as well. The random sequence of instructions are likely to hit boundary cases and other scenarios that are unlikely to be covered by a directed test. They also adds to the coverage metrics for the design verification. Additional intelligence has been added to these random generators by adding some features. For example, templates of instruction sequences can be created to target specific interesting scenarios. An example of this can be back-to-back stores that fill up the write-buffer, or a zero-overhead loop with a single instruction. Relative probabilities attached to each type of instruction or instruction sequence can decide how often one wants to generate a particular kind of instruction; for example, if a branch instruction has a high relative probability (or weight), the test generated will have more branches. User-controlled parameters can tune the nature of tests generated. For example, command line arguments can control the relative weight of certain instructions, the length of tests, the number of nested function calls, etc. The random diagnostic generators can generate user-defined TEE instructions as well.
The underlying mechanism is similar to that of the microarchitectural tests. The random generators read the ISA description that includes TIE instructions as well as the core ISA. Valid TEE instructions are constructed by looking at the ISA description of a particular TEE instruction, and employing some register allocation mechanism: foreach operand ( tie_instr->operands () ) { if ( operand is TIE register file ) { do a random register allocation random (0, #entries in register file)
} elsif ( operand is a core register file ) { if ( this is a load/store instr ) { this is the address register for the load/ store operation . Find a core register that can be wri tten, and write a valid address } else { random core register
} } elsif immediate field { generate a random immediate value based on the instruction ' s immediate table or range
} }
The random generators are preferably not accessible by end-users of the configuration system but are employed for internal verification and for a whole range of TEE descriptions such as those described above and further including exhaustive cases of TIE register files of varying widths, such as 8, 16, 32, 64, 128 bits, and states. Additionally, end-users may be given access to the random generators for use in further verification. Coverage Measurements for TIE Verification As stated above, a goal of this verification effort is to ensure the correctness of the core and TEE interface, the implementation of the user-defined state and register file and associated logic and the correct translation of the TIE instruction into hardware. Some coverage metrics of these areas are necessary.
This is not meant to refer to basic design coverage of the RTL generated by the TEE compiler, but more to functional coverage in the areas mentioned. Although it is extremely hard to make such coverage assessments for TEE, ways have been developed to generate some functional coverage modules that run along with the RTL and report some coverage measures. One important area, for example, is all the bypass paths between the TEE register files and states. The diagnostics generated to test bypass should cover all possible bypass paths, but the goal is to have an independent confirmation of that in RTL. To do so, some Verilog/ VERA modules are automatically generated from the TEE description and the pipeline information. These modules run during RTL simulation time to report which bypass paths were covered.
Taking the example of the 128 bit register file il28 already seen in the previous sections, FIG. 13 shows such a general purpose register file, and the implementation in hardware. The figure shows one read port RdO and one write port Wd. Typically, there are two read ports and one write port for the register file. The naming convention for the signals is :
<port_name>_<signal_name>_<stage_name> where port_r_aiτιe: name of the register file port ( RdO , Rdl , Wd) signal_name : the signal names are: read port: mux. output of mux, da a: output of a flip-flop that goes to the datapath unit of TEE write port: mux. output of a mux, da ta: output of the datapath unit result: output of a flip-flop stage_name: this indicates the stage of the pipeline. As stated in a previous section, the convention here is: CO: R stage, Cl: E stage, C2: M stage, C3: W stage
For the sake of simplicity, the following discussion restricts all TIE instructions to write the register file no later than the end of the M-stage.
The block diagram shows the different bypass paths for these stages. For the read port RdO, which is read by the datapath in stages 1 and 2 (this was represented as the use of the register file in the previous sections), the following traces or explains the block diagram:
Stage CO:
RdO_mux_CO = select from ( Wd_data__C2 : the result produced by the instr last in the pipeline Wd_data_Cl : the result produced by the instr before last in the pipeline Rd0_data_C0 : The current data in the register file )
Stage Cl:
Rd0_data_Cl <= Rd0_mux_C0 where <= implies after a clock cycle RdO_mux__Cl = select from ( Wd_data_C2 : the result produced by the instr last in the pipeline RdO_data_Cl : the result of the previous stage ) Stage C2:
RdO_data_C2 <= RdO_mux_Cl
The write port Wd, which is written in stages 2 and 3, has a similar bypass path:
Stage C2:
Wd_resul t_C2 <= Wd_mux_Cl = Wd_data_Cl (the only source for the write port in stage Cl is the output of the instruction in E stage) Wd mux_C2 •= select from ( Wd_resul t_C2
Wd_data_C2 : result of the current instr in M stage ) Stage C3:
Vld_result_C3 <= Wd_nux_C2
Wd_ resul t_C3 is written to the register file.
Coverage of Bypass Paths
A goal of the preferred embodiment is to generate a monitor that checks if all the bypass paths in the above block diagram have been exercised. An example bypass path is traced in the dashed path in FIG. 13. The monitor essentially traces the data through the paths, and hence it is necessary to make a very important assumption, which is that the data remains unchanged in the datapath unit of TEE. This means that the following check can be performed:
Wd_data__Cl == j dO_data_Cl with the assumption that a TEE instruction that reads data in the E stage (Cl) and produces the output data in the E-stage leaves the data unchanged. This is of course untrue for any real TEE instruction. However, for the sake of testing some "identity" instructions in the user TIE (to be eliminated for generating real hardware) are introduced. These instructions, solely for testing, essentially copy data. In this example, two identity instructions are obtained:
Identity 1: use Cl , def Cl : which reads the register file in the E stage, and produces the same data in the E stage; and Identity 2: use Cl , def C2 : which produces data after a cycle delay.
Having described the premises of the monitor generation, now the algorithm for generating a
Vera module that tests if all the bypass paths were exercised will be described. Once again, the information generated by the TEE compiler is used and the signal name convention stated above is followed. foreach regf ( list of register files ) { foreach writeport ( writeports of regf ) { foreach writestage ( list of stages writeport is written ) { foreach readport ( readports of regf ) { foreach readstage ( list of stages readport is read) { skip if writestage < readstage generate_the_signal_list ( regf->name, writeport->name, writestage, readport->name, readstage, list_of_write_stages_for_writeport )
} // readstage } //readport }// writestage
} //writeport } // regf The workings of the subroutine that generates the signal list is omitted for the sake of simplicity, but will be apparent to those skilled in the art. One important note is how the datapath is represented in the list of signals. Ifthe datapath has a write stage > read stage (for example, the Jden i ty 2 instruction above), the number of cycles spent in the datapath unit (which is up to one, in accordance with our restriction of two cycle TEE instructions for this discourse) are simply added.
The path that is shown in dashed lines in FIG. 13 is generated as a signal list or trace from the above algorithm as : il28_wd_d ta_C2-> il28_rdO_mux_CO - > il28_rdO_data_Cl -> wai tcyclesl ->
±128_wd_d ta_C2 ->
±128_wd_mux_C2 - >
±128_wd_resul t_C3 where i 128 is the register file name. The path to the TEE register file ill28 from the top level of Xtensa is prepended to this. Notice that the dashed line from RdO_data_Cl -> Wd_data_C2 in the datapath in FIG. 13 has been represented as wait cycles 1 in the signal trace. A list of such signal traces are generated for all the bypass paths. Based on the signal trace, a small monitor module is generated in Verilog/Vera that checks if this path has been traced. If so, it reports a 1 for this path at the end of the simulation. Each monitor is essentially a small state machine that is generated by the algorithm: a) Determine the number of states in the state machine number of states = number of stages (from E) in signal trace + in state m/c number of cycles in the datapath b) Group the signals according to state c) Generate code : state = 0 ; <■ foreach state ( states in FSM ) { if ( last state in list ) {
* reset state
* set flag to 1 for covered } else { if ( signals in this state ) { generate ±f express±on to advance to next state
} else { advance to next state }
} •■ - }
The state machine generated for the example bypass path is: case ( state)
{ 0 :
{ if (<hierarchy>.il28_rd0_mux_C0 ==
<hierarchy>.il28_wd_data_C2) { state = 1;
} }
1 :
{' if (<hierarchy>.il28_rdO_data_Cl ==
<hierarchy> . il2δ_rdO_mux_CO) { state = 2;
} } 2:
{ int_state = 3; // waitcycles 1
}
3 { if (<hierarchy>.il2δ_wd_result_C3 == <hierarchy> . il28_wd_mux_C2) { state = 0 ; result_flag -= l'bl; }
}
}
Verification Summary To test the correctaess of the input reference instruction semantics, the TIE coder modifies the application to use the new instructions using intrinsics and then either (1) compiles this to machine code and runs the application with the instruction set simulator or (2) compiles to native code and uses the macros and functions output by the TEE compiler to provide intrinsic compatibility. The correctaess of the application verifies the correctaess of the instruction reference semantics with either of these two options. The translation of the reference semantics is verified by option 2, and the correctaess of the extended compiler and simulator is verified by option 1. Additional coverage beyond that provided by the application is by the use of the test case TEE construct to generate tests of specific cases (e.g., unusual or "corner" cases).
The implementation semantics may be verified by using a TEE compiler option to translate these instead of the reference semantics using the same methods as above. The implementation semantics and their translation to HDL may also be formally verified similar to the reference semantics by commercial equivalence checking tools working on the translation of each to HDL. Implementation semantics and their translation are also checked by the use of the TEE-specified test cases run in the HDL simulator. The HDL generated by the TEE compiler for the register files, interlock, bypass, core interface, and exceptions is verified by running automatically-generated tests based on the TIE input and using cosimulation to verify the results. These tests use the pipeline specification to exhaustively test all combinations of interlock, bypass, and exceptions. The HAL code generated by the TEE compiler is verified by executing it in the instruction set simulator. The assembler and compiler support for the new instructions is verified by most of the above. Cosimulation of Processors
Co-simulation is the process of running the RTL and the reference model in parallel, and comparing the architecturally visible states defined in the ISA at specified boundaries.
The cosimulator (hereinafter "cosim") acts as the synchronizer and the gateway between the RTL simulator, the ISS, and multiple other monitor/checker tasks that are executed in parallel. A diagnostic fails as soon as a mismatch occurs between the RTL and the ISS or when an assertion checker signals a catastrophic event. There are several advantages of using cosimulation. First, it provides easier debugging of failing diagnostics. It causes the simulation to stop at (or near) the cycle where the problem appeared, which significantly reduces debugging time and effort.
Second, it provides more state checking. It allows observability of the processor state throughout the program execution, thereby signaling those cases that create erroneous intermediate results while producing a correct final result.
Finally, with cosimulation there is no need for self-checking. Random diagnostics can be run and checked.
In the preferred embodiment, the ISS is the reference model and the boundaries are defined on instruction retirements and whenever external events occur. The set of architecturally visible states to be compared is configurable. One of the challenges of using cosim with configurable processors is the absence of complete knowledge regarding the process of comparing RTL and ISS. What is known about comparing RTL and ISS is that the comparison needs to occur on instruction retirement boundaries and on occurrences of external events. However, the processor state that should be compared between RTL and ISS depends on the processor options the user elects to include in her configuration. When a processor option is not included in a specific configuration of the processor core, then the cosim environment should not even attempt to compare the state introduced by the option, since the state is not present in either the RTL or the ISS. Thus, the preferred embodiment uses a cosim environment that is configurable and which is customized along with the software and hardware during the processor configuration. How the Cosim Works with TIE The ability of the user to extend the processor state as well as the instruction set using TEE complicates the cosim process since the cosim environment needs to be developed with no complete prior knowledge of the processor states and instruction set. En the presence of TEE, the cosim environment needs to be able to determine the new processor state that should be compared/validated as well as decide the boundaries at which the new state will compared between the RTL and ISS. In order for cosim to be able to achieve these two requirements/goals, it requires information regarding the new processor state defined in TEE. The information required by cosim includes the names of the new states, the width of the state elements, the complete RTL hierarchy (path) defining the states, whether the state is defined on reset or not, whether it is an individual state or a register file, and the number of entries when the state is a register file.
The information required by cosim is generated from the user's TEE description in three steps. First, as shown in FIG. 14, the TIE compiler parses the TEE description and generates an intermediate representation of the states defined in the input file. This intermediate representation is subsequently used by the cosim preprocessor to generate the cosim source code necessary for the verification of the new TIE state. Finally, the generated cosim code is integrated with the rest of the cosim framework to produce the cosim environment specific to the given configuration. This is preferably done using tpp to generate code in the Vera™ cosimulation language as implemented in, e.g., the Vera™ System Verifier by Synopsys, Inc. of Mountain View, CA.
The following section contains examples of the cosim preprocessor and the generated cosim source code obtained in connection with the Galois field TIE example presented earlier.
Cosimlnfo.pm
# _ #
# Cosimlnfo.pm creates arrays which contains state and #
# register files information for TIE and the core. # # #
©Cosimlnfo: :EXPORT -= qw (
©RegisterFiles ©SpecialRegister ©IntrType ©TieState
©TieRegister ©AllRegFiles ©AllSpecialRegs) ;
# # # For a given configuration: #
# SpecialRegister contains all the core #
# special registers ' names #
# RegisterFiles contains all the core #
# register files names # # -: #
©SpecialRegister = map (CoreState ($_, 1), grep ($_->name ne 'MEM', $isa->state) ) ; ©RegisterFiles = map (CoreState ($_, 0), grep ($_->name ne 'MEM1, $isa->state) )
# #
# For a given tie description: #
# TieState contains all the TIE states names #
# TieRegister contains all the TIE register files names # # #
©TieState = map (TieState ($_, 1),
$pr->tie () ->allStates 0 ) ; ©TieRegister = map (TieState ($_, 0),
$pr->tie() ->allStates () ) ;
©AllRegFiles = (©RegisterFiles, ©TieRegister) ; ©AllSpecialRegs = (©SpecialRegister, ©TieState) ;
# #
# TieState subroutine reads the TIE state and register #
# information from the configuration data base. #
# # sub TieState { my ($state, $tieState) = @_; my $name = $state->name () ; my $entries = $state->entries ( ) ; my $width = $state->width( ) ; my $undefonreset = ! ($state->initialized() ) ; my $regfile = $state->isRegFile ( ) ; if ($tieState) { return if ($regfile) ; [$name, $width == 1 ? 1 : $width, $undefonreset] ;
} else { return if (!$regfile); [$name, $width == 1 ? 1 : $width, $entries] ;
} }
Cosim Source Code (Tie Register File comparison): ; foreach (©TieRegister) { my ($regName, $regWidth, $regEntries) = ©$_; for($i = 0; $i < $regEntries; $i++) {
$tn = $regName . $i; iss_"$tn" = $iss_read_register_bitvec ("$i~+"$regName~_start) ; if (rtl_"$tn" [index] != iss_"$tn~) { printf ("Cosim © cycle %0d PC %h:\n\tRTL != ISS TIE Reg File "$tn %h %h\n\n" , current_cycle, rtl_spreg_pc [index] , rtl_"$tn" [index] , iss_"$tn") ; - } ; } ; }
Cosim output program (Tie register file comparison): iss_gf0 = $iss_read_register_bitvec (0+gf_start) ; if (rtl_gf0 [index] != iss_gf0) { printf ("Cosim © cycle %0d PC %h:\n\tRTL != ISS TIE Reg File gfO %h %h\n\n" , current_cycle, 'rtl_spreg_pc [index] , rtl_gf0 [index] , iss_gf0) ;
iss_gfl5 = $iss_read_register_bitvec (0+gf_start) ; if (rtl_gfl5 [index] ! = iss_gfl5) { printf ("Cosim © cycle %0d PC %h:\n\tRTL != ISS TIE Reg File gfl5 %h %h\n\n" , current_cycle, rtl_spreg_pc [index] , rtl_gf0 [index] , iss_gf15) ;
Cosim source code (Tie State comparison):
; foreach (©TieState) { ; ($sreg) = ©$_; // Checking Special Register "$sreg" iss_"$sreg" = $iss_read_register_bitvec P$sreg"_map) ; if (rtl_spreg_"$sreg" [index] != iss_"$sreg~) { iss_"$sreg*~ = $iss_read_register_bitvec ( ""$sreg"_map) ; printf ("Cosim © cycle %0d PC %h:\n\tRTL ! = ISS at TIE State "$sreg" %0h %0h\n\n", current_cycle, rtl_spreg_pc [index] , rtl_spreg_~"$sreg" [index] , iss_"$sreg") ; }
; } Cosim output program (Tie State comparison):
// Checking Special Register gfmod iss_gfmod = $iss_read_register_bitvec (gfmod_map) ; if (rtl_spreg_gfmod [index] != iss_gfmod) { iss_gfmod = $iss_read_register_bitvec (gfmod_map) ; printf ("Cosim @ cycle %0d PC %h:\n\tRTL != ISS at TIE State gfmod %0h %0h\n\n" , current_cycle, rtl_spreg_ c [index] , rtl_spreg_gfmod [index] , iss_gfmod) ;
}
Thus, in summary, to adapt the simulator described in the Killian et al. application to work in the preferred embodiment a number of changes primarily having to do with generalizations to state must be made. Because TEE state can be arbitrarily wide, an interface is needed to register values that are arbitrarily sized, but it is preferred that the interface not be used all the time for performance reasons. Because of this, the registers are partitioned into classes, and the gdb and cosim interfaces are modified so that they can find a class and an index within a class from a single integer code. The socket interface is changed so that arbitrary width values can be transmitted and received. New memory interfaces are added to support wide loads and stores. The initialization of TEE state is generalized to support register files and assignment of registers to coprocessors. Support for simulating pipeline delays associated with access of TEE state is also added. The interface to TEE state is modified to simulate the CPENABLE exception.
Summary
In summary, then, the major new TEE constructions discussed above, the generated files which they affect and their general purposes are given below in TABLE III.
Figure imgf000080_0001
Figure imgf000081_0001
The present invention has been described above in connection with a preferred embodiment thereof; however, this has been done for purposes of illustration only, and the invention is not so limited. Indeed, variations of the invention will be readily apparent to those skilled in the art and also fall within the scope of the invention.
APPENDIX A
README . gf
Notation: <dir> - path to this directory
This is a brief list of the files in this directory.
Miscellaneous :
README.gf this file gf.tie a copy of the source tie file default-params default param file to configure software tools gf-params param file to configure software tools for gf.tie
Native C Support : cstub-gf.c functions for the new instructions cstub-gf-ref .c functions generated from "reference"
BR.h support for BR register file
Design Compiler Synthesis: gf.v Verilog source file gf_check.dcsh Syntax check generated verilog gf.dcsh Top-level Design Compiler synthesis script
Xtensa_cons_generic.dc supporting script
Xtensa_prim.dc supporting script
TIE_opt.dc supporting script xmTIE_cons . dc supporting script prim.v supporting Verilog source file
Verysys Verification: verysys subdirectory supporting Verysys verification verysys/verify_sem.v Verilog source generated from semantics verysys/verify_ref .v Verilog source generated from reference
Xtensa tool support: libisa-gf .so dynamically linked library for xt-gcc libiss-gf.so dynamically linked library for xt-run xtensa-gf.h macro definitions of new instructions
Unknow : gf_test .v
To compile your application in native mode:
- include cstub-gf.c in your application
- compile your application using your native c compiler (e.g., gcc)
The new TIE instructions are replaced with equivalent C code.
If you define add "-DTIE_DEBUG" to the C compile, the function names for the translated TIE instructions will be prefixed with "TIE_" . Using this method, you can check the TIE description against hand-written C functions for -"the new instructions. Refer for the application note for more details .
To compile your application for Xtensa:
- add "--xtensa-params=<dir>" to the command line or add the environment variable 'XTENSA_PARAMS=<dir>; export XTENSA_PARAMS" - compile your application using xt-gcc
To estimate the impact of your TIE description on Xtensa speed:
- Setup your shell environment to run Synopsys Design Compiler
- Modify gf .dcsh to fill in your technology information
- Run dc_shell with script gf.dcsh, e.g.,
"dc_shell -f gf.dcsh >& dc.out &"
- Inspect the synthesis results. Look in report section of the output file "dc.out". If there is any timing violation, the Xtensa speed will be impacted, roughly by the violation amount. The area report section will give you the area of your tie instruction block.
To compare reference designs against semantic designs using Verysys :
- "cd <dir>/verysys; make"
Notes for vl.5 user:
- cstub-gf.c can be included for xt-gcc compiles; it will be ignored
Note for vl.l user:
- no need to regenerate this development kits from the Web.
- no need to include <machine/Customer.h> anymore
gf . tie opcode GFADD8 op2=4'b0000 CUSTO opcode GFMULX8 op2=4'b0001 CUSTO opcode GFRWMOD op2=4'b0010 CUSTO opcode GFADD8I op2=4'b0100 CUSTO opcode GF8.1 r=4'b0000 LSCI opcode SGF8. I r=4'b0001 LSCI opcode LGF8. IU r=4'b0010 LSCI opcode SGF8.IU r=4'b0011 LSCI opcode LGF8.X op2=4'b0000 LSCX opcode SGF8.X op2=4'b0001 LSCX opcode LGF8.XU op2=4'b0010 LSCX opcode SGF8.XU op2=4'b0011 LSCX state gfmod 8 user_register 0 { gfmod } regfile gf 8 16 g operand gr r { gf [r] } operand gs s { gf [s] } operand gt t { gf [t] } operand imm4 t { t } { imm4 } interface VAddr 32 core out interface LSSize 5 core out interface MemDatalnS 8 core in interface MemDataOutδ 8 core out iclass gfrrr { GFADD8 } {out gr, in gs, in gt} {} {} iclass gfrri { GFADD8I } {out gr, in gs, in imm4} {} {} iclass σgffrrrr f{ GGFFMMUULXX88 )} {oouutt σgrr,. iinn σgss}} {{iinn ggffmmoodd}} {{} iclass gfr { GFRWMOD8 } {inout gt} {inout gfmod} {}• iclass gfloadi { LGF8.I } { out gt, in ars, in imm8} {} { out LSSize, out VAddr, in MemDatalnδ } iclass gfstorei { SGF8. I } { in gt, in ars, in immδ} {} { out LSSize, out VAddr, out MemDataOutδ } iclass gfloadiu { LGF8. IU } { out gt, inout ars, in immδ} {} { out LSSize, out VAddr, in MemDatalnδ } iclass gfstoreiu { SGF8. IU } { in gt, inout ars, in immδ} {} { out LSSize, out VAddr, out MemDataOutδ } iclass gfloadx { LGF8.X } { out gr, in ars, in art} {} { out LSSize, out VAddr, in MemDatalnS } iclass gfstorex { SGF8.X } { in gr, in ars, in art} {} { out LSSize, out VAddr, out MemDataOutδ } iclass gfloadxu { LGF8.XU } { out gr, inout ars, in art} {} { out LSSize, out VAddr, in MemDatalnδ } iclass gfstorexu { SGF8.XU } { in gr, inout ars, in art} {} { out LSSize, out VAddr, out MemDataOut8 } semantic gfl { GFADD8 } { assign gr = gs gt;
} semantic gf4 { GFADD8I } { assign gr = gs imm4;
} semantic gf2 { GFMULX8 } { assign gr = gs [7] ? ({gs [6 : 0] , 1 'bθ} A gfmod) : {gs [6 :0] , 1 'bθ} ;
} semantic gf3 { GFRWMOD8 } { wire [7 : 0] tl = gt; wire [7 : 0] t2 = gfmod; assign gfmod = tl ; assign gt = t2 ;
} semantic lgf { LGF8.I, LGF8.IU, LGF8.X, LGF8.XU } { wire indexed = LGF8.x|LGF8.XU; assign LSSize = 1; assign VAddr = ars + (indexed ? art : imm8) ; assign gt = MemDatalnS; assign gr = MemDatalnδ; assign ars = VAddr;
} semantic sgf { SGF8.I, SGF8.IU, SGF8.X, SGF8.XU } { wire indexed = SGF8.X| SGF8.XU; assign LSSize = 1; assign VAddr = ars + (indexed ? art : imm8) ; assign Me DataOutδ = SGF8.X| SGF8.XU ? gr : gt; assign ars = VAddr; } reference GFADD8 { assign gr = gs gt;
} reference GFADD8I { assign gr = gs imm4; } reference GFMULX8 { assign gr = gs [7] ? ( {gs [6 : 0] , I ' bO } λ gfmod) : {gs [6 : 0] , 1 - bθ } ,-
} reference GFRWMOD8 { wire [7:0] tl = gt; wire [7 : 0] t2 = gfmod; assign gfmod = tl; assign gt = t2;
} reference LGF8. I { assign LSSize = 1; assign VAddr = ars + imm8; assign gt = MemDatalnS;
} reference LGF8. IU { assign LSSize = 1; assign VAddr = ars + immS; assign gt = MemDataIn8; assign ars = VAddr;
} reference LGF8.X { assign LSSize = 1; assign VAddr = ars + art; assign gr = MemDatalnS ; assign ars = VAddr;
} reference LGF8.XU { assign LSSize = 1; assign VAddr = ars + art; assign gr = MemDatalnθ; assign ars = VAddr;
} reference SGF8. I {" assign LSSize = 1; assign VAddr = ars + imm8; assign MemDataOutδ = gt;
} reference SGF8.IU { assign LSSize = 1; assign VAddr = ars + imm8; assign MemDataOutβ = gt; assign ars = VAddr;
} reference SGF8.X { assign LSSize = 1; assign VAddr = ars + art; assign MemDataOutδ = gr;
} reference SGF8.XU { assign LSSize = 1; assign VAddr = ars + art; assign MemDataOutθ = gr; assign ars = VAddr; } ctype gf8 8 8 gf proto gf8_loadi {out gf8 t, in gf8* s, in immediate o} {} {
LGF8.I t, S, o; } - proto gf8_storei {in gfδ t, in gf8* s, in immediate o} {} SGF8.I t, s, o;
} proto gfδ_move {in gf8 r, in gfδ s} {} {
GFADD8 r, S, 0; } schedule gfload { LGF8. I }
{ use immδ 0; use ars 1; def gt 2;
} schedule gfloadu { LGFδ.IU } { use immδ 0; use ars 1; def ars 1; def gt 2;
} schedule gfloadx { LGFδ.X }
{ use ars 1; use art 1; def gr 2;
} schedule gfloadxu { LGFδ.XU }
( use ars 1 use art 1 def art 1 def gr 2 ;
synopsis GFADD8 "Galois Field 8 -bit Add" synopsis GFADD8I "Galois Field 8-bit Add Immediate" synopsis GFMULX8 "Galois Field 8-bit Multiply by X" synopsis GFRWMOD8 "Read/Write Galois Field Polynomial" synopsis LGF8. I "Load Galois Field Register Immediate" synopsis LGF8. IU "Load Galois Field Register Immediate Update" synopsis LGF8.X "Load Galois Field Register Indexed" synopsis LGF8.XU "Load Galois Field Register Indexed Update" synopsis SGF8. I "Store Galois Field Register Immediate" synopsis SGF8. IU "Store Galois Field Register Immediate Update" synopsis SGF8.X "Store Galois Field Register Indexed" synopsis SGF8.XU "Store Galois Field Register Indexed Update" description GFADD8
"<P><C0DE>GFADD8</C0DE> performs a 8-bit Galois Field addition of the contents of GF registers <C0DE>gs</C0DE> and <CODE>gt</CODE> and writes the result to GF register <CODE>gr</CODE>.</P>" description GFADD8I
"<P><C0DE>GFADD8I</C0DE> performs a δ-bit Galois Field addition of the contents of GF register <CODE>gs</CODE> and a 4-bit immediate from the <CODE>t</CODE> field and writes the result to GF register <CODE>gr</CODΞ> . </P>" description GFMULX8
"<PxCODE>GFMULX8</CODE> performs a δ-bit Galois Field multiplication of the contents of GF register <CODE>gs</CODE> by <I>x</I> modulo the polynomial in <CODE>gfmod</CODE> . It writes the result to GF register
<CODE>gr</CODE> . </P>" description GFRWMOD
"<P><CODE>GFRWMOD</CODE> reads and writes the <CODE>gfmod</CODE> polynomial register. GF register <CODE>gt</CODE> and <CODE>gfmod</CODE> are read these are written to <CODE>gfmod</CODE> and <CODE>gt</CODE> . </P>" description LGFδ.I
"<P>
</P>" description LGFδ.IU
"<P>
</P>" description LGFδ.X
"<P>
</P>" description LGFδ.XU
"<P>
</P>" description SGF8.I
"<P>
</P>" description SGF8. IU
"<P>
</P>" description SGF8.X
"<P>
</P>" description SGF8.XU
"<P>
</P>"
default-params isa-tie-dll=lib-i686-Linux/libisa-gf . so iss-tie-dll=lib-i686-Linux/libiss-gf . so cc-tie-dll=lib-i686-Linux/libcc-gf . so xtensa-tie-header=xtensa-gf . h
gf-params isa-tie-dll=lib-i686-Linux/libisa-gf . so iss-tie-dll=lib-i686-Linux/libiss-gf . so cc-tie-dll=lib-i686-Linux/libcc-gf . so xtensa-tie-header=xtensa-gf .h cstub-gf . c
#ifndef XTENSA
#ifdef TIE_DEBUG #define gf8_loadi TIE_gfδ_loadi #define gfδ__storei TIE_gfδ_storei #define gf8_move TIE_gf8_move #define GFADD6 TIE_GFADDδ #define GFADDδl TIE_GFADD8I #define GFMULX8 TIE_GFMULX8 #define GFRWMOD8 TIE_GFRWMOD8 #define LGF8_I TIE_LGF8_I #define SGF8_I TIE_SGF8_I ttdefine LGF8_IU TIE_LGF8_IU #define SGF8_IU TIE_SGFδ_IU ttdefine LGF8_X TIE_LGFδ_X #define SGFδ_X TIE_SGFδ_X #define LGF8_XU TIE_LGFδ_XU #define SGF8_XU TIE_SGF8_XU #define RURO TIE_RUR0 #define WURO TIE_WUR0 #endif
#include <stdio.h>
#define LittleEndian 0
#define BigEndian 1 ftdefine PIFReadDataBits 128 ttdefine PIFWriteDataBits 128
#define IsaMemoryOrder LittleEndian ttinclude "BR.h" ttinclude "LS.h"
#define BPW 32
#define WINDEX (_n) ( (_n) / BPW)
#define BINDEX ( n) ( ( n) % BPW) typedef unsigned char Vb_t; typedef unsigned short Vs_t; typedef struct Vl_s {unsigned data [1] ; } Vl_t typedef struct V2_s {unsigned data [2];} V2_t typedef struct V4_s {unsigned data [4];} V4_t typedef Vb_t gf8;
static int tie_load_instruction = 0 ; void
TieMemRead (unsigned *data, unsigned addr)
{ unsigned char *mem; unsigned modulus, bytes, offset; int t, bO, bl, b2, b3 ; bytes = PIFReadDataBits / 8; modulus = bytes - 1; mem = (unsigned char *) (addr & -modulus) offset = (unsigned char *) addr - mem; if (IsaMemoryOrder == LittleEndian) { for (t = 0 ; t < bytes/sizeof (int) ; t++) bO = mem [ (of f set++) & modulus] bl = mem [ (of f set++) & modulus] b2 = mem [ (off set++) & modulus] b3 = mem [ (of f set++) &■ modulus] data [t] = (b3 << 24 ) I (b2 << 16) (bl << 8) bO;
} } else { for(t = bytes/sizeof (int) - 1; t >= 0; t--) { b3 = mem[ (offset++) & modulus] b2 = mem [ (offset++) & modulus] bl = mem[ (offset++) & modulus] bO = mem [ (offset++) & modulus] data[t] = (b3 << 24) | (b2 << 16) | (bl << 8) bO;
}
}
void
TieMemWrite (unsigned addr, unsigned bytes, unsigned *data)
{ unsigned char *mem; unsigned modulus, offset, w; int t ; if (PIFWriteDataBits < bytes * 8) { fprintf (stderr, "Error: not configured to write %d bytes\n" , bytes) ; exit (1) ; } modulus = bytes - 1; mem = (unsigned char *) (addr & -modulus) ; if (IsaMemoryOrder == LittleEndian) { if (bytes == 1) { mem[0] = data[0] & Oxff; } else if (bytes == 2) { mem[0] = data[0] & Oxff; mem[l] = (data[0] >> 8) & Oxff; } else { offset = 0; for ( t = 0 ; t < bytes/sizeof (int) ; t++) { w = data [t] ; mem [of f set++] = w & 255 ; mem [off set++] = (w >> 8 ) & 255 ; mem [of f set++] = (w >> 16 ) & 255 ; mem [of f set++] = (w >> 24 ) & 255 ;
}
}
} else { if (bytes == 1) { mem[0] = data[0] & Oxff; } else if (bytes == 2) { mem[l] = data[0] & Oxff; mem[0] = (data[0] >> 8) & Oxff; } else { offset = 0; for(t = bytes/sizeof (int) - 1; t >= 0; t--) { w = data [t] ; mem[offset++] = (w >> 24) & 255; mem [offset++] = (w >> 16) & 255; mem[offset++] = (w >> δ) & 255; mem[offset++] = w & 255;
}
} } }
#define GetState (_s, _n) _s = _n #define SetState (_n, _s) _n = _s
Vl_t STATE_gfmod;
Vl_t VAddr = { { 0 } } ;
Vl_t VAddrBase = {{o}};
Vl_t VAddrOffset = {{θ}};
Vl_t VAddrlndex = {{o}};
Vl_t VAddrln = { { 0 } } ;
Vl_t LSSize = {{0}};
Vl_t LSIndexed = {{θ}};
V4_t MemDataInl2β = { {θ, 0,0,0} };
V2_t MemDataIn64 = {{0,0}};
Vl_t MemDataIn32 = {{θ}};
Vl_t MemDataInl6 = {{θ}};
Vl_t MemDatalnδ = {{θ}};
V4_t MemDataOutl28 = {{0,0,0,0}};
V2_t MemDataOut64 = {{θ,θ}};
Vl_t MemDataOut32 = {{°}};
Vl_t MemDataOutlδ = {{°}j;
Vl_t MemDataOutδ = {{θ}};
Vl_t Exception = {{θ}};
Vl_t ExcCause = {{θ}};
Vl_t CPEnable = {{θ}};
void
VAddrIn_get (void)
{ if (LSIndexed. data [0] != 0) {
VAddrln. data [0] = VAddrBase. data [0] + VAddrlndex. data [0] ; } else {
VAddrIn. data [0] = VAddrBase. data [0] + VAddrOffset .data [0] ; } }
void
MemDataInl28_get (void)
{ unsigned data [4]; if ( (!tie_load_instruction) || (LSSize. data [0] != 16)) { return; } if (PIFReadDataBits < 128) { fprintf (stderr, "Error: not configured to read 16 bytes\n") ; exit (-1) ; } VAddrIn_get ( ) ; TieMemRead(&data[0] , VAddrln. data [0] ) MemDataInl28. data [ 0 ] = datatO] MemDataInl28.data [1] = datatl] MemDataInl28.data [2] = data [2] MemDataInl28. data [3 ] = data [3]
void
MemDataIn64_get (void)
{ unsigned data [4] ,- if ( ( !tie_load_instruction) (LSSize.data [0] != 8) ) { return; } if (PIFReadDataBits < 64) { fprintf (stderr, "Error: not configured to read 8 bytes\n") exit (-1) ; }
VAddrIn_get ( ) ;
TieMemRead(&data[0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDataIn64.data [0] = data [0] ;
MemDataIn64.data[l] = data [1] ; } else if (PIFReadDataBits == 64) {
MemDataIn64.data [0] = data[0];
MemDataIn64.data [1] = data [1] ; } else {
MemDataIn64.data [0] = data[2];
MemDataIn64. at tl] = data[3];
void
MemDataIn32_get (void)
{ unsigned data [4] ,- if ( ( !tie_load_instruction) (LSSize. data [0] != 4) ) { return;
} if (PIFReadDataBits < 32) { fprintf (stderr, "Error: not configured to read 4 bytes\n") ; exit (-1) ; }
VAddrIn_get ( ) ;
TieMemRead(&data[0] , VAddrln. data [0] ) ; if" -(IsaMemoryOrder == LittleEndian) {
MemDataIn32.data[0] = data[0]; } else if (PIFReadDataBits == 32) {
MemDataIn32.data[0] = data[0]; } else if (PIFReadDataBits == 64) {
MemDataIn32.data[0] = data [1] ; } else {
MemDataIn32.data [0] = data[3];
}
}
void
MemDataInl6_get (void)
{ unsigned data [41; if ( (!tie_load_instruction) || (LSSize. data [0] != 2)) { return; } if (PIFReadDataBits < 16) { fprintf (stderr, "Error: not configured to read 2 bytes\n") ; exit (-1) ; }
VAddrIn_get () ;
TieMemRead(&data[0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDataInl6.data[0] = data[0] & Oxffff; } else if (PIFReadDataBits == 32) {
MemDatalnlδ.data [0] = data[0] >> 16; } else if (PIFReadDataBits == 64) {
MemDataInl6.dat [0] *■*■* data[l] >> 16; } else {
MemDataInl6.data[0] = dat [3] >> 16; }
void
MemDataIn8_ge (void)
{ unsigned data [4] ; if ( ( !tie_load_instruction) || (LSSize .data [0] != 1)) { return; } if (PIFReadDataBits < 8) { fprintf (stderr, "Error: not configured to read 1 byte\n"); exit (-1) ; }
VAddrIn_get () ;
TieMemRead(&data[0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDatalnS .data [0] = data[0] & Oxff; } else if (PIFReadDataBits == 32) {
MemDatalnβ .data[0] = data[0] >> 24; } else if (PIFReadDataBits == 64) {
MemDatalnβ.data [0] = data[l] >> 24; } else {
MemDatalnS . data [0] = data [3] >> 24; } void
MemDataOutl28_set (void)
{ if (LSSize. data [0] 1= 16) { return; }
VAddrIn_get () ;
TieMemWrite (VAddrln.data [0] & -Oxf, 16, &MemDataOutl2δ .dat [0]
void
MemDataOut64_set (void)
{ if (LSSize. data [0] != 8) { return; }
VAddrIn_get () ;
TieMemWrite (VAddrln. data [0] & -0x7, δ, &MemDataOut64.data [0] ) ;
}
void
MemDataOut32_set (void)
{ if (LSSize. data [0] != 4) { return; }
VAddrIn_get () ;
TieMemWrite (VAddrln.data [0] & -0x3, 4, &MemDataOut32.data [0] ) ;
}
void
MemDataOutl6_set (void)
{ if (LSSize.dat [0] != 2) { return; }
VAddrIn_get () ;
TieMemWrite (VAddrln.data [0] & -0x1, 2, &MemDataOutl6.data [0] ) ;
}
void
MemDataOutδ_set (void)
{ if (LSSize. data [0] != 1) { return; }
VAddrIn_get () ;
TieMemWrite (VAddrln.data [0] , 1, &MemDataOutδ .data [0] ) ;
} void
Exception_set (void)
{
/* Exception handling is not supported in native mode */
}
void
CPEnable_get (void)
{
CPEnable.data [0] = Oxff; /* always enabled in native C mode */
}
#define RUR(n) ({ \ int v; \ switch (n) { \ case 0 : \ v = RUR0O; break; \ default: \ fprintf (stderr, "Error: invalid rur number %d\n", n) ; \ exit(-l); \
} \ v; \
})
#define WUR(v, n) \ switch (n) { \ case 0 : \
WURO (v) ; break; \ default: \ fprintf (stderr, "Error: invalid wur number %d\n", n) ; \ exit(-l) ; \ } gfδ
GFADD6 (gf8 gs_, gf8 gt_)
{
/* operand variables */
Vl_t gr_o;
Vl_t gs_i;
Vl_t gt_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFADD8 = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gs_i.data[0] = gs_; gt_i.data[0] = gt_; tie_load_instruction = 0;
/* 'semantic statements */ gr_o.data[0] = (gs_i.data [0] λ gt_i .data [0] ) &• Oxff; gr_kill_o.data[0] = (0 & GFADD8.dat [0] ) & 0x1;
/* write-back inout operands */
/* return the output operand */ return gr_o.data [0] ; } gfβ
GFADD8I (gfδ gs_, int imm4_)
{
/* operand variables */
Vl_t gr_o; Vl_t gs_i; Vl_t imm4;
/* unused operand variables */ /* operand kill variables */ Vl_t gr_kill_o = {{o}},- /* one-hot instruction signals */ Vl_t GFADDδl = {{l}}; /* state variables */ /* local wire variables */ /* initialize in/inout operands */ gs_i.data[0] = gs_; imm4.data[0] = imm4_; tie_load_instruction = 0; /* semantic statements */ gr_o.data[0] = (gs_i .data [0] A imm4.data [0] ) & Oxff; gr_kill_o.data[0] = (0 & GFADDδl .data [0] ) & 0x1; /* write-back inout operands */ /* return the output operand */ return gr_o.dat [0] ; } gfδ
GFMULXδ (gfδ gs_)
{
/* operand variables */
Vl_t gr_o;
Vl_t gs_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFMULX8 = {{l}};
/* state variables */
Vl_t gfmod_ps;
/* local wire variables */
Vl_t tmp5,
Vl_t tmp4
Vl_t tmp3
Vl_t tmp2
Vl_t tm l
Vl_t tmpOj
/* get input state values */
GetState (gfmod_ps, STATE_gfmod) ;
/* initialize in/inout operands */ gs_i.data[0] = gs_; tie_load_instruction = 0 ;
/* semantic statements */ tmp0'.data[0] = ( ( (gs_i .data [0] « 24) >> 31)) & 0x1; tmpl.data[0] = ( (gs_i.data [0] & 0x7f ) ) & 0x7f; tmp2.data[0] = ( (tmpl .data [0] << l) |θ) & Oxff; tmp3.data[0] = (tmp2.data [0] A gf mod_ps . dat [0] ) & Oxf ; tmp4.data[0] = ( (gs_i .data [0] & 0x7f ) ) & 0x7f; tmp5.data[0] = ( (tmp4.data [0] << l)|θ) & Oxff; gr_o.data[0] = ( (tmpO .data [0] ) ? tmp3.data[0] : tmp5.data [0] ) & Oxff; gr_kill_o.data[0] = (0 & GFMULX8.data [0] ) & Oxl; /* write-back inout operands */ /* return the output operand */ return gr_o . data [ 0 ] ;
}
#def ine GFRWMOD 8 (gt) \
GFRWMOD 8_func (& (gt) ) void
GFRWMODδ_func (gf8 *gt_)
{
/* operand variables */
Vl_t gt_o;
Vl_t gt_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gt_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFRWMOD8 = {{l}};
/* state variables */
Vl_t gfmod_ps;
Vl_t gfmod_ns;
Vl_t gfmod_kill_ns;
/* local wire variables */
Vl_t tl;
Vl_t t2;
/* get input state values */
GetState (gfmod_j?s, STATE_gf od) ;
/* initialize in/inout operands */ gt_i.data[0] = *gt_; tie_load_instruction = 0;
/* semantic statements */ tl.data[0] = gt_i.data[0] & Oxff; t2.data[0] = gf mod jps. data [0] & Oxff; gfmod_ns .data [0] = tl.data[0] & Oxff; gt_o.data[0] = t2.data[0] & Oxff; gfmod_kill_ns.data[0] = (0 & GFRWMOD8.dat [0] ) & 0x1; gt_kill_o.data[0] = (0 & GFRWMODδ .data [0] ) & 0x1;
/* write -back inout operands */ if ( !gt_kill_o.data[0] ) *gt_ =, gt_o.data [0] ;
/* update out/inout states */ if ( !gfmod_kill_ns.data[0] ) SetState (STATE_gf od, gfmod_ns) ; } gf8
LGF8_I (unsigned ars_, int immδ_)
{
/* operand variables */
Vl_t gt_o;
Vl_t ars_i;
Vl_t immβ;
/* unused operand variables */
Vl^t ars_o;
Vl_t gr_o;
Vl_t art_i = {{θ}};
/* operand kill variables */
Vl_t gt_kill_o = {{0}};
Vl_t ars_kill_o = {{θ}}; Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t LGFδ_I = {{l}};
Vl_t LGFδ_IU = {{.0}};
Vl_t LGFδ_X = {{θ}};
Vl_t LGFδ_XU = {{0}};
/* state variables */
/* local wire variables */
Vl_t tmp2,
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ ars_i = * ( (Vl_t *) &ars_) ; immδ. data [0] = immδ_; tie_load_instruction = 1;
/* semantic statements */ indexed. data [0] = (LGFδ__X.data [0] | LGFδ_XU.data [0] ) & 0x1;
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i.data [0] ;
LSIndexed. data [0] = indexed. dat [0] & 0x1;
VAddrOffset. data [0] ■= immδ .data [0] ;
VAddrlndex. data [0] = art_i.data [0] ;
MemDataInβ_get () ; gt_o.data[0] = MemDatalnδ .data [0] & Oxff;
MemDataInδ_get () ; gr_o.data[0] = MemDatalnS .data [0] & Oxff;
VAddrIn_get () ; ars_o.data [0] = VAddr In. data [0] ; tmp0.data[0] = (LGF8_I .data [0] | LGF8_IU.data [0] ) & 0x1; gt_kill_o.data[0] = (0 & tmpO .data [0] ) & Oxl; tmpl. data [0] = (LGF8_IU.data [0] | LGF8_XU.data [0] ) & Oxl; ars_kill_o.data [0] = (0 & tmpl. data [0] ) & Oxl; tmp2.data[0] =" (LGF8_X.data [0] | LGF8_XU.data [0] ) & Oxl; gr_kill_o.data[0] = (0 & tmp2.dat [0] ) & Oxl;
/* write -back inout operands */
/* update output interface signals */
/* return the output operand */ return gt_o.data [0] ;
#define LGF8_IU(ars, imm8) \
LGF8__IU_func (& (ars) , imm8) gf8
LGFδ_IU_func (unsigned *ars_, int immδ_)
{
/* operand variables */
Vl_t gt_o;
Vl_t ars_o;
Vl_t ars_i;
Vl_t immδ;
/* unused operand variables */
Vl_t gr_o;
Vl_t art_i = {{0}};
/* operand kill variables */
Vl_t gt_kill_o = {{θ}};
Vl_t ars_kill_o = {{θ}};
Vl_t gr_kill_o . = { {0} } ;
•/* one-hot instruction signals */ Vl_t LGF8_I = {{θ}}; Vl_t LGF8_IU = U1}}; Vl_t LGF8_X = {{0}}; Vl_t LGF8_XU = {{0}}; /* state variables */ /* local wire variables */ Vl_t tmp2; Vl_t tmpl; Vl_t tmpO; Vl_t indexed;
/* initialize in/inout operands */
Figure imgf000099_0001
/* semantic statements */ indexed. data [0] = (LGF8_X.data [0] | LGF8_XU.data [0] ) & 0x1; LSSize. data [0] = 0x1 & Oxlf; VAddrBase.data [0] = ars_i .data [0] ; LSIndexed. data [0] = indexed. data [0] & Oxl; VAddrOffset .data [0] = imm8.data [0] ; VAddr Index. data [0] = art_i .data [0] ; MemDataIn8_get () ; gt_o.data[0] = MemDatalnS .data [0] & Oxff; MemDataInδ_get () ; gr_o.data[0] = MemDatalnδ .data [0] & Oxff; VAddrIn_get () ; ars_o.data [0] = VAddrln. data [0] ; tmp0.data[0] = (LGFδ_I .data [0] | LGFδ_IU.data [0] ) & Oxl; gt_kill_o.data [0] = (0 & tmpO .data [0] ) & θxl; tmpl.data[0] = (LGF8_IU.data [0] | LGF8_XU.data [0] ) & Oxl; ars_kill_o.data [0] = (0 & tmpl .data [0] ) & Oxl; tmp2.data[0] = (LGF8_X.data [0] | LGF8_XU. data [0] ) & Oxl; gr_kill_o.data [0] = (0 & tmp2.data [0] ) & Oxl; /* write-back inout operands */ if ( !ars_kill_o.data [0] ) *ars_ = *( (unsigned *) &ars_o) ; ./* update output interface signals */ /* return the output operand */ return gt_o.data [0] ;
} gf8
LGFδ_X (unsigned ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_o;
Vl_t ars_i;
Vl_t art_i;
/* unused operand variables */
Vl_t gt_o;
Vl_t ars_p;
Vl_t immS =. { { 0 } } ;
/* operand kill variables */
Vl_t gt_kill_o = {{θ}};
Vl_t ars_kill_o = {{θ}};
Vl_t" gr_kill_o = { {0} } ;
/* one-hot instruction signals */
Vl_t LGF8_I = {{0}};
Vl_t LGF8_IU = { {0} } ;
Vl_t LGF8_X = {{1}};
Vl_t LGF8_XU = { { 0 } } ; /* state variables */
/* local wire variables */
Vl_t tmp2;
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ ars_i = *((Vl_t *) &ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 1;
/* semantic statements */ indexed. data [0] = (LGF8_X.data [0] | LGF6_XU.data [0] ) & Oxl;
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ,-
LSIndexed. data [0] = indexed. data [0] & 0x1;
VAddrOf f set. data [0] = immS .data [0] ;
VAddrlndex. data [0] = art_i .data [0] ;
MemDataIn8_get() ; gt_o.data[0] = MemDatalnS .data [0] & Oxff;
MemDataInδ_get () ; gr_o.data[0] *= MemDatalnδ .data [0] & Oxff;
VAddrIn_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; tmp0.data[0] = (LGFδ_I .dat [0] | LGF8_IU.data [0] ) & 0x1; gt_kill_o.data[0] = (0 & tmpO .data [0] ) & 0x1; tmpl. data [0] = (LGF8_IU.data [0] | LGF8_XU.data [0] ) & 0x1; ars_kill_o.data[0] = (0 & tmpl. data [0] ) & 0x1; tmp2.data[0] = (LGF8_X.dat [0] | LGFδ_XU.data [0] ) & 0x1; gr_kill_o.data[0] = (0 & tmp2.data [0] ) & 0x1;
/* write -back inout operands */
/* update output interface signals */
/* return the output operand */ return gr_o.data [0] ;
}
#define LGF8_XU(ars, art) \
LGF8_XU_func(&(ars) , art) gf8
LGF8_XU_f unc (unsigned *ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_o;
Vl_t ars_o;
Vl_t ars_i;
Vl_t art_i;
/* unused operand variables */
Vl_t gt_o;
Vl_t immδ = { { 0 } } ,-
/* operand kill variables */
Vl_t gt_kill_o = {{0}};
Vl_t ars_kill_o = {{θ}};
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vr_t LGFδ_I = {{0}};
Vl_t LGF8_IU = {{θ}};
Vl_t LGF8_X = {{0}};
Vl_t LGFδ_XU = {{l}};
/* state variables */
/* local wire variables */ Vl_t tmp2;
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ ars_i = * ( (Vl_t *) ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 1;
/* semantic statements */ indexed. data [0] = (LGF8_X.dat [0] | LGF8_XU.data[0] ) & Oxl;
LSSize. data [0] = Oxl & Oxlf;
VAddrBase. data [0] = ars_i.data [0] ;
LSIndexed. data [0] = indexed. data [0] & Oxl;
VAddrOffset. data [0] = immδ .data [0] ;
VAddrlndex . data [0] = art_i . data [0] ;
MemDataIn8_get 0 ; gt_o.data[0] = MemDatalnβ .data [0] & Oxff ;
MemDataIn8_get () ; gr_o.data[0] = MemDatalnS .data [0] & Oxff ;
VAddrIn_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; tmp0.data[0] = (LGF8_I .data [0] | LGF8_IU.data [0] ) & 0x1; gt_kill_o.data[0] = (0 & tmpO .data [0] ) & Oxl; tmpl. data [0] = (LGF8_IU.data [0} | LGF8_XU.data [0] ) & 0x1; ars_kill_o.data[0] = (0 & tmpl. data [0] ) & Oxl; tmp2.data[0] = (LGF8_X.data [0] | LGF8_XU.data [0] ) & 0x1; gr_kill_o.data[0] = (0 & tmp2.data [0] ) & Oxl;
/* write -back inout operands */ if ( !ars_kill_o.data [0] ) *ars_ = *( (unsigned *) &ars_o
/* update output interface signals */
/* return the output operand */ return gr_o . data [0] ;
} void
SGF8_I (gf8 gt_, , uunnssiiggnneedd aarrss_ , int imm8_)-
{
/* operand variables */
Vl_t gt_i;
Vl_t ars_i;
Vl_t immβ ;
/* unused operand variables */
Vl_t ars_o ;
Vl_t gr_i = {{0}};
Vl_t art_i = { { 0 } } ;
/* operand kill variables */
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals
Vl_t SGF8_IU = {{θ}};
Vl_t SGF8_X = {{0}};
Vl_t SGF6_XU = {{0}};
/* state variables */
/* local wire variables */
Vl_t tmpl;
Vlj_t" tmpO;
Vl_t indexed;
/* initialize in/inout operands gt_i.data[0] = gt_; ars_i = * ( (Vl_t *) &ars_) ; immδ. data [0] = immδ_; tie_load_instruction = 0;
/* semantic statements */ indexed. data [0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & Oxl;
LSSize. data [0] = Oxl & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ;
LSIndexed. data [0] = indexed. data [0] & Oxl;
VAddrOf f set. data [0] = imm8.data [0] ;
VAddrlndex. data [0] = art_i.data [0] ; tmp0.data[0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & Oxl;
MemDataOutδ. data [0] = ( (tmpO .data [0] ) ? gr_i.data[0] : gt_i.data [0] ) & Oxff ;
VAddrIn_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; tmpl. data [0] = (SGFδ_IU.data [0] | SGFδ_XU.data [0] ) &. Oxl; ars_kill_o.data[0] = (0 & tmpl. data [0] ) & Oxl;
/* write-back inout operands */
/* update output interface signals */
MemDataOut8_set () ; }
#define SGF8_IU(gt, ars, imm8) \
SGF8_IU_func (gt, &(ars), immδ) void
SGF8_IU_func (gf8 gt_, unsigned *ars_, int imm8_)
{
/* operand variables */
Vl_t gt_i;
Vl_t ars_o,-
Vl_t ars_i;
Vl_t immδ;
/* unused operand variables */
Vl_t gr_i = {{0}};
Vl_t art_i = { { 0} } ;
/* operand kill variables */
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t SGF8_IU = {{1}};
Vl_t SGF8_X = {{0}};
Vl_t SGF8_XU = {{0}};
/* state variables */
/* local wire' ariables */
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ gt_i.data[0] = gt_; ars_i = * ( (Vl_t * ) ars_) ; imm8.data[0] = imm8_; tie_load_instruction = 0;
/* semantic statements */ indexed. data [0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ;
LSIndexed. data [0] = indexed. data [0] & 0x1;
VAddrOf f set. dat [0] = imm8.data [0] ;
VAddrlndex . data [ 0 ] = art_i . data [ 0 ] ; tmp0.data[0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
MemDataOutδ.data[0] = ( (tmpO .data [0] ) ? gr_i.data[0] : gt_i.data [0] ) & Oxff; VAddrIn_get () ; ars_o.data[0] = VAddrln . data [ 0 ] ; tmpl. data [0] = (SGF8_IU.data [0] | SGF8_XU.data [0] ) & Oxl; ars_kill_o.data[0] = (0 & tmpl. data [0] ) & Oxl;
/* write -back inout operands */ if ( !ars_kill_o.data[0] ) *ars_ = *( (unsigned *) &ars_o) ;
/* update output interface signals */
MemDataOutβ set();
} void
SGF8_X(gf8 gr_, unsigned ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_i;
Vl_t ars_i;
Vl_t art_i;
/* unused operand variables */
Vl_t gt_i = {{0}};
Vl_t ars_o;
Vl_t imm8 = { { 0 } } ;
/* operand kill variables */
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t SGF8_IU = { { 0 } } ;
Vl_t SGF8_X = {{l}};
Vl_t SGFδ_XU = {{0}};
/* state variables */
/* local wire variables */
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ gr_i.data[0] ="gr_; ars_i = * ( (Vl_t *) &ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 0;
/* semantic statements */ indexed. data [0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ;
LSIndexed. ata [0] = indexed. data [0] & 0x1;
VAddrOf f set. data [0] = immδ .data [0] ;
VAddrlndex. data [0] = art__i .data [0] ; tmp0.data[0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
MemDataOut8.data[0] = ( (tmpO .data [0] ) ? gr_i.data[0] : gt_i .data [0] ) & Oxff ;
VAddrIn_get () ; ars_o.data[0] = VAddr In. data [0] ; tmpl.data[0] = (SGF8_IU.data [0] | SGF8_XU.data [0] ) & 0x1; ars_kill_o.data [0] = (0 & tmpl. data [0] ) & 0x1;
/* write -back inout operands */
/* update output interface signals */
MemDataOutδ_set () ; }
#define SGF8_XU(gr, ars, art) \
SGF8_XU_func (gr, &(ars), art) void SGF8_XU_func (gf8 gr_, unsigned *ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_i;
Vl_t ars_o;
Vl_t ars_i;
Vl_t art__i;
/* unused operand variables */
Vl_t gt_i = {{0}};
Vl_t imm8 = {{θ}}; *
/* operand kill variables */
Vl_t ars__kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t SGF8_IU = { {0 } } ;
Vl_t SGF8_X = {{θ}};
Vl_t SGF8_XU = U1}}
/* state variables */
/* local wire variables */
Vl_t tmpl;
Vl_t tmpO;
Vl_t indexed;
/* initialize in/inout operands */ gr_i.data[0] = gr__; ars_i = * ( (Vl_t *) ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load__instruction = 0;
/* semantic statements */ indexed. data [0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ;
LSIndexed. data [0] = indexed. data [0] & 0x1;
VAddrOf f set. data [0] = imm8.data [0] ;
VAddrlndex. data [0] = art_i.dat [0] ; tmp0.data[0] = (SGF8_X.data [0] | SGF8_XU.data [0] ) & 0x1;
MemDataOut8.data[0] = ( (tmpO .dat [0] ) ? gr_i.data[0] : gt_i .data [0] ) & Oxff;
VAddrIn_get () ; ars_o.data [0] = VAddrln. data [0] ; tmpl.data[0] = (SGF8_IU.data [0] | SGF8_XU.data [0] ) & 0x1; ars_kill_o.data[0] = (0 & tmpl. data [0] ) & 0x1;
/* write -back inout operands */ if ( !.ars_kill_o.data[0] ) *ars_ = *( (unsigned *) &ars_o) ;
/* update output interface signals */
MemDataOut8_set () ; } unsigned RUR0 ()
{
/* operand variables */ Vl_t arr_o;
/* unused operand variables */ /* operand kill variables */ Vl_t arr_kill_o = {{θ}}; /*•" one-hot instruction signals */ Vl_t RUR0 = {{l}}; /* state variables */ Vl_t gfmod_ps; /* local wire variables */ . /* get input state values */ GetState (gfmod_ps, STATE_gfmod) ; /* initialize in/inout operands */ tie_load__instruction = 0; /* semantic statements */ arr_o.data [0] = gfmod ps .data [0] ; arr_kill_o.data[0] = (0 & RURO.data [0] ) & 0x1; /* write-back inout operands */ /* return the output operand */ return *. ((unsigned *) &arr_o) ;
void
WUR0 (unsigned art_)
{
/* operand variables */
Vl_t art_i;
/* unused operand variables */
/* operand kill variables */
/* one-hot instruction signals */
Vl_t WUR0 = { { 1 } } ;
/* state variables */
Vl_t gfmod_ns ,-
Vl_t gfmod_kill_ns;
/* local wire variables */
Vl_t tm O;
/* initialize in/inout operands */ art_i = * ( (Vl_t * ) &art_) ; tie_load_instruction = 0;
/* semantic statements */ tmp0.data[0] = ( (art_i .data [0] & Oxff)) & Oxff; gfmod_ns.data [0] = (tmpO .data [0] ) & Oxff; gfmod_kill_ns.data[0] = (0 & WITRO .data [0] ) &■ 0x1;
/* write -back inout operands */
/*• update out/inout states */ if ( !gfmod_kill_ns .data[0] ) SetState (STATE_gf od, gfmod_ns) ;
}
#define gf 8__loadi (_s, o) ({ \ gfS t; \ gfδ *s = _s; \ gfδ LGF8_I_return; \
LGF8_I_return = LGF8_I (*( (unsigned *)&(s)), * ( (int *)&(o))); \ t = * ( (gf 8 *)&LGF8_I_return) ; \ t; \ })
#define gf δ__storei (_t, _s, o) ({ \ gfδ t = _t; \ gf8 *s = _s; \
SGFδ_I(*{ (gfδ *)&(t)), *{ (unsigned *)&(s)), * ( (int *)&(o))); \ })
#def { \
(* ( (gf8 *)&(s)), * ( (gf8 *)&(0))); \
Figure imgf000105_0001
eturn) ; \ }) #ifdef TIE_DEBUG #undef gf8_loadi #undef gf8_storei frundef gf8_move #undef GFADDδ #undef GFADDβI #undef GFMULX8 #undef GFRWMODδ #undef LGFδ_I #undef SGFδ_I #undef LGF8_IU #undef SGF8_IU #undef LGF8_X #undef SGF8_X #undef LGF8_XU #undef SGF8_XU #undef RURO #undef WURO #endif #endif
cstub-gf-ref .c
#ifndef XTENSA
#ifdef TIE_DEBUG #define gf8_loadi TIE_gf8_loadi #define gf8_storei TIE_gf8_storei #define gfδ_move TIE_gf8_move #define GFADD8 TIE_GFADD8 #define GFADD8I TIE_GFADD8I #define GFMULXδ TIE_GFMULXδ #define GFRWMOD8 TIE_GFRWMOD8 #define LGFδ_I TIE_LGFδ_I #define SGFβ_I TIE_SGFδ_I #define LGFδ_IU TIE_LGFδ_IU #define SGF8_IU TIE_SGF8_IU #define LGF8_X TIE_LGF8_X #define SGF8_X TIE_SGF8_X #define LGFδ_XU TIE_LGFδ_XU #define SGFδ_XU TIE_SGFδ_XU #define RURO TIE_RUR0 #define WURO TIE_WUR0 #endif
#include <stdio.h>
#define LittleEndian 0
#define BigEndian 1
#define PIFReadDataBits 126
#define PIFWriteDataBits 128
#define IsaMemoryOrder LittleEndian
#include "BR.h"
#include "LS.h"
#define BPW 32
#define -"WINDEX (_n) ( (_n) / BPW)
#define BINDEX (_n) ( (_n) % BPW) typedef unsigned char Vb_t; typedef unsigned short Vs_t; typedef struct Vl_s {unsigned data [1] ; } Vl_t; typedef struct V2_s {unsigned data [2];} V2_t; typedef struct V4_s {unsigned data [4];} V4_t; typedef Vb_t gf8;
static int tie load instruction 0; void
TieMemRead (unsigned *data, unsigned addr)
{ unsigned char *mem; unsigned modulus, bytes, offset; int t, bO, bl, b2, b3; bytes = PIFReadDataBits / 8; modulus = bytes - 1; mem = (unsigned char *) (addr & -modulus) offset = (unsigned char *) addr - mem; if (IsaMemoryOrder == LittleEndian) { for(t = 0; t < bytes/sizeof (int) ; t++) bO = mem[ (offset++) & modulus] bl = mem[ (offset++) & modulus] b2 = mem[ (offset++) & modulus] b3 = mem[ (offset++) & modulus] data[t] = (b3 << 24) (b2 << 16) (bl 8) bO;
} } else { for(t = bytes/sizeof (int) - 1; t > b3 = mem [ (offset++) & modulus] b2 = mem[ (offset++) & modulus] bl = me [ (offset++) & modulus] bO = mem"[ (offset++) & modulus] data[t] = (b3 << 24) I (b2 << 16) (bl « 8) bO;
}
}
void
TieMemWrite (unsigned addr, unsigned bytes, unsigned *data)
{ unsigned char *mem; unsigned modulus, offset, w; int t ; if (PIFWriteDataBits < bytes * 8) { fprintf (stderr, "Error: not configured to write %d bytes\n", bytes) ; exit (1) ; } modulus = bytes - 1; mem-"= (unsigned char *) (addr & -modulus); if (IsaMemoryOrder == LittleEndian) { if (bytes == 1) { mem[0] = data[0] & Oxff; } else if (bytes == 2) { mem[0] = data[0] & Oxff; mem[l] = (data[0] >> 8) & Oxff; else { offset = 0; for (t = 0 ; t < bytes/sizeof (int) ; t++) { w = data [t] ; mem[offset++] = w & 255; mem [offset++] = (w >> 8) & 255; mem [offset++] = (w >> 16) & 255; mem [offset++] = (w >> 24) & 255;
}
} else { if (bytes == 1) { me [0] = data[0] & Oxff;
} else if (bytes == 2) { me [1] = data[0] & Oxff; mem [0] = (data[0] >> 8) & Oxff
} else { offset = 0; for(t = bytes/sizeof (int) - l; t > 0; t- w = data [t] ; mem [offset++] = (w >> 24) & 255; mem [offset++] = (w >> 16) & 255; mem [offset++] = (w >> 8) & 255; mem[offset++] = w S; 255;
} }
#define GetState (_s, n) _s = _n #define SetState (_n, _s) _n = _s
Vl_t STATE__gfmod;
Vl_t VAddr = { { 0} } ;
Vl_t VAddrBase = { { 0 } } ;
Vl_t VAddrOffset = {{θ}};
Vl_t VAddrlndex = { { 0 } } ;
Vl_t VAddrln = {{θ}};
Vl_t LSSize = {{0}};
Vl_t LSIndexed = {{θ}};
V4_t MemDataInl28' = {{0,0,0,0}};
V2_t MemDataIn64 = { { 0 , 0 } } ;
Vl_t MemDataIn32 = {{°}}
Vl_t Me Datalnlδ = {{θ}};
Vl_t MemDatalnS = {{θ}};
V4_t MemDataOutl28 = { { 0 , 0,0,0}} ;
V2_t MemDataOut64 = {{θ,θ}};
Vl_t MemDataOut32 = {{°}}
Vl_t MemDataOutl6 = {{θ}};
Vl_t MemDataOutδ = {{θ}};
Vl_t Exception = { { 0 } } ;
Vl_t ExcCause = {{θ}};
Vl_t CPEnable = {{°}b
void
VAddrIn_get (void)
{ if (LSIndexed. data [0] != 0) { VAddrln. data [0] = VAddrBase .data [0] + VAddrlndex.data [0] ; } else {
VAddrln.data [0] = VAddrBase.data [0] + VAddrOffset .data [0] ;
} }
void
MemDataInl28_get (void)
{ unsigned data [4] ; if ( ( !tie_load_instruction) || (LSSize. data [0] != 16)) { return; } if (PIFReadDataBits < 12δ) { fprintf (stderr, "Error: not configured to read 16 bytes\n"); exit (-1) ; }
VAddrIn_get ( ) ;
TieMemRead(&data[0] , VAddrln. data [0] ) ;
MemDataInl2β.data [0] = data[0]
MemDataInl2β.data [1] = data[l]
MemDataInl28.data[2] = data [2]
MemDataInl28.data [3] = data [3]
void
MemDataIn64_get (void)
{ unsigned data [4]-; if ( ( !tie_load_instruction) || (LSSize. data [0] != δ) ) { return; } if (PIFReadDataBits < 64) { fprintf (stderr, "Error: not configured to read δ bytes\n") ; exit (-1) ; }
VAddrIn_get ( ) ;
TieMemRead(&data[0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDataIn64.data [0] = data[0];
MemDataIn64. data [1] = data[l]; } else if (PIFReadDataBits == 64) {
MemDataIn64.data [0] = data[0];
MemDataIn64.data[l] = data[l]; } else {
MemDataIn64.data [0] = data[2];
MemDataIn64. data [1] = data [3] ; }
void MemDataIn32_get (void)
{ unsigned data [4] ; if ( ( !tie_load_instruction) |] (LSSize. data [0] != 4)) { return; } if (PIFReadDataBits < 32) { fprintf (stderr, "Error: not configured to read 4 bytes\n") ; exit (-1) ; }
VAddrIn_get () ;
TieMemRead(sdata[0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDataIn32.data [0] = data[0]; } else if (PIFReadDataBits == 32) {
MemDataIn32.data[0] = data[0]; } else if (PIFReadDataBits == 64) {
MemDataIn32.data [0] = data[l]; } else {
MemDataIn32.data[0] = data[3];
void
MemDataInl6_get (void)
{ unsigned dat [4] ; if ( ( !tie_load_instruction) || (LSSize.data [0] != 2)) { return;
} if (PIFReadDataBits < 16) { fprintf (stderr, "Error: not configured to read 2 bytes\n") ; exit (-1) ; }
VAddrIn_get () ;
TieMemRead (&data [0] , VAddrln . data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDataInl6 . data [0] = data [0] & Oxffff ; } else if (PIFReadDataBits == 32 ) {
MemDataInl6 . data [0] = data [0] >> 16 ; } else if (PIFReadDataBits == 64) {
MemDataInl6. data [0] = data [l] >> 16 ; } else {
MemDataInl6 . data [0] = data [3] >> 16 ;
}
void
MemDataInδ_get (void)
{ unsigned data [4.] ,- if ( ( !tie_load_instruction) || (LSSize. data [0] != 1) ) \ return; } if (PIFReadDataBits < 8) { fprintf (stderr, "Error: not configured to read 1 byte\n") ; exit (-1) ; }
VAddrIn_get () ;
TieMemRead(&data [0] , VAddrln. data [0] ) ; if (IsaMemoryOrder == LittleEndian) {
MemDatalnβ.data [0] = data[0] & Oxff; } else if (PIFReadDataBits == 32) {
MemDatalnδ . data [0] = data[0] >> 24; } else if (PIFReadDataBits == 64) {
MemDatalnδ .data[0] = data[l] >> 24; } else {
MemDatalnβ.data [0] = data [3] >> 24;
}
void
MemDataOutl28_set (void)
{ if (LSSize. data[0] != 16 { return;
}
VAddrIn_get ( ) ;
TieMemWrite (VAddrln. data [0] &■ -Oxf, 16, &MemDataOutl28. data [0] ) ;
}
void. MemDataOut64_set (void)
{ if (LSSize. data [0] ! = 8) { return; }
VAddrIn_get ( ) ;
TieMemWrite (VAddrln. data [0] & -0x7, 8, &MemDataOut64.data [0] ) ;
}
void
MemDataOut32_set (void)
{ if (LSSize. data [0] != 4) { return; }
VAddrIn_get() ;
TieMemWrite (VAddrln. data [0] & -0x3, 4,' &MemDataOut32.data [0] ) ;
}
void MemDataOutl6_set (void)
{ if (LSSize. data [0] != 2) { return;
}
VAddrIn_get ( ) ;
TieMemWrite (VAddrln. data [0] & -Oxl, 2, &MemDataOutl6.data [0] ) ;
}
void
MemDataOutδ_set (void)
{ if (LSSize. data [0] != 1) { return;
}
VAddrIn_get ( ) ;
TieMemWrite (VAddrln. data [0] , 1, &MemDataOutδ .data [0] ) ; }
void
Exception_set (void)
{
/* Exception handling is not supported in native mode */
}
void
CPEnable_get (void)
{
CPEnable. data [0] = Oxff; /* always enabled in native C mode */
}
#define RUR(n) ({ \ int v; \ switch (n) { \ case 0: \ v = RUR0O; break; \ default: \ fprintf (stderr, "Error: invalid rur number %d\n" , n) ; \ exit(-l) ; \
} \ v; \
})
#define WUR(v, n) \ switch (n) { \ case 0 : \
WURO (v) ; break; \ default: \ fprintf (stderr, "Error: invalid wur number %d\n" , n) ; \ -' '-'exit (-1) ; \ } gfδ
GFADDδ (gfδ gs_, gf8 gt_)
{ /* operand variables */
Vl_t gr_o;
Vl_t gs_i;
Vl_t gt_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFADD8 = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gs_i.data[0] = gs_; gt_i.data[0] = gt_; tie_load_instruction = 0;
/* semantic statements */ gr_o.data[0] = (gs_i .data [0] A gt_i.data [0] ) & Oxff; gr_kill_o.data[0] = (0 & GFADD8.data [0] ) & 0x1;
/* write-back inout operands */
/* return the output operand */ return gr o.data[0];
} gfS
GFADD8I (gf 8 gs_, int imm4_)
{
/* operand variables */
Vl_t gr_o;
Vl_t gs_i;
Vl_t imm4 ;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFADD8I = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gs_i.data[0] = gs_; imm4.data[0] = imm4_; tie_load_instruction = 0;
/* semantic statements */ gr_o.data[0] = (gs_i .data [0] A imm4.data [0] ) & Oxff; gr_kill_o.data[0] = (0 & GFADD8I .data [0] ) & 0x1;
/* write -back inout operands */
/* return the output operand */ return gr o.data[0];
gfδ
GFMULX8 (gf8 gs_)
{
/* operand variables */
Vl_t gr_o;
Vl_t gs_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
/* one-hot instruction signals
Vl_t GFMULX8 = {{l}}; /* state variables */
Vl_t gfmod_jps;
/* local wire variables */
Vl_t tmp5;
Vl_t tmp4;
Vl_t tmp3;
Vl_t tmp2;
Vl_t tmpl;
Vl_t tmpO;
/* get input state values */
GetState (gfmod ps, STATE_gfmod) ;
/* initialize in/inout operands */ gs_i.data[0] = gs_; tie_load_instruction = 0 ;
/* semantic statements */ tmp0.data[0] = ( ( (gs_i.data [0] << 24) >> 31)) & Oxl; tmpl. data [0] = ( (gs_i .data [0] & 0x7f ) ) & 0x7f; tmp2.data[0] = ( (tmpl. data [0] << l) |θ) & Oxff; tmp3.data[0] = (tmp2.data [0] Λ gfmod_ps.data [0] ) & Oxff; tmp4.data[0] = ( (gs_i .data [0] & 0x7f ) ) & 0x7f; tmp5.data[0] = ( (tmp4.data [0] << l) |θ) & Oxff; gr_o.data[0] = ( (tmpO .data [0] ) ? tmp3.data[0] : tmp5.data [0] ) & Oxff; gr_kill_o.data[0] = (0 & GFMULX8.data [0] ) & Oxl;
/* write-back inout operands */ /* return the output operand */ return gr_o.data [0] ;
}
#define GFRWMOD8 (gt) \
GFRWMOD8_func (& (gt) ) void
GFRWMODδ_func (gfδ *gt_)
{
/* operand variables */
Vl_t gt_o;
Vl_t gt_i;
/* unused operand variables */
/* operand kill variables */
Vl__t gt_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t GFRWMOD8 = {{l}};
/* state variables */
Vl_t gfmod_ps ;
Vl_t gfmod__ns ;
Vl_t gfmod_ki1l_ns ;
/* local wire variables */
Vl_t tl;
Vl_t t2;
/* get input state values */
GetState (gfmod_ps, STATE_gfmod) ;
/* initialize in/inout operands */ gt_i.data[0] = *gt_; tie_load_instruction = 0;
/* semantic statements */ tl.data[0] = gt_i.data[0] & Oxff; t2.data[0] = gf mod_ps . data [0] & Oxff; gf mod_ns . data [0] = tl.data[0] & Oxff; gt_o.data[0] = t2.data[0] & Oxff; gfmod_kill_ns.data[0] = (0 & GFRWMOD 8.data [0] ) & 0x1; gt_kill_o.data[0] = (0 & GFRWMOD 8.data [0] ) & Oxl;
/* write -back inout operands */ if ( !gt_kill_o.data[0] ) *gt_ = gt_o.data [0] ;
/* update out/inout states */ if ( Igfmod_kill_ns.data[0] ) SetState (STATE_gfmod, gfmod_ns) ;
} gfδ
LGF8_I (unsigned ars_, int immδ_)
{
/* operand variables */
Vl_t gt_o;
Vl_t ars_i;
Vl_t immδ;
/* unused operand variables */
/* operand kill variables */
Vl_t gt_kill_o = { { 0 } } ;
/* one-hot instruction signals */
Vl_t LGF8_I = {{1}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ ars_i = * ( (Vl_t *) &ars_) ; imm8.data[0] = imm8_; tie_load_instruction = 1;
/* semantic statements */
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i.data [0] ;
LSIndexed. data [0] = 0 & 0x1;
VAddrOff set. data [0] = immδ .data [0] ;
MemDataInδ_get () ; gt_o.data[0] = MemDatalnδ .data [0] &■ Oxff; gt_kill_o.data[0] = (0 & LGF8_I .data [0] ) & 0x1;
/* write -back inout operands */
/* update output interface signals */
/* return the output operand */ return gt_o.data [0] ; }
#define LGF8__IU(ars, imm8) \
LGF8_IU_func (&(ars) , imm8) gf8
LGF8_IU_func (unsigned *ars_, int imm8_)
{
/* operand variables */
Vl_t gt_ ;
Vl_t ars_o;
Vl_t ars_i;
Vl_t imm8;
/* unused operand variables */
/* operand kill variables */
Vl_t gt_kill_o = {{θ}};
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t LGF8_IU = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ ars i = * ( (VI t *) ars ) ; imm8.data[0] = immδ_; tie_load_instruction = 1;
/* semantic statements */
LSSize. data [0] = Oxl & Oxlf;
VAddrBase. data [0] = ars_i.data [0] ;
LSIndexed. data [0] = 0 & Oxl;
VAddrOf f set. data [0] = imm8.data [0] ;
MemDataIn8_get ( ) ; gt_o.data[0] •= MemDatalnδ .data [0] & Oxff;
VAddrIn_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; gt_kill_o.data[0] = (0 & LGF8_IU.data [0] ) & Oxl; ars_kill_o.data[0] = (0 & LGF8_IU.data [0] ) & Oxl;
/* write -back inout operands */ if ( !ars_kill_o.data[0] ) *ars_ = *( (unsigned *) &ars_o) ,-
/* update output interface signals */
/* return the output operand */ return gt_o.data [0] ;
} gfβ
LGF8_X (unsigned ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_o,- Vl_t ars_i; Vl_t art_i;
/* unused operand variables */ /* operand kill variables */ Vl_t gr_kill_o = {{θ}}; /* one-hot instruction signals */ Vl_t LGF8_X = {{l}}; /* state variables */ /* local wire variables */ /* initialize in/inout operands */ ars_i = * ( (Vl_t *) &ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 1; /* semantic statements */ LSSize.data [0] = 0x1 & Oxlf; VAddrBase. data [0] = ars_i .data [0] ; LSIndexed. data [0] = 0x1 & 0x1; VAddrlndex. data [0] = art_i .data [0] ; MemDataIn8_get () ; gr_o.data[0] = MemDatalnβ .data [0] & Oxff; VAddrIn_get () ; ars_o.data [0] = VAddrIn. data [0] ; gr_kill_o.data[0] = (0 & LGF8_X.data [0] ) & 0x1; /* write-back inout operands */ /* update output interface signals */ /* return the output operand */ return gr_o.data [0] ; }
#define -'LGF8_XU(ars, art) \
LGF8_XU_func (& (ars) , art) gfS
LGF8_XU_func (unsigned *ars_, unsigned art_)
{ /* operand variables */
Vl_t gr_o;
Vl_t ars_o;
Vl_t ars_i ;
Vl_t art_i;
/* unused operand variables */
/* operand kill variables */
Vl_t gr_kill_o = {{θ}};
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t LGF8_XU = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ ars_i = * ( (Vl_t *) ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 1;
/* semantic statements */
LSSize. data [0] = Oxl & Oxlf;
VAddrBase. data [0] = ars_i -data [0] ;
LSIndexed. data [0] = 0x1 & 0x1;
VAddrlndex. data [0] = art_i . data [0] ;
MemDataIn8_get () ; gr_o . data [0] = MemDatalnδ . data [0] & Oxff ;
VAddr In_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; gr_kill_o.data[0] = (0 & LGF8_XU.data [0] ) & 0x1; ars_kill_o.data[0] = (0 & LGF8_XU.data [0] ) & 0x1;
/* write-back inout operands */ if ( !ars_kill_o.data [0] ) *ars_ = *( (unsigned *) &ars_o) ;
/* update output interface signals */
/* return the output operand */ return gr_o.data [0] ;
} void
SGF8_I(gf8 gt_, unsigned ars_, int immδ_)
{
/* operand variables */
Vl_t gt_i;
Vl_t ars_i ;
Vl_t imm8;
/* unused operand variables */
/* operand kill variables */
/* one-hot instruction signals */
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gt_i.data[0] = gt_; ars_i = * ( (Vl_t *) &ars_) ; immδ. data [0] = imm8_; tie_load_instruction = 0;
/* semantic statements */
LSSize. data [0] = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i.data [0] ;
LSIndexed. data [0] = 0 & 0x1;
VAddrOffset. data [0] = imm8.data [0] ;
MemDataOut8.data [0] = gt_i.data[0] & Oxff;
/* write-back inout operands */
/* update output interface signals */ MemDataOut8_set () ; }
#define SGF8_IU(gt, ars, imm8) \
SGF8_IU_func (gt, &(ars), imm8) void
SGF8__IU_func (gf8 gt_, unsigned *ars_, int imm8_)
{
/* operand variables */
Vl_t gt_i;
Vl_t ars_o;
Vl_t ars_i ;
Vl_t immS;
/* unused operand variables */
/* operand kill variables */
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t SGF8_IU = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gt_i.data[0] = gt_; ars_i = *((Vl_t *) ars_) ; imm8 . data [0] = imm8_; tie_load_instruction = 0 ;
/* semantic statements */
LSSize . data [0] = Oxl & Oxlf ;
VAddrBase . data [0] = ars_i . data [0] ;
LSIndexed . data [0] = 0 & 0x1 ;
VAddrOffset . data [0] = immδ . data [0] ;
MemDataOutβ . data [0] = gt_i . data [0] & Oxff ;
VAddrIn_get ( ) ; ars_o.data [0] = VAddrln. data [0] ; ars_kill_o.data[0] = (0 & SGF8_IU.data [0] ) & 0x1;
/* write-back inout operands */ if ( !ars_kill_o.dat [0] ) *ars_ = *( (unsigned *) &ars_o) ;
/* update output interface signals */
MemDataOut8_set () ; } void
SGFδ_X(gfβ gr_, unsigned ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_i;
Vl_t ars_i ;
Vl_t art_i;
/* unused operand variables */
/* operand kill variables */
/* one-hot instruction signals */
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gr_^i .data [0] = gr_; ars_i = * ( (Vl_t *) &ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 0;
/* semantic statements */
LSSize. data [0] = 0x1 & Oxlf; VAddrBase. data [0] -= ars_i .data [0] ; LSIndexed. data [0] = Oxl & Oxl; VAddrlnde . data [0] = art_i . data [0] ; MemDataOutβ .data [0] = gr_i.data[0] & Oxff; /* write-back inout operands */ /* update output interface signals */ MemDataOutδ set();
}
#define SGF8_XU(gr, ars, art) \
SGF8_XU_func (gr, &(ars), art) void
SGF8_XU_func (gf8 gr_, unsigned *ars_, unsigned art_)
{
/* operand variables */
Vl_t gr_i;
Vl_t ars_o;
Vl_t ars_i ;
Vl_t art_i;
/* unused operand variables */
/* operand kill variables */
Vl_t ars_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t SGF8_XU = {{l}};
/* state variables */
/* local wire variables */
/* initialize in/inout operands */ gr_i . data [0] = gr_; ars_i = * ( (Vl_t *) ars_) ; art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 0;
/* semantic statements */
LSSize. data [0] ' = 0x1 & Oxlf;
VAddrBase. data [0] = ars_i .data [0] ;
LSIndexed. data [0] = 0x1 & 0x1;
VAddrlndex. data [0] = art_i .data [0] ;
MemDataOutβ .data [0] = gr_i.data[0] & Oxff;
VAddrIn_get () ; ars_o.data [0] = VAddrln. data [0] ; ars_kill_o.data[0] = (0 & SGF8_XU.data [0] ) & 0x1;
/* write-back inout operands */ if ( !ars_kill_o.data [0] ) *ars_ = *( (unsigned *) &ars_o) ;
/* update output interface signals */
MemDataOut8_set () ; } unsigned RURO ()
{
/* operand variables */
Vl_t arr_o;
/* unused operand variables */
/* operand kill variables */
VI £ arr_kill_o = {{θ}};
/* one-hot instruction signals */
Vl_t RURO = { { 1 } } ;
/* state variables */
Vl_t gfmod_ps;
/* local wire variables */ /* get input state values */
GetState (gfmod_ps, STATE_gfmod) ;
/* initialize in/inout operands */ tie_load_instruction = 0;
/* semantic statements */ arr_o.data [0] = gfmod_jps .data [0] ; arr_kill_o.data[0] = (0 & RURO . data [0] ) & Oxl;
/* write-back inout operands */
/* return the output operand */ return *( (unsigned *) &arr_o) ;
} void
WURO (unsigned art_)
{
/* operand variables */
Vl_t art_i;
/* unused operand variables */ /* operand kill variables */ /* one-hot instruction signals */ Vl_t WURO = {{l}}; /* state variables */ Vl_t gfmod_ns; Vl_t gfmod_kill_ns; /* local wire variables */ Vl_t tmpO;
/* initialize in/inout operands */ art_i = * ( (Vl_t *) &art_) ; tie_load_instruction = 0; /* semantic statements */ tmp0.data[0] = ( (art_i .data [0] & Oxff)) & Oxff; gfmod_ns . data [0] = (tmpO .data [0] ) & Oxff; gfmod_kill_ns .data [0] = (0 & WURO .data [0] ) & 0x1; /* write-back inout operands */ /* update out/inout states */ if ( !gfmod_kill_ns .data [0] ) SetState (STATE_gfmod, gfmod ns) ; }
#define gf8_loadi (_s, o) ({ \ gfS t; \ gf8 *s = _s; \ gfδ LGF8_I_re'turn; \
LGF8_I_return = LGFδ_I (*( (unsigned *)&(s)), * ( (int *)&(o))); \ t = *((gfδ *)&LGFδ_I_return) ; \ t; \ }) ttdefine gf 8_storei (_t , _s , o) ( { \ gf δ t = _t ; \ gfβ *s = _s ; \
SGF6_I (* ( (gf8 *)&(t)), *( (unsigned *)&(s)), * ( (int *)&(o))); \ })
#define, gf8_move (_r, _s) ({ \ gf8-'r = _r; \ gf8 s = _s; \ gfS GFADD8_return; \
GFADD8_return = GFADD8 (* ( (gf8 *)&(s)), * ( (gfδ *)&(0))); \ r = *((gfδ *)&GFADDδ_return) ; \ }) #ifdef TIE_DEBUG #undef gfδ_loadi #undef gfδ_storei frundef gfδ_move #undef GFADD8 #undef GFADD61 #undef GFMULX8 #undef GFRWMOD8 #undef LGF8_I #undef SGF8_I #undef LGF8_IU #undef SGF8_IU #undef LGF8_X #undef SGFδ_X #undef LGF8_XU #undef SGF8_XU #undef RURO #undef WURO #endif #endif
BR.h
* Copyright 1999-2000 Tensilica Inc.
* These coded instructions, statements, and computer programs are
* Confidential Proprietary Information of Tensilica Inc. and may not be
* disclosed to third parties or copied in any form, in whole or in part,
* without the prior written consent of Tensilica Inc. */
#ifndef BR_HEADER "" #define BR_HEADER
#ifndef XTENSA typedef unsigned char xtbool; typedef unsigned char xtbool2, typedef unsigned char xtbool4 ; typedef unsigned char xtboolδ; typedef unsigned short xtboollδ; xtbool
XT_ANDB (xtbool bs, xtbool bt)
{ > return Oxl & (bs & bt) ;
} xtbool
XT_ANDBC (xtbool bs, xtbool bt)
{ return 0x1 & (bs & !bt) ;
} xtbool
XT_ORB (xtbool bs, xtbool bt)
{ return 0x1 & (bs I bt) ; } xtbool
XT ORBC (xtbool bs, xtbool bt) return Oxl & (bs I !bt) ;
xtbool
XT_XORB (xtbool bs, xtbool bt) return Oxl & (bs Λ bt) ;
xtbool
XT_ANY4 (xtbool4 bs4) return (bs4 & Oxf) != 0;
xtbool
XT_ALL4 (xtbool4 bs4) return (bs4 & Oxf) == Oxf;
xtbool
XT_ANY8 (xtboolβ bs8) return (bs8 & Oxf) != 0;
xtbool
XT_ALL8 (xtbool8 bs8) return (bsδ & Oxf) == Oxf;
#endif /* XTENSA */
#endif /* BR HEADER */
gf .v module xmTIE_gf_Regfile (rdO_data_Cl, rd0_addr_C0, rd0_widthδ_C0 , rd0_usel_C0, rdl_data_Cl, rdl_addr_C0, rdl_widthδ_C0 , rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_widthδ_C0, rd2_usel_C0, wd_addr_C0, wd_width8_C0 , wd_defl_C0, wd_def2_C0, wd_dataδ_Cl, wd_dataδ_C2, wd_wen_Cl, wd_wen_C2, Kill_E, KillPipe_W, Stall_R, elk) ; output [7:0] rd0_data_Cl; input [3 : 0] rd0_addr_C0 ; input rd0_width8_C0 ; input rd0_usel_C0 ; output [7 : 0] rdl_data_Cl; input [3 : 0] rdl_addr_C0 ; input rdl_width8_C0 ; input rdl_usel_C0 ; •output [7 : 0] rd2 data_Cl ; input [3:0] rd2_addr_C0; input rd2_width8_C0; input rd2_usel_C0; input [3:0] wd_addr_C0; input wd_width8_C0; input wd__defl_C0; input wd_def2_C0; input [7:0] wd_data8_Cl; input [7*:0] wd_data8_C2 ; input wd_wen_Cl; input wd_wen_C2 ; input Kill_E; input KillPipe_W; output Stall_R; input elk;
/*********************************************************************** READ PORT rdO
***********************************************************************/ // compute the address mask wire rdO_addr_mask_CO = 1'dO;
// masked address pipeline wire rdO_maddr_CO = 1 ' do ;
// bank-qualified use wire rd0_usel_bank0_C0 = (rd0_usel_C0 & (rd0_maddr_C0 == (1'dO & rdO_addr_mask_CO) ) ) ;
// alignment mux for use 1 wire [7:0] rd0_data_bank0_Cl ; assign rd0_data_Cl [7 : 0] = rd0_data_bank0_Cl ;
*********************************************************************** READ PORT rdl
***********************************************************************/ // compute the address mask wire rdl_addr_mask_C0 = 1'dO;
// masked address pipeline wire rdl_maddr_C0 = 1'dO;
// bank-qualified use wire rdl^usel_bank0_C0 = (rdl_usel_C0 & (rdl_maddr_C0 == (1'dO & rdl_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7:0] rdl_data_bank0_Cl ; assign rdl_data_Cl [7 : 0] = rdl_data_bankO_Cl;
/*********************************************************************** READ PORT rd2 ***********************************************************************/
// compute the address mask wire rd2_addr_mask_C0 = 1 ' dO ;
// masked address pipeline wire rd2_maddr_C0 = 1'dO;
// bank-qualified use wire rd2_usel_bank0_C0 = (rd2_usel_C0 & (rd2_maddr_C0 == (1'dO & rd2_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7:0] rd2_data_bankO_Cl ; assign rd2_data_Cl [7 :0] = rd2_data_bankO_Cl ;
/*********************************************************************** WRITE PORT wd
*********************************************************************** // compute the address mask wire wd_addr_mask_C0 = 1 ' do ;
// bank-qualified write def for port wd wire wd_defl_bank0_C0 = (wd_defl_C0 & ( (wd_addr_C0 & wd_addr_mask_C0) == (1'dO & wd_addr_mask_C0) ) ) ; wire wd_def2_bank0_C0 = (wd_def2_C0 & ( (wd_addr_C0 & wd_addr_mask_C0) == (1'dO &. wd_addr_mask_C0) ) ) ;
// write mux for def 1 wire [7:0] wd_wdata_Cl ; assign wd_wdata_Cl = {l{wd_data8_Cl [7 : 0] } } ;
// write mux for def 2 wire [7:0] wd_wdata_C2 ; assign wd_wdata_C2 = {l{wd_dataδ_C2 [7 : 0] } } ;
wire Stall_R0;
/*********************************************************************** PIPELINED BANK
*********************************************************************** xmTIE_gf_Regfile_bank TIE_gf_Regfile_bank0 (rd0_data_bank0__Cl, rd0_addr_C0 [3 :0] , rd0_usel_bank0_C0', rdl_data_bank0_Cl, rdl_addr_C0[3:0] , rdl_usel_bank0_C0, rd2_data_bankO_Cl , rd2_addr_C0 [3 : 0] , rd2_usel_bankO_C0 , wd_addr_C0 [3 :0] , wd_defl_bank0_C0, wd_def2_bank0_C0, wd_wdata_Cl [7:0], wd_wdata_C2 [7:0] , wd_wen_Cl, wd_wen_C2, Kill_E, KillPipe_W, Stall_R0, elk) ; assign Stall_R = Stall_R0 | I'bO; endmodule module xmTIE_gf_Regfile_bank(rdO_data_Cl, rd0_addr_C0, rd0_usel_C0, rdl_data_Cl, rdl_addr_C0, rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_usel_C0, wd_addr_C0, wd_defl_C0, wd_def2_C0, wd_data_Cl, wd_data_C2, wd_wen_Cl, wd_wen_C2, Kill_E, KillPipe_W, Stall_R, elk) ; output [7:0] rdO_data_Cl; input [3:0] rd0_addr_C0; input rd0_usel_C0 ; output [7:0] rdl_data_Cl; input [3:0] rdl_addr_C0; input rdl_usel_C0; output [7:0] rd2_data_Cl,- input [3:0] rd2_addr_C0; input rd2_usel_C0; input [3:0] wd__addr_C0; input wd_defl_C0; input wd_def2_C0; input [7:0] wd_data_Cl; input [7:0] wd__data_C2; input wd_wen_Cl; input wd_wen_C2 ; input Kill_E; input KillPipe__W; output Stall_R; input elk; wire rd0_use2_C0 = 1'dO; wire rdl_use2_C0 = 1'dO; wire rd2_use2_C0 = 1 ' do ; wire kill_C0 = KillPipe_W; wire kill_Cl = KillPipe_W | Kill_E; wire kill_C2 =" KillPipe_W; wire kill_C3 = KillPipe_W;
// write definition pipeline wire wd_ns_defl_C0 = wd_defl_C0 & l'bl & -kill_C0; wire wd_defl_Cl; xtdelayl #(1) iwd_def1_C1 (wd_def1_C1, wd_ns_def1_C0, elk) ; wire wd_ns_def2_C0 = wd_def2_C0 & l'bl & -kill_C0; wire wd_def2_Cl; xtdelayl #(1) iwd_def2_C1 (wd_def2_C1, wd_ns_def2_C0, elk) ; wire wd_ns_def2_Cl = wd_def2_Cl & wd_wen_Cl & ~kill_Cl; wire wd_def2_C2; xtdelayl #(1) iwd_def2_C2 (wd_def2_C2, wd_ns_def2_C1, elk) ;
// write enable pipeline wire wd_we_C2 ; wire .wd_we_C3 ; wire wd_ns_we_Cl = (1'dO | (wd_defl_Cl & wd_wen_Cl) ) & -kill_Cl; wire wd_ns_we_C2 = (wd_we_C2 | (wd_def2_C2 & wd_wen_C2) ) & -kill_C2; wire wd_ns_we_C3 = (wd_we_C3 j (1'dO & 1'dO)) & -kill_C3; xtdelayl #(1) iwd_we_C2 (wd_we_C2 , wd_ns_we_Cl, elk) ; xtdelayl #(1) iwd_we_C3 (wd_we_C3 , wd_ns_we_C2, elk) ;
// write address pipeline wire [3:0] wd_addr_Cl; wire [3:0] wd_addr_C2; wire [3:0] wd addr C3 ; xtdelayl #(4) iwd_addr_Cl (wd_addr_Cl, wd_addr_C0, elk) xtdelayl #(4) iwd_addr_C2 (wd_addr_C2 , wd_addr_Cl, elk) xtdelayl #(4) iwd_addr_C3 (wd_addr_C3 , wd_addr_C2, elk)
// write data pipeline wire [7:0] wd_result_C2 ; wire [7:0] wd_result_C3 ; wire [7:0] wd_mux_Cl = wd_data_Cl; wire [7:0] wd_mux_C2 = wd_def2_C2 ? wd_data_C2 : wd_result__C2 ,- xtdelayl #(δ) iwd_result_C2 (wd_result_C2 , wd_mux_Cl, elk); xtdelayl #(8) iwd_result_C3 (wd_result_C3 , wd_mux_C2, elk); wire [7:0] rdO_data_CO wire [7:0] rdl_data_CO wire [7:0] rd2_data_C0
// Read bypass controls for port rdO wire bypass_data_rd0_C0_wd_Cl = (wd_addr_Cl == rd0_addr_C0) & wd_defl_Cl & wd_wen_Cl & ~kill_Cl; wire bypass_data_rd0_C0_wd_C2 = (wd_addr_C2 == rd0_addr_C0) & wd_def2_C2 & wd_wen_C2 & ~kill_C2 ; wire bypass_result_rd0_C0_wd_C2 = (wd_addr_C2 == rd0_addr_C0) & wd_we_C2 & -kill_C2; wire bypass_result_rd0_C0_wd_C3 = (wd_addr_C3 == rd0_addr_C0) & wd_we_C3 & -kill_C3;
// Read bypass for port rdO use 1 wire [7:0] rd0_mux_result_C0 ; xtmux3p #(8) mO (rd0_mux_result_C0, wd_result_C2 , wd_result_C3 , rd0_data_C0, bypass_result_rd0_C0_wd_C2, bypass_result_rd0_C0_wd_C3) ; wire [7:0] rd0_mux_C0 ; wire [1:0] rd0_mux_C0_sel = bypass_data_rdO_CO__wd_Cl ? 2'dl : bypass_data_rdO_CO_wd_C2 ? 2 ' d2 : bypass_result_rd0_C0_wd_C2 ? 2'dO : bypass_result_rd0_C0_wd_C3 ? 2'dO :
2 ' dO ; xtmux3e #(8) ml (rd0_mux_C0 , rd0_mux_result_C0 , wd_data_Cl, wd_data_C2, rd0_mux_C0_sel) ; xtdelayl #(8) ird0_data_Cl (rdO_data_Cl, rd0_mux_C0, elk) ;
// Read bypass controls for port rdl wire bypass_data_rdl_C0_wd_Cl = (wd_addr_Cl == rdl_addr_C0) & wd_defl_Cl & wd_wen_Cl & -kill_Cl; wire bypass_data_rdl_C0_wd_C2 = (wd_addr_C2 == rdl_addr_C0) &. wd_def2_C2 & wd_wen_C2 & -kill_C2; wire bypass_result_rdl_C0_wd_C2 = (wd_addr_C2 == rdl_addr_C0) & wd_we_C2 & -kill_C2; wire bypass_result_rdl_C0_wd_C3 = (wd_addr_C3 == rdl_addr_C0) & wd_we_C3 & -kill_C3;
// Read bypass for port rdl use 1 wire [7:0] rdl_mux_result_C0; xtmux3p #(8) m2 (rdl_mux_result_C0, wd_result_C2 , wd_result_C3 , rdl_data_C0, bypass_result_rdl_C0_wd_C2, bypass_result_rdl_C0_wd_C3) ; wire [7:0] rdl_mux_C0; wire [1:0] rdl_mux_C0_sel = bypass_data_rdl_C0_wd_Cl ? 2'dl : bypass_data_rdl_C0_wd_C2 ? 2'd2 : bypass_result_rdl_C0 wd_C2 ? 2 'do : bypass_result_rdl_C0_wd_C3 ? 2'dO :
2 ' do ; xtmux3e #(8) m3 (rdl_mux_C0, rdl_mux_result_CO , wd_data_Cl, wd_data_C2, rdl_mux_C0_sel) ; xtdelayl #(8) irdl_data_Cl (rdl_dataj_Cl , rdl_mux_C0, elk);
// Read bypass controls for port rd2 wire bypass_data_rd2__C0_wd_Cl = (wd_addr_Cl == rd2_addr_C0) & wd_defl_Cl & wd_wen_Cl & ~kill_Cl; wire bypass_data_rd2_C0_wd_C2 = (wd_addr_C2 == rd2_addr_C0) & wd_def2_C2 & wd_wen_C2 & -kill_C2; wire bypass_result_rd2_C0_wd_C2 = (wd_addr_C2 == rd2_addr_C0) & wd_we_C2 & -kill_C2; wire bypass_result_rd2_C0_wd_C3 = (wd__addr_C3 == rd2_addr_C0) & wd_we_C3 & -kill_C3;
// Read bypass for port rd2 use 1 wire [7:0] rd2_mux_result_C0; xtmux3p #(8) m4 (rd2_mux_result_C0, wd_result_C2 , wd_result_C3 , rd2_data_C0, bypass_result_rd2_C0_wd_C2, bypass_result_rd2_C0_wd_C3) ,- wire [7:0] rd2_mux_C0 ; wire [1:0] rd2_mux_C0_sel = bypass_data_rd2_C0_wd_Cl ? 2'dl : bypass_data_rd2_C0_wd_C2 ? 2'd2 : bypass_result_rd2_C0_wd_C2 ? 2'dO : bypass_result_rd2_C0_wd_C3 ? 2'dO :
2 ' do ; xtmux3e #(8) m5 (rd2_mux_C0, rd2__mux_result_C0, wd_data_Cl, wd_data_C2, rd2_mux__C0_sel) ; xtdelayl #(8) ird2_data_Cl (rd2_data_Cl, rd2_mux_C0, elk) ; assign Stall_R =
( (wd_addr_Cl == rd0_addr_C0) & (
(rd0_uδel_C0 & (wd_ns_def2_C1) ) ) ) | ( (wd_addr_Cl == rdl_addr_C0) & (
(rdl_usel_C0 & (wd_ns_def2_Cl) ) ) ) | ( (wd_addr_Cl == rd2_addr_C0) & (
(rd2_usel_C0 & (wd_ns_def2_C1) ) ) ) | I'bO;
// register file core xtregfile_3RlW_16 #(8) icore (rd0_data_C0, rd0_addr_C0, rdl_data_C0, rdl_addr_C0, rd2_data_C0, rd2_addr_C0, wd_result_C3 , wd_addr_C3, wd_ns_we_C3 , elk) ; endmodule
module xmTIE_gfmod_State (ps_data_Cl, ps_width8__C0 , ps_usel_C0, ns_width8_C0, ns_defl_C0, ns_data8_Cl, ns__wen_Cl, Kill_E, KillPipe_W, Stall_R, elk) ; output [7:0] ps_data_Cl; input ps_width8_C0 ; input ps_use1_C0 ; input ns_width8_C0; input ns_defl_C0; input [7:0] ns_data8_Cl; input ns_wen_Cl; input Kill_E; input KillPipe_W; output Stall_R; input elk; wire ps_addr_C0 = 1 ' do ; wire ns_addr_C0 = 1'dO; wire ns wen C2 = 1 ' dl ;
READ PORT ps
********************************************************************
// compute the address mask • wire ps_addr_mask_C0 = 1'dO;-
// masked address pipeline wire ps_maddr_C0 = 1'dO;
// bank-qualified use wire ps_usel_bank0_CO = (ps_usel_C0 & (ps_maddr_C0 == (1'dO & ps_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7 : 0] ps_data_bankO_Cl ; assign ps_data_Cl [7 : 0] = ps_data_bankO_Cl ;
/*********************************************************************** WRITE PORT ns
***********************************************************************/
// compute the address mask wire ns_addr_mask__C0 = 1 ' do ,-
// bank-qualified write def for port ns wire ns_defl_bank0_C0 = (ns_def1_C0 &■ ( (ns_addr_C0 & ns_addr_mask_C0) == (1'dO & ns_addr_mask_C0) ) ) ;
// write mux for def 1 wire [7 : 0] ns_wdata_Cl ; assign ns_wdata_Cl = { l {ns_dataβ_Cl [7 : 0] } } ;
wire Stall_R0 ;
/*********************************************************************** PIPELINED BANK
***********************************************************************/ xmTIE_gfmod_State_bank TIE_gfmod_State_bankO (ps_data_bankO_Cl, ps_usel_bank0_C0, ns_defl_bank0_C0, ns_wdata_Cl [7 : 0] , ns_wen_Cl, ns_wen__C2, Kill_E, KillPipe_W, Stall_R0, elk) ; assign Stall_R = Stall_R0 | I'bO; endmodule
module xmTIE_gfmod_State_bank(ps_data_Cl, ps_usel_C0, ns_defl_C0, ns -data Cl, ns_wen_Cl, ns_wen_C2, Kill_Ξ, KillPipe_W, Stall_R, elk); output [7:0] ps_data_Cl; input ps_usel_C0 ; input ns_defl_C0; input [7:0] ns_data_ Cl; input ns_wen_Cl; input ns_wen_C2 ,- input Kill_E; input KillPipe_W; output Stall_R; input elk; wire ps_addr_C0 = 1 ' do wire ps_use2_C0 = 1'dO wire ns_addr_C0 = 1 ' do wire ns_def2_C0 = 1'dO wire [7:0] ns data C2 = 0; wire kill_C0 KillPipe_W; wire kill_Cl KillPipe_W Kill E; wire kill_C2 KillPipe_W; wire kill C3 KillPipe_W;
// write definition pipeline wire ns_ns_defl_C0 = ns_defl_C0 & l'bl & ~kill_C0; wire ns_defl_Cl; xtdelayl #(1) ins_def1_C1 (ns_def1_C1, ns_ns_def1_C0, elk) ; wire ns_ns_def2_C0 = 1'dO; wire ns_def2_Cl = 1'dO; wire ns_ns_def2_C1 = 1'dO; wire ns_def2_C2 = 1'dO;
// write enable pipeline wire ns_we_C2 ; wire ns_we_C3 ; wire ns_ns_we_Cl = (1'dO | (ns_defl_Cl & ns_wen_Cl) ) & -kill_Cl; wire ns_ns_we_C2 = (ns_we_C2 | (ns_def2_C2 & ns_wen_C2)) & ~kill_C2; wire ns_ns_we_C3 = (ns_we_C3 j (1'dO & 1'dO)) & -kill_C3; xtdelayl #(1) ins_we_C2 (ns_we_C2, ns_ns_we_Cl, elk) ; xtdelayl #(1) ins_we_C3 (ns_we_C3 , ns_ns_we_C2, elk) ;
// write address pipeline wire ns_addr_Cl wire ns_addr_C2 wire ns_addr_C3 assign ns_addr_Cl = 1'dO assign ns_addr_C2 = 1'dO assign ns_addr__C3 = 1'dO
// write data pipeline wire [7:0] ns_result_C2 ; wire [7:0] ns_result_C3; wire [7:0] ns_mux_Cl = ns_data_Cl; wire [7:0] ns_mux_C2 = ns_def2_C2 ? ns_data_C2 : ns_result_C2 ; xtdelayl #(δ) ins_result_C2 (ns_result_C2 , ns_mux_Cl, elk) ; xtdelayl #(δ) ins_result_C3 (ns_result_C3 , ns_mux__C2, elk) ; wire [7:0] ps_data_C0;
// Read bypass controls for port ps wire bypass_data_ps_CO_ns_Cl = (ns_addr__Cl == ps_addr_C0) & ns_defl_Cl & ns_wen_Cl & -kill_Cl; wire bypass_result_ps_C0_ns_C2 = (ns_addr_C2 == ps_addr_C0) & ns_we_C2 & ~kill_C2; wire bypass_result_ps_C0_ns_C3 = (ns_addr_C3 == ps_addr_C0) & ns_we_C3 & -kill_C3;
// Read bypass for port ps use 1 wire [7:0] ps_mux_result_CO ; xtmux3p #(8) m6 (ps_mux_result_CO , ns_result_C2 , ns_result_C3 , ps_data_C0, bypass_result_ps_C0_ns_C2, bypass_result_ps_C0_ns_C3) ; wire [7:0] ps_mux_C0; wire [0:0] ps_mux_C0_sel = bypass_data_ps_CO_ns_Cl ? l'dl : bypass_result_ps_C0_ns_C2 ? 1'dO : bypass_resultjps_C0_ns_C3 ? 1'dO :
1 ' do ; xtmux2e #(8) m7 (ps_mux_C0 , ps_mux_result_C0 , ns_data_Cl, ps_mux_C0_sel) ; xtdelayl #(8) ips_data_Cl (ps_data_Cl, ps_mux_C0, elk) ; assign Stall_R =
( (ns_addr_Cl == ps_addr_C0) & (
(ps_usel_C0 & (ns_ns_def2_Cl) ) ) ) | I'bO;
// register file core xtregfile_lRlW_l #(8) icore (ps_data_C0, ns_result_C3 , ns_ns_we_C3 , elk) ; endmodule
module xmTIE_decoder (
GFADD8 ,
GFADD8I,
GFMULX8 ,
GFRWMOD8 ,
LGF8_I,
SGF8_I,
LGF8_IU,
SGFδ_IU,
LGF8_X,
SGF8_X,
LGF8_XU,
SGFβ_XU,
RURO,
WURO, imm4, immδ, art_use , art_def , ars_use, ars_def, arr_use, arr_def ,-" br_use, br_def , bs_use, bs_def, bt use, O IΛ
O o t 5
H U α.
Figure imgf000131_0001
o .. ω
M-l tn O tt) 3
Tl 1
1 •*
4-> tn
Λ A
Figure imgf000131_0002
output [7:0] immδ; output art_use; output art_def; output ars_use; output ars_def; output arr_use; output arr_def; output br_use; output br_def; output bs_use ; output bs_def; output bt_use; output bt_def; output bs4_use; output bs4_def ; output bsδ_use; output bsδ_def; output gr_use; output gr_def ; output gs_use; output gs_def ; output gt_use; output gt_def ; output gfmod_usel; output gfmod_defl; output AR_rdO_usel; output AR_rdO_width32 ; output AR_rdl_usel; output AR_rdl_width32; output AR_wd_defl; output AR_wd_width32; output [3:0] gf_rdO_addr; output gf_rdO_usel; output gf_rd0_width8; output [3:0] gf_rdl_addr; output gf_rdl_usel; output gf_rdl_widthδ ; output [3:0] gf_rd2_addr; output gf_rd2_usel; output gf_rd2_widthδ ; output [3:0] gf_wd_addr; output gf_wd_def2; output gf_wd_defl; output gf_wd_width8; output gfl_semantic; output gf4_semantic; output gf2_semantic; output gf3_semantic; output lgf_semantic; output sgf_semantic; output RUR0_semantic; output WUR0_semantic; output load_instruction; output store_instruction; output ' IE_Inst ; input [23:0] Inst; wire [3:0] op2 {Inst [23:20] }; wire [3:0] opl {inst [19:16] }; wire [3:0] opO {Inst [3:0]}; wire QRST = (op0== "bOOOO) ; wire CUSTO = (opl==4 'bOHO) & QRST; assign GFADD8 = (op2==4 'bOOOO) & CUSTO; assign GFADDδl = (op2==4 -bOlOO) & CUSTO; assign GFMULXδ = (op2==4 'bOOOl) & CUSTO; assign GFRWM0D8 = (op2==4 'bOOlO) & CUSTO; wire [3:0] r = {inst [15 : 12] } ; wire LSCI = (op0==4 'bOOll) ; assign LGF8_I = (r==4'b0000) & LSCI; assign SGF8_I = (r==4'b0001) & LSCI; assign LGF8_IU = (r==4'b0010) & LSCI; assign SGF8_IU = (r==4'b0011) & LSCI; wire LSCX = (opl==4 'blOOO) & QRST; assign LGF8_X = (op2==4 'bOOOO) & LSCX; assign SGF8_X = (op2==4 'bOOOl) & LSCX; assign LGFδ_XU = (op2==4 'bOOlO) & LSCX; assign SGF8_XU = (op2==4 'bOOll) & LSCX; wire [3:0] S = {inst [11 : 8] } ; wire [3:0] t = {lnst[7:4]}; wire [7:0] st = {s,t}; wire RST3 = (opl==4 'bOOll) & QRST; wire RUR = (op2==4 'blllO) & RΞT3 ; assign RURO = (st==8 'bOOOOOOOO) & RUR; wire [7:0] sr = {r,s}; wire WUR = (op2==4'bllll) & RST3 ; assign WURO = (sr==8 'bOOOOOOOO) & WUR; assign gfmod_usel = GFMULXδ | GFRWMOD8 | RURO I'bO; assign gfmod_defl = GFRWMODδ | WURO | I'bO; assign AR_rdO_usel = I'bO
| LGFδ_I j SGF8_I j LGF8_IU j SGFβ_IU
| LGFβ_X j SGFδ_X j LGFβ_XU
| SGF8_XU; assign AR_rdO_width32 = I'bO; assign AR_rdl_usel = I'bO
| LGF8_X j SGF8_X
| LGF8_XU j SGFδ_XU j WURO; assign AR_rdl_width32 = I'bO; assign AR_wd_defl = I'bO
| LGF8_IU j SGF8_IU
| LGF8_XU j SGF8_XU j RURO; assign AR_wd_width32 = I'bO; assign gf_rdO_usel = I'bO
| GFADD8 j GFADD8I j GFMULX8 ; assign gf_rd0_width8 = I'bO; assign gf_rdl_usel = I'bO
| GFADD8
GFRWMOD8 | SGFδ_I j SGFδ_IU; assign gf_rdl_width8 = = I'bO; assign gf_rd2_usel = I'bO
| SGF8_X
| SGF8_XU; assign gf_rd2_widthβ = = I'bO; assign gf_wd_def2 = 1 bO
| LGFδ_I
| LGFδ_IU j LGF8_X
| LGFδ_XU; assign gf_wd_defl = 1 bO
| GFADDδ j GFADD8I j GFMULX8 j GFRWMOD8 ; assign gf_wd_width8 = I'bO; assign art_def = I'bO assign art_use = LGF8_ _x 1 S SCGF8_X I LGFδ_XU | SGFδ_XU | WURO | I'bO; assign ars_def = LGF8_ ~ιu | ε SGFδ_IU I LGF6_XU | SGFδ_XU | I'bO; assign ars_use = LGF8_ _I 1 SSGGFδ I LGF8 IU I SGF8 IU I LGF8 X I SGF8 X
LGFδ_XU | SGF8_XU | 1 "bO; assign arr_def = RURO | l'bbCO; assign arr_use = I'bO assign br_def = I'bO; assign br_use = I' 'bbOO; assign bs_def 'bO assign bs_use 'bO assign bt_def 'bO assign bt_use 'bO assign bs4_def I'bO; assign bs4_use I'bO; assign bs8_def l1bO; assign bs8_use I'bO; assign gr_def GFADDδ | GFADDδl GFMULXβ I LGF8 X I LGF8_XU I I'bO; assign gr_use SGFδ_X j SGFδ_XU I'bO; assign gs_def I'bO; assign gs_use GFADD8 | GFADD8I GFMULX8 I 1 ' bO ; assign gt_def GFRWMODδ | LGF8_I I LGF8_IU I I'bO; assign gt_use GFADDδ 1 GFRWMOD8 SGF8 I I SGF8 IU I'bO; wire [3 : 0] gr_addr = r; wire [3 .- 0] gs_addr = s ; wire [3:0] gt_addr = t; assign gf_wd_addr = 4'bO
I {4{9r_def}} & gr_addr j {4{gt_def}} & gt_addr; assign gf_rdθ_addr = gs_addr assign gf_rdl_addr = gt_addr assign gf_rd2_addr = gr_addr assign gfl_semantic GFADD8 I I'bO; assign gf4_semantic GFADD8I I I'bO; assign gf2_semantic GFMULX8 j I'bO; assign gf3_semantic GFRWMOD8 I I'bO; assign' Igf_semantic LGF8_I I LGF8_IU LGF8_X I LGF8_XU | 1 'bO; assign sgf_semantic SGFδ_I SGF8_IU SGF8_X I SGFδ_XU | 1 'bO; assign RURO_semantic = RURO I'bO; assign WURO_semantic = WURO 1 ' bO ; assign imm4 = t; wire [7 : 0] imm8 = { inst [23 : 16] } ; assign load_instruction = I'bO
I LGF8_I j LGF8_IU j LGF8_X
I LGF8_XU; assign store_instruction = I'bO
I SGF8_I j SGFδ_IU j SGF6_X j SGFδ_XU; assign TIE_Inst = I'bO
I GFADD8
I GFADD8I j GFMULX8 j GFRWMOD8 j LGF8_I j SGF8_I
I LGFδ_IU j SGFδ_IU j LGF8_X j SGF8_X
I LGF8_XU j SGF8_XU
I RURO j WURO ; endmodule module xmTIE_gfl (
GFADDδ_C0 , gr_o_Cl, gr_kill_Cl, gs_i_Cl, gt_i_Cl , elk
); input GFADD8_C0; output [7:0] gr_o_Cl; output gr__kill_Cl; input [7:0] gs_i__Cl; input [7:0] gt_i_Cl; input elk; assign gr__o_Cl = '(gs_i_Cl) A (gt_i_Cl) ; wire GFADD8_C1; xtdelayl #(1) iGFADD8_Cl ( .xtin (GFADD8_C0) , .xtout (GFADD8_C1) , .clk(clk)); assign gr_kill_Cl = (I'bO) & (GFADD8_C1) ; endmodule module xmTIE_gf4 (
GFADD8I_C0, gr_o_Cl , gr_kill_Cl, gs_i_Cl , imm4_C0, elk
) ; input GFADD8I_C0 ; output [7 : 0] gr_o_Cl ; output gr_kill_Cl ; input [7 : 0] gs_i_Cl ; input [31 : 0 ] imm4_C0 ; input elk; wire [31:0] imm4_Cl ; xtdelayl #(32) iimm4_Cl ( .xtin(imm4_C0) , .xtout (imm4_Cl) , . elk (elk)) ; assign gr_o_Cl = (gs_i_Cl) Λ (imm4_Cl) ; wire GFADD8I_C1; xtdelayl # (1) iGFADD8I_Cl ( .xtin (GFADD8I_C0) , .xtout (GFADD8I_C1) , .elk (elk) ) ; assign gr_kill_Cl = (I'bO) & (GFADD8I_C1) ; endmodule module xmTIE_gf2 ( GFMULX8_C0 , gr_o_Cl , gr_kill_Cl, gs_i_cι , gfmod_j?s_Cl, elk
); input GFMULX8_C0; output [7:0] gr_o_Cl; output gr_kill_Cl; input [7:0] gs_i_Cl; input [7:0] gfmod__ps_Cl; input elk; assign gr_o_Cl = (gs_i_Cl [7] ) ? ( ({gs_i_Cl [6 : 0] , I'bO}) A (gfmod_ps_Cl) )
({gs_i_Cl[6:0] , I'bO}); wire GFMULX8_C1; Xtdelayl #(1) iGFMULX8_Cl ( .Xtin(GFMULX8__C0) , .xtout (GFMULX8_C1) ,
.el (elk) ) ; assign gr_kill_Cl = (I'bO) & (GFMULXδ_Cl) ; endmodule module xmTIE_gf3 (
GFRWMOD8_C0 , gt_i_Cl, gt_o_Cl , gt_kill_Cl, gfmod_jps_Cl, gfmod_ns_Cl , gfmod_kill_Cl, elk
); input GFRWMOD8_C0; input [7:0] gt_i_Cl; output [7:0] gt_0_Cl; output gt_kill_Cl; input [7:0] gfmod_j)s_Cl ; output [7:0] gfmod_ns_Cl; output gfmod_kill_Cl; input elk; wire [7:0] tl_Cl; assign tl__Cl = gt_i_Cl; wire [7.-0] t2_Cl; assign t2_Cl = gfmod_ps_Cl ; assign" gfmod_ns_Cl = tl_Cl; assign gt_o_Cl = t2_Cl; wire GFRWM0D8_C1; xtdelayl # (1) iGFRWMOD8_Cl ( .xtin (GFRWMOD8_C0) , .xtout (GFRWM0D8_C1) ,
.elk (elk) ) ; assign gfmod_kill_Cl = (I'bO) & (GFRWM0D8_C1) ; assign gt_kill_Cl = (I'bO) & (GFRWMODδ_Cl) ; endmodule module xmTIE_lgf (
LGFδ_I_C0 ,
LGFδ__IU_C0,
LGF8_X_C0,
LGF8_XU_C0 , gt_0_C2 , gt_kill_C2, ars_i_Cl, ars_o_Cl , ars_kill_Cl, imm8_C0 , gr_o__C , gr_kill_C2, art_i_Cl,
MemDataIn8_C2 ,
VAddrIn_Cl,
LSSize_C0,
VAddrBase_Cl,
VAddrIndex_Cl ,
VAddrOffset_C0,
LSIndexed_C0, elk
) ; input LGF8_I_C0; input LGF8_IU_C0; input LGF8_X_C0; input LGFβ_XU_C0; output [7:0] gt_σ_C2; output gt_kill_C2; input [31:0] ars_i_Cl; output [31:0] ars_θ_Cl; output ars_kill_Cl; input [7:0] imm8_C0; output [7:0] gr_o_C2 ; output gr_kill_C2; input [31:0] art_i_Cl; input [7:0] MemDataIn8_C2 ; input [31:0] VAddrIn_Cl; output [4-0] LSSize_C0; output [31:0] VAddrBase_Cl ; output [31:0] VAddrIndex_Cl ; output [31:0] VAddrOf fset_C0; output LSIndexed_C0; input elk; wire indexed_C0; assign indexed_C0 = (LGF6_X_C0) | (LGF8_XU_C0) ; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = indexed_C0; assign VAddrOffset_C0 = imm8_C0; assign VAddrIndex_Cl = art_i_Cl; assign-g't_o_C2 = MemDataIn8_C2 ; assign gr_o_C2 = MemDataIn8_C2 ; assign ars_o_Cl = VAddrIn_Cl; wire LGF8_I_C2 ; xtdelay2 #(1) iLGF8_I_C2 ( .xtin(LGF8_I_C0) , .xtout (LGF8_I_C2) , .clk(clk) wire LGF8 IU C2 ; Xtdelay2 # (l) iLGF8_IU_C2 ( .xtin (LGF8_IU_C0) , .Xtout (LGF8_IU_C2) ,
.elk (elk) ) ; assign gt_kill_C2 = (I'bO) & ( (LGF8_I_C2) | (LGF8_IU_C2) ) ; wire LGF8_IU_C1; xtdelayl # (1) iLGFδ_IU_Cl ( .xtin (LGFδ_IU_C0) , .xtout (LGF8_IU_C1) ,
.elk (elk) ) ; wire LGF8_XU_C1; xtdelayl #(1) iLGF8_XU_Cl { .xtin (LGF8_XU_C0) , .xtout (LGF8_XU_C1) ,
.el (elk)); . assign ars_kill_Cl = (I'bO) & ( (LGF8_IU_C1) | (LGF8_XU_C1) ) ; wire LGF8_X_C2 ; xtdelay2 #(1) iLGF8_X_C2 ( .xtin(LGFδ_X_C0) , .xtout (LGF8_X_C2) , .clk(clk)); wire LGF8_XU_C2; xtdelay2 # (1) iLGF8_XU_C2 ( .xtin (LGF8_XU_C0) , .xtou (LGF8_XU_C2) ,
.elk (elk)) ; assign gr_kill_C2 = (I'bO) & ( (LGF8_X__C2) | (LGF8_XU_C2) ) ; endmodule module xmTIE_sgf (
SGFδ_I_C0,
SGF8_IU_C0 ,
SGFS_X_C0 ,
SGF6_XU_C0 , gt_i_Cl, ars_i_Cl , ars_o_Cl, ars_kill_Cl, immδ_C0 , gr_i_Cl , art_i_Cl,
VAddrIn_Cl ,
LSSize_C0,
MemData0ut8_Cl ,
VAddrBase_Cl,
VAddrIndex_Cl,
VAddrOffset_C0,
LSIndexed_C0, elk
) ; input SGF8_I_C0; input SGF8_IU_C0; input SGF8_X_C0; input SGFδ_XU_C0; input [7:0] gt_i_Cl; input [31:0] ars_i_Cl; output [31:0] ars_o_Cl; output ars_kill_Cl; input [7:0] immδ_C0; input [7:0] gr_i_Cl; input [31:0] art_i_Cl; input [31:0] VAddr In_Cl ; output [4:0] LSSize_C0; output [7:0] MemDataOutδ_Cl; output [31:0] VAddrBase__Cl ; output ' [31:0] VAddr Index_Cl ; output [31:0] VAddrOf fset_C0 ; output LSIndexed_C0; input elk; wire indexed_C0 ,- assign indexed_C0 = (SGF8_X_C0) (SGF8 XU CO) assign LSSize_C0 -= 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0- = indexed_C0; assign VAddrOffset_C0 = imm8_C0; assign VAddrIndex_Cl = art_i_Cl; wire SGF8_X_C1; xtdelayl #(1) iSGF8_X_Cl ( .xtin (SGF8_X_C0) , .xtout (SGF8_X_C1) , . elk (elk)); wire SGF8_XU_C1; xtdelayl # (1) iSGF8_XU_Cl ( .xtin (SGF8_XU_C0) , .xtout (SGF8_XU_C1) ,
.elk (elk) ) ; assign MemDataOut8_Cl = ( (SGF8_X_C1) | (SGF8_XU_C1) ) ? (gr_i_Cl) :
(gt_i_Cl) ; assign ars_o_Cl = VAddrIn_Cl; wire SGF8_IU_C1; xtdelayl #(1) iSGF8_IU_Cl ( .xtin(SGFδ_IU_C0) , .xtout (SGF6_IU_C1) ,
.elk (elk) ) ; assign ars_kill_Cl = (I'bO) & ( (SGFδ__IU_Cl) | (SGFδ_XU_Cl) ) ; endmodule module xmTIE_RUR0 (
RUR0_C0 , arr_o_Cl, arr_kill_Cl, gfmod_DS_Cl, elk
) ; input RUR0_C0; output [31 : 0] arr_o_Cl ; output arr_kill_Cl ; input [7 : 0] gfmod_ps_Cl; input elk; assign arr_o_Cl = {gfmod_ps_Cl } ; wire RUR0_C1 ; xtdelayl #(1) iRUR0_Cl ( .xtin (RUR0_C0) , .xtout (RUR0_C1) , .clk(clk)); assign arr_kill_Cl = (I'bO) & (RUR0_C1) ; endmodule module xmTIE_WUR0 (
WUR0_C0 , art_i_Cl , gfmod_ns_Cl , gfmod_kill_Cl , elk
) ; input WUR0_C0 ; input [31 : 0] art_i_Cl ; output [7 : 0] gf od_ns_Cl ; output gfmod_kill_Cl; input elk; assign gfmod_ns_Cl = {art_i_Cl [7 :0] } ; wire WUR0_C1; xtdelayl #(1) iWUR0_Cl ( .xtin'(WUR0_C0) , .xtout (WUR0_C1) , .clk(clk)); assign gfmod_kill_Cl = (I'bO) & (WUR0_C1) ; endmodule module xmTIE ( TIE_inst_R, TIE_asRead_R, TIE_atRead_R, TIE atWrite R, TIE_arWrite_R,
TIE_asWrite_R,
TIE_aWriteM_R,
TIE_aDataKill_E,
TIE_aWriteData_E,
TIE_aDataKill_M,
TIE_aWriteData_M,
TIE_Load_R,
TIE_Store_R,
TIE_LSSize_R,
TIΞ_LSIndexed_R,
TIE_LSOffset_R,
TIE_MemLoadData_M,
TIE_MemStoreData8_E ,
TIE_MemStoreDatal6_E,
TIE_MemStoreData32_E,
TIE_MemStoreData64_E ,
TIE_MemStoreDatal28_E,
TIE_Stall_R,
TIE_Exception_E ,
TIE_ExcCause_E ,
TIE_bsRead_R,
TIE_btRead_R,
TIE_btWrite_R,
TIE_brWrite_R,
TIE_bsWrite_R,
TIE_bsReadSize_R,
TIE_btReadSize_R,
TIE_bWriteSize_R,
TIE_bsReadData_E ,
TIE_btReadData_E,
TIE_bWriteDatal_E,
TIE_bWriteData2_E ,
TIE_bWriteData4_E,
TIE_bWriteData8_E ,
TIE_bWriteDatal6_E,
TIE_bDataKill_E,
CPEnable,
Instr_R,
SBus_E ,
TBus_E,
MemOpAddr_E ,
Kill_E,
Except_W,
Replay_W,
G1WCLK,
Reset
); output TIE_inst_R; output TIE_asRead_R; output TIE_atRead_R; output TIE_atWrite_R; output TIE_arWrite_R; output TIE_asWrite_R; output "T'IE_aWriteM_R; output TIE_aDataKill_E; output [31:0] TIE_aWriteData_E; output TIE_aData ill_M; output [31:0] TIE_aWriteData_M; output TIE_Load_R; output TIΞ_Store_R; output [4:0] TIE_LSSize_R; output TIE_LSIndexed_R; output [31:0] TIE_LSOffset_R; input [127:0] TIE_MemLoadData_M; output [7:0] TIE_MemStoreData8_E; output [15:0] TIE_MemStoreDatal6_E; output [31:0] TIE_MemStoreData32_E; output [63 : 0] TIE_MemStoreData64_E; output [127:0] TIE_MemStoreDatal2δ_E; output TIΞ_Stall_R; output TIE_Exception_E; output [5:0] TIE_ExcCause_E; output TIE_bsRead_R; output TIE_btRead_R; output TIE_btWrite_R; output TIE_brWrite_R; output TIE_bsWrite_R; output [4:0] TIΞ_bsReadSize_R; output [4 : 0] TIE_btReadSize_R; output [4 : 0 ] TIE_bWriteSize_R; input [15 : 0] TIE_bsReadData_E ; input [15 : 0] TIE_btReadData_E; output TIE_bWriteDatal_E; output [1:0] TIE_bWriteData2_E; output [3 : 0] TIΞ_bWriteData4_E; output [7:0] TIE_bWriteDataδ_E; output [15:0] TIE_bWriteDatal6_E; output TIE_bDataKill_E; input [7:0] CPEnable; input [23:0] Instr_R; input [31:0] SBus_E; input [31:0] TBus_E; input [31:0] MemOpAddr_E ; input Kill_E; input Except_W; input Replay_W; input G1WCLK; input Reset;
// unused signals wire TMode = 0 ;
// control signals wire KillPipe_W; wire elk;
// decoded signals wire GFADDδ_C0; wire GFADDδI_C0; wire GFMULX8_C0; wire GFRWMOD8_C0 ; wire LGF8_I_C0; wire SGFδ_I_C0; wire LGF8_IU_C0; wire SGF8_IU_C0; wire LGFδ_X_C0; wire SGFδ_X_C0; wire LGF8_XU_C0; wire SGF8 XU CO; wire RUR0_C0; wire WUR0_C0; wire [31:0] imm4_C0 ; wire [7:0] imm8_C0; wire art_use_C0; wire art_def_C0 ; wire ars_use_C0; wire ars_def_C0 ; wire arr_use_C0;
0; 0;
Figure imgf000142_0001
; wire [3:0] gf_rd0_addr_C0; wire gf_rd0_usel_C0; wire gf_rd0_widthδ__C0 ,- wire [3:0] gf_rdl_addr_C0; wire gf_rdl_usel_C0;- wire gf_rdl_width8_C0 ; wire [3:0] gf_rd2_addr_C0 ,- wire gf_rd2_usel_C0; wire gf_rd2_width8_C0 ; wire [3:0] gf_wd_addr_C0; wire gf_wd_def2_C0,- wire gf_wd_defl_C0; wire gf_wd_width8_C0 ; wire gfl_semantic_C0; wire gf4_semantic_C0; wire gf2_semantic_C0 ; wire gf3_semantic_C0; wire lgf_semantic_CO; wire sgf_semantic_CO; wire RUR0_semantic_C0; wire WURO_semantic_CO; wire load_instruction_CO; wire store_instruction_CO; wire TIE_Inst_C0; wire [23:0] Inst_C0; // state data, write-enable and stall signals wire [7:0] g mod_ps_Cl ; wire [7:0] gfmod_ns_Cl; wire gfmod_kill_Cl; wire gfmod_Stall_Cl;
// register data, write-enable and stall signals wire [31:0] AR_rdO_data_Cl ; wire [31:0] AR_rdl_data_Cl ; wire [31:0] AR_wd_data32_Cl; wire AR_wd_kill_Cl; wire [7:0] g _rd0_data_Cl ; wire [7:0] gf _rdl_data_Cl ; wire [7:0] gf _rd2_data_Cl ; wire [7:0] gf_wd_dataδ_C2 ; wire gf _wd_kill_C2 ; wire [7:0] gf_wd_dataδ_Cl; wire gf_wd_kill_Cl; wire gf_Stall_Cl;
// operands wire [31:0] art_i_Cl; wire [31:0] art_o_Cl; wire art_kill_Cl; wire [31:0] ars_i_Cl; wire [31:0] ars_o_Cl; wire ars_kill_Cl; wire [31:0] arr_o_Cl; wire arr_kill_Cl; wire [7:0] gr_i_Cl; wire [7:0] gr_o_C2; wire gr_kill_C2; wire [7:0] gr_o_Cl; wire gr_kill_Cl; wire [7:0] gs_i_Cl; wire [7:0] gt_i_Cl; wire [7:0] gt_o_C2; wire gt_kill_C2; wire [7:0] gt_o_Cl ; wire gt_kill_Cl;
// output state of semantic gfl
// output interface of semantic gfl
// output operand of semantic gfl wire [7:0] gf l_gr_o_Cl ; wire gfl_gr_kill_Cl;
// output state of semantic gf4
// output interface of semantic gf4
// output operand of semantic gf4 wire [7-.0] gf 4_gr_o_Cl ; wire gf4_gr_kill_Cl;
// output state of semantic gf2
//•output interface of semantic gf2 // output operand of semantic gf2 wire [7:0] gf2_gr_o_Cl ,- wire gf2_gr_kill_Cl;
// output state of semantic gf3 wire [7:0] gf3_gfmod_ns_Cl; wire gf3_gfmod_kill_Cl;
// output interface of semantic gf3
// output operand of semantic gf3 wire [7:0] gf3_gt_o_Cl; wire gf3_gt_kill_Cl;
// output state of semantic lgf
// output interface of semantic lgf wire [4:0] lgf_LSSize_C0; wire [31:0] lgf_VAddrBase_Cl; wire [31:0] lgf_VAddrIndex_Cl; wire [31:0] lgf_VAddrOffset_C0 ; wire lgf_LSIndexed_C0;
// output operand of semantic lgf wire [7:0] lgf_gt_o_C2; wire lgf_gt_kill_C2; wire [31:0] lgf_ars_o_Cl; wire lgf_ars_kill_Cl; wire [7:0] lgf_gr_o_C2; wire lgf_gr_kill_C2 ;
// output state of semantic sgf
// output interface of semantic sgf wire [4:0] sgf_LSSize_C0 ; wire [7:0] sgf_MemDataOutδ_Cl; wire [31:0] sgf_VAddrBase_Cl; wire [31:0] sgf_VAddrIndex_Cl ; wire [31:0] sgf_VAddrOffset_C0; wire sgf_LSϊndexed_C0 ;
// output operand of semantic sgf wire [31:0] sgf_ars_o_Cl ; wire sgf_ars_kill_Cl;
// output state of semantic RURO
// output interface of semantic RURO
// output operand of semantic RURO wire [31:0] RUR0_arr_o_Cl ; wire RUR0_arr_kill_Cl;
// output state of semantic WURO wire [7:0] WUR0_gfmod_ns_Cl ; wire WUR0_gfmod_kill_Cl;
// output interface of semantic WURO // output operand of semantic WURO
// TIE-defined interface signals wire [31:0] VAddr_Cl; wire [31:0] VAddrBase_Cl; wire [31:0] VAddrOffset_C0; wire [31:0] VAddrIndex_Cl; wire [31:0] VAddrIn_Cl; wire [4:0] LSSize_C0; wire LSIndexed_C0; wire [127:0] MemDataInl28_C2 ; wire [63:0] MemDataIn64_C2 ; wire [31:0] MemDataIn32_C2; wire [15:0] MemDataInl6_C2; wire [7:0] MemDataIn8_C2 ; wire [127:0] MemDataOutl28_Cl; wire [63:0] MemDataOut64_Cl; wire [31:0] MemDataOut32_Cl; wire [15:0] MemDataOutl6_Cl; wire [7:0] MemData0ut8_Cl; wire Exception_Cl; wire [5:0] ExcCause_Cl; wire [7:0] CPΞnable_Cl; xtflop #(1) reset (localReset, Reset, GIWCLK); xmTIE_decoder TIE_decoder ( .GFADD8 (GFADD8_C0) , .GFADD8I (GFADD8I_C0) , .GFMULX8 (GFMULX8_C0) , .GFRWMOD8 (GFRWMOD8_C0) , .LGFδ_I (LGF8_I_C0) , .SGF8_I (SGF8_I_C0) , -LGF8_IU(LGF8_IU_C0) , .SGF6_IU(SGF8_IU_C0) , .LGF8_X(LGFδ_X_C0) , . SGFδ_X(SGF6_X_C0) , . LGF8_XU (LGF8_XU_C0) , .SGF8_XU(SGF8_XU_C0) , .RURO (RUR0_C0) , .WURO (WUR0_C0) , . imm4 (imm4_C0) , . immS (imm8_C0), ' .art_use (art_use_C0) , .art_def (art_def_C0) , .ars_use (ars_use_C0) , .ars_def (ars_def_C0) , . arr_use (arr_use_C0) , .arr_def (arr_def_C0) , .br_use (br_use_C0) , .br_def (br_def_C0) , .bs_use (bs_use_C0) , .bs_def (bs_def_C0) , .bt_use (bt_use_C0) , .bt_def (bt_def_C0) , . bs4'_use (bs4_use_C0) , .bs4_def (bs4_def_C0) , .bs8_use (bs8_use_C0) , .bs8_def (bs8_def_C0) , -gr_use (gr_use_C0) , .gr_def (gr_def_C0) , .gs_use (gs_use_C0) , .gs_def (gs_def_C0) , .gt_use (gt_use_C0) , .gt_def (gt_def_C0) , .gfmod_usel (gfmod_usel_C0) , .gfmod_def1 (gfmod_def1_C0) , .AR__rd0_usel (AR_rd0_usel_C0) , .AR_rdO_width32 (AR_rd0_width32_C0) , .AR_rdl_usel (AR_rdl_usel_C0) , .AR_rdl_width32 (AR_rdl_width32_C0) , .AR_wd_def1 (AR_wd_def1_C0) , .AR_wd_width32 (AR_wd_width32_C0) , .gf_rdO_addr (gf_rdO_addr_CO) , .gf_rdO_usel (gf_rdO_usel_CO) , .gf_rdO_widthδ (gf_rd0_width8_C0) , . gf__rdl_addr (gf_rdl_addr_CO) , .gf_rdl_usel (gf_rdl_usel_CO) , .gf_rdl_widthδ (gf_rdl_widthδ_Cθ) , .gf_rd2_addr(gf_rd2_addr_C0) , .gf_rd2_usel (gf_rd2_usel_C0) , . gf_rd2_width8 (gf_rd2_width8_C0) , . gf_wd_addr (gf_wd_addr_CO) , .gf_wd_def2 (gf_wd_def2_C0) , . gf_wd_def1 (gf_wd_def1_C0) , .gf_wd_width8 (gf__wd_widthδ_C0) , .gfl_semantic (gfl_semantic_CO) , .gf4_semantic (gf4_semantic_C0) , . gf2_semantic (gf2_semantic_C0) , . gf3_semantic (gf3_semantic_C0) , . lgf_semantic (lgf_semantic_CO) , . sgf_semantic (sgf_semantic_CO) , .RURO_semantic (RURO_semantic_CO) , .WUR0_semantic (WURO_semantic_CO) , .load_instruction(load_instruction_CO) , . store_instruction (store_instruction_CO) .TIE_Inst (TIE_Inst_CO) , .Inst (Inst CO)
) xmTIE_gf1 TIE_gf1 (
.GFADD8_C0 (GFADD8_C0) , .gr_o_Cl (gfl_gr_o_Cl) , .gr_kill_Cl (gfl_gr_kill_Cl) , .gs_i_Cl (gs_i_Cl) , .gt_i_Cl(gt_i_Cl) , .elk (elk) ) ; xmTIE_gf4 TIE_gf4 (
.GFADD8I_C0 (GFADD8I_C0) , .gr_o_Cl (gf4_gr_o__Cl) , .gr_kill_Cl (gf4_gr_kill_Cl) , -gs_i_Cl (gs_i_Cl) , . imm4_C0 (imm4_C0 ) , .elk (elk) ) ; xmTIE_gf2 TIE_gf2 (
.GFMULX8_C0 (GFMULX8_C0) , . gr_o_Cl (gf2_gr_o_Cl) , .gr_kill_Cl (gf2_gr_kill_Cl) , .gs_i_C!(gs_i_Cl) , . gfmod_ps_Cl (gfmod_ps_Cl) , . elk (elk) ) ; xmTIE_gf3 TIE_gf3 (
.GFRWMOD8_C0 (GFRWMODδ__C0) , .gt_i_Cl(gt_i_Cl) , .gt_o_Cl (gf3_gt_o_Cl) , .gt_kill_Cl (gf3_gt_kill_Cl) , .gfmod_ps_Cl (gfmod_jps_Cl) , . gfmod_ns_Cl (gf3_gfmod_ns_Cl) , .gfmod_kill_Cl (gf3_gfmod_kill_Cl) , .elk (elk) ) ; xmTIE_lgf TIE_lgf (
.LGF8_I_C0 (L'GF8_I_C0) ,
.LGF8_IU_C0 (LGF8_IU_C0) ,
.LGF8_X_C0 (LGF8_X_C0) ,
.LGFδ_XU_CO (LGF8_XU_C0) ,
.gt_o_C2 (lgf_gt_o_C2) ,
.gt_kill_C2 (lgf_gt_kill_C2) ,
. ars_i_Cl (ars_i_Cl) ,
. ars_o_Cl (lgf_ars_o_Cl) ,
.ars_kill_Cl (lgf_ars_kill_Cl) ,
. imm8_C0 (imm8_CO) ,
.gr_o_C2 (lgf__gr_o_C2) ,
.gr_kill_C2 (lgf_gr_kill_C2) ,
. art_i_Cl (art_i_Cl) ,
.MemDataInδ_C2 (MemDataIn8_C2) ,
. VAddrIn_Cl(VAddrIn_Cl) ,
.LSSize_C0 (lgf_LSSize_CO) ,
.VAddrBase_Cl (lgf_VAddrBase_Cl) ,
.VAddrIndex_Cl (lgf_VAddrIndex_Cl) ,
.VAddrOffset_CO (lgf_VAddrOffset_C0) ,
.LSIndexed_C0 (lgf_LSIndexed_CO) ,
.elk (elk) ) ; xmTIE_sgf TIE_sgf (
.SGF8_I_C0 (SGF8_I_C0) ,
.SGF8_IU_C0 (SGF8_IU_C0) ,
. SGF6 C_C0 (SGF8_X_C0) ,
.SGF8_XU_C0 (SGF8_XU_C0) ,
-gt_i_Cl.(gt_i_Cl) ,
.ars_i_Cl (ars_i_Cl) ,
. ars_o_Cl (sgf_ars_o_Cl) ,
.ars_kill_Cl (sgf_ars_kill_i_Cl) ,
.imm8_C0 (immδ_C0) ,
.gr_i_Cl (gr_i_Cl) ,
.art_i_Cl (art_i_Cl) ,
.VAddrIn_Cl (VAddrIn_Cl) ,
.LSSize_C0 (sgf_LSSize_C0) , .
.MemDataOutδ_Cl (sgf_MemDataOut8_Cl) ,
.VAddrBase_Cl (sgf_VAddrBase_Cl) ,
.VAddrIndex_Cl (sgf_VAddrIndex_Cl) ,
. VAddrOffset_C0 (sgf_VAddrOffset_C0) ,
.LSΪndexed_C0 (sgf_LSIndexed_C0) ,
.elk (elk)) ; xmTIE_RUR0 TIE_RUR0 (
.RU 0_C0 (RUR0_C0) ,
.arr o Cl (RURO arr o Cl) , . arr_kill_ Cl (RURO_arr_kill_Cl) , . gfmod_ps_Cl (gfmod_ps_Cl) , . elk (elk) ) ; xmTIE_WUR0 TIE_WUR0 (
. WUR0_C0 (WUR0_C0) , . art_i_Cl (art_i_Cl) , . gfmod_ns_Cl (WUR0_gfmod_ns_Cl) , . gfmod_kill_Cl (WURO_gfmod_kill_Cl) , . elk (elk) ) ; xmTIE_gfmod_State TIE_gfmod_State (
.ps_widthδ_CO (l'bl) ,
.ps_usel_CO (gfmod_usel_CO) ,
.ps_data_Cl (gfmod_ps_Cl) ,
.ns_widthδ_CO (l'bl) ,
.ns_defl_CO (gfmod_defl_CO) ,
.ns_data8_Cl (gfmod_ns_Cl) ,
.ns_wen_Cl (~gfmod_kill_Cl) ,
.Kill_E(Kill_E) ,
.KillPipe_W(KillPipe_W) ,
.Ξtall_R(gfmod_Stall_Cl) ,
.clk(clk) ); xmTIE_gf_Regfile TIE_gf_Regfile (
.rdO_addr_CO (gf_rdO_addr_CO) ,
-rdO_usel_C0 (gf_rdO_usel_CO) ,
. rdO_data_Cl (gf_rdO_data_Cl) ,
-rd0_width8_C0 (gf_rd0_width8_C0) ,
.rdl_addr_CO (gf_rdl_addr_C0) ,
.rdl_usel_CO (gf_rdl_usel_CO) ,
. rdl_data_Cl (gf_rdl_data_Cl) ,
. rdl_widthδ_CO (gf_rdl_widthδ_CO) ,
.rd2_addr_C0 (gf_rd2_addr_C0) ,
.rd2_usel_C0 (gf_rd2_usel_C0) ,
. rd2_data_Cl (gf_rd2_data_Cl) ,
. rd2_width8_C0 (gf_rd2_width8_C0) ,
.wd_addr_C0 (gf_wd_addr_C0) ,
.wd_def2_C0 (gf_wd_def2_C0) ,
.wd_wen_C2 (~gf_wd_kill_C2) ,
.wd_data8_C2 (gf_wd_data8_C2) ,
.wd_def1_C0 (gf_wd_def1_C0) ,
.wd_wen_Cl (~gf_wd_kill_Cl) ,
.wd_data8_Cl(gf_wd_data8_Cl) ,
. wd_width8_C0 (gf_wd_width8_C0) ,
.Kill_E(Kill_E) ,
.KillPipe_W(KillPipe_W) ,
.Stall_R(gf_Stall_Cl) ,
-clk(clk) );
// Stall logic assign. IE_Stall_R = I'bO
I gf_Stall_Cl
I gfmod_Stall_Cl;
// pipeline semantic select signals to each stage wire lgf_semantic_Cl; xtdelayl #(1) ilgf_semantic_Cl ( .xtin (lgf_semantic_CO) , .xtout (lgf_semantic_Cl) , .elk (elk) ) ; wire sgf_semantic_Cl; xtdelayl #(1) isgf_semantic_Cl ( .xtin (sgf_semantic_C0) , .xtout (sgf_semantic__Cl) , .elk (elk) ) ; wire gf3_semantic_Cl; xtdelayl #(1) igf3_semantic_Cl ( .xtin (gf3_semantic_C0) , .xtout (gf3_semantic_Cl) , .elk (elk) ) ; wire WURO_semantic_Cl; xtdelayl #(1) iWURO_semantic_Cl ( .xtin(WUR0_semantic_C0) , .xtout (WURO_semantic_Cl) , .elk(elk) ) ; wire RURO_semantic_Cl; xtdelayl #(1) iRURO_semantic_Cl ( .xtin(RUR0_semantic_C0) , .xtout (RUR0_semantic_Cl) , .elk(elk) ) ; wire lgf_semantic_C2 ; xtdelay2 #(1) ilgf_semantic_C2 ( .xtin (lgf_semantic_CO) , .xtout (lgf_semantic_C2) , .elk (elk) ) ; wire gfl_semantic_Cl; xtdelayl #(1) igfl_semantic_Cl ( .xtin (gfl_semantic_CO) , .xtout (gfl_semantic_Cl) , .elk (elk) ) ; wire gf4_semantic_Cl; xtdelayl #(1) igf4_semantic_Cl ( .xtin (gf4_semantic_C0) , .xtout (gf4_semantic_Cl) , .elk(elk) ) ; wire gf2_semantic_Cl; xtdelayl #(1) igf2_semantic_Cl ( .xtin (gf2_semantic_C0) , .xtout (gf2_semantic_Cl) , .elk (elk) ) ;
// combine output interface signals from all semantics assign VAddr_Cl = 32 'bO; assign VAddrBase_Cl = 32 'bo
I (lgf_VAddrBase_Cl & {32{lgf_semantic_Cl} }) I (sgf_VAddrBase_Cl & {32{sgf_semantic_Cl} } ) ; assign VAddrOffset_C0 = 32 'bO
I (lgf_VAddrOffset_C0 & {32 {lgf_semantic_Cθ} } ) j (sgf_VAddrOffset_C0 & {32{sgf_semantic_Cθ} }) assign VAddrIndex_Cl = 32'bO
I (lgf_VAddrIndex_Cl & {32{lgf_semantic_Cl} }) j (sgf_VAddrIndex_Cl & {32{sgf_semantic_Cl} } ) ; assign LSSize_C0 = 5'bO
I (lgf_LSSize_C0 & {5{lgf_semantic_Cθ} } ) j (sgf_LSSize_C0 & {5{sgf_semantic_Cθ} }) ; assign LSIndexed_C0 = I'bO
I (lgf_LSIndexed_CO & lgf_semantic_CO) I (sgf_LSIndexed_CO & sgf_semantic_CO) ; assign MemDataOutl28_Cl = 128 'bO; assign MemDataOut64_Cl = 64 'bO; assign MemDataOut32_Cl = 32 'bO; assign MemDataOutl6_Cl = 16 'bO; assign MemDataOut8_Cl = 8'bO
I (sgf_MemDataOut8_Cl & {8{sgf_semantic_Cl} } ) assign Exceptional = I'bO; assign ExcCause_Cl = 6'bO;
// combine output state signals from all semantics assign gfmod_ns_Cl = 8'bO
I (gf3_gfmod_ns_Cl & {8{gf3_semantic_Cl} }) j (WUR0_gfmod_ns_Cl & {s{WURO_semantic_Cl} }) ; assign gfmod_kill_Cl = I'bO
I (gf3_gfmod_kill_Cl & gf3_semantic_Cl) I (WURO_gfmod_kill_Cl & WURO_semantic_Cl) ; // combine output operand signals from all semantics assign art_o_Cl = 32 'bO; assign art_kill_Cl = I'bO; assign ars_o__Cl = 32 'bO
I (lgf_ars_o_Cl & {32{lgf_semantic_Cl} } ) I (sgf_ars_o_Cl & {32{sgf_semantic_Cl} } ) ; assign ars_kill_Cl = I'bO
I .(lgf_ars_kill_Cl & lgf_semantic_Cl) I (sgf_ars_kill_Cl & sgf_semantic_Cl) ; assign arr_o__Cl = 32'bO
I (RU 0_arr_o_Cl & {32{RUR0_semantic_Cl} } ) ; assign arr_kill_Cl = I'bO
I (RURO_arr_kill__Cl & RURO_semantic_Cl) ,- assign gr_o_C2 =*8'b0
I (lgf_gr_o_C2 & { δ {lgf__semantic_C2 } } ) ; assign gr_kill_C2 = I ' bO
I (lgf_gr_kill_C2 & lgf_semantic_C2 ) ; assign gr_o_Cl = 8 ' bO
I (gfl_gr_o_Cl & {8{gfl_semantic_Cl} }) j (gf4_gr_o_Cl & {8{gf4_semantic_Cl} }) I (gf2_gr_o_Cl & { 8{gf2_semantic_Cl} } ) ; assign gr_kill_Cl = I'bO
I (gfl_gr_kill_Cl & gfl_semantic_Cl) I (gf4_gr_kill_Cl & gf4_semantic_Cl) j (gf2_gr_kill_Cl & gf2_semantic_Cl) ; assign gt_o_C2 = 8'bO
I (lgf_gt_o_C2 & {8{lgf_semantic_C2}}) ; assign gt_kill_C2 = I'bO
I (lgf_gt_kill_C2 & lgf_semantic_C2) ; assign gt_o_Cl = 8'bO
I (gf3_gt_o_Cl & {8{gf3_semantic_Cl}}) ,- assign gt_kill_Cl = I'bO
I (gf3_gt_kill_Cl & gf3_semantic_Cl) ;
// output operand to write port mapping logic assign AR_wd_data32_Cl = ars_o_Cl | arr_o_Cl | 32 'bO; assign AR_wd_kill_Cl = ars_kill_Cl | arr_kill_Cl | I'bO; assign gf_wd_data8_C2 = gt_o_C2 | gr_o_C2 | 8'bO; assign gf_wd_kill_C2 = gt_kill_C2 | gr_kill_C2 | I'bO; assign gf_wd_data8_Cl = gr_o_Cl | gt_o_Cl | 8'bO; assign gf_wd_kill_Cl = gr_kill_Cl | gt_kill_Cl | I'bO;
// read port to input operand mapping logic assign ars_i_Cl = AR_rdO_data_Cl ; assign art_i_Cl = AR_rdl_data_Cl; assign gs_i_Cl = gf_rdO_data_Cl ; assign gt_i_Cl = gf_rdl_data_Cl; assign gr_i_Cl = gf_rd2_data_Cl ;
// clock and instructions assign elk == GIWCLK; assign Inst_C0 = Instr_R; assign TIE_inst_R = TIE_Inst_C0 ,-
// AR-related signals to/from core assign TIE_asRead_R = ars__use_C0; assign TIE_atRead_R = art_use_C0; assign TIE_atWrite_R = art_def_C0; assign TIE_arWrite_R = arr_def_C0; assign TIE_asWrite_R = ars_def_C0; assign TIE_aWriteM_R = 0; assign TIΞ_aWriteData_E = AR_wd_data32_Cl; assign TIE_aWriteData_M = 0; assign TIE_aDataKill_E = AR_wd_kill_Cl; assign TIE_aDataKill_M = 0; assign AR_rdO_data_Cl = SBus_E; assign AR_rdl_data_Cl = TBus_E;
// BR-related signals to/from core assign TIE_bsRead_R = I'bO | bs_use_C0 | bs4_use_C0 | bs8_use_C0; assign TIE_btRead_R = I'bO | bt_use_C0; assign TIE_btWrite_R = I'bO | bt_def_C0; assign TIE_bsWrite_R = I'bO | bs_def_C0 | bs4_def_C0 | bs8_def_C0; assign TIΞ_brWrite_R = I'bO j br_def_C0; assign TIE_bWriteDatal6_E = 0; assign TIE_bWriteDataδ_E = 0; assign TIE_bWriteData4_E = 0; assign TIE_bWriteData2_E = 0; assign TIE_bWriteDatal_E = 0; assign TIE_bDataKill_E = 0; assign TIE_bWriteSize_R = {l'bO, I'bO, I'bO, I'bO, l'bo}, assign TIE_bsReadSize_R = {l'bO, I'bO, I'bO, I'bO, l'bo} assign TIE_btReadSize_R = {l'bO, I'bO, I'bO, I'bO, l'bo},
// Load/store signals to/from core assign TIE_Load_R = load_instruction_C0; assign TIE_Store_R = store_instruction_C0; assign TIE_LSSize_R = LSSize_C0; assign TIE_LSIndexed_R = LSIndexed_C0; assign TIE_LSOffset_R = VAddrOffset_C0; assign TIE_MemStoreDatal2δ_E = MemDataOutl2β_Cl; assign TIE_MemStoreData64_E = MemDataOut64_Cl; assign TIE_MemStoreData32_E = MemDataOut32_Cl; assign TIE_MemSto.reDatal6_E = MemDataOutl6_cl; assign TIE_MemΞtoreData8_E = MemDataOut8_Cl; assign MemDataInl28_C2 = TIE_MemLoadData_M; assign MemDataIn64_C2 = TIE_MemLoadData_M; assign MemDataIn32_C2 = TIE_MemLoadData_M; - assign MemDataInl6_C2 = TIE_MemLoadData_M; assign MemDataIn8_C2 = TIE_MemLoadData_M; assign VAddrIn_Cl =' MemOpAddr_E;
// CPEnable and control signals to/from core assign CPEnable_Cl = CPEnable; assign TIE_Exception_E = Exception_Cl; assign TIE_ExcCause_E = ExcCause_Cl; assign KillPipe_W = Except_W | Replay_W; endmodule module xtdelayl (xtout, xtin, elk) ; parameter size = 1; output [size-l:0] xtout; input [size-l:0] xtin; input elk; wire [size-l : 0] tO ; xtflop # ( size) i0 (t0 , xtin, elk) ; assign xtout = tO; endmodule module xtdelay2 (xtout, xtin, elk); parameter size = 1; output [size-l:0] xtout; input' [size-l:0] xtin; input elk; wire [size-l:0] tO; xtflop #(size) i0(t0, xtin, elk); wire [size-l : 0] tl ; xtflop # (size) il (tl , to , elk) ; assign xtout = tl; endmodule module xtmux3p(o, dO, dl, d2, sO, si); parameter size = 1; output [size-l:0] o; input [size-l:0] do, dl, d2 ; input SO, si; wire [1:0] s = SO ? 0 : Sl ? 1 : 2; xtmux3e #(size) iO (o, do, dl, d2, s) ,- endmodule module xtregfile_lRlW_l (rd0_data, wr0_data, wr0_we, elk); parameter size=32, addr_size=0; output [size-l : 0] rd0_data; input [size-l : 0] wr0_data; input wr0_we ; input elk; wire wr0_addr = 0; wire word0_we = wr0_we & (wr0_addr == 0) ; wire [size-l:0] wordO ; xtenflop #(size) iwordO (wordO, wr0_data, word0_we, elk) ; assign rd0_data = wordO; endmodule module xtregfile_3RlW_16 (rd0_data, rd0_addr, rdl_data, rdl_addr, rd2_data, rd2_addr, wr0_data, wr0_addr, wr0_we , elk) ; parameter size=32 , addr_size=4 ; output [size-l : 0] rd0_data; input [addr_size-l : 0] rd0_addr; output [size-l : 0] rdl_data; input [addr_size-l : 0] rdl_addr; output [size-l : 0] rd2_data; input [addr_size-l : 0] rd2_addr; input [size-1 : 0] wr0_data; input [addr_size-l : 0] wr0_addr; input wr0_we ; input elk; wire [size-1 : 0] wr0_ndata; xtnflop # (size) iwr0_ndata (wr0_ndata, wr0_data, elk) ; wire word0_we = wr0_we & (wr0_addr == 0) ; wire [size-1 : 0] wordO ; wire gclkO ; xtclock_gate_nor xt_clock_gate_norO (gclkO, elk, ~word0_we) ; xtRFlatch #(size) iwordO (wordO , wr0_ndata, gclkO) ; wire wordl_we = wr0_we & (wr0_addr == 1) ; wire [size-1 :0] wordl; wire gclkl; xtclock_gate_nor xt_clock_gate_norl (gclkl, elk, -wordl_we) ; xtRFlatch #(size) iwordl (wordl, wr0_ndata, gclkl); wire word2_we = wr0_we & (wr0_addr == 2) ; wire [size-1 :0] word2; wire gclk2 ; xtclock_gate_nor xt_clock_gate_nor2 (gclk2 , elk, ~word2_we) ; xtRFlatch #{size) iword2 (word2, wr0_ndata, gclk2) ; wire word3_we = wr0_we & (wr0_addr == 3) ; wire [size-1 :0] word3 ,- wire gclk3 ; xtclock_gate_nor xt_clock_gate_nor3 (gclk3, elk, -word3_we) ; xtRFlatch #(size) iword3 (word3 , wr0_ndata, gclk3) ; wire word4_we = wr0_we & (wr0_addr == 4) ; wire [size-1 :0] word4; wire gclk4 ; xtclock_gate_nor xt_clock_gate_nor4 (gclk4, elk, -word4_we) ; xtRFlatch #(size) iword4 (word4 , wr0_ndata, gclk4) ; wire word5_we = wr0_we & (wr0_addr == 5) ; wire [size-1 :0] words ; wire gclk5 ; xtclock_gate_nor xt_clock_gate_nor5 (gclk5, elk, -word5_we) ; xtRFlatch #(size) iword5 (words , wr0_ndata, gclk5) ; wire word6_we = wr0_we & (wr0_addr == 6) ; wire [size-1 :0] wordδ; wire gclk6; xtclock_gate_nor xt_clock_gate_nor6 (gclk6, elk, ~word6_we) ; xtRFlatch #(size) iwordδ (word6 , wr0_ndata, gclk6) ; wire word7_we =-wr0_we & (wr0_addr == 7) ; wire [size-1 :0] word7; wire gclk7 ,- xtclock_gate_nor xt_clock_gate_nor7 (gclk7, elk, ~word7_we) ; xtRFlatch #(size) iword7 (word7 , wr0_ndata, gclk7) ; wire word8_we = wr0_we & (wr0_addr == 8) ; wire [size-1 :0] wordδ; wire gclkθ ; xtclock_gate_nor xt_clock_gate_nor8 (gclkδ, elk, -word8_we) ; xtRFlatch #(size) iwordδ (wordδ , wr0_ndata, gclkδ) ; wire word9_we = wr0_we & (wr0_addr == 9) ; wire [size-1: 0] word9; wire gclk9; xtclock_gate_nor xt_clock_gate_nor9 (gclk9, elk, ~word9_we) ; xtRFlatch #(size) iword9 (word9, wr0_ndata, gclk9) ; wire wordl0_we = wr0_we & (wr0_addr == 10) ; wire [size-1 :0] wordlO; wire gclklO; xtc'lock_gate_nor xt_clock_gate_norlO (gclklO, elk, ■-wordl0_we) ; xtRFlatch #(size) iwordl0 (wordl0, wr0_ndata, gclklO); wire wordll_we = wr0_we & (wr0_addr == 11) ; wire [size-1 :0] wordll; wire gclkll; xtclock_gate_nor xt_clock_gate norll (gclkll, elk, -wordll_we) xtRFlatch #(size) iwordll (wordll, wrO ndata, gclkll); wire wordl2_we = wr0_we & (wr0_addr == 12) ; wire [size-1 :0] wordl2; wire gclkl2 ; xtclock_gate_nor xt_clock_gate_norl2 (gclkl2 , elk, -wordl2_we) ; xtRFlatch #(size) iwordl2 (wordl2 , wrO ndata, gclkl2) ; wire wordl3_we = wr0_we & (wr0_addr == 13) ; wire [size-1 :0] wordl3; wire gclkl3; xtclock_gate_nor xt_clock_gate_norl3 (gclkl3, elk, -wordl3_we) ; xtRFlatch #(size) iwordl3 (wordl3 , wrO_ndata, gclkl3); wire wordl4_we ■ = wr0_we & (wrO addr == 14) ; wire [size-1 :0] wordl4 ; wire gclkl4 ; xtclock_gate_nor xt_clock_gate_norl4 (gclkl4 , elk, -wordl4_we) xtRFlatch #(size) iwordl4 (wordl4 , wr0_ndata, gclkl4) ; wire wordl5_we = wr0_we & (wr0_addr == 15) ; wire [size-1 :0] wordl5; wire gclklS; xtclock_gate_nor xt_clock_gate_norl5 (gclklS, elk, -wordl5_we) xtRFlatch #(size) iwordl5 (wordl5 , wr0_ndata, gclklS); xtmuxl6e # (size) rdO (rd0_data, wordO, wordl, word2, word3, word4, words , wordδ , word7 , words, word9, wordlO, wordll, wordl2, wordl3, wordl4, wordl5 , rd0_addr) ; xtmuxlδe # (size) rdl (rdl_data, wordO, wordl, word , word3, word4, word5 , wordδ , word7 , words, word9, wordlO, wordll, wordl2 , wordl3 , wordl4, wordl5 , rdl_addr) ; xtmuxlδe # (size) rd2 (rd2_data, wordO, wordl, word2, word3 , word4, words , word6 , word7 , words, word9, wordlO, wordll, wordl2 , wordl3, wordl4 , wordl5, rd2 addr) ; endmodule module xtmuxl6e(o, do, dl, d2, d3 , d4, d5, d6, d7, d8, d9, dlO, dll, dl2 , dl3, dl4, dl5, s) ; parameter size = 1; output [size-1 :0] O; input [size-l:0] do, dl, d2, d3, d4, d5, d6, d7, d8, d9, dlO, dll, dl2 , dl3, dl4, dl5; input [3:0] s;
Figure imgf000154_0001
xtmux4e #(size) i0(t0, do, dl, d2, d3 , {s[l], s[0]}); wire [size-1 :0] tl xtmux4e #(size) il (tl, d4, d5, d6, d7, {s [1] , s[0]}); wire [size-1 :0] t2 xtmux4e #(size) i2 (t2, d8, d9, dlO, dll, {s[l], s[0]}); wire [size-1 :0] t3 xtmux4e #(size) i3 t3, dl2, dl3, dl4, dl5, {s [1] , s[0]}); wire [size-1 :0] t4 xtmux4e #(size) i4(t4, to, tl, t2, t3 , {s[3], s[2]}); assign o = t4; endmodule module xtRFenlatch (xtRFenlatchout,xtin, xten, elk) ; parameter size = 32; output [size-1 :0] xtRFenlatchout; input [size-1 :0] xtin; input xten; input elk; reg [size-1 :0] xtRFenlatchout; always @(clk or xten or xtin or xtRFenlatchout) begin if (elk) begin xtRFenlatchout <= #1 (xten) ? xtin : xtRFenlatchout; end end endmodule module xtRFlatch (xtRFlatchout, tin, elk) ; parameter size = 32; output [size-1 :0] xtRFlatchout; input [size-1 :0] xtin; input elk; reg [size-1 :0] xtRFlatchout; always @(clk or xtin) begin if (elk) begin xtRFlatchout <= #1 xtin; end end endmodule module xtadd(xtout, a, b) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; assign xtout = a + b; endmodule module xtaddc(sum, carry, a, b, c) ; parameter size = 32; output [size-1 :0] sum; output carry; input [size-1 :0] a; input [size-1 :0] b; input c; wire junk; assign {carry, sum, junk} = {a,c} + {b,c}; endmodule module xtaddcin (xtout, a, b, c) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; •input c; assign xtout = ({a,c} + {b,c}) >> 1; endmodule module xtaddcout (sum, carry, a, b) ; parameter size = 1; output [size-1 :0] sum; output . carry; input [size-1 :0] a; input [size-1 :0] b; assign {carry, sum} = a + b; endmodule module xtbooth(out, cin, a, b, sign, negate); parameter size = 16; output [size+l:0] out;- output cin; input [size-1 :0] a; input [2:0] b; input sign, negate; wire ase = sign & [size-1]; wire [size+l:0] axl = {ase, ase, a}; wire [size+l:0] ax2 = {ase, a, 1'dθ}; wire one = b [1] Λ b[0] ; wire two = b[2] ? ~b[l] & ~b[0] .- b[l] & b[0]; wire cin = negate ? (~b[2] & (b[l] | b[0])) : (b[2] & ~(b[l] & b[0])); assign out = {size+2 {cin} } Λ (axl&{size+2-{one} } | ax2&{size+2 {two} } ) ; endmodule module xtclock_gate_nor (xtout, xtinl,xtin2) ; output xtout ; input xtinl,xtin2 ; assign xtout = - (xtinl | | xtin2) ; endmodule module xtclock_gate_or (xtout, xtinl,xtin2) ; output xtout ; input xtinl, xtin2; assign xtout = (xtinl | | xtin2) ; endmodule module xtcsa (sum, carry, a, b, c) ; parameter size = 1; output [size-1 :0] sum; output [size-1 :0] carry; input [size-1.-0] a; input [size-1 :0] b; input [size-1 :0] c; assign sum = a A b A c; assign carry = (a & b) | (b & c) | (c & a) ; endmodule module xtenflop (xtout, xtin, en, elk) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input clk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (en) tmp <= #1 xtin; end endmodule module xtfa(sum, carry, a, b, c) ; output sum, carry; input a, b, c; assign sum = a ^ b Λ c; assign carry = a & b | a & c |' b & c; endmodule module xtflop (xtou , xtin, elk); parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin tmp <= #1 xtin; end endmodule module xtha(sum, carry, a, b) ; output sum, carry; input a, b; assign sum = a Λ b; assign carry = a & b; endmodule module xtinc (xtout, a) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; assign xtout = a + 1; endmodule module xtmux2e (xtout, a, b, sel) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input sel; assign xtout = (~sel) ? a : b; endmodule module xtmux3e (xtout, a, b, c, sel); parameter size = 32; output [size-1 : 0] xtout ; input [size-1 : 0] a; input [size-1 : 0] b; input [size-1 : 0] c ; input [1 : 0] sel ; reg [size-1 : 0] xtout ;
always @ (a or b or c or sel) begin xtout = sel [1] ? c : (sel [0] ? b : a) ; end endmodule module xtmux4e (xtout, a, b, c, d, sel); parameter size = 32; output [size-1 :0] xtout; input [size-l:0] a; input [size-1 :0] b; input [size-1 :0] c; input [size-1 :0] d; input [1:0] sel; reg [size-1 :0] xtout;
// synopsys infer_mux "xtmux4e" always @(sel or a or b or c or d) begin : xtmux4e case (sel) // synopsys parallel_case full_case 2 'b00: xtout = a; 2 'bOl: xtout = b; 2 'blO: xtout = C ; 2'bll: xtout = d; default : xtout = {size{l 'bx} } ; endcase // case (sel) end // always @ (sel or a or b or c or d) endmodule module xtnflop (xtout, xtin, elk); parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always ©(negedge elk) begin tmp <= #1 xtin; end // always @ (negedge elk) endmodule module xtscflop (xtout, xtin, clrb, elk); // sync clear ff parameter size = 32; output [size-1 :0] xtout; input [size-1 : 0] xtin; input clrb ; input elk; reg [size- l : 0] tmp ; assign xtout = tmp; always @ (posedge elk) begin if (Iclrb) tmp <= 0; else tmp <= #1 xtin; end endmodule module xtscenflop (xtout, xtin, en, clrb, elk); // sync clear parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input clrb; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (lclrb) tmp <= 0; else if (en) tmp <= #1 xtin; end endmodule
gf check. dcsh
/*
* Copyright 1999-2000 Tensilica Inc.
* These coded instructions, statements, and computer programs are
* Confidential Proprietary Information of Tensilica Inc. and may not be
* disclosed to third parties or copied in any form, in whole or in part,
* without the prior written consent of Tensilica Inc. */
/*=======================================================================
Generic setup
*/ hdlin_auto_save_templates = true define_design_lib WORK -path workdir define_name_rules no_slash -restrict "/" -replacement_char "_" verilogout_no_tri = true verbose_messages = false sh mkdir -p workdir sh date -' sh hostname Read and elaborate the design
*/
/* foreach (F, {"gf.v", "gf_FF.v", "gf_tlt.v"}) {
*/ foreach (F, {"gf.v"}) { read -f verilog "/home/earl/tensilica/test/gf/gf .out/" + F
/* remove_design find (design, "xtha") >/dev/null remove_design find (design, "xtfa") >/dev/null remove_design find (design, "xtmux4b") >/dev/null read -f verilog "/home/earl/tensilica/test/gf/gf .out/prim.v"
*/
/* elaborate xmTIE
*/ current_design xmTIE link ungroup -all -flatten check_design remove_design find (design, "*") } quit
gf.dcsh
/*
* Copyright 1999-2000 Tensilica Inc.
* These coded instructions, statements, and computer programs are
* Confidential Proprietary Information of Tensilica Inc. and may not be
* disclosed to third parties or copied in any form, in whole or in part,
* without the prior written consent of Tensilica Inc. */
Generic setup
*/ hdlin_auto_save_templates = true define_design_lib WORK -path workdir define_name_rules no_slash -restrict "/" -replacement_char verilogout_no_tri = true verbose_messages = false sh mkdir -p workdir sh date-' sh hostname Library-specific parameters
Most are self-explanatory. Examples for each are shown.
LIB_MAP_ONLY is a set of gates to use the "set_map_only" attribute for Design Compiler. Typically this should be all 3:1 and 4:1 muxes and all half-adders and full-adders.
LIB_SCAN_FLOP is a set of flops to not use for sequential mapping because they represent scan flops in the library.
LIB_DONT_USE can select target gates in the library not to use.
*/ synthetic_library = {standard. sldb} search_path = search_path + { "/cad/artisan/Phantom/synopsys/acbδ72" } target_library = slow.db link_library = {"*"} + target_library + synthetic_library
CLOCK_PERIOD = 6.67 /* target clock period */
CLOCK_SKEW = .35 /* estimated clock skew */
CRITICAL_RANGE = .8 /* keep paths off-critical paths tight */
BOUNDARY_LOAD = slow/lNVXl/A /* typical load */
DRIVE_CELL = DFFX4 /* typical drive cell name */
DRIVE_PIN = Q /* typical drive pin name */
DRIVE_PIN_FROM = CK /* typical drive from pin name */
OPERATING_CONDITION = slow /* operating conditions */
WIRE_L0AD = TSMC32K_Aggresi e /* wire-load model */
LIB_MAP_ONLY = {slow/MX4*, slow/MXI4*, slow/ADDF*, slow/ADDH*}
LIB_SCAN_FLOP = {slow/SDFF*, slow/SEDFF*}
LIB_DONT_USE = {slow/ADDFX4*} + LIB_SCAN_FLOP
Design-specific parameters
TIE_DESIGN is the name of the top-level design for optimization. Typically it is "xmTIE" the root of the TIE logic. However, it can be set to any lower-level design (e.g., any single semantic block such as xmTIE_myblock) to optimize just that semantic block logic.
TIE_RETIME enables "optimize_registers" for retiming a TIE pipelined design. It can be set to 0, 1 or 2. If 0, no retiming is done. If 1, retiming of semantic block logic is done. If 2 , a more aggressive retiming is done which includes the control and bypass logic in the register files.
Retiming requires a Design Compiler Ultra license.
TIE_MAP_EFFORT controls the Design Compiler effort level on the final pass of incremental compiles.
AREA__IS_PRIORITY tweaks the optimization script to try for a minimum area
• design. Use it only when timing constraints are very loose. */
TIE_DESIGN = xmTIE IE_RETIME = 0 TIE_MAP_EFFORT = medium AREA IS PRIORITY = 0
Configure the synthetic library
*/ read standard. sldb set_dont_use standard. sldb/*/rpl remove_attribute standard. sldb/*cmp*/rpl dont_use
Read and elaborate the design
*/ read -f verilog " /home/earl/tensilica/test/gf/gf .out/gf .v" remove_design find (design, "xtha") >/dev/null remove_design find (design, "xtfa") >/dev/null' remove_design find (design, "xtmux4b") >/dev/null read -f verilog "/home/earl/tensilica/test/gf/gf .out/prim.v" elaborate TIE_DESIGN current_design TIE__DESIGN link
Optimize
*/ /*
Copyright (c) 1997-2000 Tensilica Inc.
These coded instructions, statements, and computer programs are Confidential Proprietary Information of Tensilica Inc. and may not be disclosed to third parties or copied in any form, in whole or in part, without the prior written consent of Tensilica Inc.
Title: Synthesis script for Tensilica primitives
Created: Fri Nov 12, 1999
;# Author: Richard Rudell
;# <rudell@tensilica.com> Description:
The Design Compiler "current_design" is relevant when this script is run.
A hierarchical search from the current design finds the set of primitives .
TENSILICA_SOURCE/Hardware/scripts/syn/Xtensa_cons_generic .de sets the constraints on the primitives .
The primitives are ungrouped when they are optimized. Most primitives are optimized with a CLOCK_PERIOD of 0 and a CLOCK_SKEW of 0 (i.e., min- delay) . Some are mapped with the real constraints. Not all primitives are optimized.
The primitives are ordered so that primitives which contain other primitives as instances will be optimized later in the flow. The order is hardwired.
XTADD and XTMUL give better results when mapped "incremental". A primitive with lots of generic logic when it is mapped usually is worse when mapped incremental . prim.v contains special synthesis versions of xtmux3e, xtmux4e, and xtcsa.
These designs contain cells of xtmux3e_l0 4 , xtmux4e_l024 , and xtcsa_1024 which then instantiate 1,024 xtmux3b, xtmux4b, and xtfa cells. It is important that these designs are ungrouped and optimized to remove the many nets with no fanout . This trick is used to.ensure efficient cells from the library are used, regardless of the width of the primitive.
Single-bit versions of xtmux3b, xtmux4b, xtfa and xtha are premapped hoping to get single dells from the library of they exist. Note that this is pretty much guaranteed for xtmux4b, xtfa, and xtha as they are instantiated in "prim.v" as GTECH components.
Revision History:
Nov 1999: Rewrite to specialize it for some primitives
Nov 199S: Original version */
XTVERBOSE = 0
XTCURRENT_DESIGN = current_design
XTCLOCK_PERIOD = CLOCK_PERI0D
XTCLOCK 3KEW = CLOCK_SKEW
LAST TIME = timeO
/* configure the library */ read target_library set_map_only LIB_MAP_ONLY + {gtech/GTECH_ADD_ABC, gtech/GTECH_ADD_AB , gtech/GTECH_MUX4} true if (LIB_DONT_USE != {}) { set_dont_use LIB_DONT_USE } current_design XTCURRENT_DΞSIGN
XTGATE = find (design, "xtmux*b", -hier) + find (design, "xtfa", -hier) + find (design, "xtha", -hier) >/dev/null
XTCLOCKGATE = find(design, "xtclock_gate*" , -hier) >/dev/null
XTRFLATCH = find (design, "xtRF*latch*" , -hier) >/dev/null
XTMUX2 = find (design, "xtmux2_size*" , -hier) + find (design,
"xtmux2e_size*" , -hier) + find (design, "xtmux2p_size*" , -hier) >/dev/null
XTMUX3 = find (design, "xtmux3_size*" , -hier) + find (design,
"xtmux3e_size*" , -hier) + find(design, "xtmux3p_size*" , -hier) >/dev/null
XTMUX4 = find (design, "xtmux4_size*" , -hier) + find (design,
"xtmux4e_size*" , -hier) + find (design, "xtmux4p_size*" , -hier) >/dev/null
XTBOOTH = find (design, "xtbooth*", -hier) >/dev/null
XTADD = find(design, "xtinc*", -hier) + find(design, "xtadd*", -hier) + find (design, "xtcsa_size*" , -hier) + find (design, "xtrelational*" , -hier)
>/dev/null
XTMUL = find (design, "xtmul*", -hier) + find (design, "xtmac*", -hier)
>/dev/null
XTREGFILE = find (design, "xtregfile*", -hier) >/dev/null
/* set the compilation order */
XTPRIM = XTGATE + XTCLOCKGATE + XTRFLATCH + XTMUX2 + XTMUX3 + XTMUX4 +
XTBOOTH + XTADD + XTMUL + XTREGFILE
/* set compile options */
XTFLATTEN = { }
XTSTRUCTURE = {}
XTDONT TOUCH = XTCLOCKGATE + XTREGFILE
XTINCREMENTAL = XTADD + XTMUL + XTREGFILE
XTAREA = XTCLOCKGATE -l- XTRFLATCH
XTRELAXED = XTREGFILE
Premap the primitives
*/ if (XTFLATTEN ! = { } ) { set_flatten true -design XTFLATTEN
} if (XTPRIM - XTSTRUCTURE != {}) { set_structure false -design XTPRIM - XTSTRUCTURE
} if (XTDONTJTOUCH != {}) { set_dont_touch XTDONTJTOUCH true } foreach (D, XTPRIM) { echo "Primitive map " + D current_design D echo "Ungrouping " + D ungroup -all -flatten >/dev/null echo "Constraining " + D if (({D} - XTAREA) == {}) { echo D + " : Area optimization" set_max_area 0 } else { if (({D} - XTRELAXED) == {}) { /* normal constraints */ CL0CK_PERI0D = XTCLOCK_PERIOD CLOCK_SKEW = XTCLOCK_SKEW } else {
/* overconstrain all other primitives */ CLOCKJPERIOD = 0 CLOCK 3KEW = 0
} echo D + " : Clock period is " + CLOCK_PERIOD + " and clock skew is "
+ CLOCK_SKEW /*
Copyright (c) 1997-2000 Tensilica Inc.
These coded instructions, statements, and computer programs are Confidential Proprietary Information of Tensilica Inc. and may not be disclosed to third parties or copied in any form, in whole or in part, without the prior written consent of Tensilica Inc.
Title: Generic Design Compiler Constraints
Created: November, 199δ
;# Author: Richard Rudell ;# <rudell@tensilica.com>
Description:
Revision History:
Nov 1999: Changed multicycle paths for RFLATCH into a set_disable_timing on the latches instead
Nov 1998 : Original version
*/
/* ==================== clocks ==================== */
CLOCK_PORT = find (port, "CLK") + find (port, "G*CLK") + find (port, "elk")
>/dev/null if (CLOCK_PORT == {}) { create_clock -name CLK -period CLOCKJPERIOD
} else {
CLOCKJPORT = filter (CLOCK_PORT, "@port_direction == in") >/dev/null create__clock CLOCK_PORT -name CLK -period CLOCK_PERIOD
} set_dont_touch_network find (clock, "*") set_fix_hold find (clock, "*") set_clock_skew -ideal -uncertainty CLOCK_SKEW find(clock, "*") DEBUG_CLOCK_PORT = find (port, "TClockDR") >/dev/null if (DEBUG_CLOCK_PORT != {}) { create_clock DEBUG_CLOCK_PORT -name TClockDR -period 4 * CLOCK_PERIOD
}
/* ==================== i/o delays, loads, drives ==================== */ set_input_delay .20 * CLOCKJPERIOD -clock CLK all_inputs() - CLOCKJPORT -
DEBUG_CLOCK_PORT set_output_delay .20 * CLOCKJPERIOD -clock CLK all_outputs () set_load {4 * load_of (BOUNDARY_LOAD) } all_outputs () set_driving_cell -cell DRIVE_CELL -pin DRIVE_PIN -from_pin DRIVE_PIN_FROM all_inputs() - CLOCK_PORT - DEBUG_CLOCK_PORT >/dev/null
/* ==================== Miscellaneous ==================== */ set_operating_conditions OPERATING_CONDITION
/* BACKWARD COMPATIBILITY ISSUE: set_wire_load_model DOES NOT work with
DC98.0δ */
/* set_wire_load_model -name WIRE_LOAD */ set_wire_load WIRE_LOAD set_critical_range CRITICAL_RANGE current_design
/* ==================== clock Gating Checks ==================== */ set_clock_gating_check -setup CLOCK_SKEW -hold CLOCK_SKEW current_design
/* ==================== Disable latch timing ==================== */
/* the if prevents RFLATCH from being printed */ if (FOOBAR == FOOBAR) {
RFLATCH = find(cell, "*xtRF*latchout* " , -hier) >/dev/null if (RFLATCH != {}) { echo disabling timing through the latches set_disable_timing RFLATCH } }
/* ==================== False paths ==================== */
/* if (DEBUGJCLOCKJPORT != {}) { set_false_path -from TClockDR -to CLK set_false_path -from CLK -to TClockDR
} */ if (({D} - XTREGFILE) == { } ) { set_input_delay .35 * CLOCKJPERIOD -clock CLK find (port, "wr*_addr" ) >/dev/nul1 set_input_delay .35 * CLOCK_PERIOD -clock CLK find (port, "wr*_we")
} } echo "Optimizing " + D if (({D} - XTINCREMENTAL) == { } ) { compile -map_effort low -ungroup_all -no_design_rule -incremental } else { compile -map effort low -ungroup_all -no_design_rule } if (XTVERBOSE) { echo "Reporting " + D report_constraint reportJ:iming report_area report_reference
ELAPSEJTIME = time() - LASTJTIME LASTJTIME = timeO echo D + " elapse time is " + ELAPSEJTIME echo D + " total time is " + time() echo D + " memory is " + mem() } } echo "Prim total time is " + time() echo "Prim memory is " + mem() remove_design find (design, "xtmux3e_1024" ) >/dev/null remove_design find (design, "xtmux4e_1024") >/dev/null remove_design find (design, "xtcsa_1024") >/dev/null current_design XTCURRENT_DESIGN CLOCK_PERIOD = XTCLOCK_PERIOD CLOCK_SKΞW = XTCLOCK_SKEW /*
Copyright (c) 1997-2000 Tensilica Inc.
These coded instructions, statements, and computer programs are Confidential Proprietary Information of Tensilica Inc. and may not be disclosed to third parties or copied in any form, in whole or in part, without the prior written consent of Ten-silica Inc.
Title: Synthesis script for TIE Coprocessors
Created: Fri Nov 12, 1999
;# Author: Richard Rudell ;# <rudell@tensilica.com>
Description:
Controls Design Compiler for optimizing TIE Coprocessors.
Set TIE_DESIGN to TIE to optimize the TIE module, or set it to the verilog name of a semantic block (e.g., TIE_vec_mac) to optimize just that module.
Set TIEJRETIME to 1 to perform retiming ("optimize_registers") . All of the TIE -logic except for the the pipelined register files will be retimed.
If
TIEJRETIME is 2, only the register file cores will not be retimed. This allows, for retiming of the pipeline logic within the register files, but is more taxing on the Design Compiler retiming algorithm.
TIE_MAP_EFFORT is one of {low, medium, high} for the final optimization.
The steps are as follows :
- group the top-level logic into a design (TIE_toplogic)
- set compile options
- optimize the design for each top-level cell (low effort)
- TIEJRETIME: regroup the top-level design for retiming
- optimize top-level design (using TIΞ_MAP_EFFORT)
- TIE_RETIME: retime the top-level design
- optimize top-level design (using TIE_MAP_EFFORT)
- fix design rules
Revision History:
Nov 1999: Original version
*/
/*=============================_
Group the TIE top-level logic into a subdesign
*/ current_design TIEJDESIGN if (TIEJUNGROUP != {}) {
/* remove some cells */ ungroup TIE_UNGROUP -flatten
} if (TIEJDESIGN == "xmTIE") {
/* group the top-level random logic into a subdesign */
TIE_CELL_LIST = find (cell, "TIE_*") >/dev/null group -design_name xmTIE_toplogic -cell_name- TIE_toplogic -except TIE_CELL_LIST }
Find the top-level cells and their designs
*/ current_design TIEJDESIGN if (TIEJDESIGN == "xmTIE") {
TIE_CELL_LIST = find (cell, "TIE_*") >/dev/null
TIEJDESIGNJLIST = {} } else {
TIE_CELL_LIST = {}
TIE_DESIGN_LIST = TIEJDESIGN
} foreach'-" CC, TIE_CELL_LIST) {
TIEJDESIGNJLIST = TIEJDESIGNJLIST + find (design, "xm" + C)
}
TIEJREGFILE = find(design, "xmTIE*_Regfile" , -hier) + find(design,
"xmTIE*_State", -hier) >/dev/null
TIEJXTREGFILE = find (design, "xtregfile*", -hier) >/dev/null TIE_DECODER = find (design, "xmTIΞ_decoder" , -hier) >/dev/hull
/*=======================================================================
Set optimization controls.
*/
TIEJFLATTEN = TIEJDECODER /* always flatten decoder */ if (AREA_IS_PRIORITY) {
TIE_STRUCTURE = TIE_DESIGN_LIST } else {
TIE_STRUCTURE = TIE_DECODER /* always structure decoder */
} if (TIEJFLATTEN !=• {}) { set_flatten true -effort medium -design TIEJFLATTEN
} if (TIEJDESIGN_LIST - TIE_STRUCTURE !■•= {}) { set_structure false -design TIEJDESIGNJLIST - TIE_STRUCTURE
Premap the hierarchical designs
*/
LASTJTIME = timeO foreach (D, TIE_DESIGN_LIΞT) { echo "Premapping " + D current_design D echo "Ungrouping " + D ungroup -all -flatten echo "Constraining " + D set_resource_allocation none set_resource_implementation area_only
/*
+
Copyright (c) 1997-2000 Tensilica Inc.
These coded instructions, statements, and computer programs are Confidential Proprietary Information of Tensilica Inc. and may not be disclosed to third parties or copied in any form, in whole or in part, without the prior written consent of Tensilica Inc.
Title: Generic Design Compiler Constraints
Created: November, 199δ
;# Author: Richard Rudell ,-# <rudell@tensilica.com>
Description: Revision History -.
Nov 1999 : Changed multicycle paths for RFLATCH into a set_disable_timing on the latches instead
Nov 1998 : Original version
*/
/* ==================== Clocks ==================== */
CLOCK PORT = find (port, "CLK") + find (port, "G*CLK") + find (port, "elk")
>/dev/null if (CLOCK_PORT == {}) { create_clock -name CLK -period CLOCKJPERIOD
} else {
CLOCK_PORT = filter (CLOCK_PORT, "@port_direction == in") >/dev/null create_clock CLOCK_PORT -name CLK -period CLOCK_PERIOD
} set_dont_touch_network find (clock, "*") set_fix_hold find (clock, "*") set_clock_skew -ideal -uncertainty CLOCK_SKEW find (clock, "*")
DEBUG_CLOCKJPORT = find (port, "TClockDR") >/dev/null if (DEBUG_CLOCK_PORT != {}) { create_clock DEBUG_CLOCK_PORT -name TClockDR -period 4 * CLOCKJPERIOD
}
/* ==================== i/o delays, loads, drives ==================== */ set_input_delay .20 * CLOCK_PERIOD -clock CLK all_inputs() - CLOCKJPORT -
DEBUG_CLOCK_PORT set_output_delay .20 * CLOCKJPERIOD -clock CLK all_outputs () set_load {4 * load_of (BOUNDARY_LOAD) } all_outputs () set_driving_cell -cell DRIVE_CELL -pin DRIVE_PIN -from_ >in DRIVE_PIN_FROM all_inputs() - CLOCK_PORT - DEBUG_CLOCK_PORT >/dev/null
/* ==================== Miscellaneous ==================== */ set_operating_conditions OPERATING_CONDITION
/* BACKWARD COMPATIBILITY ISSUE: set_wire_load_model DOES NOT work with
DC98.08 */
/* set_wire_load_model -name WIREJLOAD */ set_wire_load WIREJLOAD set_critical_range CRITICAL_RANGE current_design
/* ==================== clock Gating Checks ==================== */ set_clock_gating_check -setup CLOCKJSKEW -hold CLOCK_SKEW current_design
/* ==================== Disable latch timing =================
/* the if prevents RFLATCH from being printed */ if (FOOBAR == FOOBAR) {
RFLATCH = find (cell, "*xtRF*latchout*" , -hier) >/dev/null if (RFLATCH != {}) { echo disabling timing through the latches set_disable_timing RFLATCH } ' }
False paths /* if (DEBUG_CLOCK_PORT != {}) { set_false_path -from TClockDR -to CLK set_false_path -from CLK -to TClockDR
}
*/ if (FOOBAR == FOOBAR) {
X = find (port, "MemOpAddr_E") >/dev/null if (X != {}) { echo setting input delay for TIE memory interface set_input_delay .50 * CLOCKJPERIOD -clock CLK X
}
X = find (port, "TIE_MemLoadData_M") + find (port, "MemDataln*")
>/dev/null if (X != {}) { echo setting input delay for TIE memory interface set_input_delay .60 * CLOCKJPERIOD -clock CLK X }
/* constraints for TIE register files and TIE state */
X = find (port, "rd*_data_C*") + find (port, "ps_data_C*") >/dev/null if (X != {}) { echo setting output delay for TIE register file set_output_delay .95 * CLOCKJPERIOD -clock CLK X
}
X = find (port, "wd*_data*_C*") + find (port, "wr*_data*_C*") + find (port, "ns_data*_C*") >/dev/null if (X != {}) { echo setting input delay for TIE register file set_input_delay .90 * CLOCK_PERIOD -clock CLK X
}
X = find (port, "wd*_wen_C*") + find (port, "Kill*") >/dev/null if (X != {}) {
X = filter (X, "@port_direction == in") >/dev/null if (X != {}) { echo setting input delay for TIE register file controls set_input_delay .35 * CLOCKJPERIOD -clock CLK X } }
} if (TIE_RETIME) { set__critical_range CLOCKJPERIOD current_design } echo "Optimizing " + D compile -map_effort low -ungroup_all -no_design_rule echo "Reporting " + D report_constraint reportJ;iming report_area report_reference
ELAPSEJTIME = time() - LASTJTIME LASTJTIME = time() echo D + " elapse time is " + ELAPSEJTIME echo D + " total time is " + timeO echo D + " memory is " + mem() echo "Premap total time is " + time() echo "Premap memory is " + mem()
Report on the top level
*/ current_design TIEJDESIGN
/*
Copyright (c) 1997-2000 Tensilica Inc.
These coded instructions, statements, and computer programs are Confidential Proprietary Information of Tensilica Inc. and may not be disclosed to third parties or copied in any form, in whole or in part, without the prior written consent of Tensilica Inc.
Title: Generic Design Compiler Constraints
Created: November, 1998
;# Author: Richard Rudell ;# <rudell@tensilica.com>
Description:
Revision History:
Nov 1999: Changed multicycle paths for RFLATCH into a set_disable_timing on the latches instead
Nov 1998 : Original version
*/
/* ==================== clocks ==================== */
CLOCK_PORT = find (port, "CLK") + find (port, "G*CLK") + find (port, "elk")
>/dev/null if (CLOCK_PORT == {}) { create_clock -name CLK -period CLOCK_PERIOD
} else {
CLOCK_PORT = filter (CLOCKJPORT, "®port_direction == in") >/dev/null create_clock CLOCK_PORT -name CLK -period CLOCK_PERIOD
} set_dont_touch_network find(clock, "*") set_fix_hold find (clock, "*") set_clock_skew -ideal -uncertainty CLOCK_SKEW find (clock, "*■■)
DEBUG_CLOCK_PORT = find (port, "TClockDR") >/dev/null if (DEBUG_CLOCK_PORT != {}) { create_clock DEBUG_CLOCK_PORT -name TClockDR -period 4 * CLOCK_PERIOD }
/* ==================== i/o delays, loads, drives set_input_delay .20 * CLOCK_PERIOD -clock CLK all_inputs() - CLOCK_PORT -
DEBUG_CLOCK_PORT set_output_delay .20 * CLOCK_PERIOD -clock CLK all_outputs () set_load {4 * load_of (BOUNDARY_LOAD) } all_outputs () set_driving_cell -cell DRIVE_CELL -pin DRIVΞ_PIN -from_pin DRIVE_PIN_FROM all_inputs() - CLOCKJPORT - DEBUG_CLOCK_PORT >/dev/null
/* ==================== Miscellaneous ==================== */ set_operating_conditions OPERATING_CONDITION
/* BACKWARD COMPATIBILITY ISSUE: set_wire_Jload_model DOES NOT work with
DC98.08 */
/* set_wire_load_model -name WIRE_LOAD */ set_wire_load WIRE_LOAD set_critical_range CRITICAL_RANGE current__design
/* ==================== clock Gating- Checks ==================== */ set_clock_gating_check -setup CLOCK_SKEW -hold CLOCK_SKEW current_design
/* ==================== Disable latch timing ==================== */
/* the if prevents RFLATCH from being printed */ if (FOOBAR == FOOBAR) {
RFLATCH = find (cell, "*xtRF*latchout*" , -hier) >/dev/null if (RFLATCH != {}) { echo disabling timing through the latches set_disable_timing RFLATCH } }
/* ==================== False paths ==================== */
/* if (DEBUG_CLOCK_PORT != {}) { set_false_path -from TClockDR -to CLK set_false_path -from CLK -to TClockDR
}
*/ if (FOOBAR == FOOBAR) {
X = find (port, "MemOpAddr_E") >/dev/null if (X != {}) { echo setting input delay for TIE memory interface set_input_delay .50 * CLOCK_PERIOD -clock CLK X
}
X = find (port, "TIE_MemLoadData_M") + find (port, "MemDataln*") >/dev/null if (X != {}) { echo setting input delay for TIE memory interface set_input_delay .60 * CLOCKJPERIOD -clock CLK X }
/* constraints for TIE register files and TIE state */
X = find (port, "rd*_data_C*") + find (port, "ps_data_C*") >/dev/null if (X != {}) { echo setting output delay for TIE register file set_output_delay .95 * CLOCK_PERIOD -clock CLK X
}
X = find (port, "wd*_data*_C*") + find (port, "wr*_data*_C*") + find (port, "ns_data*_C*") >/dev/null if (X != {}) { echo setting input delay for TIE register file set_input_delay .90 * CLOCK_PERIOD -clock CLK X
}
X = find (port, "wd*_wen_C*") + find (port, "Kill*") >/dev/null if (X != {}) {
X = filter (X, "@port_direction == in") >/dev/null if (X • = {}) { echo setting input delay for TIE register file controls set_input_delay .35 * CLOCK_PERIOD -clock CLK X } } } report_constraint report_timing report_area report_reference
Prepare design for retiming: keep the register files as subdesigns, and group everything else into "datapath". Also, set a very high critical range so that all paths are made fast.
*/ current_design TIEJDESIGN if (TIE_RETIME) { set_critical_range CLOCKJPERIOD current_design if (TIEJRETIME == 2) {
TIE_KEEP_DESIGN = TIEJXTREGFILE } else {
TIE_KEEP_DESIGN = TIE_REGFILE
} list TIE_KEΞP_DESIGN if (TIE_KEEP_DESIGN == { } ) {
TIE_RETIMEJDESIGN = TIEJDESIGN ungroup -all -flatten } else {
TIEJRETIME_DESIGN = "xmTIE_datapath" set_dont_touch TIE_KEEP_DESIGN true ungroup -all -flatten set_dont_touch TIE_KEEP_DESIGN false if (TIE_RETIME == 2) {
TIE_KEEP_CELL = find (cell, "*icore") } else {
TIE_KEEP_CELL = find (cell, "TIΞ*_Regfile" ) + find (cell, "TIE*_State")
} group -design TIEJRETIMEJDESIGN -cell TIEJRETIME_DESIGN -except TIE_KEEP_CELL list TIE_KEEP_CELL } . }
Pass 1 */ current_design TIEJDESIGN if (TIEJXTREGFILE != {}) { set_dont_touch TIE_XTREGFILE false
} if (TIEJDESIGN == "xmTIE" ) { compile_no_new_cells_at_top_level = true
} uniquify compile -incremental -map_effort TIE_MAP_EFFORT -no_design_rule boundary_optimization report_constraint report_timing report_area report_reference
ELAPSEJTIME = timeO - LASTJTIME
LASTJTIME = time() echo "passl elapse time is " + ELAPSEJTIME echo "passl total time is " + time() echo "passl memory is " + mem()
Retime
*/ current_design TIEJDESIGN if (TIEJRETIME) { if (TIE_RETIME_DESIGN != TIEJDESIGN) { characterize TIE_RETIMEJDESIGN current_design TIEJRETIMEJDESIGN set_wire_load WIREJLOAD
} optimize_registers -check_design -print_critical_loop no_ineremental_map current_design TIE_DESIGN set_critical_range CRITICALJRANGE current_design }
Pass 2 (add area constraint)
*/ current_design TIEJDESIGN set_max_area 0 compile -incremental -map_effort TIE_MAP_EFFORT -no_design_rule boundary_optimization report_c'onstraint report_timing report_area report_reference
ELAPSEJTIME = time() - LASTJTIME
LAST TIME = time() echo "pass2 elapse time is " + ELAPSEJTIME echo "pass2 total time is " 4- time() echo "pass2 memory is " + mem()
Pass 3 (Design Rules)
*/ current_design TIE_DESIGN compile -incremental -map_effort TIE_MAP_EFFORT -only_design_rule boundary_optimization report_constraint report_timing report_area report_reference
ELAPSEJTIME = time() - LASTJTIME
LASTJTIME = ti eO echo "pass3 elapse time is " + ELAPSEJTIME echo "pass3 total time is " + time() echo "pass3 memory is " + mem()
/*===============
Write it out
*/ current_design TIE_DESIGN write -o TIE DESIGN + ".db" -hier
/*=============================================================
Final hierarchical area/timing report
*/ current_design TIEJDESIGN
X = find (cell, "TIE_*") + find (cell, "icore") >/dev/null if (X != {}) { characterize X } current_design TIEJDESIGN reportjnierarchy > TIEJDESIGN + ".report" foreach (D, TIEJDESIGN + find(design, "*", -hier)) { echo- "Final report " + D current_design D report_constraint >> TIEJDESIGN + ".report" reportjtiming >> TIEJDESIGN + ".report" repσrt_area >> TIE_DESIGN + ".report" report_reference >> TIEJDESIGN + ".report"
} echo "xmTIE elapse time is " + time() >>TIEJDESIGN + ".report" echo "xmTIE memory is " + mem() >>TIEJDESIGN + ".report" sh r -rf workdir echo "xmTIE total time is " + time O echo "xmTIE memory is " + mem ( ) quit
pr-L .v
//
// Copyright (c) 1997-2000 Tensilica Inc.
//
// These coded instructions, statements, and computer programs
// are Confidential Proprietary Information of Tensilica Inc.
// and may not be disclosed to third parties or copied in any
// form, in whole or in part, without the prior written
// consent of Tensilica Inc.
// +
//
// Title: Base Synthesis Primitives
//
// Created: Tue Sep 28 16:59:24 1999
//
//
// Description:
//
// Revision History:
// module xtmux3e (xtout, a, b, c, sel) ; parameter size = 32; output [size-1 :0] xtout; input [size-l:0] a, b, c; input [1:0] sel; wire [1023 0] tmp; wire [1023 0] fa; wire [1023 0] fb; wire [1023 0] fc; assign fa [1023 :size] = {(1024 - size) {l 'bθ} } ; assign fa [size-1 :0] = a; assign fb [1023 :size] = {(1024 - size) {l 'bo} } ; assign fb [size-1 :0] = b; assign fc [1023 :size] = {(1024 - size) {l 'bθ} } ; assign fc [size-1.-0] = c; xtmux3e_1024 i(tmp, fa, fb, fc, sel); assign xtout = tmp; endmodule module xtmux3b (xtout, a, b, c, sel); output xtout; input a, b, c ; input [1 : 0] sel ;
// synopsys infer nux "xtmux3b" assign xtout = sel [1] ? c : (sel [0] ? b : a) ; endmodule module xtmux4e (xtout , a, b, c , d, sel) ; parameter size = 32 ; output [size-1 : 0] xtout ; input [size-1 : 0] a, b, c , d; input [1 : 0] sel ; wire [1023:0] tmp; wire [1023:0] fa wire [1023:0] fb wire [1023:0] fc wire [1023:0] fd, assign fa [1023 :size] = {(1024 - size){l'bθ}} assign fa[size-l:0] = a- assign fb [1023 : size] - = (1024 - size) {l'bO}} assign fb[size-l:0] = b assign fc [1023 : size] = (1024 - size) {l'bO}} assign fc [size-1 :0] = c assign fd [1023 :size] = (1024 - size) {l'bO}} assign fd[size-l:0] = d, xtmux4e_1024 i(tmp, fa, fb, fc, fd, sel); assign xtout = tmp; endmodule module xtmux4 (xtout , a, b, c, d, sel) output tout; input a, b, c, d; input [1:0] sel;
GTECHJMUX4 i ( .D0 (a .Dl(b), .D2(c), .D3(d), .A(sel[0]), .B(sel[l]), .Z (xtout) ) ; endmodule module xtcsa(sum, carry, a, b, c) ; parameter size = 32; output [size-1 :0] sum, carry; input [size-1 :0] a, b, c; wire [1023:0] tmpl, tmp2 ; wire [1023:0] fa; wire [1023:0] fb; wire [1023:0] fc; assign fa [1023 :size] = {(1024 size) {l'bO}} assign fa[size-l:0] = a; assign fb [1023 : size] = {(1024 size) {l'bO}} assign fb[size-l:0] = b; assign fc [1023 :size] = {(1024 size) {l'bO}} assign fc[size-l:0] = c; xtcsa_1024 i(tmpl, tmp2, fa, fb, fc) ,- assign sum = tmpl; assign carry '= tmp2; endmodule
module xtfa (sum, carry, a, b, c) ; output sum, carry; input a, b, c;
GTECH_ADD_ABC i(a, b, c, sum, carry); endmodule
module xtha (sum, carry, a, b) ,- output sum, carry; input a> D;
GTECH_ADD_AB i(a, b, sum, carry); endmodule module xtmux3e_1024 (xtout, a, b, c, sel) ; output [1023:0] xtout; input [1023 : 0] a, b, c; input [1 : 0] sel; xtmux3b iO ( .xtout (xtout [0] b(b[0] .sel (sel) xtmux3b il ( .xtout (xtout [1] b(b[l] .sel (sel) xtmux3b i2 ( .xtout (xtout [2] b(b[2] .sel (sel) xtmux3b i3 ( .xtout (xtout [3] b(b[3] .sel (sel) xtmux3b i ( .xtout (xtout [4] b(b[4] .sel (sel) xtmux3b i5 ( .xtout (xtout [5] b(b[5] .sel (sel) xtmux3b i.6 ( .xtout (xtout [6] b(b[6] .sel (sel) xtmux3b i7 ( .xtout (xtout [7] b(b[7] . sel (sel) xtmux3b iδ ( .xtout (xtout [8] b(b[8] .sel (sel) xtmux3b i9 ( .xtout (xtout [9] b(b[9] . sel (sel) xtmux3b ilO ( .xtout (xtout [10] 10]) ■ b( 10] . sel (sel) ) ; xtmux3b ill ( .xtout (xtout [11] 1]) •b( 1] . sel (sel) ) ; xtmux3b il2 ( .xtout (xtout [12] 12]) •b( 12] .sel (sel) ) ; xtmux3b il3 ( .xtout (xtout [13] 3]) •b( 13] . sel (sel) ) ; xtmux3b il4 ( .xtout (xtout [14] 14]) •b( 14] .sel (sel) ) ; xtmux3b il5 ( .xtout (xtout [15] 15]) •b( 15] . sel (sel) ) ; xtmux3b il6 ( .xtout (xtout [16] 16]) ■ b( 16] .sel (sel) ) ; xtmux3b il7 ( .xtout (xtout [17] 17]) •b( 17] . sel (sel) ) ; xtmux3b il8 ( .xtout (xtout [18] 8] ) ■ b( 8] .sel (sel) ) ; xtmux3b il9 ( .xtout (xtout [19] 19] ) -b( 19] . sel (sel) ) ; xtmux3b i20 ( .xtout (xtout [20] 0] ) -b( 0] .sel (sel) ) ; xtmux3b i21 ( .xtout (xtout [21] 1]) •b( 1] .sel (sel) ) ; xtmux3b i22 ( .xtout (xtout [22] 2] ) •b( 2] . sel (sel) ) ; xtmux3b i23 ( .xtout (xtout [23] 3] ) •b( 3] . sel (sel) ) ; xtmux3b i24 ( .xtout ( tout [24] 4]) •b( 4] . sel (sel) ) ; xtmux3b i25 ( .xtout (xtout [25] 5]) • b( 5] . sel (sel) ) ,- xtmux3b i26 ( .xtout (xtout [26] 6] ) •b( 6] .sel (sel) ) ; xtmux3b i27 ( .xtout (xtout [27] 7] ) •b( 7] . sel (sel) ) ; xtmux3b i28 ( .xtout (xtout [28] 8] ) •b( 8] . sel (sel) ) ; xtmux3b i29 ( .xtout (xtout [29] 9] ). •b( 9] .sel (sel) ) ; xtmux3b i30 ( .xtout (xtout [30] 0] ) •b( 0] . sel (sel) ) ; xtmu'x3b i31 ( .xtout (xtout [31] 1]) •b(b 1] .sel (sel) ) ; xtmux3b i32 ( .xtout (xtout [32] 2] ) -b( 2] .sel (sel) ) ; xtmux3b i33 ( -xtout (xtout [33]
Figure imgf000179_0001
3] ) •b(
Figure imgf000179_0002
3] .sel (sel) ) ; xtmux3b i34 ( sel (sel) ) ; xtmux3b i35 ( sel (sel) ) ; xtmux3b i36( sel (sel) ) ; xtmux3b i37 ( sel (sel) ) ; xtmux3b i38 ( sel (sel) ) ; xtmux3b i39( sel (sel) ) ; xtmux3b i40 ( sel (sel) ) ; xtmux3b i41( sel (sel) ) ; xtmux3b i42 ( sel (sel) ) ; xtmux3b i43 ( sel (sel) ) ; xtmux3b i44 ( sel (sel) ) ; xtmux3b i45 ( sel (sel) ) ; xtmux3b i46{ sel (sel) ) ; xtmux3b i47 ( sel (sel) ) ; xtmux3b i48 ( sel (sel) ) ; xtmux3b i49( sel (sel) ) ; xtmux3b i50 ( sel (sel) ) ; xtmux3b i51( sel (sel) ) ; xtmux3b i52 ( sel (sel) ) ; xtmux3b i53 ( sel (sel) ) ; xtmux3b i54 ( sel (sel) ). ; xtmux3b i55( sel (sel) ) ; xtmux3b i56( sel (sel) ) ; xtmux3b i57( sel (sel) ) ; xtmux3b i58 ( sel (sel) ) ; xtmux3b i59( sel (sel) ) ; xtmux3b i60 ( sel (sel) ) ; xtmύx3b i61( sel (sel) ) ; xtmux3b i62 ( sel (sel) ) ; xtmux3b i63 (
Figure imgf000180_0001
sel (sel) ) ; xtmux3b i64 (sel) ) ; xtmux3b i65 (sel) ) ; xtmux3b i66 (sel) ) ; xtmux3b i67 (sel) ) ; xtmux3b .168 (sel) ) ; xtmux3b i69 (sel) ) ; xtmux3b i70 (sel) ) ; xtmux3b i71 (sel) ) ;
Xtmux3b i72 (sel) ) ; xtmux3b i73 (sel) ) ; xtmux3b i74 (sel) ) ; xtmux3b i75 (sel) ) ; xtmux3b i76 (sel) ) ; xtmux3b i77 (sel) ) ; xtmux3b i78 (sel) ) ; xtmux3b i79 (sel) ) ; xtmux3b i80 (sel) ) ; xtmux3b iδl (sel) ) ; xtmux3b i82 (sel) ) ; xtmux3b i83 (sel) ) ; xtmux3b iδ4 (sel) ) ; xtmux3b iδ5 (sel) ) ; xtmux3b i86 (sel) ) ; xtmux3b i87 (sel) ) ; xtmux3b iδδ (sel) ) ; . xtmux3b iδ9 (sel) ) ; xtmux3b i90 (sel) ) ; xtmύx3b i91 (sel) ) ; xtmux3b i92 (sel) ) ; xtmux3b i93
Figure imgf000181_0001
(sel) ) ; xtmux3b i94 ( .xtout (xtout [94] ) , .a (a 94]) b(b[94 .c(c[94] sel (sel) ) ; xtmux3b i95 ( .xtout (xtout [95] ) , .a (a 95]) b(b[95 .c(c[95] sel (sel) ) ; xtmux3b i96 ( .xtout (xtout [96] ) , .a (a 96]) b(b[96 .c(c[96] sel (sel) ) ; xtmux3b i97 ( .xtout (xtout [97] ) , .a(a 97]) b(b[97 .c(c[97] sel (sel) ) ; xtmux3b i98 ( .xtout (xtout [98] ) , .a (a 98]) b(b[9β: .c(c[98] sel (sel) ) ; xtmux3b i99 ( .xtout (xtout [99] ) , .a (a 99]) b(b[99 .c(c[99] sel (sel) ) ; xtmux3b a[100] .b(b 100] .c 100] sel (sel) ) ; xtmux3b a[101] -b(b 101] 101] sel (sel) ) ; xtmux3b a[102] .b(b 102] 102] sel (sel) ) ; xtmux3b a[103] .b(b 103] .c 103] sel (sel) ) ; xtmux3b a[104] • b(b 104] 104] sel (sel) ) ; xtmux3b .a a[105] .b(b 105] .c 105] sel (sel) ) ; xtmux3b a[106] .b(b 106] 106] sel (sel) ) ; xtmux3b . a a [107] .b(b 107] 107] sel (sel) ) ; xtmux3b a[108] ,b(b 108] . c 108] sel (sel) ) ; xtmux3b .a a[109] .b(b 109] .c 109] sel (sel) ) ; xtmux3b a [110] .b(b 110] . c 110] sel (sel) ) ; xtmux3b a[lll] .b(b 111] 111] sel (sel) ) ; xtmux3b a [112] .b(b 112] 112] sel (sel) ) ; xtmux3b .a a[113] .b(b 113] . c 113] sel (sel) ) ; xtmux3b a [114] .b(b 114] . c 114] sel (sel) ) ; xtmux3b a [115] .b(b 115] 115] sel (sel) ) ; xtmux3b a [116] .b(b 116] 116] sel (sel) ) ; xtmux3b a [117] .b(b 117] 117] sel (sel) ) ; xtmux3b a [118] .b(b 118] .c 118] sel (sel) ) ; xtmux3b a[119] .b(b 119] 119] sel (sel) ) ; xtmux3b a[120] .b(b 120] 120] sel (sel) ) ; xtmux3b [121] .b(b 121] 121] sel (sel) ) ; xtmύx3b [122] .b(b 122] 122] sel (sel) ) ; xtmux3b
Figure imgf000182_0001
[123] .b(b 123] 123] sel (sel) ) ,- xtmux3b il24 ( .b(b[124] sel (sel) ) ; xtmux3b il25 ( .b(b[125] sel (sel) ) ; xtmux3b il26( .b(b[126] sel (sel) ) ; xtmux3b il27( .b(b[127] sel (sel) ) ; xtmux3b il28 ( .b(b[128] sel (sel) ) ; xtmux3b il29( .b(b[129] sel (sel) ) ; xtmux3b il30( .b(b[130] sel (sel) ) ; xtmux3b il31( .b(b[131] sel (sel) ) ; xtmux3b 1132 ( .b(b[132] sel (sel) ) ; xtmux3b 1133 ( .b(b[133] sel (sel) ) ; xtmux3b 1134 ( .b(b[134] sel (sel) ) ; xtmux3b il35( .b(b[135] sel (sel) ) ; xtmux3b il36( .b(b[136] sel (sel) ) ; xtmux3b il37 ( .b(b[137] sel (sel) ) ; xtmux3b il38 ( .b(b[13δ] sel (sel) ) ; xtmux3b il39( • b(b[139] sel (sel) ) ; xtmux3b il40( .b(b[140] sel (sel) ) ; xtmux3b il41( .b(b[141] sel (sel) ) ; xtmux3b il42 ( .b(b[142] sel (sel) ) ; xtmu 3b il43 ( .b(b[143] sel (sel) ) ; xtmux3b il44( .b(b[144] sel (sel) ) ; xtmux3b il45 ( .b(b[145] sel (sel) ) ; xtmux3b il46( .b(b[146] sel (sel) ) ; xtmux3b il47( .b(b[147] sel (sel) ) ; xtmux3b il48 ( .b(b[148] sel (sel).) ; xtmux3b il49( .b(b[149] sel (sel) ) ; xtmux3b il50( .b(b[150] sel (sel) ) ; xtmύx3b il51( .b(b[151] sel (sel) ) ; xtmux3b il52 ( .b(b[152] sel (sel) ) ; xtmux3b il53(
Figure imgf000183_0001
.b(b[153]
Figure imgf000183_0002
sel (sel) ) ; xtmux3b il54 ( sel (sel) ) ; xtmux3b il55( sel (sel) ) ,- xtmux3b il56( sel (sel) ) ; xtmux3b il57( sel (sel) ) ; xtmux3b il58 ( sel (sel) ) ; xtmux3b il59( sel (sel) ) ; xtmux3b il60 ( sel (sel) ) ; xtmux3b il61( sel (sel) ) ; xtmux3b il62-( sel (sel) ) ; xtmux3b il63 ( sel (sel) ) ; xtmux3b 1164 ( sel (sel) ) ; xtmux3b il65 ( sel (sel) ) ; xtmux3b il66( sel (sel) ) , xtmux3b il67 ( sel (sel) ) ; xtmux3b il68 ( sel (sel) ) ; xtmux3b il69( sel (sel) ) ; xtmux3b il70( sel (sel) ) ; xtmux3b il71( sel (sel) ) ; xtmux3b il72 ( sel (sel) ) ; xtmux3b il73 ( sel (sel) ) ; xtmux3b il74 ( sel (sel) ) ; xtmux3b il75( sel (sel) ) ; xtmux3b il76( sel (sel) ) ; xtmux3b il77 ( sel (sel) ) ; xtmux3b il78 ( sel (sel) ) ; xtmux3b il79( sel (sel) ) ; xtmux3b il80 ( sel (sel) ) ; xtmux3b il81( sel (sel) ) ; xtmux3b il82 ( sel (sel) ) ; xtmux3b il83 (
Figure imgf000184_0001
sel (sel) ) ; xtmux3b il84 ( sel (sel) ) ; xtmux3b il85( sel (sel) ) ; xtmux3b il86( sel (sel) ) ; xtmux3b il87 ( sel (sel) ) ; xtmux3b il88 ( sel (sel) ) ; xtmux3b il89( sel (sel) ) ; xtmux3b il90 ( sel (sel) ) ; xtmux3b il91( sel (sel) ) ; xtmux3b il92 ( sel (sel) ) ; xtmux3b il93 ( sel (sel) ) ; xtmux3b il94 ( sel (sel) ) ; xtmux3b 1195 ( sel (sel) ) ; xtmux3b il96 ( sel (sel) ) ; xtmux3b il97 ( sel (sel) ) ; xtmux3b il98 ( sel (sel) ) ; xtmux3b il99( sel (sel) ) ; xtmux3b i200 ( sel (sel) ) ; xtmux3b i201 ( sel (sel) ) ; xtmux3b i202 ( sel (sel) ) ; xtmux3b i203 ( sel (sel) ) ; xtmux3b 1204 ( sel (sel) ) ; xtmux3b i205 ( sel (sel) ) ; xtmux3b i206 ( sel (sel) ) ; xtmux3b i207 ( sel (sel) ) ; xtmux3b i208 ( sel (sel) ) ,- xtmux3b i209( sel (sel) ) ; xtmux3b 1210 ( sel (sel) ) ; xtmύx3b i211( sel (sel) ) ; xtmux3b i212 ( sel (sel) ) ,- xtmux3b i213 (
Figure imgf000185_0001
Figure imgf000185_0002
sel (sel) ) ; xtmux3b i214 ( sel (sel) ) ; xtmux3b i215 ( sel (sel) ) ; xtmux3b i216( sel (sel) ) ; xtmux3b i217 ( sel (sel) ) ; xtmux3b i21δ ( sel (sel) ) ; xtmux3b 1219 ( sel (sel) ) ; xtmux3b i220 ( sel (sel) ) ; xtmux3b i221( sel (sel) ) ; xtmux3b i222 ( sel (sel) ) ; xtmux3b i223 ( sel (sel) ) ; xtmux3b i224 ( sel (sel) ) ; xtmux3b i225 ( sel (sel) ) ; xtmux3b i226 ( sel (sel) ) ; xtmux3b i227 ( sel (sel) ) ; xtmux3b i228 ( sel (sel) ) ; xtmux3b i229( sel (sel) ) ; xtmux3b i230 ( sel (sel) ) ; xtmux3b i231( sel (sel) ) ; xtmux3b 1232 ( sel (sel) ) ; xtmux3b i233 ( sel (sel) ) ; xtmux3b i234 ( sel (sel) ) ; xtmux3b 1235 ( sel (sel) ) ; xtmux3b i236( sel (sel) ) ; xtmux3b i237 ( sel (sel) ) ; xtmux3b i238 ( sel (sel) ) ; xtmux3b i239( sel (sel) ) ; xtmux3b i240 ( sel (sel) ) ; xtmux3b i241( sel (sel) ) ; xtmux3b i242 ( sel (sel) ) ; xtmux3b i243 (
Figure imgf000186_0001
Figure imgf000186_0002
sel (sel) ) ; xtmux3b i244 ( sel (sel) ) ; xtmux3b i245( sel (sel) ) ; xtmux3b ±246 ( sel (sel) ) ; xtmux3b i247 ( sel (sel) ) ; xtmux3b i248 ( sel (sel) ) ; xtmux3b i249( sel (sel) ) ; xtmux3b i250( sel (sel) ) ; xtmux3b i251( sel (sel) ) ; xtmux3b i252 ( sel (sel) ) ; xtmux3b i253 ( sel (sel) ) ,- xtmux3b i254 ( sel (sel) ) ; xtmux3b 1255 ( sel (sel) ) ; xtmux3b i256( sel (sel) ) ; xtmux3b i257( sel (sel) ) ; xtmux3b i258 ( sel (sel) ) ; xtmux3b i259( sel (sel) ) ; xtmux3b i260( sel (sel) ) ; xtmux3b i261( sel (sel) ) ; xtmux3b i262 ( sel (sel) ) ; xtmux3b i263 ( sel (sel) ) ; xtmux3b i264 { sel (sel) ) ; xtmux3b i265( sel (sel) ) ; xtmux3b i266( sel (sel) ) ; xtmux3b i267( sel (sel) ) ; xtmux3b i268 ( sel (sel) ) ; . xtmux3b i269( sel (sel) ) ; xtmux3b i270( sel (sel) ) ,- xtmux3b i271( sel (sel) ) ; xtmux3b i272( sel (sel) ) ; xtmux3b i273 ( sel (sel) ) ; xtmux3b 1274 ( sel (sel) ) ; xtmux3b i275( sel (sel) ) ,- xtmux3b i276( sel (sel) ) ; xtmux3b i277 ( sel (sel) ) ; xtmux3b i278 ( sel (sel) ) ; xtmux3b i279( sel (sel) ) ; xtmux3b i280( sel (sel) ) ; xtmux3b i281( sel (sel) ) ; xtmux3b i282 ( sel (sel) ) ; xtmux3b i283 ( sel (sel) ) ; xtmux3b 1284 ( sel (sel) ) ; xtmux3b i285 ( sel (sel) ) ; xtmux3b i286( sel (sel) ) ; xtmux3b 1287 ( sel (sel) ) ; xtmux3b i288( sel (sel) ) ; xtmux3b i289( sel (sel) ) ; xtmux3b i290{ sel (sel) ) ; xtmux3b i291( sel (sel) ) ,- xtmux3b 1292 ( sel (sel) ) ; xtmux3b i293 ( sel (sel) ) ; xtmux3b i294 ( sel (sel) ) ,- xtmux3b i295 ( sel (sel) ) ; xtmux3b i296( sel (sel) ) ; xtmux3b 1297 ( sel (sel) ) ,- xtmux3b i298( sel (sel) ) ; xtmux3b 1299 ( sel (sel) ) ; xtmux3b i300 ( sel (sel) ) ; xtmύx3b 1301 ( sel (sel) ) ; xtmux3b i302 ( sel (sel) ) ; xtmux3b i303 (
Figure imgf000188_0001
sel (sel) ) ,- xtmux3b i304 (sel) ) ; xtmux3b i305 (sel) ) ; xtmux3b i306 (sel) ) ; xtmux3b i307 (sel) ) ; xtmux3b i308 (sel) ) ; xtmux3b i309 (sel) ) ; xtmux3b i310 (sel) ) ; xtmux3b i311 (sel) ) ; xtmux3b i312 (sel) ) ; xtmux3b i313 (sel) ) ; xtmux3b i314 (sel) ) ; xtmux3b i315 (sel) ) ; xtmux3b i316 (sel) ) ; xtmux3b i317 (sel) ) ; xtmux3b i318 (sel) ) ; xtmux3b i319 (sel) ) ; xtmux3b i320 (sel) ) ; xtmux3b i321 (sel) ) ; xtmux3b i322 (sel) ) ; xtmux3b i323 (sel) ) ; xtmux3b i324 (sel) ) ; xtmux3b 1325 (sel) ) ; xtmux3b i326 (sel) ) ; xtmux3b 1327 (sel) ) ; xtmux3b i328 (sel) .) ; xtmux3b i329 (sel) ) ; xtmux3b i330 (sel) ) ; xtmύx3b i331 (sel) ) ; xtmux3b i332 (sel) ) ; xtmux3b i333
Figure imgf000189_0001
Figure imgf000189_0002
(sel) ) ; xtmux3b 1334 .sel (sel) ) ; xtmux3b i335 .sel (sel) ) ; xtmux3b i336 . sel (sel) ) ; xtmux3b 1337 . sel (sel) ) ; xtmux3b i338 . sel (sel) ) ; xtmux3b i339 . sel (sel) ) ; xtmux3b 1340 .sel (sel) ) ; xtmux3b 13 1 .sel (sel) ) ; xtmux3b 1342 . sel (sel) ) ; xtmux3b i343 . sel (sel) ) ,- xtmux3b i344 . sel (sel) ) ; xtmux3b i345 .sel (sel) ) ; xtmux3b i346 . sel (sel) ) ; xtmux3b i347 . sel (sel) ) ; xtmux3b i348 . sel (sel) ) ; xtmux3b i349 . sel (sel) ) ; xtmux3b 1350 . sel (sel) ) ; xtmux3b i351 . sel (sel) ) ; xtmux3b i352 .sel (sel) ) ; xtmux3b 1353 .sel (sel) ) ; xtmux3b i354 . sel (sel) ) ; xtmux3b i355 .sel (sel) ) ; xtmux3b i356 .sel (sel) ) ; xtmux3b i357 .sel (sel) ) ; xtmux3b i35δ . sel (sel)); xtmux3b i359 .sel (sel) ) ; xtmux3b i360 , sel (sel) ) ; xtmύx3b i361 .sel (sel) ) ; xtmux3b i362 , sel (sel) ) ; xtmux3b i363
Figure imgf000190_0001
Figure imgf000190_0002
, sel (sel) ) ; xtmux3b i364( sel (sel) ) ; xtmux3b i365 ( sel (sel) ) ; xtmux3b i366( sel (sel) ) ; xtmux3b i367( sel (sel) ) ; xtmux3b i368 ( sel (sel) ) ; xtmux3b i369( sel (sel) ) ; xtmux3b i370 ( sel (sel) ) ; xtmux3b i371( sel (sel) ) ; xtmux3b i372 ( sel (sel) ) ; xtmux3b i373 ( sel (sel) ) ; xtmux3b i374( sel (sel) ) ; xtmux3b i375 ( sel (sel) ) ; xtmux3b 1376( sel (sel) ) ; xtmux3b i377 ( sel (sel) ) ; xtmux3b i376( sel (sel) ) ; xtmux3b i379( sel (sel) ) ; xtmux3b i3δ0 ( sel (sel) ) ; xtmux3b i3δl( sel (sel) ) ; xtmux3b i382 ( sel (sel) ) ; xtmux3b i383 ( sel (sel) ) ; xtmux3b i384 ( sel (sel) ) ; xtmux3b i385( sel (sel) ) ; xtmux3b i3δ6( sel (sel) ) ; xtmux3b i387( sel (sel) ) ; xtmux3b i388 ( sel (sel) ) ; xtmux3b i389( sel (sel) ) ; xtmux3b i390( sel (sel) ) ; xtmϋx3b i391( sel (sel) ) ; xtmux3b i392 ( sel (sel) ) ; xtmux3b i393 (
Figure imgf000191_0001
sel (sel) ) ,- xtmux3b i394( sel (sel) ) ; xtmux3b i395( sel (sel) ) ,- xtmux3b i396( sel (sel) ) ; xtmux3b i397( sel (sel) ) ; xtmux3b i398 ( sel (sel) ) ; xtmux3b i399( sel (sel) ) ; xtmux3b i400 ( sel (sel) ) ; xtmux3b i401( sel (sel) ) ; xtmux3b i402 ( sel (sel) ) ; xtmux3b i403 ( sel (sel) ) ; xtmux3b i404( sel (sel) ) ; xtmux3b i405 ( sel (sel) ) ; xtmux3b i406( sel (sel) ) ; xtmux3b i407 ( sel (sel) ) ,- xtmux3b i408( sel (sel) ) ; xtmux3b i409( sel (sel) ) ; xtmux3b i410 ( sel (sel) ) ; xtmux3b i411( sel (sel) ) ; xtmux3b i412 ( sel (sel) ) ; xtmux3b i413 ( sel (sel) ) ; xtmux3b i414 ( sel (sel) ) ; xtmux3b i415( sel (sel) ) ; xtmux3b i416( sel (sel) ) ; xtmux3b i417( sel (sel) ) ; xtmux3b 1418 ( sel (sel) ) ; xtmux3b i419( sel (sel) ) ; xtmux3b i420 ( sel (sel) ) ; xtmux3b 1421 ( sel (sel) ) ; xtmux3b i422 ( sel (sel) ) ; xtmux3b i423 (
Figure imgf000192_0001
sel (sel) ) ,- xtmux3b 1424 .sel (sel) ) ; xtmux3b i425 . sel (sel) ) ; xtmux3b i426 . sel (sel) ) ; xtmux3b i427 . sel (sel) ) ; xtmux3b 1428 , sel (sel) ) ; xtmux3b i429 , sel (sel) ) ; xtmux3b i430 . sel (sel) ) ; xtmux3b i431 . sel (sel) ) ; xtmux3b 1432 . sel (sel) ) ; xtmux3b i433 . sel (sel) ) ; xtmux3b i434 . sel (sel) ) ; xtmux3b i435 .sel (sel) ) ; xtmux3b i436 . sel (sel) ) ; xtmux3b i437 . sel (sel) ) ; xtmux3b i438 . sel (sel) ) ; xtmux3b i439 . sel (sel) ) ; xtmux3b i440 .sel (sel) ) ; xtmu 3b i441 . sel (sel) ) ; xtmux3b i442 . sel (sel) ) ; xtmux3b i443 . sel (sel) ) ; xtmux3b 1444 .sel (sel) ) ; xtmux3b i445 .sel (sel) ) ; xtmux3b i446 . sel (sel) ) ; xtmux3b i447 . sel (sel) ) ; xtmux3b i448 .sel (sel) ) ; xtmux3b i449 .sel (sel) ) ; xtmux3b i450 .sel (sel) ) ; xtmύx3b i451 .sel (sel) ) ; xtmux3b i452 .sel (sel) ) ; xtmux3b i453
Figure imgf000193_0001
sel (sel) ) ; xtmux3b i454 ( sel (sel) ) ; xtmux3b i455 ( sel (sel) ) ; xtmux3b i456 ( sel (sel) ) ,- xtmux3b i457 ( sel (sel) ) ; xtmux3b i458 ( sel (sel) ) ; xtmux3b i459 ( sel (sel) ) ; xtmux3b i460 ( sel (sel) ) ; xtmux3b 1461 ( sel (sel) ) ; xtmux3b i462 ( sel (sel) ) ; xtmux3b i463 ( sel (sel) ) ,- xtmux3b i464 ( sel (sel) ) ,- xtmux3b i465 ( sel (sel) ) ; xtmux3b i466( sel (sel) ) ; xtmux3b i467( sel (sel) ) ; xtmux3b i468 ( sel (sel) ) ; xtmux3b i469( sel (sel) ) ; xtmux3b i470 ( sel (sel) ) ; xtmux3b i471( sel (sel) ) ,- xtmux3b i472 { sel (sel) ) ; xtmux3b i473 ( sel (sel) ) ; xtmux3b 1474 ( sel (sel) ) ; xtmux3b i475 ( sel (sel) ) ; xtmux3b i476 ( sel (sel) ) ; xtmux3b i477 ( sel (sel) ) ; xtmux3b i478 ( sel (sel) ) ; xtmux3b i479( sel (sel) ) ; xtmux3b i480( sel (sel) ) ; xtmux3b i481( sel (sel) ) ; xtmux3b i482 ( sel (sel) ) ; xtmux3b i483 (
Figure imgf000194_0001
Figure imgf000194_0002
sel (sel) ) ; xtmux3b i4δ4 ( sel (sel) ) ,- xtmux3b i4δ5( sel (sel) ) ; xtmux3b i486( sel (sel) ) ; xtmux3b i487 ( sel (sel) ) ; xtmux3b i48δ( sel (sel) ) ; xtmux3b i4δ9{ sel (sel) ) ; xtmux3b i490 ( sel (sel) ) ; xtmux3b i491( sel (sel) ) ; xtmux3b i492 ( sel (sel) ) ; xtmux3b i493 ( sel (sel) ) ; xtmux3b i494 ( sel (sel) ) ; xtmux3b i495 ( sel (sel) ) ; xtmux3b i496 ( sel (sel) ) ; xtmux3b 1497 ( sel (sel) ) ; xtmux3b i49δ ( sel (sel) ) ; xtmux3b i499 ( sel (sel) ) ; xtmux3b i500 ( sel (sel) ) ; xtmux3b i501( sel (sel) ) ; xtmux3b i502 ( sel (sel) ) ; xtmux3b i503 ( sel (sel) ) ; xtmux3b i504 ( sel (sel) ) ; xtmux3b 1505 ( sel (sel) ) ; xtmux3b i506( sel (sel) ) ; xtmux3b i507( sel (sel) ) ; xtmux3b i508 ( sel (sel) ) ; xtmux3b i509( sel (sel) ) ; xtmux3b 1510 ( sel (sel) ) ; xtmux3b i511{ sel (sel) ) ; xtmux3b i512 ( sel (sel) ) ; xtmux3b i513 (
Figure imgf000195_0001
Figure imgf000195_0002
sel (sel) ) ; xtmux3b i514 ( sel (sel) ) ; xtmux3b i515 ( sel (sel) ) ; xtmux3b i516 ( sel (sel) ) ; xtmux3b i517 ( sel (sel) ) ; xtmux3b 1518 ( sel (sel) ) ; xtmux3b i519 ( sel (sel) ) ; xtmux3b i520 ( sel (sel) ) ; xtmux3b i521( sel (sel) ) ; xtmux3b i522 ( sel (sel) ) ; xtmux3b i523 ( sel (sel) ) ; xtmux3b 1524 ( sel (sel) ) ; xtmux3b i525 ( sel (sel) ) ; xtmux3b i526 ( sel (sel) ) ; xtmux3b i527 ( sel (sel) ) ; xtmux3b i528 ( sel (sel) ) ; xtmux3b 1529 ( sel (sel) ) ; xtmux3b i530 ( sel (sel) ) ; xtmux3b i531( sel (sel) ) ; xtmux3b i532 ( sel (sel) ) ; xtmux3b i533 ( sel (sel) ) ; xtmux3b i534 ( sel (sel) ) ; xtmux3b i535 ( sel (sel) ) ; xtmux3b i536( sel (sel) ) ; xtmux3b i537 ( sel (sel) ) ; xtmux3b i538 ( sel (sel) ) ; xtmux3b i539( sel (sel) ) ; xtmux3b i540 ( sel (sel) ) ; xtmύx3b i541( sel (sel) ) ; xtmux3b i542 ( sel (sel) ) ; xtmux3b 1543 (
Figure imgf000196_0001
Figure imgf000196_0002
sel (sel) ) ; xtmux3b i544 ( sel (sel) ) ; xtmux3b i545( sel (sel) ) ; xtmux3b i546( sel (sel) ) ; xtmux3b 1547 ( sel (sel) ) ; xtmux3b i548( sel (sel) ) ; xtmux3b i549( sel (sel) ) ; xtmux3b i550( sel (sel) ) ; xtmux3b i551( sel (sel) ) ; xtmux3b i552 ( sel (sel) ) ; xtmux3b i553 ( sel (sel) ) ; xtmux3b i554 ( sel (sel) ) ; xtmux3b i555 ( sel (sel) ) ; xtmux3b i556( sel (sel) ) ; xtmux3b 1557 ( sel (sel) ) ; xtmux3b i558( sel (sel) ) ; xtmux3b i559( sel (sel) ) ; xtmux3b i560( sel (sel) ) ; xtmux3b i561( sel (sel) ) ; xtmux3b i562 ( sel (sel) ) ; xtmux3b i563 { sel (sel) ) ; xtmux3b i564( sel (sel) ) ; xtmux3b i565 ( sel (sel) ) ; xtmux3b i566( sel (sel) ) ; xtmux3b i567( sel (sel) ) ; xtmux3b i568 ( sel (sel) ) ; xtmux3b i569( sel (sel) ) ; xtmux3b 1570 ( sel (sel.) ) ; xtm'ux3b i571( sel (sel) ) ,- xtmux3b 1572 ( sel (sel) ) ; xtmux3b i573 (
Figure imgf000197_0001
Figure imgf000197_0002
sel (sel) ) ; xtmux3b i574 ( sel (sel) ) ; xtmux3b i575 ( sel (sel) ) ; xtmux3b i576( sel (sel) ) ; xtmux3b i577 ( sel (sel) ) ; xtmux3b i578 ( sel (sel) ) ; xtmux3b i579( sel (sel) ) ; xtmux3b i580 ( sel (sel) ) ; xtmux3b i581( sel (sel) ) ; xtmux3b i582 ( sel (sel) ) ,- xtmux3b 1583 ( sel (sel) ) ; xtmux3b i584 ( sel (sel) ) ; xtmux3b i585 ( sel (sel) ) ,- xtmux3b i586( sel (sel) ) ; xtmux3b 1587 ( sel (sel) ) ; xtmux3b i588( sel (sel) ) ; xtmux3b i589( sel (sel) ) ; xtmux3b i590 ( sel (sel) ) ; xtmux3b i591( sel (sel) ) ; xtmux3b i592 ( sel (sel) ) ; xtmux3b i593 ( sel (sel) ) ; xtmux3b i594 ( sel (sel).) ; xtmux3b i595( sel (sel) ) ; xtmux3b i596 ( sel (sel) ) ; xtmux3b i597 ( sel (sel) ) ; xtmux3b i598 ( sel (sel) ) ; xtmux3b i599( sel (sel) ) ; xtmux3b 1600 ( sel (sel) ) ; xtmύx'3b i601( sel (sel) ) ; xtmux3b i602 ( sel (sel) ) ; xtmux3b i603 (
Figure imgf000198_0001
Figure imgf000198_0002
sel (sel) ) ; xtmux3b i604 ( sel (sel) ) ; xtmux3b i605( sel (sel) ) ; xtmux3b i606( sel (sel) ) ; xtmux3b i607 ( sel (sel) ) ; xtmux3b 1608 ( sel (sel) ) ; xtmux3b i609( sel (sel) ) ; xtmux3b i610( sel (sel) ) ; xtmux3b i611( sel (sel) ) ; xtmux3b i612 ( sel (sel) ) ; xtmux3b i613 ( sel (sel) ) ; xtmux3b i614 ( sel (sel) ) ; xtmux3b i615( sel (sel) ) ; xtmux3b i616( sel (sel) ) ; xtmux3b 1617 ( sel (sel) ) ; xtmux3b i618( sel (sel) ) ; xtmux3b 1619 ( sel (sel) ) ; xtmux3b i620( sel (sel) ) ; xtmux3b i621( sel (sel) ) ; xtmux3b i622 ( sel (sel) ) ; xtmux3b i623 ( sel (sel) ) ; xtmux3b i624( sel (sel) ) ; xtmux3b 1625 ( sel (sel) ) ; xtmux3b i626 ( sel (sel) ) ; xtmux3b i627 ( sel (sel) ) ; xtmux3b 1628 ( sel (sel) ) ; xtmux3b i629( sel (sel) ) ; xtmux3b i630 ( sel (sel) ) ; xtmux3b i631( sel (sel) ) ; xtmux3b 1632 ( sel (sel) ) ; xtmux3b i633 (
Figure imgf000199_0001
sel (sel) ) ; xtmux3b 1634 ( sel (sel) ) ; xtmux3b i635 ( sel (sel) ) ; xtmux3b i636( sel (sel) ) ; xtmux3b i637 ( sel (sel) ) ; xtmux3b i638( sel (sel) ) ; xtmux3b i639( sel (sel) ) ; xtmux3b i640 ( sel (sel) ) ; xtmux3b i641( sel (sel) ) ; xtmux3b 1642 ( sel (sel) ) ; xtmux3b i643 ( sel (sel) ) ; xtmux3b i644 ( sel (sel) ) ; xtmux3b i645 ( sel (sel) ) ; xtmux3b i646 ( sel (sel) ) ; xtmux3b i647 ( sel (sel) ) ; xtmu 3b 1648 ( sel (sel) ) ; xtmux3b i649( sel (sel) ) ; xtmux3b i650 ( sel (sel) ) ; xtmux3b 1651 ( sel (sel) ) ; xtmux3b 1652 ( sel (sel) ) ; xtmux3b 1653 ( sel (sel) ) ; xtmux3b 1654 ( sel (sel) ) ; xtmux3b i655 ( sel (sel) ) ; xtmux3b i656( sel (sel) ) ; xtmux3b i657( sel (sel) ) ; xtmux3b i658( sel (sel) ) ; xtmux3b i659( sel (sel) ) ; xtmux3b i660( sel (sel) ) ; xtmux3b i661( sel (sel) ) ; xtmux3b i662 ( sel (sel) ) ; xtmux3b i663 (
Figure imgf000200_0001
sel (sel) ) ; xtmux3b i664 ( sel (sel) ) ; xtmux3b 1665 ( sel (sel) ) ; xtmux3b i666( sel (sel) ) ; xtmux3b 1667 ( sel (sel) ) ; xtmux3b 1668 ( sel (sel) ) ; xtmux3b i669( sel (sel) ) ; xtmux3b i670 ( sel (sel) ) ; xtmux3b i671( sel (sel) ) ; xtmux3b 1672 ( sel (sel) ) ; xtmux3b i673 ( sel (sel) ) ; xtmux3b 1674 ( sel (sel) ) ; xtmux3b i675 ( sel (sel) ) ; xtmux3b i676( sel (sel) ) ; xtmux3b i677 ( sel (sel) ) ; xtmux3b i678 ( sel (sel) ) ; xtmux3b i679( sel (sel) ) ; xtmux3b i680 ( sel (sel) ) ; xtmux3b i681( sel (sel) ) ; xtmux3b i682 ( sel (sel) ) ; xtmux3b i683 ( sel (sel) ) ; xtmux3b i684 ( sel (sel) ) ; xtmux3b i685 ( sel (sel) ) ; xtmux3b i686( sel (sel) ) ; xtmux3b i687 ( sel (sel) ) ; xtmux3b i688 ( sel (sel) ) ; xtmux3b i689( sel (sel) ) ; xtmux3b i690 ( sel (sel) ) ; xtmux3b i691( sel (sel) ) ; xtmux3b i692 ( sel (sel) ) ; xtmux3b i693 (
Figure imgf000201_0001
sel (sel) ) ; xtmux3b i694 ( sel (sel) ) ; xtmux3b i695 ( sel (sel) ) ; xtmux3b 1696 ( sel (sel) ) ; xtmux3b i697 ( sel (sel) ) ; xtmux3b i69δ ( sel (sel) ) ; xtmux3b i699( sel (sel) ) ; xtmux3b i700 ( sel (sel) ) ; xtmux3b i701( sel (sel) ) ; xtmux3b i702 ( sel (sel) ) ; xtmux3b i703 ( sel (sel) ) ; xtmux3b i704( sel (sel) ) ; xtmux3b i705 ( sel (sel) ) ; xtmux3b i706( sel (sel) ) ; xtmux3b i707( sel (sel) ) ; xtmux3b i70δ ( sel (sel) ) ; xtmux3b i709( sel (sel) ) ; xtmux3b i710 ( sel (sel) ) ; xtmux3b i711( sel (sel) ) ; xtmux3b i712 ( sel (sel) ) ; xtmux3b i713 ( sel (sel) ) ; xtmux3b i714 ( sel (sel) ) ; xtmux3b i715 ( sel (sel) ) ; xtmux3b i716( sel (sel) ) ; xtmux3b i717( sel (sel) ) ; xtmux3b i71δ ( sel (sel) ) ; xtmux3b i719( sel (sel) ) ; xtmux3b 1720 ( sel (sel) ) ; xtmux3b i721( sel (sel) ) ; xtmux3b i722 ( sel (sel) ) ; xtmux3b i723 (
Figure imgf000202_0001
sel (sel) ) ; xtmux3b i724 ( sel (sel) ) ; xtmux3b i725 ( sel (sel) ) ; xtmux3b i726( sel (sel) ) ; xtmux3b i727 ( sel (sel) ) ; xtmux3b i728 ( sel (sel) ) ; xtmux3b i729( sel (sel) ) ; xtmux3b i730 ( sel (sel) ) ; xtmux3b i731( sel (sel) ) ; xtmux3b i732 ( sel (sel) ) ; xtmux3b i733 ( sel (sel) ) ; xtmux3b 1734 ( sel (sel) ) ; xtmux3b i735 ( sel (sel) ) ; xtmux3b i736( sel (sel) ) ; xtmux3b 1737 ( sel (sel) ) ; xtmux3b i738 ( sel (sel) ) ; xtmux3b i739( sel (sel) ) ; xtmux3b i740( sel (sel) ) ; xtmux3b i741( sel (sel) ) ; xtmux3b i742 ( sel (sel) ) ; xtmux3b 1743 ( sel (sel) ) ; xtmux3b i744 ( sel (sel) ) ; xtmux3b i745 ( sel (sel) ) ; xtmux3b i746( sel (sel) ) ; xtmux3b i747( sel (sel) ) ; xtmux3b i74δ ( sel (sel) ) ; xtmux3b i749( sel (sel) ) ; xtmux3b i750 ( sel (sel) ) ; xtmux3b i751( sel (sel) ) ; xtmux3b i752 ( sel (sel) ) ; xtmux3b i753 (
Figure imgf000203_0001
Figure imgf000203_0002
sel (sel) ) ; xtmux3b i754 (( .Xtout (xtout [754] sel (sel) ) ; xtmux3b i755 (( .xtout (xtout [755] sel (sel) ) ; xtmux3b i756(( .xtout (xtout [756] sel (sel) ) ; xtmux3b i757(( -xtout (xtout [757] sel (sel) ) ; xtmux3b i758 (( .xtout (xtout [758] sel (sel) ) ; xtmux3b i759((.xtout (xtout [759] sel (sel) ) ; xtmux3b i760 (( .xtout (xtout [760] sel (sel) ) ; xtmux3b i761(( .xtout (xtout [761] sel (sel) ) ; xtmux3b i762 (( -xtout (xtout [762] sel (sel) ) ; xtmux3b 1763 (( .xtout (xtout [763] sel (sel) ) ; xtmux3b i764(( .xtout (xtout [764] sel (sel) ) ; xtmux3b i765 (( .xtout (xtout [765] sel (sel) ) ; xtmux3b i766(( .xtout (xtout [766] sel (sel) ) ; xtmux3b i767(( .xtout (xtout [767] sel (sel) ) ; xtmux3b i76δ (( .xtout (xtout [768] sel (sel) ) ; xtmux3b i769(( .xtout (xtout [769] sel (sel) ) ; xtmux3b i770 (( .xtout (xtout [770] sel (sel) ) ; xtmux3b i771(( .xtout (xtout [771] sel (sel) ) ; xtmux3b i772 (( .xtout (xtout [772] sel (sel) ) ; xtmux3b i773 (( .xtout (xtout [773] sel (sel) ) ; xtmux3b i774(( .xtout (xtout [774] sel (sel) ) ; xtmux3b i775(( .xtout (xtout [775] sel (sel) ) ; xtmux3b i776(( .xtout (xtout [776] sel (sel) ) ; xtmux3b 1777 (( .xtout (xtout [777] sel (sel) ) ; xtmux3b i778 (( .xtout (xtout [778] sel (sel) ) ; xtmux3b i779(( .xtout (xtout [779] sel (sel) ) ; xtmux3b i780 (( .xtout (xtout [780] sel (sel) ) ; xtmux3b i781(( .xtout (xtout [781] sel (sel) ) ; xtmux3b i782 (( .xtout (xtout [782] sel (sel) ) ; xtmux3b i763 ((.xtout (xtout [783]
Figure imgf000204_0001
sel (sel) ) ; xtmux3b i784 .b(b[784]
.sel (sel)); xtmux3b i785 .b(b[785]
.sel (sel) ) ; xtmux3b i786 .b(b[786] sel (sel)); xtmux3b i787 .b(b[787]
, sel (sel) ) ; xtmux3b i788 .b(b[788]
.sel (sel)); xtmux3b i789 .b(b[789]
, sel (sel)); xtmux3b i790 .b(b[790]
.sel (sel)); xtmux3b i791 .b(b[791]
.sel (sel)); xtmux3b 1792 .b(b[792]
, sel (sel)),- xtmux3b i793 .b(b[793]
, sel (sel) ) ; xtmux3b i794 .b(b[794]
, sel (sel)); xtmux3b i795 .b(b[795]
, sel (sel)); xtmux3b i796 .b(b[796]
.sel (sel)) ; xtmux3b i797 .b(b[797]
.sel (sel)); xtmux3b i798 .b(b[798]
.sel (sel)); xtmux3b i799 .b(b[799]
.sel (sel)); xtmux3b i800 .b(b[800]
.sel (sel)); xtmux3b i801 .b(b[801]
.sel (sel)); xtmux3b i802 .b(b[δ02]
. sel (sel)); xtmux3b i803 .b(b[803]
.sel (sel)); xtmux3b 1804 .b(b[804]
.sel (sel)); xtmux3b i805 .b(b[δ05]
.sel (sel)); xtmux3b i806 .b(b[806]
.sel (sel)); xtmux3b i807 .b(b[807]
. sel (sel)); xtmux3b i808 .b(b[808]
.sel (sel)); xtmux3b i809 .b(b[809]
.sel (sel)); xtmux3b i810 .b(b[810]
.sel (sel) ) ; xtmux3b i811 .b(b[811]
.sel (sel)); xtmux3b i812 .b(b[812]
. sel (sel)); xtmux3b i813
Figure imgf000205_0001
.b(b[813]
Figure imgf000205_0002
.sel (sel) ) ; xtmux3b iδl4 ( .b(b[814] sel (sel) ) ; xtmux3b i815( -b(b[815] sel (sel) ) ; xtmux3b i816( .b(b[δl6] sel (sel) ) ; xtmux3b i817 ( .b(b[δl7] sel (sel) ) ; xtmux3b i818( .b(b[818] sel (sel) ) ; xtmux3b i819( .b(b[819] sel (sel) ) ; xtmux3b i820 ( .b(b[820] sel (sel) ) ; xtmux3b i821( .b(b[821] sel (sel) ) ; xtmux3b i822 ( .b(b[822] sel (sel) ) ; xtmux3b i823 ( .b(b[823] sel (sel) ) ; xtmux3b i824( .b(b[824] sel (sel) ) ; xtmux3b 1825 ( .b(b[825] sel (sel) ) ; xtmux3b i826( .b(b[826] sel (sel) ) ; xtmux3b i827( .b(b[827] sel (sel) ) ; xtmux3b i828 ( -b(b[828] sel (sel) ) ; xtmux3b i829( .b(b[829] sel (sel) ) ; xtmux3b 1830 ( .b(b[830] sel (sel) ) ; xtmux3b i831( .b(b[831] sel (sel) ) ; xtmux3b i832 ( .b(b[832] sel (sel) ) ; xtmux3b i833 ( .b(b[833] sel (sel) ) ; xtmux3b i834 ( .b(b[834] sel (sel) ) ; xtmux3b i835 ( .b(b[835] sel (sel) ) ; xtmux3b i836( .b(b[836] sel (sel) ) ; xtmux3b i837{ .b(b[837] sel (sel) ) ; xtmux3b i838( .b(b[838] sel (sel) ) ; xtmux3b i839( .b(b[839] sel (sel) ) ; xtmux3b i840 ( .b(b[840] sel (sel) ) ; xtmux3b i841( .b(b[841] sel (sel) ) ; xtmux3b i842 ( .b(b[842] sel (sel) ) ; xtmux3b iδ43 (
Figure imgf000206_0001
.b(b[843]
Figure imgf000206_0002
sel (sel) ) ; xtmux3b .b(b[844] (sel) ) ; xtmux3b .b(b[845] (sel) ) ; xtmux3b .b(b[846] (sel) ) ; xtmux3b .b(b[847] (sel) ) ; xtmux3b .b(b[84δ] (sel) ) ; xtmux3b .b(b[849] (sel) ) ; xtmux3b .b(b[850] (sel) ) ; xtmux3b .b(b[851] (sel) ) ; xtmux3b .b(b[852] (sel) ) ; xtmux3b .b(b[853] (sel) ) ; xtmux3b .b(b[854] (sel) ) ; xtmux3b .b(b[855] (sel) ) ; xtmux3b .b(b[856] (sel) ) ; xtmux3b -b(b[857] (sel) ) ; xtmux3b -b(b[858] (sel) ) ; xtmux3b -b(b[δ59] (sel) ) ; xtmux3b -b(b[δ60] (sel) ) ; xtmux3b -b(b[861] (sel) ) ; xtmux3b .b(b[862] (sel) ) ; xtmux3b .b(b[863] (sel) ) ; xtmux3b .b(b[864] (sel) ) ; xtmux3b .b(b[865] (sel) ) ; xtmux3b .b(b[866] (sel) ) ; xtmux3b .b(b[867] (sel) ) ; xtmux3b .b(b[868] (sel) ) ; xtmux3b .b(b[869] (sel) ) ; xtmux3b .b(b[870] (sel) ) ; xtmux3b .b(b[871] (sel) ) ; xtmux3b .b(b[872] (sel) ) ; xtmux3b
Figure imgf000207_0001
.b(b[873]
Figure imgf000207_0002
(sel) ) ; xtmux3b i874 ( .b(b[874] sel (sel) ) ; xtmux3b 1875 ( .b(b[875] sel (sel) ) ; xtmux3b i876( .b(b[876] sel (sel) ) ; xtmux3b i877( .b(b[877] sel (sel) ) ; xtmux3b i878 ( .b(b[878] sel (sel) ) ; xtmux3b i879( .b(b[879] sel (sel) ) ; xtmux3b i880( .b(b[880] sel (sel) ) ; xtmux3b i881( .b(b[881] sel (sel) ) ; xtmux3b i882 ( .b(b[882] sel (sel) ) ; xtmux3b 1883 ( .b(b[883] sel (sel) ) ; xtmux3b i884( .b(b[864] sel (sel) ) ; xtmux3b 1885 ( .b(b[885] sel (sel) ) ; xtmux3b i886( .b(b[886] sel (sel) ) ; xtmux3b iδ87 ( .b(b[887] sel (sel) ) ; xtmux3b 1888 ( .b(b[88δ] sel (sel) ) ; xtmux3b i889( .b(b[869] sel (sel) ) ; xtmux3b i890( .b(b[890] sel (sel) ) ; xtmux3b 1891 ( .b(b[891] sel (sel) ) ; xtmux3b i892 ( .b(b[892] sel (sel) ) ; xtmux3b i893 ( .b(b[893] sel (sel) ) ; xtmux3b i894( .b(b[894] sel (sel) ) ; xtmux3b i895( .b(b[895] sel (sel) ) ; xtmux3b i896( .b(b[896] sel (sel) ) ; xtmux3b i897( .b(b[897] sel (sel) ) ; xtmux3b i898( .b(b[898] sel (sel) ) ; xtmux3b i899( .b(b[899] sel (sel) ) ; xtmux3b i900 ( .b(b[900] sel (sel) ) ; xtmux3b i901( .b(b[901] sel (sel) ) ; xtmux3b i902 ( .b(b[902] sel (sel) ) ; xtmux3b i903 (
Figure imgf000208_0001
.b(b[903]
Figure imgf000208_0002
sel (sel) ) ; xtmux3b 1904 ( sel (sel) ) ; xtmux3 i905( sel (sel) ) ; xtmux3b i906( sel (sel) ) ; xtmux3b 1907 ( sel (sel) ) ; xtmux3b i908 ( sel (sel) ) ; xtmux3b i909( sel (sel) ) ; xtmux3b i910 ( sel (sel) ) ; xtmux3b i911( sel (sel) ) ; xtmux3b 1912 ( sel (sel) ) ; xtmux3b i913 ( sel (sel) ) ; xtmux3b i914 ( sel (sel) ) ; xtmux3b 1915 ( sel (sel) ) ; xtmux3b i916( sel (sel) ) ; xtmux3b i917 ( sel (sel) ) ; xtmux3b i918 ( sel (sel) ) ; xtmux3b i919( sel (sel) ) ,- xtmux3b i920( sel (sel) ) ; xtmux3b i921( sel (sel) ) ; xtmux3b 1922 ( sel (sel) ) ; xtmux3b 1923 ( sel (sel) ) ; xtmux3b i924 ( sel (sel) ) ; xtmux3b i925( sel (sel) ) ; xtmux3b i926( sel (sel) ) ; xtmux3b i927 ( sel (sel) ) ; xtmux3b i928 ( sel (sel) ) ; xtmux3b i929( sel (sel) ) ; xtmux3b 1930 ( sel (sel) ) ; xtmux3b i931( sel (sel) ) ; xtmux3b i932 ( sel (sel) ) ; xtmux3b i933 (
Figure imgf000209_0001
Figure imgf000209_0002
sel (sel) ) ; xtmux3b i934 ( sel (sel) ) ; xtmux3b i935( sel (sel) ) ; xtmux3b i936( sel (sel) ) ; xtmux3b i937( sel (sel) ) ; xtmux3b i93δ( sel (sel) ) ; xtmux3b i939( sel (sel) ) ; xtmux3b i940 ( sel (sel) ) ; xtmux3b i941( sel (sel) ) ; xtmux3b i942 ( sel (sel) ) ; xtmux3b i943 ( sel (sel) ) ; xtmux3b 1944 ( sel (sel) ) ; xtmux3b i945( sel (sel) ) ; xtmux3b i946( sel (sel) ) ; xtmux3b i947 ( sel (sel) ) ; xtmux3b i94δ ( sel (sel) ) ; xtmux3b i949( sel (sel) ) ; xtmux3b i950( sel (sel) ) ,- xtmux3b i951( sel (sel) ) ; xtmux3b i952 ( sel (sel) ) ; xtmux3b i953 ( sel (sel) ) ; xtmux3b i954( sel (sel) ) ; xtmux3b i955( sel (sel) ) ; xtmux3b i956( sel (sel) ) ; xtmux3b i957 ( sel (sel) ) ; xtmux3b i958 ( sel (sel) ) ; xtmux3b i959( sel (sel) ) ; xtmux3b i960 ( sel (sel) ) ; xtmux3b i961( sel (sel) ) ; xtmux3b i962 ( sel (sel) ) ; xtmux3b i963 (
Figure imgf000210_0001
Figure imgf000210_0002
sel (sel) ) ; xtmux3b i964 ( sel (sel) ) ; xtmux3b i965 ( sel (sel) ) ; xtmux3b i966( sel (sel) ) ; xtmux3b i967 ( sel (sel) ) ; xtmux3b i968 ( sel (sel) ) ; xtmux3b i969( sel (sel) ) ; xtmux3b i970 ( sel (sel) ) ; xtmux3b i971( sel (sel) ) ; xtmux3b 1972 ( sel (sel) ) ; xtmux3b i973 ( sel (sel) ) ; xtmux3b i974 ( sel (sel) ) ; xtmux3b i975 ( sel (sel) ) ; xtmux3b i976( sel (sel) ) ; xtmux3b i977( sel (sel) ) ; xtmux3b i978 ( sel (sel) ) ; xtmux3b i979( sel (sel) ) ; xtmux3b i980 ( sel (sel) ) ; xtmux3b i981( sel (sel) ) ; xtmux3b i9δ2 ( sel (sel) ) ; xtmux3b i983 ( sel (sel) ) ; xtmux3b i984 ( sel (sel) ) ; xtmux3b i985 ( sel (sel) ) ; xtmux3b i986( sel (sel) ) ; xtmux3b i987 ( sel (sel) ) ; xtmux3b i988 ( sel (sel) ) ; xtmux3b 1989 ( sel (sel) ) ; xtmux3b i990 ( sel (sel) ) ; xtmux3b i991( sel (sel) ) ; xtmux3b i992 ( sel (sel) ) ; xtmux3b 1993 (
Figure imgf000211_0001
sel (sel) ) ; xtmux3b 1994 ( Xtout (xtout [994] ) , a (a 994] ) b(b[994]) .c(c[994] ) , sel (sel) ) ; xtmux3b i995 ( xtout (xtout [995] ) , a (a 995]) b(b[995] ) .c(c[995] ) .sel (sel) ) ; xtmux3b i996( xtout (xtout [996] ) , a(a 996]) b(b[996] ) .c(c[996] ) . sel (sel) ) ; xtmux3b i997( xtout (xtout [997] ) , a (a 997]) b(b[997]) .c(c[997] ) . sel (sel) ) ; xtmux3b i998 ( xtout (xtout [998] ) , a (a 998] ) b(b[998]) .c(c[998] ) .sel (sel) ) ; xtmux3b i999( xtout (xtout [999] ) , a (a 999] ) b(b[999]) .c(c[999]) . sel (sel) ) ; xtmux3 .xtout (xtout [1000] .a a[1000] -b(b[1000] ,c(c[1000] ) sel)); xtmux3 . xtout (xtout [1001] .a a [1001] .b(b[1001] ,c(c[1001] ) sel) ) ; xtmux3 . xtout (xtout [1002] .a a[1002] .b(b[1002] ,c(c[1002] ) sel) ) ; xtmux3 . tout (xtout [1003] .a a[1003] .b(b[1003] c(c[1003] ) sel) ) ; xtmux3 .xtout (xtout [1004] a[1004] .b(b[1004] .c(c[1004] ) sel)); xtmux3 .xtout (xtout [1005] .a a [1005] .b(b[1005] ,c(c[1005] ) sel) ) ; xtmux3 .xtout (xtout [1006] a[1006] .b(b[1006] c(c[1006] ) sel)) ; xtmux3 . xtout (xtout [1007] a[1007] .b(b[1007] ,c(c[1007] ) sel) ) ; xtmux3 . tout (xtout [1008] a [1008] .b(b[1008] ,c(c[1008] ) sel) ) ; xtmux3 . tout (xtout [1009] a[1009] .b(b[1009] c(c[1009] ) sel) ) ; xtmux3 . xtout (xtout [1010] a[1010] .b(b[1010] .c(c[1010] ) sel) ) ,- xtmux3 . xtout (xtout [1011] .a a [1011] .b(b[1011] c(c [1011] ) sel) ) ; xtmux3 . xtout (xtout [1012] a [1012] .b(b[1012] c(c[1012] ) sel) ) ; xtmux3 . tout (xtout [1013] a [1013] .b(b[1013] c(c[1013] ) sel) ) ; xtmux3 .xtout (xtout [1014] a[1014] .b(b[1014] c(c[1014] ) sel) ) ; xtmux3 . xtout (xtout [1015] a [1015] .b(b[1015] c(c[1015] ) sel) ) ; xtmux3 . tout (xtout [1016] [1016] .b(b[1016] c(c[1016] ) sel) ) ; xtmux3 . tout (xtout [1017] a [1017] .b(b[1017] c(c[1017] ) sel) ) ; xtmux3 . xtout (xtout [1018] a[1018] .b(b[1018] c(c[1018] ) sel) ) ; xtmux3 . xtout (xtout [1019] a [1019] .b(b[1019] c(c[1019] ) sel) ) ; xtmux3 . xtout (xtout [1020] a[1020] .b(b[1020] c(c[1020] ) sel)); xtmux3 . tout (xtout [1021] a[1021] .b(b[1021] c(c[1021] ) el) ) ; xtmux3b . xtout (xtout [1022] a [1022] .b(b[1022] c(c[1022] ) el) ) ; xtmux3b . tout (xtout [1023] a[1023] .b(b[1023] c(c[1023] )
Figure imgf000212_0001
el) ) ; endmodule module xtmux4e_ 1024 (xtout, a, b, c, d, sel) ; output [1023:0] xtout ; input [1023 :0] a, b, c, d; input [1:0] sel xtmux4b 10 ( .xtout (xtout [0] .b(b[0] .d(d[o]
.sel (sel) ) ; xtmux4b 11 ( .xtout (xtout [1] • b(b[l] • d(d[l]
.sel (sel) ) ; xtmux4b 12 ( .xtout (xtout [2] .b(b[2] .d(d[2]
.sel (sel) ) ; xtmux4b 13 ( .xtout (xtout [3] .b(b[3] .d(d[3]
. sel (sel) ) ; xtmux4b 14 ( .xtout (xtout [4] ■ b(b[4] • d(d[4]
.sel (sel) ) ; xtmux4b 15 ( .xtout (xtout [5] .b(b[5] .d(d[5]
.sel (sel) ) ; xtmux4b 16 ( .xtout (xtout [6] -b(b[6] .d(d[6]
.sel (sel) ) ; xtmux4b 17 ( .xtout (xtout [7] -b(b[7] • d(d[7]
.sel (sel) ) ; xtmux4b i8( .xtout (xtout [8] .b(b[8] .d(d[8]
. sel (sel) ) ; xtmux4b 19 ( .xtout (xtout [9] ■ b(b[9] • d(d[9]
.sel (sel) ) ; xtmux4b HO ( .xtout (xtout [10 .a 10]) , .b( 0])
•d(d[10]), sel (sel)); xtmux4b ill ( .xtout (xtout [11 .a 11] • b( 1])
■ d(d[ll]), sel (sel)); xtmux4b il2 ( .xtout (xtout [12 12] • b( 2])
• d(d[12]) , sel (sel) ) ; xtmux4b il3 ( .xtout (xtout [13 13] • b( 3])
• d(d[13]), sel (sel)); xtmux4b il4 ( .xtout (xtout [14 14] • b( 4])
• d(d[14]), sel (sel) ) ; xtmux4b il5 ( .xtout (xtout [15 15] • b( 5])
.d(d[15]) , sel (sel)); xtmux4b il6 ( .xtout (xtout [16 .a 16] • b( 6])
• d(d[16]), sel (sel)); xtmux4b il7 ( .xtout (xtout [17 17] -b( 7])
• d(d[17]), sel (sel)); xtmux4b il8 ( .xtout (xtout [18 lδ] • b( 8])
• d(d[18]), sel (sel)); xtmux4b il9 ( .xtout (xtout [19 19] • b( 9])
•d(d[19]), sel (sel)); xtmux4b i20 ( .xtout (xtout [20 20] ■ b( 0])
• d(d[20]), sel (sel)); xtmux4b i21 ( .xtout (xtout [21 21] • b( 1])
• d(d[21]), sel (sel)); xtmux4b i22 ( .xtout (xtout [22 22] -b( 2])
.d(d[22]) , sel (sel) ) ; xtmux4b i23 ( .xtout (xtout [23 23] • b( 3])
• d(d[23]), sel (sel)); xtmux4b i24 ( .xtout (xtout [24 .a 24] • b( 4])
• d(d[24]), sel (sel) ) ; xtmux4b i25 ( .xtout (xtout [25 25] • b(b 5])
• d(d[25]), . sel (sel)); xtmux4b i26 ( .xtout (xtout [26 26] • b(b
Figure imgf000213_0001
6])
.d(d[26]), . sel (sel)); xtmux4b (d[27]), xtmux4b (d[28]) , xtmux4b (d[29]); xtmux4b (d[30]) , xtmux4b (d[31]), xtmux4b (d[32]), xtmux4b (d[33]), xtmux4b (d[34]), xtmux4b (d[35]); xtmux4b (d[36]), xtmux4b (d[37]), xtmux4b (d[38]), xtmux4b (d[39]), xtmux4b (d[40]), xtmux4b (d[41]), xtmux4b (d[42]), xtmux4b (d[43]), xtmux4b (d[44]), xtmux4b (d[45]), xtmux4b (d[46]), xtmux4b (d[47]), xtmux4b (d[48]), xtmux4b (d[49]), xtmux4b (d[50]), xtmux4b (d[51]), xtmux4b (d[52 ), xtmux4b (d[53]), xtmux4b (d[54]), xtmux4b (d[55]), xtmux4b
Figure imgf000214_0002
(d[56]),
Figure imgf000214_0001
xtmux4b
• d(d[57]), xtmux4b
■ d(d[58]), xtmux4b
• d(d[59]), xtmux4b
• d(d[60]) , xtmux4b
• d(d[61]), xtmux4b
• d(d[62]), xtmux4b .d(d[63]) , xtmux4b
• d(d[64]), xtmux4b
• d(d[65]), xtmux4b
• d(d[66]), xtmux4b .d(d[67]) , xtmux4b
• d(d[68]), xtmux4b
• d(d[69]), xtmux4b
• d(d[70]) , xtmux4b .d(d[71]), xtmux4b
• d(d[72]), xtmux4b
• d(d[73]), xtmux4b
• d(d[74]), xtmux4b
• d(d[75]), xtmux4b
• d(d[76]), xtmux4b *d(d[77]), xtmux4b
■ d(d[78]) , xtmux4b
• d(d[79]), xtmux4b *d(d[80]), xtmux4b *d(d[81]), xtmux4b *d(d[82]), xtmux4b •d(d[δ3]), xtmux4b
• d(d[δ4]), xtmux4b *d(d[δ5]), xtmux4b
Figure imgf000215_0002
*d(d[δ6]),
Figure imgf000215_0001
xtmux4b iδ7 ( .xtout (xtout [δ7] a (a 87] b(b[87]
• d(d[87]), . sel (sel) ) ,- xtmux4b iδ8 ( .xtout (xtout [88] a (a 8δ] b(b[δδ] .c
.d(d[88]), . sel (sel) ) ; xtmux4b i89 ( .xtout (xtout [89] a (a 89] b(b[89] .
• d(d[δ9]), . sel (sel) ) ; ' xtmux4b i90 ( .xtout (xtout [90] a(a 90] b(b[90] -d(d[90]), , sel (sel) ) ; xtmux4b i91 ( .xtout (xtout [91] a (a 91] b(b[91] .
• d(d[91]), . sel (sel) ) ; xtmux4b i92 ( .xtout (xtout [92] a (a 92] b(b[92]
■ d(d[92]), , sel (sel) ) ; xtmux4b i93 ( .xtout (xtout [93] a (a 93] b(b[93]
• d(d[93]), . sel (sel) ) ; xtmux4b i94 ( .xtout (xtout [94] a(a 94] b(b[94]
• d(d[94]), . sel (sel) ) ; xtmux4b i95 ( .xtout (xtout [95] a(a 95] b(b[95]
• d(d[95]), . sel (sel) ) ; xtmux4b i96 ( .xtout (xtout [96] a (a 96] b(b[96] •d(d[96]), . sel (sel) ) ; xtmux4b i97 ( .xtout (xtout [97] a (a 97] b(b[97] -d(d[97]), , sel (sel) ) ; xtmux4b i98 ( .xtout (xtout [98] a(a 98] b(b[98]
• d(d[98]), . sel (sel) ) ; xtmux4b i99 ( .xtout (xtout [99] a (a 99] b(b[99]
• d(d[99]), . sel (sel) ) ; xtmux4b ilOO ( .xtout (xtout [100 .a .b(b[100] 100] -d(d[100]), . sel (sel) ) ; xtmux4b ilOl ( .xtout (xtou [101 .b(b[101] 101] .d(d[101]) , . sel (sel) ) ; xtmux4b 1102 ( .xtout (xtout [102 .b(b[102] 102] .d(d[102]), . sel (sel) ) ; xtmux4b il03 ( .xtout (xtout [103 .b(b[103] 103]
• d(d[103]), .sel (sel) ) ; xtmux4b il04 ( .xtout (xtout [104 .b(b[104] 04] •d(d[104]), .sel (sel) ) ; xtmux4b il05 ( .xtout (xtout [105 .b(b[105] 105] .d(d[105]), . sel (sel) ) ; xtmux4b il06 ( .xtout (xtout [106 .b(b[106] 106] .d(d[106]), . sel (sel) ) ; xtmux4b il07 ( .xtout (xtout [107 .b(b[107] 07] •d(d[107]), . sel (sel) ) ; xtmux4b ilOδ ( .xtout (xtout [108 .b(b[10δ] 08] .d(d[108]), .sel (sel) ) ; xtmux4b il09 ( .xtout (xtout [109 .b(b[109] 09]
• d(d[109]), . sel (sel) ) ; xtmux4b illO ( .xtout (xtout [110 .b(b[110] 10]
■ d(d[110]) , . sel (sel) ) ; xtmux4b illl ( .xtout (xtout [111 .a .b(b[lll] 11]
• d(d[lll]), . sel (sel) ) ; xtmux4b ill2 ( .xtout (xtout [112 .b(b[112] 12] ■d(d[112]), .sel (sel) ) ; xtmux4b ill3 ( .xtout (xtout [113 .b(b[113] 13] *d(d[113]), .sel (sel) ) ; xtmux4b ill4 ( .xtout (xtout [114 .b(b[114] 14] •d(d[114]), .sel (sel) ) ; xtmux4b 1115 ( .xtout (xtout [115 .b(b[115] 15] *d(d[115]), .sel (sel) ) ; xtmux4b ill6 ( .xtout (xtout [116
Figure imgf000216_0001
.b(b[116]
Figure imgf000216_0002
16] *d(d[116]), .sel (sel) ) ; xtmux4b 1117 . xtout d(d[117]), . sel sel) ) ; xtmux4b 1118 . tout d(d[118]), .sel (sel)); xtmux4b ill9 . xtout d(d[119]), .sel sel) ) ; xtmux4b il20 . xtout d(d[120]) , . sel 'sel)); xtmux4b il21 . xtout d(d[121]), .sel sel) ) ,- xtmux4b il22 .xtout d(d[122] ) , . sel sel) ) ; xtmux4b 1123 .xtout d(d[123]), . sel sel) ) ; xtmux4b H24 .xtout d(d[124]), .sel .sel)); xtmux4b il25 . xtout d(d[125]) , . sel sel)); xtmux4b il26 . xtout d(d[126]), .sel sel) ) ; xtmux4b il27 .xtout d(d[127]), . sel sel) ) ; xtmux4b il28 . xtout d(d[128]) , . sel (sel)); xtmux4b il29 . xtout d(d[129]), .sel (sel)); xtmux4b il30 ( .xtout d(d[130]), .sel (sel)); xtmux4b 1131 . xtout d(d[131] ) , . sel (sel)); xtmux4b 1132 ' . xtout d(d[132]) , .sel sel) ) ; xtmux4b H33 .xtout d(d[133]), .sel sel) ) ; xtmux4b il34 . xtout d(d[134]), .sel sel) ) ; xtmux4b il35 . xtout d(d[135]), . sel sel) ) ; xtmux4b il36 . tout d(d[136]) , . sel sel) ) ; xtmux4b il37 .xtout d(d[137]), .sel sel) ) ; xtmux4b il38 . xtout d(d[138]), .sel sel) ) ; xtmux4b il39 . xtout d(d[139]), .sel sel) ) ; xtmux4b il40 . tout d(d[140]), . sel sel) ) ; xtmux4b il41 .xtout d(d[141]), .sel sel)); xtmux4b il42 .xtout d(d[142]), .sel sel) ) ; xtmux4b il43 .xtout d(d[143]), .sel sel) ) ; xtmux4b il44 . tout d(d[144] ) , . sel sel) ) ; xtmux4b il45 . xtout d(d[145]), . sel sel) ) ; xtmux4b H46< . tout
Figure imgf000217_0001
d(d[146]) , . sel sel)); xtmux4b .d(d[147]) , xtmux4b .d(d[148]), xtmux4b
• d(d[149]), xtmux4b
.d(d[150]) , xtmux4b
• d(d[151]), xtmux4b .d(d[152]) , xtmux4b .d(d[153]), xtmux4b •d(d[154]), xtmux4b .d(d[155]) , xtmux4b .d(d[156]), xtmux4b •d(d[157]), xtmux4b •d(d[158]), xtmux4b
• d(d[159]), xtmux4b .d(d[160]), xtmux4b -d(d[161]), xtmux4b -d(d[162]), xtmux4b -d(d[163]), xtmux4b -d(d[164]), xtmux4b .d(d[165]), xtmux4b .d(d[166]), xtmux4b .d(d[167]), xtmux4b .d(d[168]), xtmux4b •d(d[169]), xtmux4b
• d(d[170]), xtmux4b .d(d[171]), xtmux4b .d(d[172]), xtmux4b .d(d[173]), xtmux4b .d(d[174]), xtmux4b •d(d[175]), xtmux4b
• d(d[176]),
Figure imgf000218_0001
xtmux4b .d(d[177]), xtmux4b •d(d[178]), xtmux4b
• d(d[179]), xtmux4b .d(d[180]), xtmux4b ■ d(d[181]), xtmux4b .d(d[182]), xtmux4b .d(d[183]), xtmux4b •d(d[184]), xtmux4b .d(d[lδ5] ) , xtmux4b .d(d[186]), xtmux4b •d(d[187]), xtmux4b .d(d[18δ]), xtmux4b
• d(d[189]), xtmux4b .d(d[190]), xtmux4b -d(d[191]), xtmux4b •d(d[192]), xtmux4b .d(d[193]), xtmux4b .d(d[194]), xtmux4b
• d(d[195]), xtmux4b •d(d[196]), xtmux4b •d(d[197]), xtmux4b
• d(d[198]), xtmux4b •d(d[199]), xtmux4b .d(d[200]) , xtmux4b .d(d[201]), xtmux4b .d(d[202]) , xtmux4b .d(d[203]) , xtmux4b .d(d[204]), xtmux4b .d(d[205]), xtmux4b
Figure imgf000219_0002
.d(d[206]),
Figure imgf000219_0001
xtmux4b i207 .xtout .d(d[207]) , . sel sel) ) ,- xtmux4b i20β . tout .d(d[208] ) , . sel sel) ) ; xtmux4b i209 .xtout .d(d[209]), . sel sel) ) ; xtmux4b i210 . tout .d(d[210]), . sel sel)); xtmux4b 1211 .xtout .d(d[211]), .sel sel) ) ; xtmux4b i212 .xtout .d(d[212]), . sel sel) ) ,- xtmux4b i213 . tout
• d(d[213]), . sel sel) ) ; xtmux4b i214 .xtout .d(d[2l4]), . sel sel) ) ; xtmux4b 1215 . tout .d(d[215]), . sel sel) ) ; xtmux4b i216 . xtout .d(d[216]), . sel sel)); xtmux4b i217 .xtout .d(d[2l7]), . sel sel) ) ; xtmux4b i21δ . tout .d(d[21β]), . sel sel)); xtmux4b 1219 . tout .d(d[219]), . sel sel) ) ; xtmux4b i220 .xtout .d(d[220] ) , . sel sel) ) ; xtmux4b 1221 . tout .d(d[221]), . sel sel)),- xtmux4b i222 .xtout .d(d[222] ) , . sel sel)); xtmux4b i223 -xtout .d(d[223]), . sel sel) ) ; xtmux4b i224 -xtout .d(d[224]), . sel sel) ) ; xtmux4b i225 .xtout .d(d[225]), . sel sel) ) ; xtmux4b i226 . tout .d(d[226] ) , . sel sel)); xtmux4b i227 . xtout .d(d[227]), .sel sel) ) ; xtmux4b i228 .xtout .d(d[22δ] ) , .sel sel) ) ; xtmux4b i229 .xtout .d(d[229]), . sel sel) ) ; xtmux4b i230 . xtout .d(d[230]), . sel sel) ) ; xtmux4b i231 . tout
• d(d[231]), . sel sel) ) ; xtmux4b i232 .xtout
.d(d[232]), .sel sel) ) ; xtmux4b i233 .xtout
• d(d[233]), . sel sel) ) ; xtmux4b i234 . xtout
■d(d[234]), . sel sel) ) ; xtmux4b i235 . xtout
• d(d[235]), . sel sel) ) ; xtmux4b i236 .xtout
Figure imgf000220_0001
Figure imgf000220_0002
• d(d[236]), . sel sel) ) ; xtmux4b 1237 .xtout
(d[237]), . sel (sel)); xtmux4b i238 .xtout
(d[238]), .sel [eel)) ; xtmux4b 1239 .xtout
(d[239]), .sel (sel) ) ; xtmux4b i240 ( .xtout
(d[240]), . sel [sel)); xtmux4b i241 . tout
(d[241]) , . sel (sel)),- xtmux4b 1242 .xtout
(d[242]), . sel sel) ) ; xtmux4b i243 .xtout
(d[243]) , .sel (sel)); xtmux4b i244 . xtout
(d[244]), . sel (sel)); xtmux4b i245 .xtout
(d[245]) , .sel (sel)) ; xtmux4b i246 ( . xtout
(d[246]), . sel 'sel)); xtmux4b i247 . xtout
(d[247]), .sel (sel) ) ,- xtmux4b i248 .xtout
(d[248]), .sel 'sel)) ; xtmux4b i249 .xtout
(d[249]) , .sel (sel) ) ; xtmux4b i250 .xtout
(d[250]) , .sel (sel)); xtmux4b i251 ' .xtout
(d[251]), . sel sel) ) ; xtmux4b i252 .xtout
(d[252]) , .sel (sel) ) ; xtmux4b i253 ' .xtout
(d[253]), .sel sel) ) ; xtmux4b i254 . tout
(d[254]), . sel sel)); xtmux4b i255 i .xtout
(d[255]), . sel 'sel)); xtmux4b i256 .xtout
(d[256]) , .sel sel) ) ; xtmux4b 1257 .xtout
(d[257]), .sel sel)); xtmux4b i25δ .xtout
(d[258]) , . sel sel) ) ; xtmux4b i259 . xtout
(d[259]), .sel sel) ) ; xtmux4b i260 . tout
(d[260]), .sel sel) ) ; xtmux4b 1261 . xtout
(d[261]), .sel sel) ) ,- xtmux4b i262 .xtout
(d[262]). . sel sel) ) ; xtmux4b i263 . xtout
(d[263])7 .sel sel) ) ; xtmux4b i264 ( .xtout
(d[264]) , .sel ( sel) ) ; xtmux4b i265 .xtout
(d[265]), .sel sel)); xtmux4b i266 .xtout
Figure imgf000221_0001
Figure imgf000221_0002
(d[266]) , .sel sel) ) ; xtmux4b .d(d[267]), xtmux4b .d(d[268] ) , xtmux4b .d(d[269] ) , xtmux4b .d(d[270]), xtmux4b .d(d[271]), xtmux4b .d(d[272]), xtmux4b
• d(d[273]), xtmux4b .d(d[274]), xtmux4b .d(d[275]), xtmux4b .d(d[276]), xtmux4b .d(d[277]) , xtmux4b .d(d[278]), xtmux4b
• d(d[279]), xtmux4b .d(d[280] ) , xtmux4b -d(d[281]), xtmux4b .d(d[282] ) , xtmux4b .d(d[283] ) , xtmux4b .d(d[284]) , xtmux4b .d(d[285] ) , xtmux4b .d(d[286] ) , xtmux4b .d(d[287]), xtmux4b .d(d[288] ) , xtmux4b .d(d[289]), xtmux4b .d(d[290]), xtmux4b
• d(d[291]), xtmux4b .d(d[292]), xtmux4b .d(d[293]) , xtmux4b
• d(d[294]), xtmux4b
• d(d[295]), xtmux4b
Figure imgf000222_0002
• d(d[296]),
Figure imgf000222_0001
xtmux4b
• d(d[297]), xtmux4b .d(d[29δ]) , xtmux4b .d(d[299]) , xtmux4b .d(d[300]) , xtmux4b .d(d[301]), xtmux4b .d(d[302]) , xtmux4b
• d(d[303]), xtmux4b
• d(d[304]), xtmux4b .d(d[305]) , xtmux4b .d(d[306] ) , xtmux4b
■ d(d[307]), xtmux4b .d(d[308] ) , xtmux4b .d(d[309]), xtmux4b
■ d(d[310]), xtmux4b .d(d[311]), xtmux4b .d(d[312]) , xtmux4b -d(d[313]), xtmux4b -d(d[314]), xtmux4b -d(d[315]), xtmux4b .d(d[3l6]), xtmux4b
• d(d[317]), xtmux4b
• d(d[318]), xtmux4b
• d(d[319]), xtmux4b .d(d[320] ) , xtmux4b .d(d[321]), xtmux4b .d(d[322]) , xtmux4b .d(d[323]), xtmux4b •d(d[324]), xtmux4b .d(d[325]), xtmux4b .d(d[326]),
Figure imgf000223_0001
xtmux4b *d(d[327]), xtmux4b .d(d[328]), xtmux4b *d(d[329]), xtmux4b .d(d[330]) , xtmux4b *d(d[331]), xtmux4b *d(d[332]), xtmux4b .d(d[333] ) , xtmux4b *d(d[334]), xtmux4b .d(d[335] ) , xtmux4b *d(d[336]), xtmux4b
• d(d[337]) , xtmux4b
.d(d[338]), xtmux4b
• d(d[339]), xtmux4b
■ d(d[340]), xtmux4b .d(d[341]), xtmux4b .d(d[342]), xtmux4b
• d(d[343]), xtmux4b .d(d[344]) , xtmux4b .d(d[345]), xtmux4b
• d(d[346]), xtmux4b
■ d(d[347]), xtmux4b
.d(d[348]), xtmux4b
• d(d[349]) , xtmux4b .d(d[350]), xtmux4b .d(d[351]), xtmux4b .d(d[352]), xtmux4b
■ d(d[353]) , xtmux4b .d(d[354]), xtmux4b .d(d[355] ) , xtmux4b
Figure imgf000224_0002
.d(d[356]),
Figure imgf000224_0001
xtmux4b
• d(d[357]), xtmux4b .d(d[358]), xtmux4b •d(d[359]), xtmux4b .d(d[360]), xtmux4b •d(d[361]), xtmux4b .d(d[362]), xtmux4b
■ d(d[363]), xtmux4b .d(d[364]), xtmux4b .d(d[365]), xtmux4b .d(d[366]), xtmux4b
• d(d[367]), xtmux4b .d(d[368]), xtmux4b .d(d[369]), xtmux4b
■ d(d[370]), xtmux4b .d(d[371]), xtmux4b .d(d[372]), xtmux4b
• d(d[373]), xtmux4b
• d(d[374]), xtmux4b
.d(d[375]), xtmux4b
• d(d[376]), xtmux4b
• d(d[377]), xtmux4b
• d(d[378]), xtmux4b
• d(d[379]), xtmux4b .d(d[380]), xtmux4b .d(d[381]), xtmux4b .d(d[382] ) , xtmux4b
• d(d[383]), xtmux4b
• d(d[3δ4]), xtmux4b
.d(d[3δ5]), xtmux4b
• d(d[386]),
Figure imgf000225_0001
xtmux4b (d[387]), xtmux4b (d[388]); xtmux4b (d[389]), xtmux4b (d[390]), xtmux4b (d[391]), xtmux4b
.d (d[392])( xtmux4b (d[393]), xtmux4b (d[394]); xtmux4b (d[395]); xtmux4b (d[396]), xtmux4b
.d (d[397]), xtmux4b (d[398]), xtmux4b (d[399]), xtmux4b (d[400]), xtmux4b (d[401]), xtmux4b (d[402]), xtmux4b (d[403]) , xtmux4b
.d (d[404]) , xtmux4b (d[405]), xtmux4b (d[406]), xtmux4b (d[407]), xtmux4b (d[408]), xtmux4b (d[409]), xtmux4b
,d (d[410]), xtmux4b (d[411]), xtmux4b (d[412]), xtmux4b (d[413]), xtmux4b (d[414]), xtmux4b (d[415]), xtmux4b
Figure imgf000226_0002
(d[416]),
Figure imgf000226_0001
xtmux4b (d[417]) , xtmux4b (d[418]), xtmux4b (d[419]) , xtmux4b (d[420]) , xtmux4b (d[421]), xtmux4b (d[422]), xtmux4b (d[423]), xtmux4b (d[424]), xtmux4b (d[425]), xtmux4b (d[426]) , xtmux4b (d[427]), xtmux4b (d[428]), xtmux4b (d[429]) , xtmux4b (d[430]) , xtmux4b (d[431]), xtmux4b (d[432]), xtmux4b (d[433]), xtmux4b (d[434]), xtmux4b (d[435]), xtmux4b (d[436]), xtmux4b (d[437]), xtmux4b (d[438]), xtmux4b (d[439]), xtmux4b (d[440]) , xtmux4b (d[441]), xtmux4b (d[442]), xtmux4b (d[443]), xtmux4b (d[444]), xtmux4b (d[445]), xtmux4b (d[446] ) ,
Figure imgf000227_0001
xtmux4b .b(b[447]
•d(d[447]), xtmux4b .b(b[448]
■ d(d[448]), xtmux4b .b(b[449]
• d(d[449]), xtmux4b .b(b[450]
• d(d[4503), xtmux4b .b(b[451] .d(d[451]), xtmux4b .b(b[452] .d(d[452]), xtmux4b .b(b[453]
• d(d[453]), xtmux4b .b(b[454]
• d(d[454]), xtmux4b .b(b[455] .d(d[455]), xtmux4b .b(b[456] .d(d[456]), xtmux4b .b(b[457] •d(d[457]), xtmux4b .b(b[458] .d(d[458]), xtmux4b .b(b[459]
• d(d[459]), xtmux4b .b(b[460] .d(d[460]) , xtmux4b .b(b[461] -d(d[461]), xtmux4b .b(b[462] -d(d[462]), xtmux4b .b(b[463] -d(d[463]), xtmux4b .b(b[464] -d(d[464]), xtmux4b .b(b[465]
• d(d[465]), xtmux4b .b(b[466]
.d(d[466]), xtmux4b .b(b[467]
• d(d[467]), xtmux4b .b(b[468]
.d(d[468]), xtmux4b .b(b[469]
• d(d[469]), xtmux4b .b(b[470] •d(d[470]), xtmux4b .b(b[471] .d(d[471]), xtmux4b .b(b[472]
• d(d[472]), xtmux4b .b(b[473]
• d(d[473]), xtmux4b .b(b[474] .d(d[474]), xtmux4b .b(b[475] .d(d[475]), xtmux4b .b(b[476]
Figure imgf000228_0002
.d(d[476]) ,
Figure imgf000228_0001
xtmux4b
.d(d[477]) , xtmux4b
• d(d[478]), xtmux4b
• d(d[479]), xtmux4b .d(d[480]), xtmux4b .d(d[481]) , xtmux4b .d(d[482]), xtmux4b .d(d[483]), xtmux4b •d(d[484]), xtmux4b .d(d[485]), xtmux4b .d(d[486]), xtmux4b •d(d[487]), xtmux4b .d(d[488]), xtmux4b
• d(d[489]), xtmux4b .d(d[490]), xtmux4b .d(d[491]), xtmux4b •d(d[492]), xtmux4b ■ d(d[493]), xtmux4b
• d(d[494]), xtmux4b
• d(d[495]), xtmux4b
• d(d[496]), xtmux4b
• d(d[497]), xtmux4b
• d(d[498]), xtmux4b .d(d[499]), xtmux4b .d(d[500]), xtmux4b .d(d[501]), xtmux4b .d(d[502] ) , xtmux4b .d(d[503]), xtmux4b .d(d[504]), xtmux4b .d(d[505]), xtmux4b
Figure imgf000229_0002
*d(d[506]),
Figure imgf000229_0001
xtmux4b 1507 .xtout
• d(d[507]), . sel sel) ) ; xtmux4b i508 .xtout
.d(d[508]), . sel sel)) ,- xtmux4b i509 .xtout
■ d(d[509]), . sel sel) ) ,- xtmux4b i510 .xtout
■ d(d[510]), . sel sel) ) ,- xtmux4b i511 .xtout
• d(d[511]), .sel sel) ) ; xtmux4b i512 . xtout .d(d[512]), . sel sel) ) ; xtmux4b i513 .xtout .d(d[513]), . sel sel) ) ; xtmux4b i514 . xtout •d(d[514]), . sel sel) ) ; xtmux4b i515 .xtout -d(d[515]), . sel sel) ) ,- xtmux4b i516 . xtout
• d(d[5163), . sel sel)); xtmux4b i517 .xtout
.d(d[517]), .sel sel)); xtmux4b i51β . xtout
■ d(d[518]), . sel sel) ) ; xtmux4b i519 .xtout •d(d[519]), . sel sel) ) ; xtmux4b i520 .xtout .d(d[520]), . sel sel) ) ; xtmux4b i521 .xtout .d(d[521]), .sel sel) ) ; xtmux4b i522 .xtout -d(d[522]), . sel sel) ) ; xtmux4b 1523 .xtout .d(d[523]), . sel sel) ) ; xtmux4b i524 .xtout .d(d[524]), . sel sel) ) ; xtmux4b i525 .xtout
• d(d[525]), . sel sel)); xtmux4b i526 .xtout .d(d[526]), . sel sel) ) ; xtmux4b i527 .xtout .d(d[527]), . sel sel) ) ; xtmux4b i528 .xtout .d(d[528]) , . sel sel) ) ; xtmux4b i529 .xtout .d(d[529]), . sel sel) ) ; xtmux4b i530 .xtout .d(d[530]), . sel sel) ) ; xtmux4b 1531 .xtout
• d(d[531]), . sel sel) ) ; xtmux4b i532 . tout *d(d[532]), . sel sel) ) ; xtmux4b i533 .xtout .d(d[533]), . sel sel) ) ; xtmux4b i534 .xtout *d(d[534]), . sel sel) ) ; xtmux4b i535 .xtout ■d(d[535]), .sel sel) ) ; xtmux4b i536 . tout
Figure imgf000230_0001
Figure imgf000230_0002
.d(d[536]), . sel sel) ) ; xtmux4b
• d(d[537]), xtmux4b .d(d[538] ) , xtmux4b .d(d[539]), xtmux4b .d(d[540]), xtmux4b
• d(d[541]), xtmux4b *d(d[542]), xtmux4b .d(d[543]), xtmux4b .d(d[544]), xtmux4b •d(d[545]), xtmux4b .d(d[546]), xtmux4b .d(d[547]), xtmux4b
• d(d[548]), xtmux4b
• d(d[549]), xtmux4b
.d(d[550] ) , xtmux4b
• d(d[551]), xtmux4b .d(d[552]) , xtmux4b .d(d[553] ) , xtmux4b .d(d[554]), xtmux4b .d(d[555]) , xtmux4b
• d(d[5563), xtmux4b .d(d[557]), xtmux4b .d(d[558]) , xtmux4b .d(d[559]), xtmux4b .d(d[560]), xtmux4b .d(d[561]), xtmux4b .d(d[562]), xtmux4b .d(d[563]) , xtmux4b .d(d[564]), xtmux4b .d(d[565]) , xtmux4b
Figure imgf000231_0002
,d(d[5663) ,
Figure imgf000231_0001
xtmux4b
• d(d[567]), xtmux4b
.d(d[568]) , xtmux4b
• d(d[569]), xtmux4b .d(d[570]), xtmux4b -d(d[571]), xtmux4b •d(d[572]), xtmux4b .d(d[573]), xtmux4b .d(d[574]), xtmux4b .d(d[575]), xtmux4b .d(d[576]) , xtmux4b
■ d(d[577]), xtmux4b
.d(d[578]), xtmux4b
• d(d[579]), xtmux4b
.d(d[580]), xtmux4b
■ d(d[581]) , xtmux4b .d(d[582]), xtmux4b .d(d[583]), xtmux4b .d(d[584]) , xtmux4b .d(d[585]), xtmux4b .d(d[5δ6]), xtmux4b
• d(d[587]) , xtmux4b .d(d[588] ) , xtmux4b .d(d[589]), xtmux4b
• d(d[590]), xtmux4b •d(d[591]), xtmux4b .d(d[592]), xtmux4b .d(d[593]), xtmux4b *d(d[594]), xtmux4b *d(d[595]), xtmux4b
Figure imgf000232_0002
,d(d[596]),
Figure imgf000232_0001
xtmux4b *d(d[5973), xtmux4b *d(d[59β]), xtmux4b *d(d[599]), xtmux4b .d(d[600] ) , xtmux4b .d(d[601]), xtmux4b .d(d[602] ) , xtmux4b .d(d[603]), xtmux4b .d(d[604]), xtmux4b .d(d[605]), xtmux4b .d(d[606]) , xtmux4b .d(d[607]), xtmux4b .d(d[608] ) , xtmux4b .d(d[609]), xtmux4b .d(d[610]), xtmux4b
• d(d[611]), xtmux4b
.d(d[612]), xtmux4b
• d(d[613]), xtmux4b
• d(d[614]), xtmux4b .d(d[615]), xtmux4b .d(d[616]), xtmux4b •d(d[617]), xtmux4b .d(d[618]), xtmux4b
• d(d[619]), xtmux4b .d(d[620]) , xtmux4b .d(d[621]), xtmux4b .d(d[622] ) , xtmux4b .d(d[623]), xtmux4b .d(d[624] ) , xtmux4b .d(d[625]), xtmux4b
Figure imgf000233_0002
.d(d[626]),
Figure imgf000233_0001
xtmux4b .b(b[627] .d(d[627]), xtmux4b .b(b[628] .d(d[628] ) , xtmux4b .b(b[629] .d(d[629]), xtmux4b .b(b[630] .d(d[630]), xtmux4b .b(b[631] .d(d[631]), xtmux4b .b(b[632] .d(d[632]), xtmux4b .b(b[633] •d(d[633]), xtmux4b .b(b[634]
• d(d[634]), xtmux4b .b(b[635] .d(d[635]), xtmux4b .b(b[636] .d(d[636]), xtmux4b .b(b[637]
• d(d[637]), xtmux4b .b(b[638] .d(d[638]), xtmux4b .b(b[639] -d(d[639]), xtmux4b .b(b[640] .d(d[640]), xtmux4b .b(b[641]
• d(d[641]), xtmux4b .b(b[642] -d(d[642]), xtmux4b .b(b[643] •d(d[643]), xtmux4b .b(b[644] .d(d[644]), xtmux4b .b(b[645]
• d(d[645]), xtmux4b .b(b[646] .d(d[646]), xtmux4b .b(b[647] •d(d[647]), xtmux4b -b(b[648] .d(d[648]), xtmux4b .b(b[649] •d(d[649]), xtmux4b .b(b[650] .d(d[650]), xtmux4b .b(b[651] .d(d[651]), xtmux4b .b(b[652] .d(d[652]), xtmux4b .b(b[653] •d(d[653]), xtmux4b .b(b[654] .d(d[654]), xtmux4b .b(b[655] ■d(d[655]), xtmux4b .b(b[656]
Figure imgf000234_0002
.d(d[656]),
Figure imgf000234_0001
xtmux4b .d(d[657]) , xtmux4b .d(d[658]), xtmux4b .d(d[659]), xtmux4b .d(d[660] ) , xtmux4b .d(d[661]) , xtmux4b .d(d[662] ) , xtmux4b .d(d[663]), xtmux4b .d(d[664]) , xtmux4b .d(d[665]) , xtmux4b .d(d[666]) , xtmux4b .d(d[667]) , xtmux4b .d(d[66δ] ) , xtmux4b .d(d[6693), xtmux4b .d(d[670]), xtmux4b .d(d[671]) , xtmux4b .d(d[672]) , xtmux4b .d(d[673]), xtmux4b .d(d[674]) , xtmux4b • d(d[675]), xtmux4b .d(d[676]), xtmux4b •d(d[677]), xtmux4b .d(d[67δ]) , xtmux4b .d(d[679]), xtmux4b .d(d[6δ0] ) , xtmux4b .d(d[681]), xtmux4b .d(d[682] ) , xtmux4b .d(d[683]) , xtmux4b •d(d[684]), xtmux4b *d(d[685]) , xtmux4b .d(d[686]) ,
Figure imgf000235_0001
xtmux4b ±687 . tout d(d[687]), . sel sel) ) ,- xtmux4b i688 . xtout d(d[688] ) , .sel sel) ) ; xtmux4b i689 . xtout d(d[669]), .sel sel) ) ; xtmux4b i690 .xtout d(d[690]), .sel .sel)) ; xtmux4b i691 . xtout d(d[691]), . sel .sel)) ; xtmux4b i692 . xtout d(d[692]), . sel (sel) ) ; xtmux4b i693 .xtout d(d[693]), . sel sel) ) ; xtmux4b i694 .xtout d(d[694]), .sel sel) ) ; xtmux4b i695 . tout d(d[695]), .sel sel) ) ; xtmux4b i696 . xtout d(d[696]), .sel sel)); xtmux4b i697 . xtout d(d[697]), . sel sel) ) ; xtmux4b i698 . xtout d(d[698]), . sel sel) ) ; xtmux4b i699 .xtout d(d[699]), . sel sel) ) ; xtmux4b i700 .xtout d(d[700]), .sel sel)) ,- xtmux4b i701 .xtout d(d[701]), .sel sel) ) ; xtmux4b i702 .xtout d(d[702]), .sel .sel) ) ; xtmux4b 1703 . xtout d(d[703] ) , .sel sel)); xtmux4b 1704 . xtout d(d[704]), .sel (sel)); xtmux4b i705 . xtout d(d[705]), .sel sel) ) ; xtmux4b 1706 . xtout d(d[706]), .sel sel) ) ; xtmux4b i707 . xtout d(d[707]), .sel sel)); xtmux4b i708 . xtout d(d[708]), .sel sel)); xtmux4b i709 . xtout d(d[709]), .sel sel)); xtmux4b i710 . xtout d(d[710]), . sel sel) ) ; xtmux4b i711 . xtout d(d[711]), . sel sel) ) ; xtmux4b i712 . xtout d(d[712]), . sel sel) ) ; xtmux4b i713 . xtout d(d[713]), .sel sel) ) ; xtmux4b i714 . xtout d(d[714]), .sel sel) ) ; xtmux4b i715( . xtout d(d[715]), .sel sel)); xtmux4b i716( . xtout
Figure imgf000236_0001
d(d[716]), .sel sel) ) ; xtmux4b
.d(d[717]), xtmux4b
• d(d[718]), xtmux4b
• d(d[719]), xtmux4b .d(d[720]) , xtmux4b .d(d[721]) , xtmux4b .d(d[722]), xtmux4b •d(d[723]), xtmux4b .d(d[724]) , xtmux4b
• d(d[725]), xtmux4b .d(d[726]), xtmux4b .d(d[727]), xtmux4b .d(d[728]), xtmux4b .d(d[729]), xtmux4b .d(d[730]) , xtmux4b -d(d[731]), xtmux4b
• d(d[732]), xtmux4b
• d(d[733]), xtmux4b
• d(d[734]), xtmux4b
•d(d[735]), xtmux4b
• d(d[736]), xtmux4b
• d(d[737]), xtmux4b
• d(d[738]), xtmux4b
• d(d[739]), xtmux4b
• d(d[740]), xtmux4b
• d(d[741]), xtmux4b
• d(d[742]), xtmux4b .d(d[743]), xtmux4b •d(d[744]), xtmux4b •d(d[745]), xtmux4b
Figure imgf000237_0002
• d(d[746]),
Figure imgf000237_0001
xtmux4b ■ d(d[747]), xtmux4b .d(d[748]), xtmux4b
• d(d[749]), xtmux4b
• d(d[750]), xtmux4b -d(d[751]), xtmux4b -d(d[752]), xtmux4b -d(d[753]), xtmux4b -d(d[754]), xtmux4b
• d(d[755]), xtmux4b .d(d[756]), xtmux4b -d(d[757]), xtmux4b .d(d[758]) , xtmux4b
• d(d[759]), xtmux4b
.d(d[760]), xtmux4b
• d(d[761]), xtmux4b
.d(d[762]), xtmux4b
• d(d[763]), xtmux4b •d(d[764]), xtmux4b ■d(d[765]), xtmux4b .d(d[766]), xtmux4b
• d(d[767]), xtmux4b .d(d[768]), xtmux4b *d(d[769]), xtmux4b •d(d[770]), xtmux4b
• d(d[771]), xtmux4b
■d(d[772]), xtmux4b
• d(d[773]), xtmux4b
• d(d[774]), xtmux4b
*d(d[775]), xtmux4b
Figure imgf000238_0002
• d(d[776]),
Figure imgf000238_0001
xtmux4b i777 . xtout .d(d[777]), . sel sel)); xtmux4b i77δ . xtout .d(d[7783) , . sel sel) ) ; xtmux4b i779 . xtout .d(d[779]) , . sel sel) ) ; xtmux4b i7δ0 . xtout .d(d[7803), . sel sel) ) ; xtmux4b i7δl . xtout
• d(d[781]), . sel sel) ) ; xtmux4b i7δ2 . xtout .d(d[782]), . sel sel) ) ; xtmux4b i7δ3 . xtout •d(d[7833), . sel sel) ) ; xtmux4b i7δ4 . tout
• d(d[7643), .sel sel) ) ; xtmux4b I7δ5 .xtout
• d(d[785]), . sel sel) ) ; xtmux4b i766 . tout
• d(d[786]), . sel sel) ) ; xtmux4b 1787 .xtout
• d(d[787]), . sel sel) ) ; xtmux4b i78δ . xtout -d(d[788]), . sel sel)); xtmux4b i7δ9 . xtout -d(d[789]), . sel sel) ) ; xtmux4b i790 . xtout
• d(d[790]), . sel sel) ) ; xtmux4b i791 .xtout .d(d[791]) , . sel sel) ) ; xtmux4b i792 . xtout .d(d[792]), . sel sel) ) ; xtmux4b i793 .xtout .d(d[793]), . sel sel)); xtmux4b i794 .xtout •d(d[794]), .sel sel)); xtmux4b i795 . tout
• d(d[795]), . sel sel) ) ; xtmux4b i796 .xtout
• d(d[796]), . sel sel) ) ; xtmux4b i797 . tout
• d(d[797]), . sel sel) ) ; xtmux4b i798 .xtout
• d(d[798]), . sel sel) ) ; xtmux4b i799 . xtout
• d(d[799]), . sel sel) ) ; xtmux4b iδOO .xtout .d(d[800]) , . sel sel) ) ; xtmux4b i801 .xtout .d(d[801]), .sel sel) ) ; xtmux4b i802 .xtout .d(d[802] ) , . sel sel) ) ; xtmux4b i803 .xtout .d(d[803]), . sel sel) ) ; xtmux4b i804 . xtout *d(d[804]), . sel sel) ) ; xtmux4b i805 . tout *d(d[605]), . sel sel) ) ; xtmux4b 1806 .xtout
Figure imgf000239_0001
Figure imgf000239_0002
.d(d[δ06]), .sel sel) ) ; xtmux4b i807 . tout
,d(d [807]), . sel sel) ) ; xtmux4b i808 .xtout
.d(d [808]), .sel sel) ) ; xtmux4b i809 .xtout .d(d [809]) , .sel sel) ) ; xtmux4b 1810 .xtout .d(d [810]), . sel sel)); xtmux4b i811 .xtout .d(d [δll]), .sel sel) ) ; xtmux4b 1812 . xtout .d(d [δl2]), . sel sel) ) ; xtmux4b i813 .xtout .d(d [δl3]), .sel sel) ) ; xtmux4b i814 .xtout .d(d [814]) , .sel sel) ) ; xtmux4b i815 .xtout .d(d [815]), . sel sel)),- xtmux4b 1816 .xtout .d(d [816]), .sel sel) ) ; xtmux4b i817 .xtout .d(d [817]) , . sel sel) ) ; xtmux4b i818 . tout .d(d [818]), . sel sel) ) ; xtmux4b 1819 .xtout .d(d [819]), .sel sel) ) ; xtmux4b i820 . tout .d(d [820]) , . sel sel) ) ; xtmux4b i821 .xtout .d(d [821]), . sel sel)); xtmux4b i822 .xtout .d(d [822]), .sel sel) ) ,- xtmux4b 1823 .xtout .d(d [823]), . sel sel) ) ; xtmux4b 1824 . xtout .d(d [824]) , . sel sel) ) ; xtmux4b i825 .xtout .d(d [825]) , . sel sel)); xtmux4b i826 .xtout .d(d [826]), . sel sel) ) ,- xtmux4b i827 .xtout .d(d [827]), . sel sel) ) ; xtmux4b i828 .xtout .d(d [82δ]), . sel sel) ) ; xtmux4b i829 . xtout d(d [829]), . sel sel) ) ; xtmux4b i830 .xtout d(d [830]) , . sel sel) ) ; xtmux4b i831 . tout d(d [831]), .sel sel)); xtmux4b i832 .xtout d(d [832]), . sel sel) ) ; xtmux4b i833 . xtout d(d [833]), . sel sel) ) ; xtmux4b i834 .xtout d(d [834]), . sel sel)); xtmux4b i835 .xtout d(d [835]), . sel sel) ) ; xtmux4b i836 .xtout
Figure imgf000240_0001
d(d [836]), . sel sel) ) ; 240 xtmux4b .d(d[867]) , xtmux4b .d(d[868] ) , xtmux4b .d(d[869]), xtmux4b .d(d[δ70]), xtmux4b .d(d[871]), xtmux4b .d(d[8723), xtmux4b •d(d[8733), xtmux4b
• d(d[8743), xtmux4b .d(d[δ753), xtmux4b .d(d[δ76]), xtmux4b •d(d[877]), xtmux4b .d(d[878]), xtmux4b -d(d[879]), xtmux4b .d(d[880] ) , xtmux4b .d(d[881]), xtmux4b .d(d[882] ) , xtmux4b .d(d[883]), xtmux4b .d(d[884]), xtmux4b .d(d[δδ53), xtmux4b .d(d[886] ) , xtmux4b .d(d[8873), xtmux4b .d(d[888] ) , xtmux4b .d(d[8δ9]), xtmux4b .d(d[δ90]), xtmux4b *d(d[891]), xtmux4b .d(d[8923), xtmux4b *d(d[893]), xtmux4b
• d(d[894]), xtmux4b
• d(d[δ95]), xtmux4b
Figure imgf000241_0002
*d(d[8963),
Figure imgf000241_0001
239 xtmux4b i837 .xtout
• d(d[837]), . sel sel)); xtmux4b i838 . tout .d(d[838]), .sel sel) ) ; xtmux4b i839 . xtout .d(d[839]), .sel sel) ) ; xtmux4b i840 .xtout .d(d[840]), .sel sel) ) ; xtmux4b i841 .xtout .d(d[841]), . sel sel) ) ; xtmux4b i842 . xtout .d(d[δ42]), . sel sel) ) ; xtmux4b i843 .xtout .d(d[843]) , . sel sel) ) ; xtmux4b i844 . xtout -d(d[844]), . sel sel) ) ; xtmux4b i845 . xtout .d(d[845]), .sel sel) ) ; xtmux4b i846 . xtout .d(d[846]), . sel sel) ) ; xtmux4b i847 . tout -d(d[847]), . sel sel) ) ; xtmux4b i848 . xtout .d(d[848]), .sel sel) ) ; xtmux4b i849 . tout
• d(d[849]), . sel sel) ) ; xtmux4b i850 .xtout .d(d[850]), .sel sel) ) ; xtmux4b i851 . tout -d(d[δ51]), . sel sel) ) ; xtmux4b 1852 . xtout -d(d[δ52]), .sel sel) ) ; xtmux4b i853 . tout -d(d[853]), . sel sel) ) ; xtmux4b 1854 .xtout .d(d[854]) , . sel sel)); xtmux4b i855 .xtout -d(d[8553), . sel sel) ) ; xtmux4b 1856 . tout -d(d[856]), . sel sel) ) ; xtmux4b i857 . xtout .d(d[857]), .sel sel) ) ; xtmux4b 1858 . tout .d(d[858])f . sel sel) ) ; xtmux4b i859 .xtout
• d(d[δ59]), .sel sel) ) ; xtmux4b i860 . xtout .d(d[860] ) , . sel sel) ) ; xtmux4b i861 .xtout .d(d[861]), .sel sel) ) ; xtmux4b i862 . tout .d(d[862]) , . sel sel) ) ; xtmux4b i863 . tout .d(d[863]) , . sel sel) ) ; xtmux4b i864 . xtout .d(d[864]), .sel sel) ) ; xtmux4b i865 . xtout
• d(d[865]), . sel sel) ) ; xtmux4b i866 . xtout
Figure imgf000242_0001
Figure imgf000242_0002
.d(d[866]), . sel sel) ) ,- xtmux4b *d(d[8973), xtmux4b .d(d[8983), xtmux4b •d(d[δ993), xtmux4b .d(d[9003), xtmux4b *d(d[90l3), xtmux4b .d(d[902]) , xtmux4b •d(d[903]), xtmux4b •d(d[9043), xtmux4b
• d(d[9053), xtmux4b .d(d[906]), xtmux4b •d(d[9073), xtmux4b .d(d[908]), xtmux4b ■ d(d[9093), xtmux4b .d(d[910]), xtmux4b .d(d[911]), xtmux4b -d(d[912]), xtmux4b •d(d[913]), xtmux4b .d(d[9143), xtmux4b
• d(d[9153), xtmux4b .d(d[916]), xtmux4b .d(d[917]), xtmux4b
• d(d[918]), xtmux4b
• d(d[919]), xtmux4b .d(d[920]), xtmux4b *d(d[921]), xtmux4b .d(d[922]), xtmux4b *d(d[923]), xtmux4b .d(d[924]), xtmux4b .d(d[925]), xtmux4b
Figure imgf000243_0002
.d(d[926]),
Figure imgf000243_0001
xtmux4b .b(b[927] .c [927]
.d(d[927]) , xtmux4b .b(b[928] i [928]
• d(d[928]), xtmux4b .b(b[929] [929]
• d(d[929]) , xtmux4b .b(b[930] :[930] .d(d[930]), xtmux4b .b(b[931] :[931] ■ d(d[931]), xtmux4b .b(b[932] :[932] .d(d[932]), xtmux4b .b(b[9333 :[933] .d(d[933]) , xtmux4b .b(b[934] ![934] •d(d[934]), xtmux4b .b(b[9353 :[935] .d(d[935] ) , xtmux4b .b(b[936] :[936] .d(d[936]), xtmux4b .b(b[937] :[937] •d(d[937]), xtmux4b .b(b[938] :[938] .d(d[938]), xtmux4b .b(b[939] ![9393
• d(d[939]), xtmux4b .b(b[940] :[940]
•d(d[940]), xtmux4b .b(b[941] :[9413
• d(d[941]) , xtmux4b .b(b[942] ![942] .d(d[942]) , xtmux4b .b(b[943] :[943] .d(d[943]) , xtmux4b .b(b[944] ![944] .d(d[944] ) , xtmux4b .b(b[945] :[945] .d(d[945]), xtmux4b .b(b[946] :[946] .d(d[946]), xtmux4b .b(b[947] :[947]
• d(d[947]), xtmux4b .b(b[948] :[948]
• d(d[948]), xtmux4b .b(b[949] :[949]
• d(d[949]), xtmux4b .b(b[950] [950] .d(d[950]), xtmux4b .b(b[951] :[951] •d(d[951]), xtmux4b .b(b[952] [952] .d(d[952]) , xtmux4b .b(b[953] :[953] •d(d[953]), xtmux4b .b(b[954] [954] *d(d[954]), xtmux4b .b(b[955] .c [955] .d(d[955]) , xtmux4b .b(b[956] :[956] .d(d[956]),
Figure imgf000244_0001
xtmux4b i957 . xtout d(d[957]), .sel [sel)); xtmux4b i958 ' . xtout d(d[958]) , . sel .sel) ) ; xtmux4b i959 . xtout d(d[959]), .sel (sel) ) ; xtmux4b i960 ' . xtout d(d[960]), . sel .sel)); xtmux4b i961 .xtout d(d[961]), . sel (sel)); xtmux4b i962 ' .xtout d(d[962] ) , . sel .sel) ) ; xtmux4b i963 .xtout d(d[963]), . sel (sel)); xtmux4b i964 ' .xtout d(d[964]), . sel (sel) ) ; xtmux4b i965 . xtout d(d[965]) , . sel (sel)); xtmux4b i966 ' . xtout d(d[966]), .sel [sel)); xtmux4b i967 .xtout d(d[967]), . sel (sel)); xtmux4b i968 [ . tout d(d[968]) , . sel (sel)) ; xtmux4b i969 ( . xtout d(d[969]), . sel (sel)); xtmux4b i970 ..xtout d(d[970]), . sel (sel) ) ; xtmux4b i971 [ .xtout d(d[971]), . sel (sel)); xtmux4b i972 . tout d(d[972]), . sel (sel)) ; xtmux4b i973 . xtout d(d[973]) , . sel (sel)); xtmux4b i974 . xtout d(d[974]), . sel sel) ) ; xtmux4b i975 . xtout d(d[975]), . sel (sel)); xtmux4b i976 ' . tout d(d[976]), . sel sel) ) ; xtmux4b i977 l . xtOUt d(d[977]), . sel [sel)); xtmux4b i978 ' . xtout d(d[978]), . sel sel) ) ; xtmux4b i979 . xtout d(d[979]), . sel sel) ) ; xtmux4b i980 ' . tout d(d[980]), . sel sel) ) ; xtmux4b i981 . xtout d(d[981]), . sel sel) ) ; xtmux4b 1982 . tout d(d[982]), . sel sel) ) ; xtmux4b i983 . xtout d(d[9δ3]), . sel sel) ) ; xtmux4b i984 .xtout d(d[984]), . sel sel) ) ; xtmux4b i985 . xtout d(d[985]), . sel sel) ) ; xtmux4b i986 .xtout
Figure imgf000245_0001
Figure imgf000245_0002
d(d[986]), .sel sel) ) ; [987] [988] [989] [990] [991] [992] [993] c[994] c[995] c[996] c[997] c [998] c[999]
Figure imgf000246_0001
xtmux4b 1017 ( .xtout (xtout [1017 .a(a[1017] ) .b(b[1017])
.c(c[10l7] ) .d(d[1017]), .sel(sel) xtmux4b 1018 ( .xtout (xtout [1018 .a(a[1018] ) .b(b[1018])
• c(c[10l83 ) .d(d[1018]), .sel(sel) xtmux4b 1019 ( .xtout (xtout [1019 .a(a[1019] ) .b(b[10193)
.c(c[10l9]) .d(d[1019]), .sel(sel) xtmux4b 1020 ( .xtout (xtout [1020 .a(a[1020]) .b(b[1020])
.c(c[1020] ) .d(d[1020]), .sel(sel) xtmux4b 1021 ( .xtout (xtout [1021 .a(a[1021] ) .b(b[1021])
.c(c[1021]) .d(d[1021] ) , .sel(sel) xtmux4b 1022 ( .xtout (xtout [1022 .a(a[1022]) .b(b[1022] )
.c(c[1022] ) .d(d[1022]), .sel(sel) xtmux4b 1023 ( .xtout (xtout [1023 .a(a[1023] ) .b(b[1023] )
.c(c[1023] ) .d(d[1023] ) , .sel(sel) endmodule module xtcsa_ 1024 (sum, carry, a, b, c) output [1023 : 0] sum, carry; input [1023:0 ] a, b, c; xtfa iO sum (sum [0] carry ( carry [ 0 ] .a(a[0] .b c[03 ) ) ; xtfa il sum (sum [1] carry (carry [1] ■ a(a[l] .b c[l])) ; xtfa i2 sum (sum [2] carry (carry [2] .a(a[2] .b c[2] ) ) ; xtfa i3 sum (sum [3] carry (carry [3] .a(a[3] .b c[3])) ; xtfa i4 sum (sum [4] carry (carry [4] • a(a[4] .b c [4] ) ) ; xtfa i5 sum (sum [5] carry (carr [5] ■ a(a[5] -b c [5] ) ) ; xtfa i6 sum (sum [6] carry (carry [6] .a(a[6] -b c[6] )) ; xtfa i7 sum (sum [7] carry (carry [7] -a(a[7] .b c[7])) ; xtfa i8 sum (sum [8] carry ( carry [ 8 ] .a(a[8] .b c[8] ) ) ; xtfa 19 sum (sum [9] carry (carry [9] • a(a[9] .b c [9] ) ) ; xtfa ilO . sum (sum [10] . carry (carry [10] a (a 10]
.c(c[10])) ; xtfa ill . sum (sum [11] .carry (carry [11] .a(a 11]
• c(c[ll])) ; xtfa il2 . sum (sum [12] . carry (carry [12 ] .a (a 12] .c(c[12])); xtfa il3 . sum (sum [13] . carry (carry [13] .a (a 13] .c(c[13])) ; xtfa il4 . sum (sum [14] . carry ( carry [ 14 ] .a (a 14] .c(c[14])) ; xtfa il5 . sum (sum [15] . carry (carry [15] .a(a 15] .c(c[15])); xtfa il6 . sum (sum [16] . carry (carry [16] .a (a 16] .c(c[16] )) ; xtfa il7 . sum (sum [17] . carry (carry [17] .a (a 17] .c(c[17])) ; xtfa ilδ . sum(sum[18] .carry (carry [18] .a(a 18] .c(c[18])) ; xtfa il9 . sum(sum[19] . carry (carry [19] .a(a 19] .c(c[19])); xtfa i20 . sum (sum [20] . carry (carry [20] .a (a 20] .c(c[20])); xtfa i21 . sum (sum [21] . carry (carry [21] .a(a 21] .c(c[21])) ; xtfa i22 . sum (sum [22] . carry ( carry [2 ] .a(a 22] .c(c[22])); xtfa i23 . sum(sum[23] . carry (carr [23] .a(a 23] .c(c[23])); xtfa i24 . sum (sum [24] . carry ( carry [24] .a(a 24] .c(c[24])); xtfa i25 . sum(sum[25] . carry ( carry [25] .a (a 25]
Figure imgf000247_0001
.c(c[25])) ; . carry (carry [26] [26] .b(b[26]) , . carry (carry [27] [27] .b(b[27]) , . carry (carry [28] [28] .b(b[28]) , . carry (carry [29] [29] .b(b[29]) , . carry (carry [30] [30] .b(b[30] ) , . carry (carry [31] [31] ■ b(b[31]) , . carry(carry [32] .[32] .b(b[32]) , . carry (carry [33] [33] .b(b[33]) , . carry (carry [34] .[34] .b(b[34]). . carry (carry [35] [35] .b(b[35]) , . carry (carry [36] .[36] .b(b[36]) , . carry (carry [37] .a [37] ,b(b[37]) , . carry (carry [38] .[38] .b(b[38] ) , . carry(carry [39] .[39] .b(b[39] ) , . carry (carry [40] .[40] .b(b[40]) , . carry (carry [41] .[41] .b(b[41] ) , . carry (carry [42] .[42] .b(b[423) , . carry (carry [43] .[43] .b(b[43]) , . carry (carry [44] .[44] .b(b[44]) , . carry(carry [45] [45] .b(b[45] ) , . carry (carry [46] .[46] .b(b[46]) , . carry (carry [47] .[47] .b(b[47] ) , . carry (carry [48] .[48] .b(b[48]) , . carry (carry [49] .[49] .b(b[49] ) , . carry (carry [50] .[50] .b(b[50]) , . carry (carry [51] .[51] .b(b[51]) , . carry (carry [52] [52] .b(b[523) , .carry (carry [53] .[53] .b(b[53]) , . carry (carry [54] .[54] .b(b[54]) , . carry (carry [55] .[55] .b(b[55] ) ,
Figure imgf000248_0001
[56] [56] .b(b[56]) , [57] [57] .b(b[57]) , [58] .[58] .b(b[58]) , [59] [59] .b(b[593) , [60] [60] .b(b[60]) , [613 [61] .b(b[61]) , [62] [62] .b(b[62]) , [63] .[63] .b(b[633) , [64] .[64] .b(b[64]) , [65] .[65] .b(b[65]) , [66] [66] .b(b[663) , [67] .a .[67] .b(b[67]) , [68] .[68] .b(b[68l) , [69] .[69] .b(b[69]) , [70] [70] .b(b[703) , [71] .[71] .b(b[71]) , [72] [72] .b(b[723) , [73] .[73] .b(b[73]) , [74] [74] .b(b[74]) , [75] [75] .b(b[753) , [76] .a [76] .b(b[76]) , [77] [77] .b(b[773) , [78] [78] .b(b[78]) , [79] [79] .b(b[79]) , [80] [80] .b(b[δ0]) , [81] [81] .b(b[81]) , [82] [82] .b(b[82]) , [83] [83] .b(b[83]) , [84] [84] .b(b[84]) ,
Figure imgf000249_0002
[85] [85] .b(b[85]) ,
Figure imgf000249_0001
sum (sum [δ6] . carry (carry [86] .a(a[8 sum (sum [87] . carry (carry [87] .a(a[87 sum (sum [88] . carry (carry [88] .a(a[88 sum (sum [89] . carry (carry [89] • a(a[8 sum (sum [90] . carry (carry [90] .a(a[9 sum (sum [91] . carry (carry [91] .a(a[9
-C sum (sum [92] . carry (carry [92] .a(a[92 sum (sum [93] . carry (carry [93] .a(a[93 sum (sum [94] . carry (carry [94] .a(a[94 sum (sum [95] . carry (carry [95] .a(a[95 sum (sum [96] . carry (carry [96] .a(a[96 sum (sum [97] . carry (carry [97] .a(a[97 sum (sum [98] . carry (carry [98] .a(a[98 sum (sum [99] . carry (carry [99] .a(a[9 .sum (sum [100 , .carry (carry [100] ) 1003 . sum (sum [101 , .carry (carry [101] ) 01] .sum (sum [102 , .carry (carry [102] ) 102] .sum (sum [103 , .carry (carry [103] ) 03] .sum (sum [104 , .carry (carry [104] ) 04] . sum (sum [105 , .carry (carry [105] ) 05] . sum (sum [106 , .carry (carry [106] ) 06] .sum (sum [107 , .carry (carry [107] ) 07] .sum (sum [108 , .carry (carry [108] ) 08] . sum (sum [109 , .carry (carry [109] ) 09] .sum (su [110 , .carry (carry [110] ) 10] .sum (sum [111 , .carry (carry [111] ) 11] .su (sum [112 , .carry (carr [112] ) 12] .sum (sum [113 , .carry (carry [113] ) 13] .sum (sum [114 , .carry (carry [114] ) 14] .sum (sum [115 , . carry (carry [115] )
Figure imgf000250_0002
15]
Figure imgf000250_0001
-C
.c
Figure imgf000251_0001
Figure imgf000251_0002
C[145])) ; xtfa 1146 . carry (carry [146] :(c[146])); xtfa il47 . carry (carry [147] :(c[147])) ; xtfa il48 . carry (carry [148] :(C[148])); xtfa il49 . carry (carry [149] :(C[149])); xtfa 1150 . carry (carry [150] :(c[150])) ; xtfa il51 . carry (carry [151] :(c[151])); xtfa il52 . carry (carry [152] xtfa il53 . carry (carry [153] :(c[153])); xtfa il54 . carry (carry [154] :(c[154])); xtfa il55 . carry (carry [155] :(C[15S])); xtfa 1156 .carry (carry [156] :(c[156])); xtfa il57 . carry (carry [157] :(C[157])); xtfa 1158 . carry (carry [158] :(c[158])); xtfa il59 . carry (carry [159] xtfa 1160 .carry (carry [160] :(c[160])); xtfa il61 . carry (carry [161] :(c[161])); xtfa il62 . carry (carry [162] :(c[162])); xtfa il63 . carry (carry [163] :(c[163])); xtfa il64 .carry (carry [164] :(c[164])); xtfa il65 . carry (carry [165] :(c[165])); xtfa il66 . carry (carry [166] :(c[166])); xtfa il67 .carry (carry [167] :(c[167])); xtfa il68 .carry (carry [168] *(c[168])); xtfa H69 . carry (carry [169] :(c[169])); xtfa il70 . carry (carry [170] :(c[170])); xtfa 1171 . carry (carry [171] :(c[171])); xtfa il72 . carry (carry [172] :(c[172])); xtfa il73 .carry(carry [173] !(c[173])); xtfa il74 . carry (carry [174] :(c[174])); xtfa il75
Figure imgf000252_0001
. carry (carry [175]
Figure imgf000252_0002
ι(c[175])); xtfa il76 (c[176] )); xtfa il77 (c[177] )); xtfa il78 (c[178] )); xtfa 1179 (c[179] ) ); xtfa il80 (c[180] )); xtfa il81 (c[181] )); xtfa il82 (c[182] )>; xtfa il83 (c[183] )); xtfa il84 (c[184] )); xtfa il85 (c[185] )); xtfa il86 (c[186] ) ); xtfa il87 (c[187] )); xtfa il88 (c[188] )) ; xtfa il89 (c[189] )); xtfa il90 (c[190] ) ); xtfa il91 (c[191] ) ); xtfa 1192 (c[192] )); xtfa il93 (c[193] )> ; xtfa il94 (c[194] )); xtfa 1195 (c[195] )); xtfa il96 (c[196] )); xtfa 1197 (c[197] )>; xtfa 1198 (c[198] )) ; xtfa il99 (c[1993 xtfa i200 (c[200] )); xtfa i201 (c[201] )); xtfa i202 (c[202] )) ; xtfa i203 (c[2033 )); xtfa 1204 (c[2043 )); xtfa i205
Figure imgf000253_0001
Figure imgf000253_0002
(c[205] ) ); sum[2063 . carry ( carry [206] sum [207] . carry (carry [207] sum[208] . carry ( carry [208] sum [209] . carry (carry [209] sum[210] . carry (carry [210] sum [211] . carry (carry [211] sum [212] . carry ( carry [212] sum [213] . carry ( carry [213] sum [214] . carry (carry [214] sum [215] . carry (carry [215] sum [216] . carry (carry [216] um[217] . carry (carry [217] um[218] .carry (carry [218] um [219] . carry (carry [219] sum[220] . carry (carry [220] um[221] . carry (carry [221] um [222] . carry (carry [222] um[223] . carry ( carry [223] um [224] . carry (carry [224] um[225] . carry ( carry [225] um[226] .carry (carry [226] um [227] . carry (carry [227] um [228] . carry ( carry [228] um [229] . carry (carry [229] um [230] . carry (carry [230] um [231] . carry (carry [231] um[232] . carry (carry [232] um [233] . carry (carry [233] um [2343 . carry (carry [234] um [235] . carry (carry [235]
Figure imgf000254_0002
Figure imgf000254_0001
xtfa i236 (c[236])) ; xtfa i237 (c[237])) ; xtfa i238 (c[238])) ; xtfa i239 (c[239])); xtfa i240 (c[240])) ; xtfa i241 (c[241])); xtfa i242 (c[242])); xtfa 1243 (c[243])); xtfa i244 (C[244])) ;
Xtfa i245 (c[245])) ; xtfa i246 (c[246])); xtfa i247 (c[247])) ; xtfa i24δ (c[248])) ; xtfa i249 (c[249])) ; xtfa i250 (c[250])) ; xtfa i251 (C[251])); xtfa i252 (c[252])) ; xtfa i253 (c[253])); xtfa i254 (C[254])) ; xtfa i255 (c[255])) ; xtfa i256 (c[256])) ; xtfa i257 (c[257])); xtfa i258 (c[258])) ; xtfa 1259 (c[259])) ; xtfa i260 (c[260])) ; xtfa i261 (c[261])); xtfa i262 (c[262])); xtfa i263 (c[263])); xtfa 1264 (c[264])); xtfa i265
Figure imgf000255_0001
Figure imgf000255_0002
(c[265])); .sum (sum [266] .carry (carry [266] . sum (sum [267] . carry (carry [267] .sum (sum [268] .carry (carry [268]
. . sum (sum [269] . carry (carry [269] .sum (sum [270] .carry (carry [270] .sum (sum [2713 . carry (carry [271] .sum (su [2723 . carry ( carry [272] .sum (sum [273] . carr ( carry [273] . sum (sum [274] . carry (carry [274] .sum (su [275] .carry (carry [275] .sum (sum [276] . carry (carry [276] .sum (sum [277] . carry (carry [277] .sum (sum [278] . carry ( carry [278] .sum (sum [279] .carry (carry [279] .sum (sum [280] . carry (carry [280] .sum (sum [281] . carry (carry [281] . sum (sum [282] .carry (carry [282] .sum(sum[283] .carry (carry [283] .sum (su [284] . carry (carry [284] .sum (sum [285] . carry (carry [285] .sum (sum [286] .carry (carry [286] .sum (sum [287] . carry (carry [287] . sum (sum [288] . carry ( carry [286] .sum (sum [289] .carry (carry [289] .su (sum [290] .carry (carry [290] .sum (sum [291] . carry (carry [291] .sum (su [292] . carry (carry [292 ] sum (sum [293] .carr (carry [293]
-C sum (sum [294] . carry ( carry [294] sum (sum [295] •carry (carry [295]
Figure imgf000256_0002
Figure imgf000256_0001
sum [296]) , . carry (carry [296
:(c[296] xtfa sum [297] ) , . carry (carry [297 :(c[297] xtfa sum [298] ) , .carry (carry [29 8] :(c[298] xtfa sum [299]) , . carry (carry [299] :(c[299] xtfa sum [300] ) , . carry ( carry [300] :(c [300] xtfa sum [301] ) , .carry (carr [301] :(c[301] xtfa sum [302] ) , . carry ( carry [3023 :(c[302] xtfa sum [303] ) , .carry (carry [303] :(c [303] xtfa sum [304] ) , . carry (carry [304] :(c[304] xtfa sum [305]) , . carry ( carry [305] :(c[305] xtfa sum [306] ) , . carry ( (carry [306] :(c[306] xtfa sum [307] ) , . carry ( (carry [307] :(c[307] xtfa sum [308] ) , . carry ( Icarry [308] :(c[308] xtfa um [309] ) , . carry ( Icarry [309] :(c[309] xtfa um [310]) , . carry ( Icarry [310] :(c [310] xtfa sum [311] ) , . carry ( Icarry [311] :(c[311] xtfa um [312]) , . carry ( (carry [312] :(c [312] xtfa um [313] ) , . carry ( (carry [313] :(c[313] xtfa um [314]) , . carry ( (carry [314] :(c[314] xtfa um [315]) , . carry ( carry [315] :(c[315] xtfa um [316]) , . carry (carry [316] :(c[316] xtfa um [317]) , . carry ( carry [317] :(c[317] xtfa um [318] ) , . carry (carry [318] :(c[318] xtfa um [319]) , . carry (carry [319] xtfa um [320] ) , . carry (carry [320] :(c[320] xtfa um [321] ) , . carry (carry [321] !(c[321] xtfa um [322] ) , . carry ( carry [322] :(c[322] xtfa um [323]) , . carry (carry [323] :(c [323] xtfa um [324]) , . carry (carry [324] :(c[324] xtfa
Figure imgf000257_0001
um [325]) , . carry ( carry [325]
Figure imgf000257_0002
:(c[325] .b(b[326] ) ,
.b(b[327] ) ,
.b(b[328] ) ,
.b(b[329] ) ,
.b(b[330] ) ,
.b(b[331] ) ,
.b(b[332] ) ,
.b(b[333]) ,
.b(b[334] ) ,
.b(b[335] ) ,
.b(b[336] ) ,
.b(b[337] ) ,
.b(b[338] ) ,
.b(b[339] ) ,
.b(b[340] ) ,
.b(b[341] ) ,
.b(b[342] ) ,
.b(b[343] ) ,
.b(b[344] ) ,
.b(b[3453) ,
.b(b[346] ) ,
.b(b[347] ) ,
.b(b[348] ) ,
.b(b[349] ) ,
.b(b[350] ) ,
.b(b[351] ) ,
.b(b[352] ) ,
.b(b[353]) ,
.b(b[354]) ,
Figure imgf000258_0001
Figure imgf000258_0002
.b(b[355] ) ,
C [355 ] ) ) xtfa i356 .sum (sum [356] . carry (carry [356 .b(b[356] (c[356])) ; xtfa i357 .su (sum [357] . carry (carry [357 .b(b[357] (c[357])) ; xtfa i358 .sum (sum [358] .carry (carry [358 .b(b[3583 (c[358])) ; xtfa i359 .sum (su [359] . carry (carry [359 .b(b[359] (C[359])); xtfa i360 .sum (sum [360] .carry (carry [360 .b(b[360] (c[360])) ; xtfa i361 .sum (sum [361] . carry (carry [361 .b(b[361] (c[361])); xtfa i362 . sum (sum [362] .carry (carry [362 .b(b[362] (C[362])) ; xtfa i363 .sum (sum [363] .carry (carry [363] .b(b[363] (=[363])); xtfa i364 .sum (su [364] . carry (carry [364] .b(b[364] (c[364])); xtfa i365 .sum (sum [365] .carry (carry [365] .b(b[365] (=[365])); xtfa i366 .sum (sum [366] . carry (carry [366] .b(b[366] (C[366])) ; xtfa i367 .sum (sum [367] .carry (carry [367] .b(b[367] (c[367])); xtfa i368 .sum (sum [368] . carry (carry [368] .b(b[368] (=[368])); xtfa i369 .sum (sum [369] .carry (carry [369] .b(b[369] (C[369])) ; xtfa i370 . sum (sum [370] .carry (carr [370] .b(b[370] (■=[370])); xtfa i371 .sum (sum [371] . carry (carry [371] .b(b[371] (=[371])); xtfa i372 .sum (sum [372] .carry (carry [372] .b(b[372] (c[372])) ; xtfa i373 .sum (sum [373] .carry (carry [373] .b(b[373] (c[373])); xtfa i374 . sum (sum [374] . carry (carry [374] .b(b[374] (C[374])) ; xtfa 1375 . sum (sum [375] . carry (carry [375] .b(b[375] (C[375])) ; xtfa i376 .sum (sum [376] .carry (carry [376] .b(b[376] (c[376])) ;
Xtfa i377 .sum (sum [377] . carry(carry [377] .b(b[377] (C[377])); xtfa i378 .sum (sum [378] . carry (carry [378] .b(b[378] (c[378])); xtfa i379 .sum (sum [379] . carry (carry [379] .b(b[379] (c[379])) ; xtfa 1380 .sum (sum [380] . carry (carry [380] .b(b[380] (c[380])); xtfa 1381 .sum (sum [381] . carry (carry [381] .b(b[381] (c[381])); xtfa i382 .sum (sum [382] . carry (carr [382] .b(b[382] (c[382])); xtfa i383 .sum(sum [383] . carry (carry [383] .b(b[383] (c[383])); xtfa i384 .sum (sum [384] . carry (carry [384] .b(b[3843 (•=[384])); xtfa i385 .sum (sum [385] .carry (carry [385]
Figure imgf000259_0001
.b(b[385] (=[385])); sum[3δ6] . carry (carry [386] um[387] . carry (carry [387] sum[388] . carry (carry [388] sum[389] . carry (carry [389] sum[390] .carry (carry [390] sum [391] .carry (carry [391] sum[392] . carry ( carry [392] sum [393 ] . carry ( carry [393] sum [394] . carry (carry [394] um [395] . carry (carry [395] sum[396] . carry (carry [396] sum [397] . carry (carry [397] um[398] . carry (carry [398] um[399] . carry (carry [399]
Figure imgf000260_0001
um [400] . carry (carry [400] sum[401] .carry (carry [401] um[402] . carry (carry [402] um[403] .carry (carry [403] um [404] . carry (carry [404] um [405] . carry (carry [405] um [406] . carry (carry [406] um [407] . carry ( carry [407] um[408] . carry (carry [408] um[409] .carry (carry [409] um[410] . carry (carry [410] um [411] . carry (carry [411] um [412] . carry ( carry [412] um [413] . carry (carry [413] um [414] . carry (carry [414] um [415] . carry (carry [415]
Figure imgf000260_0003
Figure imgf000260_0002
xtfa i416 .carry (carry [416] (c[416] )); xtfa i417 . carry (carry [417] (c[417] )); xtfa i418 .carry (carry [41δ] (c[418] )); xtfa i419 . carry (carry [419] (c[419] )); xtfa i420 . carry (carry [420] (c[420] )); xtfa i421 .carry (carry [421] (c[421] )); xtfa 1422 . carry (carry [422] (c[422] )); xtfa i423 . carry (carry [423] (c[423] )); xtfa i424 .carry (carry [424] (c[424] )); xtfa i425 .carry (carry [425] (c[4253 )); xtfa i426 . carry (carry [426] (c[426] )); xtfa i427 .carry (carry [427] (c[4273 )); xtfa i428 . carry (carry [42δ] (c[428] )); xtfa i429 . carry (carry [429] (c[429] )); xtfa i430 .carry (carry [430] (c[4303 )); xtfa i431 .carry (carry [431] (c[431] )); xtfa i432 . carry (carry [432] (c[4323 )); xtfa i433 .carry (carry [433] (c[433] )); xtfa 1434 . carry (carry [434] (c[434] )); xtfa 1435 . carry ( carry [435] (c[435] )); xtfa i436 .carry (carry [436] (c[436] )); xtfa i437 . carry (carry [437] (c[437] )); xtfa i438 . carry (carry [43δ] (c[438] )); xtfa i439 . carry (carry [439] (c[439] )); xtfa i440 . carry ( carry [440] (c[440] )); xtfa i441 .carry (carry [441] (c[441] )); xtfa i442 . carry ( carry [442] (c[442] )); xtfa i443 . carry (carry [443] (c[443] )); xtfa 1444 . carry (carry [444 ] (c[444] )); xtfa i445
Figure imgf000261_0001
. carry (carry [445]
Figure imgf000261_0002
(c[445] )); xtfa i446 ( . sum (sum [446] .carry (carry [446] c (=[446])) ; xtfa i447 ( . sum (su [447] . carry ( carry [447] c (C[447])) ; xtfa i448 ( . sum (sum [448] . carry (carry [448] c (c[448])); xtfa i449 ( . sum (sum [449] . carry (carry [449] c (c[449])) ; xtfa i450 ( . sum (sum [450] .carry (carry [450] c (c[450])) ; xtfa 1451 ( . sum (sum [451] . carry (carry [451] c (C[451])) ; xtfa 1452 ( . sum (sum [452] .carry (carry [452] c (c[452])) ; xtfa i453 ( . sum (sum [453] . carry (carry [453] c (c[453] )) ; xtfa i454 ( . sum (sum [454] .carry (carry [454] c (c[454])) ; xtfa 1455 ( . sum (sum [455] . carry (carry [455] c (c[455])) ; xtfa i456 ( . sum (su [456] .carry (carry [456] c (c[456])) ; xtfa i457 ( . sum (sum [457] . carry (carry [457] c (c[4573)) ; xtfa 1458 ( . sum (sum [458] . carry ( carry [458] c (c[458])) ; xtfa i459( . sum (sum [459] .carry (carry [459] c (c[459])) ; xtfa i460 ( . sum (sum [460] . carry (carry [ 60] c 'c[460])) ; xtfa i461 ( . sum (sum [461] .carry (carry [461] c [c[461] )) ; xtfa 1462 ( . sum (sum [462] . carry (carry [462] c [c[462])) ; xtfa i463 ( . sum (sum [463] . carry (carry [463] c C[463])) ; xtfa i464 ( . sum (sum [464] . carry ( carry [464] c c[464])) ; xtfa i465 ( . sum (sum [465] .carry (carry [465] c c [465] ) ) ; xtfa i466 ( . sum (sum [466] . carry (carry [466] c C[466])) ; xtfa 1467 ( -sum (sum [467] . carry ( carry [467] c c[467])); xtfa i468 ( . sum (sum [468] .carry (carry [468] c c[468])); xtfa i469 ( . sum (sum [469] . carry (carry [469] c c[469])) ; xtfa i470 ( .sum (sum [470] . carry ( carry [470] c c[470])) ; xtfa i471( .sum (sum [471] . carry ( carr [471] c C[471])); xtfa i472 ( .sum (sum [472] .carry (carry [472] c ( c[472])); xtfa i473 ( . sum(sum[473] . carry (carry [473] c ι =[473])); xtfa i474 ( .sum (sum [474] . carry (carry [474 ]
C 1 c[474])); xtfa i475 ( .sum (sum [475] .carry (carry [475]
Figure imgf000262_0001
C 1 C[475])); xtfa 1476 . sum (sum [476] carry (carry [476] (c[476])); xtfa i477 .sum (sum[477] carry (carry [477] {=[477])); xtfa i478 . sum (sum[478] carry (carry [478] (c[478])); xtfa i479 .sum (sum[479] carry (carry [479] (c[479])) ; xtfa i480 .sum (sum [480] carry (carry [480] (c[4δ0])); xtfa i461 .sum (sum[4δl] carry (carry [481] (c[481])); xtfa i4δ2 . sum (sum[482] carry (carry [482] (c[482])) ; xtfa 1483 .sum (sum[483] carry (carry [483] (c[483])); xtfa i484 . sum (sum [484] carry (carry [484] (C[484])) ; xtfa i485 . sum (sum[485] carry (carry [485] (c[485])); xtfa i486 .sum (sum [486] . carry (carry [486] (c[486])) ; xtfa i487 .sum (sum[487] . carry (carry [487] (■=[487])); xtfa i486 .sum (sum [488] . carry (carry [488] (c[488])); xtfa i489 . sum (sum [489] . carry (carry [489] (C[489])); xtfa i490 . sum (sum[490] . carry (carry [490] (=[490])); xtfa i491 . sum (sum [491] . carry (carry [491] (=[491])) ; xtfa i492 . sum (sum [492] . carry (carry [492] (=[492])) ; xtfa i493 .sum (sum [493] . carry (carry [493] (c[493])); xtfa i494 .sum (sum [494] . carry (carry [494] (c[494])) ; xtfa i495 .sum (sum [495] . carry (carry [495] (=[495])) ; xtfa i496 .sum (sum[496] .carry (carry [496] (c[496])); xtfa i497 . sum (sum [497] . carry (carry [497] (C[497])); xtfa i49δ .sum (sum [498] . carry (carry [498] (=[498])); xtfa i499 . sum (sum [499] .carry (carry [499] (c[499])) ; xtfa i500 .sum (sum[500] .carry (carry [500] (c[500])); xtfa 1501 .sum (sum [501] .carry (carry [501]
(■=[501])); xtfa i502 . sum (sum [502] . carry (carry [502] (=[502])); xtfa i503 .sum (sum [503] . carry (carry [503] (•=[503])); xtfa i504 .sum (sum [504] . carry (carry [504] (c[504])); xtfa i505 .sum (sum [505] . carry (carry [505]
Figure imgf000263_0001
(=[505])); xtfa i506 . carry (carry [506] (c[506])) ; xtfa i507 . carry (carry [507] (■=[507])); xtfa i508 . carry (carry [508] (=[508])); xtfa i509 .carry (carry [509] (c[509])); xtfa i510 .carry (carry [510] (=[510])); xtfa 1511 . carry (carry [511] (c[511])); xtfa i512 . carry (carry [512] (c[512])); xtfa i513 .carry (carry [513] (=[513])) ; xtfa i514 . carry (carry [514] (■=[514])); xtfa 1515 . carry (carry [515] (c[515])); xtfa i516 . carry (carry [516] (c[516])) ; xtfa 1517 . carry (carry [517] (=[517])); xtfa i51δ . carry (carry [518] (c[518])) ; xtfa i519 . carr (carry [519] (c[519])); xtfa i520 . carry (carry [520] (c[520])); xtfa i521 .carry (carry [521] (c[521])); xtfa i522 . carry (carry [522] (=[522])); xtfa 1523 .carry (carry [523] (c[523])) ; xtfa i524 . carry (carry [524] (c[524])) ; xtfa i525 . carry (carry [525] (c[525])); xtfa i526 . carry (carry [526] (c[526])) ; xtfa i527 .carry (carry [527] (c[527])) ; xtfa i52δ . carry (carry [528] (C[528])) ; xtfa 1529 .carry (carry [529] (■=[529])); xtfa i530 . carry (carry [530] (c[530])); xtfa 1531 . carry (carry [531] (c[531])) ; xtfa i532 . carry (carry [532] (c[532])) ; xtfa i533 . carry (carry [533] (c[533])) ; xtfa i534 . carry (carry [534] (c[534])); xtfa i535
Figure imgf000264_0001
. carry (carry [535]
Figure imgf000264_0002
(c[535])); xtfa 1536 (=[536])); xtfa i537 (•=[537])); xtfa i538 (c[538])); xtfa i539 (=[539])); xtfa i540 (=[540])) ; xtfa 1541 (c[541])); xtfa i542 (c[5423)); xtfa i543 (c[5433)) ; xtfa i544 (C[544])); xtfa i545 (•=[545])); xtfa i546 (=[546])) ; xtfa 1547 (=[547])); xtfa i54δ (■=[548])); xtfa i549 (c[549])) ; xtfa i550 (c[550])); xtfa 1551 (c[551])); xtfa i552 (c[552])) ; xtfa 1553 (c[553])); xtfa 1554 (C[554])); xtfa i555 (C[555])); xtfa i556 (c[556])); xtfa i557 (c[557])) ; xtfa i558 (c[558])); xtfa 1559 (c[559])); xtfa i560 (c[560])); xtfa i561 (•=[561])); xtfa i562 (c[562])) ; xtfa i563 (c[563])); xtfa i564 (C[564])); xtfa 1565
Figure imgf000265_0001
Figure imgf000265_0002
(=[565])); xtfa 1566 .sum (su [566] .carry (carry [566 ι(c[566])) ; xtfa i567 .sum (sum [567] .carry (carry [567 :(c[567])); xtfa i568 .sum(sum[56δ] . carry ( carry [56 δ !(C[568])); xtfa i569 .sum (sum [569] . carry (carry [569 .(=[569])); xtfa i570 .sum (sum [570] . carry (carry [570] !(c[570])); xtfa i571 .sum (sum [571] .carry (carry [571] (=[571])) ; xtfa i572 .sum (sum [572] . carry (carry [572] (c[572])); xtfa i573 .sum (sum [573] .carry (carry [573] (■=[573])); xtfa i574 .sum (sum [574] . carry (carry [574] (=[574])); xtfa i575 .sum (sum [575] . carry (carry [575] (■=[575])); xtfa i576 .sum (sum [576] .carry (carry [576] (C[576])); xtfa i577 .sum (sum [577] .carry (carry [577] (c[577])); xtfa i578 .sum (sum [578] .carry (carry [578] (c[57δ])) ; xtfa i579 .sum (sum [579] . carry (carry [579] (c[579])); xtfa i580 .sum (sum [580] .carry (carry [580] (c[5δ0])) ; xtfa 1581 .sum (sum [581] . carry (carry [581] (c[581]));
, xtfa i582 .sum (sum [582] .carry (carry [582] (c[582])); xtfa i583 .sum (sum [583] . carry (carry [583] (c[583])) ; xtfa i5δ4 .sum (sum [584] . carry (carry [584] (=[584])); xtfa i585 .su (sum [585] . carry ( carry [585] (c[585])); xtfa i586 .sum (sum [586] .carry (carry [586] (c[586])) ; xtfa i587 .sum (sum [587] .carry (carry [587] (■=[587])); xtfa i588 .sum (sum [588] . carry (carry [588] (c[588])); xtfa i589 .su (sum [589] . carry (carry [589] (c[589])); xtfa i590 .sum (su [590] .carry (carry [590] (c[590])); xtfa 1591 .sum (sum [591] . carry (carry [591] (=[591])); xtfa i592 .su (sum [592] . carry (carry [592 ] (c[592])) ; xtfa i593 .sum(sum[593] . carry (carry [5933 (c[593])); xtfa i594 .sum (sum [594] . carry (carry [5943 (c[594])) ; xtfa i595 .sum (sum [595] . carry (carry [595]
Figure imgf000266_0001
(•=[595])); xtfa sum [596] carry (carry [59 '6] :(c[596 xtfa sum [597] carry (carry [597] :(c[597] xtfa sum[59δ] carry (carry [598] :(c[598 xtfa sum[5993 carr (carry [599] :(c [599] xtfa sum[6003 carry (carry [600] :(c[600] xtfa sum[601] carr (carry [601] :(c[601] xtfa sum[602] carry (carry [602] :(c[602] xtfa sum[6033 carry (carry [603] :(c[603] xtfa sum [604] carry (carry [604] :(c[604] xtfa um[605] carry (carry [605] :(c[605] xtfa sum[606] carry (carry [606] ;(c[606] xtfa um [607] . carry carry [607] :(c[607] xtfa sum[60δ] . carry carry [608] :(c[608] xtfa um [609] . carry carry [609] :(c[609] xtfa um[610] . carry carry [610] :(c[610] xtfa um [611] . carry carr [611] :(c [611] xtfa um [612] . carry carry [612] :(c[612] xtfa um[613] . carry carry [613] :(c[613] xtfa um [614] . carry carry [614] :(c[614] xtfa um [615] . carry (carry [615] :(c[615] xtfa um[616] . carry (carry [616] :(c[616] xtfa um[617] . carry ( carr [617] :(c[617] xtfa um [618] . carry (carry [618] :(c[618] xtfa um[619] . carry (carry [619] :(c[619] xtfa um[620] . carry ( carry [620] :(c[620] xtfa um [621] .carry (carr [621] :(c[621] xtfa um[622] . carry (carry [622] !(c[622] xtfa um[623] . carry (carr [623] :(c[623] xtfa um [624] . carry (carry [624] ι(c[624] xtfa
Figure imgf000267_0001
um[625] .carry (carry [625]
Figure imgf000267_0002
(c[625] xtfa i626 .sum (sum [626] carry (carry [626] (c[626] )); xtfa i627 .sum (sum [627] carry (carry [627] (c[627] )); xtfa 1628 . sum (sum [6283 carry (carry [628] (c[628] )); xtfa i629 .sum (sum[629] carry (carry [629] (c[629] )); xtfa i630 .sum (sum[630] carry ( carry [630] (c[630] )); xtfa i631 . sum (sum[631] carry (carry [631] (c[631] )); xtfa i632 .sum (sum[632] carry ( carry [632] (c[632] )); xtfa i633 . sum (sum[6333 carry (carry [633] (c[633] )); xtfa 1634 .sum (sum [634] carry ( carry [634] (c[634] )); xtfa 1635 .sum (sum [635] . carry (carry [635] (c[635] )); xtfa i636 .sum (sum[636] . carry (carry [636] (c[636] )); xtfa i637 .sum (sum[637] . carry (carry [637] (c[637] )); xtfa 1638 . sum (sum[638] . carry (carry [638] (c[638] )); xtfa i639 .sum (sum[639] . carry (carry [639] (c[639] )); xtfa i640 .sum (sum [640] . carry (carr [640] (c[640] )); xtfa 1641 . sum (sum[641] . carry (carry [641] (c[641] )); xtfa i642 .sum (sum[642] . carry (carry [642] (c[642] )); xtfa i643 .sum (sum [643] . carry (carry [643] (c[643] )); xtfa 1644 .sum (sum [644] . carry (carry [644] (c[644] )); xtfa i645 . sum (sum [645] carry (carry [645] (c[645] )); xtfa i646 sum (sum[646] carry (carry [646] (c[646] )); xtfa i647 .sum (sum [647] carry (carry [647] (c[647] )); xtfa i648 sum (su [648] carry ( carry [648] (c[648] )); xtfa i649 sum (sum [649] carry (carry [649] (c[649] xtfa 1650 sum (sum [650] carry (carry [650] (c[650] )); xtfa 1651 sum (sum [651] carry (carry [651] (c[651] )); xtfa 1652 sum (sum [652] carry ( carry [ 652 ] (c[652] )); xtfa i653 sum (sum[653] carry (carry [653] (c[653] )); xtfa i654 sum (sum[654] carry (carry [654] (c[654] )); xtfa i655 sum (sum [655] carry (carry [655]
Figure imgf000268_0001
(c[655] )); xtfa i656 (=[656])); xtfa i657 (■=[657])); xtfa i658 (c[658])); xtfa i659 (c[659])); xtfa 1660 (c[660])) ; xtfa i661 (c[661])); xtfa i662 (c[662])) ; xtfa i663 (■=[663])); xtfa i664 (=[664])); xtfa i665 (=[665])); xtfa i666 (=[666])); xtfa i667 (c[667])); xtfa i668 (c[668] )) ; xtfa i669 (=[669])); xtfa i670 (c[670])); xtfa i671 (c[671])) ; xtfa i672 (•=[672])); xtfa i673 (c[673])); xtfa i674 (c[674] )) ; xtfa i675 (c[675])) ; xtfa i676 (c[676])); xtfa i677 (•=[677])); xtfa i678 (■=[678])); xtfa i679 (•=[679])); xtfa- i680 (c[680])); xtfa i681 (c[681])); xtfa i6δ2 (c[682])); xtfa i683 (•=[683])); xtfa i6δ4 (c[684])); xtfa i6δ5
Figure imgf000269_0001
Figure imgf000269_0002
(c[685])); xtfa i686 ( . sum (sum [686] . carry (carry [686] c (c[686])); xtfa i6δ7 ( .sum (sum [687] .carry (carry [687] c (c[687])) ; xtfa i688 ( . sum (sum [688] . carry ( carry [688] c (C[688])) ; xtfa i689( . sum (sum [689] .carry (carry [689] c (c[689])); xtfa i690 ( . sum(sum[690] . carry (carry [690] c (c[690])); xtfa 1691 ( . sum (sum [691] . carry (carry [691] c (c[691])); xtfa i692 ( .sum (sum [692] . carry (carry [692] c (c[692])) ; xtfa i693 ( . sum (sum [693] . carry ( carr [ 693 ] c (c[693])) ; xtfa i694 ( . sum (sum [694] . carry (carry [694] c (c[694])) ; xtfa 1695 ( .sum(sum[695] . carry ( carry [695] c (c[695])) ; xtfa i696 ( . sum (sum [696] .carry (carry [696] c [■=[696])); xtfa i697 ( .sum (sum [697] .carry (carry [697] c [c[697])) ; xtfa i698 ( . sum (sum [698] .carry (carry [698] c (c[698])); xtfa i699 ( . sum (sum [699] . carry (carry [699] c (c[699])) ; xtfa i700 ( .sum(sum[700] .carry (carry [700] c (c[700])) ; xtfa i701 ( . sum (sum [701] .carry (carry [701] c 'c[701])) ; xtfa i702 ( .sum (sum [702] . carry (carry [702 ] c .C[702])); xtfa i703 ( .sum (sum [703] . carry (carry [703] c 'c[703])) ;
Xtfa i704 ( .sum (sum [704] . carry (carry [704] c C[704])) ; xtfa 1705 ( .sum (sum [705] .carry (carry [705] c C[705])) ; xtfa i706 ( .sum (sum [706] . carry ( carry [706] c = [706])) ; xtfa i707 ( .sum (sum [707] . carry (carry [707] c c[707])) ; xtfa 1708 ( .sum (sum [708] .carry (carry [708] c C[708])); xtfa i709 ( .sum (sum [709] . carry (carry [709]
C 1 C[709])); xtfa i710 ( .sum (sum [710] . carry (carry [710]
C 1 •=[710])); xtfa i711 ( .sum (sum [711] . carry (carry [711]
C I •=[711])); xtfa i712 ( .sum (sum [712] . carry (carry [712 ]
C 1 ■=[712])); xtfa i713 ( .sum (sum [713] . carry (carry [713]
C 1 c[713])); xtfa i714 ( .sum(sum[714] .carry (carry [714]
C 1 C[714])); xtfa i715 ( .sum (sum [715] .carry (carry [715]
Figure imgf000270_0001
C 1 C[715])); . sum (sum [716] . carr (carr [716 J .sum (sum [717] . carry (carry [717] . sum (su [718] . carry (carry [718] .sum (sum [719] .carry (carry [719] .sum (sum [720] . carry ( carry [720] .sum (sum [721] . carry (carry [721] .sum (sum [722] . carry (carry [722] .sum (su [723] . carry (carry [723] .sum (sum [724] .carry (carry [724] .sum (sum [725] . carry (carry [725] .sum (sum [726] . carry (carry [726] .sum (sum [727] .carry (carry [727] .sum (sum [728] . carry (carry [728] .sum (sum [729] . carry (carry [729] .sum (sum [730] . carry ( carr [730] . sum (sum [731] . carry (carry [731] .sum (sum [732] . carry (carry [732] .sum (sum [733] .carry (carry [733] .sum (sum [734] . carry (carry [734] .sum (sum [735] . carry (carry [735] .sum (sum [736] . carry (carry [736] .sum (sum [737] . carry (carry [737] . sum (sum [738] . carry (carry [738] .sum (sum [739] . carry (carry [739] .sum (sum [740] . carry (carry [740] .sum (sum [741] . carry (carry [741] sum (sum [742] . carry (carry [742] sum (sum [743] . carry (carry [743] sum (sum [744] . carry (carry [744] sum (sum [745] .carry (carry [745]
Figure imgf000271_0002
Figure imgf000271_0001
xtfa i746 (c[746])); xtfa i747 (c[747])) ;
Xtfa i748 (=[748])); xtfa i749 (•=[749])); xtfa i750 (=[750])); xtfa i751 (•=[751])); xtfa i752 (c[752])); xtfa i753 (C[753])) ; xtfa i754 (c[754])); xtfa i755 (c[755]));
Xtfa 1756 (■=[756])); xtfa i757 (=[757])); tfa i758 (c[758])) ; xtfa i759 (c[759])) ; tfa i760 (c[760])) ; xtfa i761 (=[761])) ; xtfa 1762 (C[762])) ;
Xtfa i763 (C[763])) ; xtfa i764 (c[764])) ; tfa i765 (c[765])) ; xtfa i766 (C[766])) ; xtfa i767 (■=[767])); xtfa i768 (c[768])) ; xtfa i769 (c[769])) ; xtfa i770 (=[770])); xtfa i771 (C[771])) ; xtfa i772 (C[772])) ; xtfa i773 (C[773])) ; xtfa i774 (c[774])) ;
Xtfa i775
Figure imgf000272_0001
Figure imgf000272_0002
(c[775])); xtfa sum[776] . carry (carry [776] (c[776] xtfa sum [777] . carry (carry [777] (c[777] xtfa sum[778] . carry (carry [778] (c[778] xtfa sum[779] . carry (carry [779] (c[779] xtfa sum [780] . carry (carry [780] (c[780] xtfa su [781] . carry (carry [781] (c[78l3 xtfa sum[782] . carry ( carry [782] (c[782] xtfa sum[783] . carry (carry [783] (c[783] xtfa sum[784] . carry (carry [784] (c[7δ4] xtfa sum[785] .carry (carry [785] (c[785] xtfa sum[7δ6] . carry (carry [786] (c[786] xtfa sum[787] . carry (carry [787] (c[7873 xtfa sum[78δ] .carry (carry [78δ] (c[78δ] xtfa sum [789] . carry (carry [769] (c[789] xtfa su [790] . carry ( carry [790] (c[790] xtfa sum[791] . carry (carry [791] (c[791] xtfa u [792] . carry (carry [792] (c[792] xtfa um [793] . carry (carry [793 ] (c[793] xtfa sum [794] . carry (carry [794] (=[794] xtfa um[795] . carry ( carry [795] (c[795] xtfa um [796] . carry (carry [796] (c[7963 xtf um [797] . carry (carry [797] (c[797] xtfa um[798] .carry (carry [798] (c[798] xtfa um[799] . carry (carry [799] (c[799] xtf um[δ00] . carry (carry [800] (c[800] xtfa um[δ01] . carry (carry [801] (c[80l] xtfa um[802] . carry ( carry [802] (c[802] xtfa um[803] . carry (carry [803 ] (c[803] xtfa um [804] . carry (carry [804] (=[804] xtfa
Figure imgf000273_0001
um[805] . carry (carry [805]
Figure imgf000273_0002
(c[805] )); xtfa i806 .sum (sum[806] . carry (carry [806] :(c[806])); xtfa iδ07 .sum (sum [807] .carry (carry [807] :(c[807])); xtfa i808 .sum (sum[808] .carry (carry [808] :(c[808])) ; xtfa i809 .sum (sum[8093 .carry (carry [809] :(c[809])) ; xtfa 1810 .sum (sum [810] . carry (carry [810] ;(c[810])); xtfa iδll , sum (sum [811] . carry (carry [811] J(c[811])); xtfa i812 .sum (sum[812] .carry (carry [812] :(c[812])) ; xtfa i813 . sum (sum [813] . carry (carry [813] :(C[813])) ; xtfa 1814 . sum (sum [814] .carry (carry [814] :(c[8143)) ; xtfa i815 . sum (sum[815] . carry (carry [815] :(c[815])) ; xtfa i816 .sum (sum [816] . carry (carr [816 :(c[816])) ; xtfa i817 . sum (sum [817] . carry (carry [817] :(c[817])) ; xtfa i818 . sum (sum[818] . carry (carry [816] :(c[818])) ; xtfa i819 , sum (sum[819] . carry (carry [819] :(c[819])) ; xtfa i820 . sum (sum[820] .carry (carr [820] :(c[820])) ; xtfa i821 (sum [821] .carry (carry [821] :(c[821] )); xtfa i822 .sum (sum [822] . carry (carry [822] :(c[822] )) ; xtfa i823 sum (sum [823] . carry (carry [823] :(c[823])); xtfa i824 sum (sum [824] .carry (carry [824] :(c[824])) ; xtfa i825 sum (sum[825] . carry (carry [825] :(c[825])) ; xtfa i826 sum (sum[826] . carry (carry [826] :(c[826])) ; xtfa 1827 sum (sum[827] . carry (carry [827] :(c[827])) ; xtfa i828 sum (sum[828] . carry (carry [828] *(c[828])); xtfa i829 sum (sum[829] . carry (carry [829] ι(c[829])); xtfa iδ30 sum (sum[830] . carry (carry [830] !(c[830])); xtfa i831 .sum (sum [831] . carry (carry [831] *(c[831])); xtfa i832 sum (sum [832] . carry (carry [832 ] :(c[832])); xtfa i833 sum(sum[833] . carry (carry [833] :(c[833])); xtfa i834 sum (sum [834] . carry (carry [834] ;(c[834])) ; xtfa i835 sum (sum[835] . carry (carry [835]
Figure imgf000274_0001
(c[835])); xtfa i836 .b(b[636] ) , (c[836] )); xtfa iδ37 .b(b[837] ) , (c[837] )); xtfa i838 .b(b[83δ]) , (c[838] )); xtfa i839 .b(b[839]) , (c[839] )); xtfa i840 .b(b[840] ) , (c[840] )); xtfa 1841 .b(b[841]) , (c[841] )); xtfa iδ42 .b(b[842] ) , (c[842] )); xtfa i843 .b(b[843]) , (c[843] )); xtfa iβ44 .b(b[844] ) , (c[844] )); xtfa i845 -b(b[845] ) , (c[845] )); xtfa iδ46 .b(b[846] ) , (c[δ46] )); xtfa i847 .b(b[847] ) , (c[847] )); xtfa i848 .b(b[848] ) , (c[848] )); xtfa i849 -b(b[849] ) , (c[849] )); xtfa i850 .b(b[850]) , (c[850] )); xtfa 1851 .b(b[851]) , (c[δ51] )); xtfa iδ52 .b(b[852] ) , (c[δ52] )); xtfa i853 .b(b[853]) , (c[853] )); xtfa i854 .b(b[854] ) , (c[854] )); xtfa i855 .b(b[855] ) , (c[855] )); xtfa i856 .b(b[856] ) , (c[δ56] )); xtfa i857 .b(b[857]) , (c[857] xtfa i858 .b(b[858]) , (c[858] )); xtfa i859 .b(b[859]) , (c[δ59] )); xtfa i860 .b(b[860] ) , (c[860] )); xtfa iδ61 .b(b[861]) , (c[861] )); xtfa iδ62 .b(b[862] ) , (c[862] )); xtfa i863 .b(b[863] ) , (c[863] )); xtfa iδ64 .b(b[864]) , (c[864] )); xtfa iδ65
Figure imgf000275_0001
Figure imgf000275_0002
.b(b[865]) , (c[865] )); sum[866] . carry (carry [866] sum[867] .carry (carry [867] sum [868] . carry (carry [868] sum[869] . carry (carry [869] sum[870] .carry (carry [870] sum[871] . carry (carry [871] sum[δ72] . carry (carry [872] sum[δ73] . carry (carry [873 ] sum [874] . carry (carry [874] sum[875] . carry (carry [875] sum[876] .carry (carry [876] sum [877] .carry (carry [877] sum[878] .carry (carry [878] sum [879] .carry (carry [879] um[880] .carry (carry [880] um[8δl] . carry (carry [881] um [882] .carry (carry [882] um[8δ3] . carry (carry [8δ3] um [884] .carry (carry [884] um[885] . carry (carry [885] um [886] . carry (carry [8δ6] um[8δ7] . carry (carry [887] um [6 δδ ] .carry (carry [888] um[δ89] .carry (carry [889] um[890] .carry (carry [890] um [891] .carry (carry [891] um[892] . carry (carry [892] um[893] . carry (carry [893] um [894] .carry (carry [894] um [895] .carry (carry [895]
Figure imgf000276_0002
Figure imgf000276_0001
xtfa iδ96 .sum (sum [896] . carry (carry [δ96] '(=[896])) ; xtfa iδ97 .sum (sum[897] . carry (carry [δ97] (C[δ97])) ; xtfa iδ98 .sum (sum[896] . carry (carry [89δ] (c[89δ])) ; xtfa iδ99 .sum (sum[899] . carry (carry [δ99] (c[899])) ; xtfa i900 . sum (sum[900] .carry (carry [900] (c[900])) ; xtfa i901 .sum (sum [901] . carry (carry [901] (c[901])) ; xtfa i902 . sum (sum[902] . carry (carry [902] (c[902])) ; xtfa i903 .sum (sum[903] .carry (carry [903] {=[903])); xtfa i904 . sum (sum[904] . carry (carry [904] (c[904])) ; xtfa i905 . sum(sum[905] . carry (carry [905] (c[905])); xtfa i906 . sum (sum[906] . carry (carry [906] (c[906])) ; xtfa i907 . sum (sum[907] . carry (carry [907] (c[907])) ; xtfa i908 .sum (sum [908] . carry (carry [90δ]
-C (c[908])); xtfa i909 . sum (sum[909] . carry (carry [909] (=[909])) ; xtfa 1910 .sum (sum [910] .carry (carry [910] (=[910])) ; xtfa 1911 . sum(sum[911] . carry (carry [911] (c[911])); xtfa i912 .sum (sum [912] . carry (carry [912] (c[912])) ; xtfa i913 . sum (sum[913] .carry (carry [913] (=[913])); xtfa 1914 . sum (sum [914] .carry (carry [914] (=[914])); xtfa 1915 . sum (sum [915] .carry (carry [915] (=[915])); xtfa i916 . sum (sum[916] . carry (carry [916] (c[916])) ; xtfa 1917 . sum (sum [917] . carry (carry [917] (c[917])) ; xtfa 1918 .sum (sum[918] . carry (carry [91δ] (=[918])); xtfa 1919 . sum (sum [919] . carry (carry [919] (■=[919])) ; xtfa i920 . sum(sum [920] . carry (carry [920] (c[920])); xtfa i921 . sum (sum[921] . carry (carry [921] (c[921])) ; xtfa i922 . sum (sum [922] . carry (carry [922] (C[922])) ; xtfa i923 .sum(sum [923] . carry (carry [923] (=[923])); xtfa i924 . sum (sum [924] .carry (carry [924] (=[924])); xtfa 1925 . sum (sum [925] . carry (carry [925]
Figure imgf000277_0001
(=[925])); xtfa i926 .sum (sum [926] . carry (carry [926] (c[926])) ; xtfa i927 .sum (sum [927] . carr (carr [927] (•=[927])); xtfa 1928 . sum (sum[928] . carry (carry [928] (c[928])) ; xtfa 1929 .sum (sum [929] . carry (carry [929] (c[929])) ; xtfa i930 . sum (sum[930] . carry (carry [930] (C[930])); xtfa i931 . sum (sum [931] . carry (carry [931] (c[931])); xtfa 1932 . sum (sum [932] .carry (carry [932] (=[932])) ; xtfa 1933 . sum (sum[933] . carry (carry [933] (C[933])); xtfa 1934 . sum (sum [934] .carry (carry [934] (c[934])) ; xtfa i935 . sum (sum[935] . carry (carry [935] (■=[935])); xtfa i936 . sum (sum[936] .carry (carry [936] (c[936])); xtfa 1937 . sum (sum[937] . carry (carry [937] (c[937])) ; xtfa i938 . sum (sum[93δ] .carry (carry [938] (c[938])); xtfa i939 .sum (sum[939] . carry (carry [939] (c[939])) ; xtfa i940 . sum (sum[940] .carry (carry [940] (■=[940])); xtfa i941 . sum (sum[941] . carry (carry [941] (c[941])) ; xtfa i942 .sum (sum [942] . carry (carry [942] (c[942])); xtfa i943 .sum (sum [943] . carry (carry [943] (c[943])) ; xtfa i944 . sum (sum[944] .carry (carry [944] (c[944])) ; xtfa i945 . sum (sum[945] . carry (carry [945] (c[945])); xtfa i946 .sum (sum [946] . carry (carry [946] (c[946])); xtfa i947 .sum (sum [947] . carry (carry [947] (c[947])); xtfa i94δ .sum (sum[94δ] .carry (carry [948] (c[948])); xtfa i949 .sum (sum[949] . carr (carry [949] (c[949])); xtfa i950 .sum (sum[950] .carry (carry [950] (•=[950])); xtfa i951 .sum (sum [951] . carry (carry [951] (c[951])); xtfa i952 .sum (sum[952] .carry (carry [952] (c[952])); xtfa i953 . sum (sum[953] .carry(carry [953] (c[953])); xtfa i954 . sum (sum[954] .carr (carr [954] (•=[954])); xtfa i955 . sum (sum[955] . carry (carry [955]
Figure imgf000278_0001
(c[955])); xtfa i956 .sum (su [956] .carry (carry [956 (■=[956])); xtfa i957 .sum (sum [957] . carry (carry [957 (=[957])) ; xtfa i958 . sum (sum [958] -carry (carr [958 (■=[958])); xtfa i959 .sum (sum [959] . carry (carry [959 (c[959])) ; xtfa i960 . sum (su [960] . carry (carry [960 (c[960])) ; xtfa i961 .sum(sum[961] . carry (carry [961] (c[961])) ; xtfa i962 . sum (sum [962] . carry (carry [962] (c[962])) ; xtfa i963 . sum (sum [963] . carry (carry [963 ] (c[963])) ; xtfa i964 . sum (su [964] . carry (carry [964] (c[964])) ; xtfa i965 . sum (sum [965] . carry (carry [965 ] (c[965])); xtfa i966 . sum(sum[966] . carry (carry [966] (c[966])) ; xtfa i967 . sum (su [967] . carry (carry [967] (c[967])) ; xtfa i96δ .sum (sum [968] . carry (carry [968] (C[968])) ; xtfa i969 . sum (sum [969] . carry (carr [969] (c[969])) ; xtfa i970 . sum(sum[970] -carry (carry [970] (c[970])) ; xtfa i97l . sum (su [971] -carry (carry [971] (c[971])) ; xtfa i972 . sum (sum [972] -carry (carry [972] (=[972])) ; xtfa i973 . sum (sum [973] . carry (carry [973 ] (c[973])); xtfa i974 .sum (sum [974] -carry (carry [974] (=[974])) ; xtfa i975 .sum (sum [975] . carry (carry [975] (c[975])) ; xtfa i976 . sum (sum [976] . carry (carry [976] (c[976])) ; xtfa i977 .sum (sum [977] . carry (carry [977] (c[977])>; xtfa 1978 .sum(sum[978] . carry (carry [978] (c[978])); xtfa i979 .sum (sum [979] . carry (carry [979] (C[979])) ; xtfa i980 .sum (sum [980] . carry ( carry [980] (=[980])) ; xtfa i981 .sum (sum [981] . carry (carry [981] (c[981])); xtfa i982 .sum (sum [982] . carry (carry [9δ2] (•=[982])); xtfa i983 .sum (su [983] . carry (carry [983] (c[983])); xtfa i9δ4 . sum (sum [984] . carry (carry [984] (c[984])); xtfa i9δ5 . sum (sum [985] . carry (carry [985]
Figure imgf000279_0001
(c[985])); xtfa i986(. sum (sum [986] . carry (carry [9δ6] .a(a[986] .b(b[9δ6] [986])); xtfa i9δ7 ( . sum (sum [987] .carry (carry [987] .a(a[987] .b(b[987] [987])) ; xtfa i988 ( . sum (sum [988] . carry (carry [988] .a(a[988] .b(b[9δδ] [988])) ; xtfa i989 ( . sum (sum [989] . carry (carry [989] .a(a[989] .b(b[989] [989])) ; xtfa i990 (. sum (sum [990] . carry (carry [990] .a(a[990] .b(b[990] [990])) ; xtfa i991 (. sum (sum [991] . carry (carry [991] .a(a[991] .b(b[991] [991])); xtfa 1992 ( . sum (sum [992] carry (carry [992] .a(a[992] .b(b[992] [992])); xtfa i993 ( . sum (sum [993] carry (carry [993] .a(a[993] .b(b[993] [993])) ; xtfa 1994 (. sum (sum [994] carry (carry [994] .a(a[994] .b(b[994] [994])); xtfa i995 (. sum (sum [995] carry (carry [995] .a(a[995] .b(b[995] [995])); xtfa 1996 (. sum (sum [996] carry (carry [996] .a(a[996] .b(b[996] [996])) ; xtfa i997 (. sum (sum [997] carry (carry [997] .a(a[997] .b(b[997] [997])); xtfa 1998 (. sum (sum [998] carry (carry [998] .a(a[998] .b(b[99δ] [998])) ; xtfa i999(. sum (sum [999] carry (carry [999] .a(a[999] .b(b[999] [999])); xtfa ilOOO ( su (sum [1000 . carr (carry [1000] [1000]), .c c[1000])) ; xtfa il001( sum (sum [1001 .carry (carry [1001] [1001]), .c cflOOl])) ; xtfa 11002 ( sum (sum [1002 .carry (carry [1002] [1002]), .c c[1002])) ; xtfa 11003 ( sum (sum [1003 . carry (carry [10033 [10033), .c c[1003])) ; xtfa il004 ( su (sum [1004 .carry (carry [1004] [10043), .c c[1004])) ; xtfa il005 ( sum (sum [1005 .carry (carry [1005] [1005]), .c c[1005])) ; xtfa il006( sum (sum [1006 . carry (carry [1006] [1006]), .c c[1006])) ; xtfa il007 ( sum (sum [1007 . carry (carry [1007] [1007]), .c c[1007])) ; xtfa ilOOδ ( sum (sum [1008 .carry (carry [1008] [100δ]), .c c[1008])) ; xtfa il009 ( sum (sum [1009 . carry (carry [1009] [1009]), .c c[1009])) ; xtfa ilOlO ( sum (sum [1010 .carry (carry [1010] [1010]), .c c[1010])) ; xtfa 1101K sum (sum [1011 . carry (carry [1011] [10H3), .c ■=[1011])); xtfa il012 ( sum (sum [1012 . carry (carry [1012] [1012]), .c c[1012])) ; xtfa il013 ( sum (sum [1013 .carry (carry [1013] [1013]), .c c[1013])) ; xtfa il014 ( sum (sum [1014 . carry (carry [1014] [1014]), .c c[1014] ) ) ; xtfa il015 ( sum (sum [1015 . carry (carry [1015]
Figure imgf000280_0001
[1015]), .c c[1015] )) ; xtfa 11016 ( . sum ( sum [ 1016] ) .carry (carry [1016] ) .a(a[1016] ) .b(b[1016]), .c (c[1016]) ); xtfa il017 ( . sum ( sum [ 1017] ) .carry (carry [1017] ) .a(a[1017] ) .b(b[1017]), .c (c[1017]) ) ; xtfa il018( . sum ( sum [ lOlδ] ) .carry (carry [lOlδ] ) .a(a[101δ] ) .b(b[1018]), .c (c[10183) ) ; xtfa il019( . sum ( sum [ 1019] ) .carry (carry [1019] ) .a(a[1019]) .b(b[1019]), .c (c[10193) ) ; xtfa il020( . sum ( sum [ 1020] ) .carry (carry [1020] ) .a(a[1020] ) .b(b[1020]), .c (c [1020] ) ) ; xtfa il021( . sum ( sum [ 1021] ) .carry (carry [1021] ) .a(a[1021] ) .b(b[1021J) , .c (■=[1021]) ) ; xtfa il022 ( . sum ( sum [ 1022] ) .carry (carry [1022] ) .a(a[1022] ) .b(b[1022]), .c (c[1022] ) xtfa 11023 ( . sum ( sum [ 1023] ) .carry (carry [1023] ) .a(a[1023] ) .b(b[1023]), .c (c[1023]) ) ; endmodule
// Local Variables: *** // mode: verilog *** // End: ***
verysys/verify sem.v module xmTIΞ_gfJRegfile (rd0_data_Cl, rd0_addr_C0, rd0_width8_C0 , rd0_usel_C0 , rdl_data_Cl, rdl_addr_C0 , rdl_widthδ_C0 , rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_widthδ_C0, rd2_usel_C0, wd_addr_C0, wd_width8_C0 , wd_defl_C0, wd_def2_C0, wd__dataδ_Cl, wd_dataδ_C2, wd_wen_Cl, wd_wen_C2, Kill_Ξ, KillPipeJW, Stall_R, elk) ; output [7:0] rd0_data_Cl; input [3:0] rd0_addr_C0; input rd0_widthδ_C0 ; input rd0_usel_C0; output [7:0] rdl_data_Cl; input [3:0] rdl_addr_C0; input rdl_widthδ_C0 ; input rdl_usel_C0; output [7:0] rd2_data_Cl; input [3:0] rd2_addr_C0 ; input rd2_widthδ_C0; input rd2_usel_C0; input [3:0] wd_addr_C0; input wd_widthδ_C0 ,- input wd_defl_C0; input wd_def2_C0; input [7:0] wd_dataδ_Cl; input [7:0] wd_dataδ_C2; input wd_wen_Cl; input wd_wen_C2; input KillJΞ; input KillPipeJW; output StallJR; input elk;
*********************************************************************** READ PORT rdO
************** * ******************************************** ************ // compute the address mask wire rdO_addr_mask_CO = 1 ' do ;
// masked address pipeline wire rdO_maddr_CO = 1 ' dO ;
// bank-qualified use wire rd0_usel_bank0_C0 = (rd0_usel_C0 & (rdO_maddr__CO == (1 ' dO & rdO_addr_mask_CO ) ) ) ;
// alignment mux for use 1 wire [7 : 0 ] rd0_data JoankO_Cl ,- assign rdO_data_Cl [7 : 0] = rdO_data_bankO_Cl ,-
/** **** ******** ******************* ******************** ***** ************ * READ PORT rdl
***********************************************************************/ // compute the address mask wire rdl_addr_mask_CO = 1'dO;
// masked address pipeline wire rdl_maddr_CO = 1'dO;
// bank-qualified use wire rdl_usel_bankO_CO = (rdl_usel_C0 & (rdl_maddr_C0 == (1'dO & rdl_addr_mask_CO) ) ) ;
// alignment mux for use 1 wire [7:0] rdl_data_bankO_Cl; assign rdl_data Cl[7:0] = rdl data_bankO_Cl ;
/***********************************************************************
READ PORT rd2
***********************************************************************/ // compute the address mask wire rd2_addr_mask_C0 = 1 ' dO ;
// masked address pipeline wire rd2_maddr__C0 = 1 ' dO ;
// bank-qualified use wire rd2_usel_bankO_CO = (rd2_usel_C0 & (rd2_maddr_C0 == (1'dO & rd2_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7:0] rd2_data_bankO_Cl ; assign rd2_data_Cl [7 :0] = rd2_data_bank0_Cl ;
'*********************************************************************** WRITE PORT wd
*********************************************************************** // compute the address mask wire wd_addr_mask_C0 = 1'dO;
// bank-qualified write def for port wd wire wd_defl_bank0_C0 = (wd_defl_C0 & ( (wd_addr_C0 & wd_addr_mask_C0) == (1'dO & wd_addr_mask_C0) ) ) ; wire wd_def2_bank0_C0 = (wd_def2_C0 & ( (wd_addr_C0 & wd_addr_mask_C0) == (1'dO & wd_addr_mask_C0) ) ) ;
// write mux for def 1 wire [7:0] wd_wdata_C1 ; assign wd_wdata_Cl = {l{wd_dataδ_Cl [7 : 0] } } ,-
// write mux for def 2 wire [7:0] wd_wdata_C2; assign wd_wdata_C2 = { l {wd_dataδ_C2 [7 : 0] } } ;
wire Stall _R0 ; *********************************************************************** PIPELINED BANK
*********************************************************************** xmTIE_gfJRegfilejoank TIE_gfJRegfileJoankO (rd0_data_bank0_Cl, rd0_addr_C0 [3 :0] , rd0_usel_bank0_C0 , rdl_data_bankO_Cl, rdl_addr_C0 [3:0] , rdl__usel_bank0_C0, rd2_data_bankO_Cl, rd2_addr__C0 [3 :0] , rd2_usel_bank0_C0 , wd_addr_C0 [3 :0] , wd_deflJoank0_C0, wd_def2_bank0_C0, wd_wdata_Cl [7:0], wd_wdata_C2 [7 :0] , wd_wen_Cl, wd_wen_C2, KillJB, KillPipe_W, StallJR0, elk) ; assign StallJR = Stall_R0 | I'bO; endmodule
module xmTIE_gfJRegfilejoank (rdO_data_Cl, rd0_addr_C0, rd0_usel_C0, rdl_data_Cl, rdl__addr_C0 , rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_usel_C0, wd_addr_C0, wd_defl_C0, wd_def2_C0, wd_data_Cl, wd_data_C2, wd_wen_Cl, wd_wen_C2, KillJB, KillPipeJW, StallJR, elk); output [7:0] rd0_data_Cl; input [3 : 0] rd0_addr_C0 ; input rd0_usel__C0 ; output [7 : 0] rdl_data_Cl ; input [3 : 0] rdl_addr_C0 ; input rdl_usel_C0 ; output [7 : 0] rd2_data_Cl; input [3 : 0] rd2_addr_C0 ; input rd2_usel_C0 ; input [3 : 0] wd_addr_C0 ; input wd_defl_C0 ; input wd_def2_C0 ; input [7 : 0] wd_data_Cl ; input [7:0] wd_data_C2; input wd_wen_Cl; input wd_wen_C2 ; input KillJB; input KillPipeJW; output StallJR; input elk; wire rd0_use2_C0 = 1 ' dO wire rdl_use2_C0 = 1 ' dO wire rd2 use2 CO = 1'dO wire kill_C0 KillPipeJW; wire kill_Cl KillPipeJW Kill E; wire kill_C2 KillPipeJW; wire kill C3 KillPipeJW;
// write definition pipeline wire wd_ns_def1_C0 = wd_defl_C0 & l'bl & ~kill_C0; wire wd_defl_Cl; xtdelayl #(1) iwd_def1_C1 (wd_def1_C1, wd_ns_def1_C0, elk); wire wd_ns_def2_C0 = wd_def2_C0 & l'bl & ~kill_C0; wire wd_def2_Cl; xtdelayl #(1) iwd_def2_Cl (wd_def2_Cl, wd_ns_def2_C0 , elk); wire wd_ns_def2_Cl = wd_def2_Cl & wd_wen_Cl & ~kill_Cl; wire wd_def2_C2; xtdelayl #(1) iwd_def2_C2 (wd_def2_C2 , wd_ns_def2_Cl, elk) ;
// write enable pipeline wire wd_we_C2 ,- wire wd_we_C3 ,- wire wd_ns_we_Cl = (1'dO | (wd_defl_Cl & wd_wen_Cl) ) & -kill_Cl; wire wd_ns_we_C2 = (wd_we_C2 | (wd_def2_C2 & wd_wen_C2) ) & ~kill_C2; wire wd_ns_we_C3 = (wd__we_C3 j (1'dO & 1'dO)) & -kill_C3; xtdelayl #(1) iwd_we_C2 (wd_we_C2 , wd_ns_we_Cl, elk) ; xtdelayl #(1) iwd__we_C3 (wd_we_C3 , wd_ns_we_C2, elk) ;
// write address pipeline wire [3:0] wd_addr_Cl; wire [3:0] wd_addr_C2, wire [3:0] wd_addr_C3 , xtdelayl #(4) iwd_addr__Cl (wd_addr_Cl, wd_addr_C0, elk) xtdelayl #(4) iwd_addr_C2 (wd_addr_C2 , wd_addr_Cl, elk) xtdelayl #(4) iwd_addr_C3 (wd_addr_C3 , wd_addr_C2, elk)
// write data pipeline wire [7.-0) wd_result_C2 ; wire [7:0] wd_result_C3 ; wire [7:0] wd_mux_Cl = wd_data_Cl; wire [7:0] wd_mux_C2 = wd_def2_C2 ? wd_data_C2 : wd_result_C2 ,- xtdelayl #(8) iwd_result_C2 (wd_result_C2 , wd_mux_Cl, elk) ; xtdelayl #(δ) iwd_result_C3 (wd_result_C3 , wd_mux_C2, elk) ; wire [7:0] rd0_data_C0 wire [7:0] rdl_data_C0 wire [7:0] rd2 data CO xtdelayl #(δ) ird0_data_Cl (rdO_data_Cl, rd0_data_C0, elk) ; xtdelayl #(8) irdl_data_Cl (rdl_data_Cl, rdl_data_C0, elk); xtdelayl #(8) ird2_data_Cl (rd2_data_Cl, rd2_data_C0, elk); assign StallJR =
( (wd_addr_Cl == rdO_addr_CO) & (
(rdO_usel_CO & (wd_ns_def2_Cl) ) ) ) | ( (wd_addr_Cl == rdl_addr_CO) & (
(rdl_usel_CO & (wd_ns_def2_C1) ) ) ) | ( (wd_addr_Cl == rd2_addr_C0) & (
(rd2_usel_C0 & (wd_ns_def2_C1) ) ) ) | I'bO;
// verification register file replacement wire [7:0] xwd_verify; xtenflop #(8) wd_verify (xwd_verify, wd_result_C3 , wd_ns_we_C3, elk) xtflop #(8) rdO_verify (rd0_data_C0, xwd_verify, elk) xtflop #(8) rdl_verify (rdl_data_C0, xwd_verify, elk) xtflop #(8) rd2_verify (rd2_data_C0, xwd_verify, elk) endmodule
module xmTIE_gfmod_State (ps_data_Cl, ps_width8_C0 , ps_usel_C0, ns_width8_C0, ns_defl_C0, ns_dataδ_Cl, ns_wen_Cl, Kill_E, KillPipeJW, StallJR, elk) ; output [7:0] ps_data_Cl; input ps_widthδ_C0; input ps_usel_C0 ,- input ns_widthδ_C0; input ns_defl_C0; input [7:0] ns_data8_Cl; input ns_wen_Cl; input KillJB; input KillPipeJW; output StallJR; input elk; wire ps__addr_C0 = 1'dO; wire ns_addr_C0 = 1 ' do ; wire ns_wen_C2 = 1 ' dl;
/*********************************************************************** READ PORT ps
***********************************************************************/ // compute the address mask wire ps__addr_mask_C0 = 1'dO;
// masked address pipeline wire ps_maddr_C0 = 1 ' do ;
// bank-qualified use wire ps_usel_bankO_CO = (ps_usel_C0 & (ps_maddr_C0 == (1'dO & ps_addr_mask_Cθ) ) ) ,-
// alignment mux for use 1 wire [7:0] ps_data_bankO_Cl; assign ps_data_Cl [7 :0] = ps_data_bankO_Cl; /************************** **************************************** ***** WRITE PORT ns
************** ************* ******************************** ************ // compute the address mask wire ns_addr_mask_CO = 1 ' dO ,-
// bank-qualified write def for port ns wire ns_def l_bank0_C0 = (ns_defl_C0 & ( (ns_addr_C0 & ns_addr_mask_C0 ) == ( 1 ' dO & ns_addr_mask_C0 ) ) ) ,-
// write mux for def 1 wire [7:0] ns_wdata_Cl ; assign ns_wdata_Cl = {l{ns_dataδ_Cl [7 : 0] } } ;
wire StallJRO; *********************************************************************** PIPELINED BANK
***********************************************************************/ xmTIE_gfmod_State_bank TIE_gfmod_State_bankO (ps_data_bankO_Cl , ps_usel_bank0_C0, ns_defl_bankO_CO, ns_wdata_Cl [7:0] , ns_wen_Cl, ns_wen_C2, KillJΞ, KillPipeJW, StallJRO, elk); assign StallJR = StallJRO | I'bO; endmodule
module xmTIE_gfmod__State_bank(ps_data_Cl, ps_usel_C0, ns_defl_C0, ns_data_Cl, ns_wen_Cl, ns_wen_C2, KillJE, KillPipeJW, StallJR, elk); output [7:0] ps_data_Cl; input ps_usel_C0; input ns_defl_C0; input [7:0] ns_data_Cl; input ns_wen_Cl; input ns_wen_C2 ; input KillJB; input KillPipeJW; output StallJR; input elk; wire ps_addr_C0 = 1'dO; wire ps_use2_C0 = 1'dO; wire ns_addr_C0 = 1'dO; wire ns_def2_C0 = 1'dO; wire [7:0] ns_data_C2 = 0; wire kill_C0 = KillPipeJW; wire kill_Cl = KillPipeJW | KillJB; wire kill_C2 = KillPipeJW; wire kill_C3 = KillPipeJW;
// write definition pipeline wire ns_ns_def1_C0 = ns_defl_C0 & l'bl & ~kill_C0; wire ns defl Cl; xtdelayl #(1) ins_def1_C1 (ns_def1_C1, ns_ns_def1_C0 , elk) ; wire ns_ns_def2_C0 = 1'dO; wire ns_def2_Cl = 1'dO; wire ns_ns_def2_C1 = 1 ' do ,- wire ns_def2_C2 = 1 ' do ;
// write enable pipeline wire ns_we_C2 ,- wire ns_we_C3 ; wire ns_ns_we_Cl = (1'dO | (ns_defl_Cl & ns_wen_Cl) ) & ~kill_Cl; wire ns_ns_we_C2 = (ns_we_C2 | (ns_def2_C2 &■ ns_wen_C2) ) & ~kill_C2; wire ns_ns_we_C3 = (ns_we_C3 j (1'dO & 1'dO)) & -kill_C3; xtdelayl #(1) ins_we_C2 (ns_we_C2 , ns_ns_we_Cl, elk) ,- xtdelayl #(1) ins_we_C3 (ns_we_C3 , ns_ns_we_C2, elk) ,-
// write address pipeline wire ns_addr_Cl wire ns_addr_C2 wire ns_addr_C3 assign ns_addr_Cl = 1'dO assign ns_addr_C2 = 1'dO assign ns_addr_C3 = 1'dO
// write data pipeline wire [7 : 0] ns_result_C2 ; wire [7 : 0] ns_result_C3 ; wire [7 : 0] ns_mux_Cl = ns_data_Cl; wire [7:0] ns_mux_C2 = ns_def2_C2 ? ns_data_C2 : ns_result_C2 ; xtdelayl #(8) ins_result_C2 (ns_result_C2 , ns_mux_Cl, elk) ; xtdelayl #(8) ins_result_C3 (ns_result_C3, ns_mux_C2, elk) ,- wire [7:0] ps_data_C0; xtdelayl #(8) ips_data_Cl (ps_data_Cl, ps_data_C0, elk) ; assign StallJR =
( (ns_addr_Cl == ps__addr_C0) & (
(ps_usel_C0 & (ns_ns_def2_Cl) ) ) ) | 1 ' bO ;
// verification register file replacement wire [7:0] xns_verify; xtenflop #(8) ns_verify (xns_verify, ns_result_C3 , ns_ns_we_C3, elk) ; xtflop #(8) ps_verify (ps_data_C0, xns_verify, elk) ; endmodule
module xmTIE_decoder (
GFADD8,
GPADD8I,
GFMULXδ ,
GFRWMODδ,
LGF8_I ,
SGF8_I,
LGF8_IU,
SGFδ_IU,
LGFδ_X,
SGFδ_X,
LGFδ_XU,
SGFδ XU, O IΛ
O o t/1
H U α.
O oo
o O
Figure imgf000288_0001
0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0o 0oo0 0o 0o 0o 0o 0oo0 0oo0 0oo0 0o 0o 0o 0o 0oo0o0 0o 0o PooPoP oPo0 oP oPoPoP o0 o0 oPo0 o0o0 o0o0 o0 o0 o0 o0o0o0 o0 rt a rC rr rf rf rf rf rr rr rt iT iT ff i+ rf rf rt rr it rr rt it rC rr rt rt rr rr rf rr rt rf rf rf it rr *ϋ *τJ 'τJ *d τi τi τi τi τi τi τ) τi τi τi τi τi τi 'd 'd τ) τi τi τi τi τi τi τi 'd τi ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø ø p ø rt rt rt r rt rt rt rt rt rt rt rr r rt rt r rt rt rt rt rt rt r rt rt rt rt rt rt rt rt rt rt r rt r rt rt r rt
CQ in CQ CQ *-,CQ CQ ^lβ CQ ,-,CQ CQ r-, py to ι ) te CQ CQ CQ CQ CQ ca tr tr tr tr tr tr tr tr tr tr sa s» SD —. *—. -3 fd Co t* ω f CO F cn ir1 Ω Ω Ω Ω
Hl Hl Hl Hι ω Hl Hl U Hl Hl U Hl Hι U Sj ^ S) W ?3 ?l Ml Ml rtTr cn n Ii l-i CD cn cn cn rt rt cn cn i l-i l-i li ii H ii ii •j u d O O Q o Ω Ω Q Hr] rl
Hi i i •• i i - I I » i i ■• i i i i i i g g i i 1 1 1 1 ,∞ rf- rt-| 1 1 1 1 ϋ ,*i cn cn rt rt ■• μ- fd fd ir) d TJ *J fl fd 2 fc* ft*
1 ^ ^ ^ o i i-i o i i-i o i i o ≤ ≤ i-i i i hi O O f-. !-: ' £_, £, '— ' p, p, •— CD ø P< 01 1 1 1 & ø 0 PJ ø 1 1 1 1 1 O ■• O O 00 00 00 cn 3 q o
CD P- d CL *— p, p, — fl) II H (. 0 cn CO t-* -> O O I I H O O| | Hi ( ø ø a, ø — o -. - I 1 1 1 3 P α o
CD l l l to CD CD cn cn cn A, 1 1 *-! H H H O X co g g Q. 1 CQ | | CQ I I CQ S I I I p. d -• -• y - — α oo H
HJ H- CD CD I-h ≤ f- l-h ^ Ci Mi ^ fi l-h P- CΕ ^ I-I Sl P- CO g μ- 00 -• 3 Pi Hi Hι| H- CQ | μ- CD | P- CO | Hi μ- CD μ- CO Hi fl) 3 3
It ft μ u -J pj CD H CD Ii f d) hj rt H CL (D CD H co g μ- j3* — -. Pι (t μ p, (t μ p, rt μ Qι t!' - tt μ tt μ - -
Ω co | !J - to [ -* μ ff - o ω £T - in' -. -.
*.. **.. fu co | oo | oo | to o ω p, — Si — S — fu — to co **» "*> l-i
Hi i-i ii
oo -J
output gf4_semantic; output gf2_semantic; output gf3_semantic; output lgf_semantic; output sgf_semantic,- output RUR0_semantic; output WUR0_semantic; output load_instruction; output storejLnstruction; output TIE_Inst; input [23:0] Inst; wire [3:0] op2 = {inst [23 :20] } ; wire [3:0] opl = {inst [19:16] } ; wire [3:0] opO = {inst [3:0]}; wire QRST = (op0==4 'bOOOO) ; wire CUSTO = (opl==4 'bOHO) & QRST; assign GFADD8 = (op2==4 'bOOOO) & CUSTO; assign GFADD8I = (op2==4 'b0100) & CUSTO; assign GFMUX8 = (op2== 'bOOOl) & CUSTO; assign GFRWMOD8 = (op2==4 'bOOlO) & CUSTO; wire [3:0] r = {inst [15 : 12] } ; wire LSCI = (op0==4 'bOOll) ; assign GF8_I = (r==4 'bOOOO) & LSCI; assign SGF8_I = (r==4 'bOOOl) & LSCI; assign LGF8_IU = (r==4 'b0010) & LSCI; assign SGF8_IU = (r==4 'bOOll) & LSCI; wire LSCX = (opl==4 'blOOO) & QRST; assign LGF8_X = (op2== 'bOOOO) & LSCX; assign SGF8JX = (op2==4 'bOOOl) & LSCX; assign LGF8_XU = (op2==4 'b0010) & LSCX; assign SGF8JXU = (op2==4 'bOOll) & LSCX; wire [3:0] s = {inst [11:8] } ; wire [3:0] t = {inst [7:4]} ; wire [7:0] st = {s,t}; wire RST3 = (opl==4 'bOOll) & QRST; wire RUR = (op2==4'blll0) & RST3; assign RURO = (st==8 'bOOOOOOOO) & RUR; wire [7:0] sr = {r,s}; wire WUR = (op2==4 'bllll) & RST3 ; assign WURO = (sr==8 'bOOOOOOOO) & WUR; assign gfmod_usel = GFMULX8 | GFRWMOD8 | RURO | I'bO; assign gfmod_defl = GFRWMOD8 | WURO | I'bO; assign AR_rdO_usel = I'bO
I LG 8_I j SGF8_I j LGF8_IU j SGF8_IU j LGF8_X j SGF8_X j LGF8_XU j SGF8JXU; assign AR_rdO_width32 = I'bO; assign AR_rdl_usel = I'bO
I LGF8_X j SGF8_X
I LG 8_XU j SGF8_XU j WURO; assign AR rdl width32 = I'bO; assign AR_wd_defl = I'bO
I LGF8_Iϋ j SGF8_IU
I LGF8_XU j SGF8_XU j RURO; assign AR_wd_width32 = I'bO; assign gf_rdO_usel = I'bO
I GFADD8 j GFADD8I j GFMULX8; assign gf_rd0_width8 = I'bO; assign gf_rdl_usel = I'bO
I GFADD8 j GFRWMOD8 j SGF8_I j SGF8_IU; assign gf_rdl_width8 = I'bO; assign gf_rd2_usel = I'bO
I SGF8_X
I SGF8_XU; assign gf_rd2_width8 = I'bO; assign gf_wd_def2 = I'bO
I LGF8_I j LGF8_IU j LGF8_X j LGF8_XU; assign gf_wd_defl = I'bO
I GFADD8 j GFADD8I j GFMUL 8 j GFRWMOD8; assign gf_wd_widt 8 = I'bO; assign art_def = I'bO; assign art_use = LGF8JX | SGF8_X | LGF8_XU | SGF8_XU | WURO | I'bO; assign ars_def = LGF8_IU | SGF8_IU | LGF8_XU | SGF8_XU | I'bO; assign ars_use = LGF8_I | SGF8_I | LGF8_IU | SGF8_IU | LGF8JX | SGF8_X LGF8_XU I SGF8_XU | I'bO; assign arr_def = RURO I 1' 'bO; assign arr_use = I'bO; assign br_def = I'bO; assign br_use = I'bO assign bs_def = I'bO assign bs_use = I'bO assign bt_def = I'bO assign bt_use = I'bO assign bs4_def = I'bO; assign bs4_use = I'bO; assign bs8_def = I'bO; assign bs8_use = I'bO; assign gr_def = GFADD8 GFADD8I I GFMULX8 I LGF8JX I LGF8_XU | I'bO; assign gr_use = SGF8_X | SGF8JXU j I'bO; assign gs_def = I'bO; assign gs_use = GFADD8 | GFADD8I I GFMULX8 I I'bO; assign gt_def = GFRWMOD8 I LGF8_I I LGF8_IU I I'bO; assign gt_use = GFADD8 | GFRWMOD8 SGF8 I I SGF8 IU I'bO; wire [3:0] gr_addr = r; wire [3:0] gs_addr = s; wire [3:0] gt_addr = t; assign gf_wd_addr = 4'bO I {4{gr_def}} & gr_addr j {4{gt_def}} & gt_addr; assign gf_rdO_addr = gs_addr; assign gf_rdl_addr = gt_addr; assign gf_rd2_addr = gr_addr; assign gfl_semantic = GFADD8 | I'bO; assign gf4_semantic = GFADD8I | I'bO; assign gf2_semantic = GFMULX8 j I'bO; assign gf3_semantic = GFRWMOD8 | I'bO; assign lgf_semantic = LGF8_I | LGF8_IU | LGF8J | LGF8JXU | I'bO; assign sgf_semantic = SGF8_I j SGF8_IU j SGF8JX j SGF8_XU j I'bO; assign RUR0_semantic = RURO | I'bO; assign WURO_semantic = WURO j I'bO; assign imm4 = t; wire [7:0] imm8 = {inst [23 : 16] } ; assign load_instruction = I'bO
I LGF8_I j LGF8_IU j LGF8_X j LGF8_XU; assign store_instruction = I'bO
I SGF8_I j SGF8_IU j SGF8_X j SGF8_XU; assign TIE_Inst = I'bO
I GFADD8 j GFADD8I j GFMUL 8 j GFRWMOD8 j LGF8_I j SGF8_I j LGF8_IU j SGF8_IU j LGF8JX j SGF8_X j LG 8_XU j SGF8JU j RURO j WURO; endmodule module xmTIE_gfl ( GFADD8_C0, gr_o_cι, grJill_Cl, gs_i_cι, gt_i_Cl, elk
); input GFADD8_C0; output [7:0] gr_o_Cl; output gr_kill_Cl; input [7:0] gs_i_Cl; input [7:0] gt_i_Cl; input elk; assign gr_o_Cl = (gs_i_Cl) Λ (gt_i_Cl) ,- wire GFADD8_C1; xtdelayl #(1) iGFADD8_Cl ( -xtin (GFADD8_C0) , .xtout (GFADD8_C1) , .clk(clk)); assign gr kill Cl = (I'bO) & (GFADD8 Cl) ; endmodule module xmTIE_gf4 (
GFADD8I_C0, gr_o_Cl , gr_kill_Cl, gs_i_Cl , imm4_C0 , elk
); input GFADD8I_C0; output [7:0] gr_o_C 1 ; output gr_kill_Cl; input [7:0] gs_i_Cl ; input [31:0] imm4_C 0 ; input elk; wire [31:0] imm4_Cl; xtdelayl #(32) iimm4_Cl ( .xtin (imm4_C0) , . xtout ( imm4_Cl ) , .clk(clk)),- assign gr_o_Cl = (gs_i_Cl) A (imm4_Cl) ,- wire GFADD8I_C1; xtdelayl #(1) iGFADD8I_Cl ( .xtin(GFADD8I_C0) , . tout (GFADD8I_C1) ,
.elk (elk) ) ; assign gr_kill_Cl = (I'bO) & (GFADD8I_C1) ; endmodule module xmTIE_gf2 ( GFMULX8_C0, gr_o_Cl , gr_kill_Cl, gs_i_Cl , gfmod_ps_Cl, elk
: 0] , l'bo}) (gfmod_j)s_Cl) )
.xtout (GFMULX8_C1) ,
Figure imgf000293_0001
module xmTIE_gf3 (
GFRWMOD8_C0, gt_i_Cl , gt_o_Cl , gt_kill_Cl, gfmod_ps_Cl , gfmod_ns_Cl, gfmodJill_Cl , elk
); input GFRWMOD 8_C 0 ; input [7:0] gt_i_Cl; output [7:0] gt o Cl; output gt_kill_Cl; input [7:0] gf mod_ps_Cl ,- output [7:0] gf mod_ns_Cl ; output gfmod_kill_Cl; input elk; wire [7:0] tl_Cl; assign tl_Cl = gt_i_Cl; wire [7:0] t2_Cl; assign t2_Cl = gfmod_j?s_Cl; assign gfmod_ns_Cl = tl_Cl; assign gt_o_Cl = t2_Cl; wire GFRWMOD8_C1 ; tdelayl # (1) iGFRWMOD8_Cl ( .xtin (GFRWMOD8_C0) , .xtout (GFRWMOD8_Cl) ,
.elk (elk) ) ,- assign gfmodJill_Cl = (I'bO) & (GFRWM0D8_C1) ; assign gt_kill_Cl = (I'bO) & (GFRWMOD8_Cl) ; endmodule module xmTIE_lgf (
LGF8_I_C0,
LGF8_IU_C0 ,
LGF8_X_C0,
LGF8_XU_C0, gt_o_C2 , gtJill_C2, ars_i_Cl , ars_o_Cl, ars_kill_Cl, imm8_C0 , gr_o_C2 , grJill_C2, art_i_Cl ,
MemDataIn8_C2 ,
VAddrIn_Cl ,
LSSize_C0,
VAddrBase_Cl,
VAddrIndex_Cl,
VAddrOffset_C0 ,
LSIndexed 0, elk
); input LGF8_I_C0; input LGF8_IU_C0; input LGF8JX_C0; input LGF8_XU_C0; output [7:0] gt_θ_C2; output gt_kill_C2 ; input [31:0] ars_i_Cl ; output [31:0] ars_o_Cl ; output ars_kill_Cl; input [7:0] imm8_C0; output [7:0] gr_θ_C2; output gr_kill_C2; input [31:0] art_i_Cl ; input [7:0] MemDataIn8_C2 ,- input [31:0] VAddrIn_Cl; output [4:0] LSSize_C0; output [31:0] VAddrBase_Cl; output [31:0] VAddrIndex_Cl ,- output [31:0] VAddrOf fset_C0; output LSIndexed_CO ; input elk; wire indexed_CO; assign indexed_C0 = (LGF8_X_C0) | (LGF8_XU_C0) ; assign LSSize_C0 = 32'hl; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = indexed_C0; assign VAddrOffset_C0 = imm8_C0; assign VAddrIndex_Cl = art_i_Cl; assign gt_o_C2 = MemDataIn8_C2 ; assign gr_o_C2 = MemDataIn8_C2 ,- assign ars_o_Cl = VAddrIn_Cl; wire LGF8_I_C2; xtdelay2 #(1) iLGF8_I_C2 ( .xtin (LGF8_I_C0) , .xtout (LGF8_I_C2) , .clk(clk)); wire LGF8_IU_C2; xtdelay2 # (1) iLGF8_IU_C2 ( .xtin (LGF8_IU_C0) , .xtout (LGF8_IU_C2) ,
.elk (elk)) ,- assign gt_kill_C2 = (I'bO) & ( (LGF8_I_C2) | (LGF8_IU_C2) ) ; wire LGF8_IU_C1; xtdelayl #(1) iLGF8_IU_Cl ( .xtin(LGF8_IU_C0) , .xtout (LGF8_IU_C1) ,
.elk (elk)) ,- wire LGF8JXU_C1; xtdelayl #(1) iLGF8_XU_Cl ( .xtin (LGF8_XU_C0) , .xtout (LGF8_XU_C1) ,
.elk (elk) ) ,- assign ars_kill_Cl = (I'bO) & ( (LGF8_IU_C1) | (LGF8JXU_C1) ) ; wire LGF8_X_C2; xtdelay2 #(1) iLGF8_X_C2 ( .xtin(LGF8_X_C0) , .xtout (LGF8_X_C2) , .clk(clk)); wire LGF8JXU_C2; xtdelay2 #(1) iLGF8_XU_C2 ( .xtin (LGF8JXU_C0) , .xtout (LGF8_XU_C2) ,
.elk (elk)) ; assign gr_kill_C2 = (I'bO) & ( (LGF8_X_C2) | (LGF8_XU_C2) ) ; endmodule module xmTIE_sgf (
SGF8_I_C0,
SGF8_IU_C0 ,
SGF8_X_C0 ,
SGF8_XU_C0, gt_i_Cl , ars_i_Cl , ars_o_Cl , ars_kill_Cl, imm8_C0, gr_i_Cl , art_i_Cl ,
VAddrIn_Cl,
LSSize_C0,
MemDataOut8__Cl ,
VAddrBase_Cl,
VAddrIndex_Cl ,
VAddrOffset_C0,
LSIndexed_C0, elk
); input SGF8_I_C0; input SGF8_IU_C0; input SGF8_X_C0; input SGF8JXU_C0; input [7:0] gt_i_Cl; input [31 : 0] ars i Cl; output [31:0] ars_o_Cl ; output ars_kill_Cl; input [7:0] imm8_C0; input [7:0] gr_i_Cl; input [31:0] art_i_Cl; input [31:0] VAddr In_Cl; output [4:0] LSSize_C0; output [7:0] MemDataOut8_Cl; output [31:0] . VAddrBase_Cl ; output [31:0] VAddr Index_Cl ; output [31:0] VAddrOf fset_C0; output LSIndexedJ..0; input elk; wire indexed__C0; assign indexed_C0 = (SGF8JX_C0) | (SGF8_XU_C0) ; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = indexed_C0; assign VAddrOffset_C0 = imm8_C0; assign VAddrIndex_Cl = art_i_Cl; wire SGF8_X_C1; xtdelayl #(1) iSGF8_X_Cl ( .xtin (SGF8_X_C0) , .xtout (SGF8_X_C1) , .clk(clk)); wire SGF8_XU_C1; xtdelayl #(1) iSGF8_XU_Cl { .xtin (SGF8_XU_C0) , .xtout (SGF8_XU_C1) ,
.elk (elk) ) ,- assign MemDataOut8_Cl = ( (SGF8_X_C1) | (SGF8JXU_C1) ) ? (gr_i_Cl) :
(gt-i.-Cl) ; assign ars_o__Cl = VAddrIn_Cl; wire SGF8_IU__C1; xtdelayl #(1) iSGF8_IU_Cl ( .xtin (SGF8_IU_C0) , .xtout (SGF8_IU_C1) ,
.elk (elk) ) ; assign ars_kill_Cl = (I'bO) & ( (SGF8_IU_C1) | (SGF8_XU_C1) ) ; endmodule module xmTIE__RUR0 (
RUR0_C0, arr_o_Cl, arr_kill_Cl, gfmod_ps_Cl, elk
); input RUR0_C0; output [31:0] arr_o_Cl ; output arr_kill_Cl; input [7:0] gf mod_jps_Cl ; input elk; assign arr_o_Cl = {gfmod_ps_Cl} ; wire RUR0_C1; xtdelayl #{1) iRUR0_Cl ( .xtin(RUR0_C0) , .xtout (RUR0_C1) , .clk(clk)); assign arrJill_Cl = (I'bO) & (RUR0_C1) ; endmodule module xmTIEJWURO (
WUR0_C0 , art_i_CΪ, gfmod_ns_Cl , gfmodJill_Cl , elk
); input WURO CO; input [31 : 0] art_i_Cl ; output [7 : 0 ] gfmod_ns_Cl ; output gfmod_kill_Cl ; input elk; assign gfmod_ns_Cl = {art_i_Cl [7 : 0] } ; wire WUR0_C1; xtdelayl #(1) iWUR0_Cl ( .xtin(WUR0_C0) , .xtout (WUR0_C1) , .clk(clk)); assign gfmod_kill_Cl = (I'bO) & (WUR0_C1) ,- endmodule module xmTIE (
TIE_instJR,
TIE_asReadJR,
TIE_atRead_R,
TIE_atWrite_R,
TIE_arWrite_R,
TIE_asWriteJR,
TIE_aWriteMJR,
TIE_aDataKillJΞ ,
TIΞ_aWriteDataJΞ ,
TIE_aDataKill_M,
TIE_aWriteDataJM,
TIE_Load _R,
TIΞ_Store_R,
TIEJQSSizeJR,
TIE_LSIndexedJR,
TIE_LSOffsetJR,
TIEJMemLoadDataJM,
TIE_MemStoreData8JE,
TIEJMemStoreDatal6JS,
TIEJMemStoreData32JΞ,
TIEJMemStoreData64JΞ ,
TIE_MemStoreDatal28JΞ,
TIE_StallJR,
TIEJΞxceptionJΞ ,
TIEJΞxcCauseJΞ ,
TIEjbsReadJR,
TIEjotReadJR,
TIEjotWriteJR,
TIEJorWriteJR,
TIEjosWriteJR,
TIEjosReadSizeJR,
TIEjotReadSizeJR,
TIEjoWriteSizeJR,
TIEjbsReadDataJΞ,
TIEJotReadDataJ ,
TIEJbWriteDatal_E,
TIE_bWriteData2JΞ,
TIE_bWriteData4JΞ,
TIE_bWriteData8_E,
TIEJoWriteDatal6JΞ,
TIE oDataKillJΞ,
CPEnable,
InstrJR,
SBusJΞ", -'
TBusJΞ,
MemOpAddrJS,
KillJΞ,
ExceptJ ,
ReμlayJW, GIWCLK, Reset
); output TIE_inst_R; output TIE_asRead_R; output TIE_atReadJR; output TIE_atWriteJ ; output TIE_arWriteJR; output TIE_apWriteJR; output TIE_aWriteM_R; output TIE_aDataKill_E; output [31:0] TIE_aWriteDataJΞ; output TIE_aDataKill_M; output [31:0] TIE_aWriteData_M; output TIE__Load_R; output TIE__Store_R; output [4:0] TIE_LSSizeJR; output TIE JύS Indexed JR.; output [31:0] TIE_LSOffset_R; input [127:0] TIEJMemLoadDataJM; output [7:0] TIE_MemStoreData8JΞ; output [15:0] TIEJMemStoreDatalδJS; output [31:0] TIEJMemStoreData32JS; output [63:0] TIEJMemStoreData64JΞ; output [127:0] TIE_MemStoreDatal28_E; output TIE_StallJR; output TIEJΞxceptionJΞ; output [5:0] TIEJExcCauseJΞ; output TIEJbsReadJR; output TIEJotReadJR; output TIEJotWriteJR; output TIEJorWriteJR; output TIEjosWriteJR; output [4:0] TIEjDΞReadSizeJR; output [4:0] TIEjotReadSizeJR; output [4 0] TIEjoWriteSizeJR; input [15 0] TIEjosReadDatajΞ; input [15 0 ] TIE_btReadData JΞ ; output TIEJoWriteDatalJΞ; output [1:0] TIE_bWriteData2_E; output [3:0] TIE_bWriteData4JΞ; output [7:0] TIE_bWriteData8JΞ; output [15:0] TIE_bWriteDatal6JΞ; output TIEJbDataKillJΞ; input [7:0] CPEnable; input [23:0] Instr JR, input [31:0] SBus_E; input [31:0] TBusJΞ; input [31:0] MemOpAddr JΞ ,- input KilljΞ;. input Except JW; input ReplayJW; input GIWCLK ; input Reset;
// unused signals wire TMode = 0 ;
// control signals wire KillPipeJW; wire elk;
// decoded signals wire GFADD8_C0; wire GFADD8I_C0; wire GFMULX8_C0; wire GFRWMOD8_C0; wire LGF8_I_C0; wire SGF8_I_C0; wire LGF8_IU_C0; wire SGF8_IU_C0; wire LGF8_X_C0; wire SGF8_X_C0; wire LGF8_XU_C0; wire SGF8_XU_C0; wire RUR0_C0; wire WU 0_C0; wire [31:0] imm4_C0; wire [7:0] imm8_C0; wire art_use_C0 ; wire art_def_C0 ; wire ars_use_C0 ; wire ars_def_C0; wire arr_use_C0 ; wire arr_def_C0; wire br_use_C0 ; wire br_def_C0; wire bs_use_C0 ; wire bs_def_C0; wire bt_use_C0 ; wire bt_def_C0; wire bs4_use_C0 ; wire bs4_def_C0; wire bs8_use_C0; wire bs8_def_C0; wire gr_use_C0; wire gr_def_C0 ; wire gs_use_C0 ; wire gs_def_C0; wire gt_use_C0; wire gt_def_C0 ; wire gf od_usel_C0 wire gfmod_def1_C0; wire AR_rd0_usel_C0 ; wire AR_rdO_width32_CO ; wire AR_rdl_usel_C0 ; wire AR_rdl_width32_C0 ; wire AR_wd_def 1_C0 ; wire AR_wd_width32_C0 ; wire [3 : 0 ] gf _rd0_addr_C0 ; wire gf_rd0_usel_C0 ; wire gf_rd0_width8_C0 ; wire [3 : 0] gf_rdl__addr_C0 ; wire gf _rdl_usel_C0 ; wire g rdl_width8_C0 ; wire [3 : 0] gf _rd2_addr_C0 ; wire gf_rd2_usel_C0 ; wire gf _rd2_width8_C0 ; wire [3 : 0] gf_wd_addr_C0 ; wire gf_wd_def 2_C0 ; wire gf_wd_def 1_C0; wire gf _wd_width8_C0 ; wire gf l_s emant ic_C0; wire gf4_semantic_C0; wire gf2_semantic_C0; wire gf 3_semantic_C0 ; wire lgf_semantic_CO; wire sgf _s emant ic_C0; wire RUR 0_s emant i c_C 0 ,- wire WUR0_semantic_C0 ; wire load_instruction_C0; wire store_instruction_C0; wire TIE_Inst_C0; wire [23:0] Inst_C0;
// state data, write-enable and stall signals wire [7:0] gfmod_ps_Cl; wire [7:0] gf mod_ns_Cl ; wire gfmod_kill_Cl; wire gfmod_Stall_Cl;
// register data, write-enable and stall signals wire [31:0] AR_rdO_data_Cl ; wire [31:0] AR_rdl_data_Cl; wire [31:0] AR_wd_data32__Cl; wire AR_wd_kill_Cl; wire [7:0] gf _rdO_data_Cl ; wire [7:0] gf _rdl_data_Cl wire [7:0] gf _rd2_data_Cl ; wire [7:0] gf _wd_data8_C2 ; wire gf_wd_kill_C2 ; wire [7:0] gf_wd_data8_Cl; wire gf_wd_kill_Cl; wire gf_Stall_Cl;
// operands wire [31:0] art_i_Cl; wire [31:0] art_o_Cl; wire art_kill_Cl; wire [31:0] ars_i_Cl; wire [31:0] ars_o_Cl; wire ars_kill_Cl; wire [31:0] arr_o_Cl; wire arr_kill_Cl; wire [7:0] gr_i_Cl; wire [7:0] gr_o_C2; wire grJ ill_C2 ; wire [7:0] gr_o_Cl; wire gr_kill_Cl; wire [7:0.] gs_i_Cl; wire [7:0] gt_i_Cl; wire [7:0] gt_o_C2; wire gt_kill_C2; wire [7:0] gt_o_Cl ; wire gtJkill__Cl;
// output state of semantic gfl
// output interface of semantic gfl // output operand of semantic gfl wire [7:0] gfl_gr_o_Cl; wire gfl_gr_kill_Cl;
// output state of semantic gf
// output interface of semantic gf4
// output operand of semantic gf4 wire [7:0] gf4_gr_o_Cl; wire gf4_gr_kill_Cl;
// output state of semantic gf2
// output interface of semantic gf2
// output operand of semantic gf2 wire [7:0] gf2_gr_o_Cl; wire gf2_grJkill_Cl;
// output state of semantic gf3 wire [7:0] gf3_gfmod_ns_Cl; wire gf3_gfmod_kill_Cl;
// output interface of semantic gf3
// output operand of semantic gf3 wire [7:0] gf3_gt_o_Cl; wire gf3_gt_kill_Cl;
// output state of semantic lgf
// output interface of semantic lgf wire [4:0] lgf_LSSize_C0; wire [31:0] lgf_VAddrBase_Cl; wire [31:0] lgf_VAddrIndex_Cl; wire [31:0] lgf_VAddrOffset_C0; wire lgf_LSIndexed_C0 ;
// output operand of semantic lgf wire [7:0] lgf_gt_o_C2 ; wire lgf_gtJkill_C2 ; wire [31 : 0] lgf_ars_o_Cl ; wire lgf_arsJill_Cl; wire [7:0] lgf_gr_o_C2; wire lgf_gr_kill_C2;
// output state of semantic sgf
// output interface of semantic sgf wire [4:0] sgf_LSSize_C0; wire [7:0] sgfJMemData0ut8_Cl; wire [31:0] sgf_VAddrBase_Cl; wire [31:0] sgf_VAddrIndex_Cl; wire [3Ϊ.-0] sgf_VAddrOffset_C0; wire sgfJ_.SIndexed 0;
// output operand of semantic sgf wire [31:0] sgf_ars_o_Cl,- wire sgf_ars_kill_Cl; // output state of semantic RURO
// output interface of semantic RURO
// output operand of semantic RURO wire [31:0] RUR0_arr_o_Cl ,- wire RURO_arrJill_Cl;
// output state of semantic WURO wire [7:0] WUR0_gfmod_ns_Cl ; wire WUR0_gfmod_kill_Cl;
// output interface of semantic WURO
// output operand of semantic WURO
// TIE-defined interface signals wire [31:0] VAddr_Cl; wire [31:0] VAddrBase_Cl ,- wire [31:0] VAddrOffset_C0; wire [31:0] VAddrIndex_Cl ; wire [31:0] VAddrIn_Cl; wire [4:0] LSSize_C0; wire LSIndexed 0; wire [127:0] MemDataInl28_C2; wire [63:0] MemDataIn64_C2 ; wire [31:0] MemDataIn32_C2, wire [15:0] MemDataInl6_C2 j wire [7:0] MemDataIn8_C2 ; wire [127:0] MemDataOutl28_Cl; wire [63:0] MemDataOut64_Cl; wire [31:0] MemDataOut32_Cl; wire [15.-0] MemDataOutl6_Cl; wire [7:0] MemDataOut8_Cl; wire Exception_Cl; wire [5:0] ExcCause_Cl; wire [7:0] CPEnable_Cl; xtflop #(1) reset (localReset, Reset, GIWCLK); xmTIE_decoder TIE_decoder ( .GFADD8 (GFADD8_C0) , .GFADD8I (GFADD8I_C0) , .GFMULX8 (GFMULX8_C0) , .GFRWMOD8 (GFRWMOD8_C0) , .LGF8_I (LGF8_I_C0) , .SGF8_I (SGF8_I_C0) , .LGF8_IU(LGF8_IU_C0) , .SGF8_IU(SGF8_IU_C0) , .LGF8_X(LGF8JX_C0) , . SGF8_X (SGF8_X_C0) , .LGF8_XU(LGF8JXU_C0) , . SGF8JXU (SGF8_XU_C0) , .RURO (RUR0_C0) , .WUR0(WUR0_C0) , . imm4 (imm4_C0) , .imm8 (imm8_C0) , . art_use (art_use_C0) , .art_def (art_def_C0) , .ars use (ars use CO), . ars_def (ars_def_C0 ) ,
. arr_use (arr_use_C0 ) ,
. arr_def (arr_def_C0 ) ,
. br_use (br_use_C0 ) ,
. br_def (br_def_C0) ,
. bs_use (bs_use_C0 ) ,
. bs_def (bs_def_C0 ) ,
.bt_use (bt_use_C0) ,
. bt_def (bt_def_C0 ) ,
. bs4_use (bs4_use_C0 ) .,
.bs4_def (bs4_def_C0 ) ,
. bs8_use (bs8_use_C0 ) ,
. bs8_def (bs8_def_C0 ) ,
. gr_use (gr_use_C0) ,
. gr_def (gr_def_C0 ) ,
. gs_use (gs_use_C0 ) ,
. gs_def (gs_def_C0) ,
. gt_use (gt_use_C0 ) ,
. gt_def (gt_def_C0 ) ,
. gfmod_usel (gfmod_usel_C0 ) ,
. gfmod_def 1 (gfmod_defl_C0 ) ,
. AR_rd0_usel (AR_rdO_usel_CO ) ,
. AR_rdO_width32 (AR_rd0_width32_C0) ,
. AR_rdl_usel (AR_rdl_usel_CO ) ,
.AR_rdl_width3 (AR_rdl_width32_C0) ,
.AR_wd_def1 (AR_wd_def1_C0) ,
.AR_wd_width32 (AR_wd_width32_C0) ,
. gf_rd0_addr (gf_rdO_addr_CO) ,
.gf_rd0_usel (gf_rdO_usel_CO) ,
.gf_rd0_width8 (gf_rd0_width8_C0) ,
.gf_rdl_addr(gf_rdl_addr_CO) ,
.gf_rdl_usel (gf_rdl_usel_CO) ,
.gf_rdl_width8 (gf_rdl_width8_C0) ,
.gf_rd2_addr(gf_rd2_addr_C0) ,
.gf_rd2_usel (gf_rd2_usel_C0) ,
.gf_rd2_width8 (gf_rd2_width8_C0) ,
. gf_wd_addr (gf_wd_addr_CO) ,
.gf_wd_def2 (gf_wd_def2_C0) ,
.gf_wd_def1 (gf_wd_def1_C0) ,
. gf_wd_width8 (gf_wd_width8_C0) ,
.gfl_semantic (gfl_semantic_CO) ,
.gf4_semantic (gf4_semantic_C0) ,
.gf2_semantic (gf2_semantic_CO) ,
.gf3_semantic (gf3_semantic_C0) ,
. lgf_semantic (lgf_semantic_CO) ,
.sgf_semantic (sgf_semantic_CO) ,
. RUR0_semantic (RURO_semantic_CO) ,
.WURO_semantic (WUR0_semantic_CO) ,
. load_instruction (load_instruction_CO) ,
. store_instruction (store_instruction_CO) ,
.TIE_Inst (TIE_Inst_CO) ,
.Inst (Inst CO)
); xmTIE_g'fl TIE_gfl(
. GFADD8_CO (GFADD8_C0) , .gr_o_Cl(gfl_gr_o_Cl) , .grJkill_Cl(gfl_grJill_Cl) , .gs_i_Cl (gs_i_Cl) , . .gt_i_cl(gt_i_Cl) , . elk (elk) ) ; xmTIΞ_gf4 TIΞ_gf4 (
.GFADD8I_C0 (GFADD8I_C0) , .gr_o_Cl (gf4_gr_o_Cl) , .gr_kill_Cl (gf4_grJkill_Cl) , .gs_i_Cl (gs_i_Cl) , . imm4_C0 (imm4_C0) , .elk (elk) ) ,- xmTIE_gf2 TIE_gf2 (
.GFMULX8_C0 (GFMULX8_C0) , .gr_o_Cl (gf2_gr_o_Cl) , .gr_kill_Cl (gf2_gr_kill_Cl) , .gs_i_Cl(gs_i_Cl) , .gfmod_ps_Cl (gfmod_ps_Cl) , .elk (elk) ) ; xmTIE_gf3 TIE_gf3 (
. GFRWMOD8_C0 (GFRWMOD8_C0) , .gt_i_Cl (gt_i_Cl) , -gt_o_Cl (gf3_gt_o_Cl) , .gt_kill_Cl (gf3_gt_kill_Cl) , .gfmod_ps_Cl (gfmod_ps_Cl) , .gfmod_ns_Cl (gf3_gfmod_ns_Cl) , .gfmod_kill_Cl (gf3_gfmod_kill_Cl) , .elk (elk) ) ,- xmTIE_lgf TIE_lgf (
.LGF8_I_C0 (LGF8_I_C0) , .LGF8_IU_C0 (LGF8_IU_C0) , -LGF8_X_C0 (LGF8_X_C0) , .LGF8_XU_C0 (LGF8_XU_C0) , .gt_o_C2 (lgf_gt_o_C2) , .gt_kill_C2 (lgf_gtJill_C2) , .. ars_i_Cl (ars_i_Cl) , .ars_o_Cl (lgf_ars_o_Cl) , .ars_kill_Cl(lgf_arsJkill_Cl) , .imm8_C0 (imm8_C0) , .gr_o_C2 (lgf_gr__o_C2) , .gr_kill_C2 (lgf_gr_kill_C2) , .art_i_Cl (art_i_Cl) , .MemDataIn8_C2 (MemDataIn8_C2) , .VAddrIn_Cl (VAddrIn_Cl) , .LSSize_C0 (lgf_LSSize_CO) , .VAddrBase_Cl (lgf_VAddrBase_Cl) , . AddrIndex_Cl (lgf_VAddrIndex_Cl) , .VAddrOffset_C0 (lgf_VAddrOffset_C0) , .LSIndexed_C0 (lgf_LSIndexed_CO) , .elk (elk).) ; xmTIE_sgf TIE_sgf (
.SGF8_I_C0 (SGF8_I_C0) , .SGF8_IU_C0 (SGF8_IU_C0) , .SGF8_X_C0 (SGF8_X_C0) , .SGF8_XU_C0 (SGF8_XU_C0) , -gt_i_Cl(gt_i_Cl) , .ars_i_Cl (ars_i_Cl) , .ars_o_Cl (sgf_ars_o_Cl) , .ars_kill_Cl (sgf_ars_kill_Cl) , .imm8_C0 (imm8_C0) , .gr_i_Cl (gr_i_Cl) , . art_i_Cl (art_i_Cl) , .VAddrIn_Cl ( AddrIn_Cl) , .LSSize_C0 (sgf_LSSize_C0) , .MemDataOut8_Cl (sgfJMemDataOut8_Cl) , .VAddrBase_Cl (sgf_VAddrBase_Cl) , . AddrIndex_Cl (sgf_VAddrIndex_Cl) , .VAddrOffset_C0 (sgf_VAddrOffset_C0) , .LSIndexed_C0 (sgf_LSIndexed_C0) , .elk (elk)) ; xmTIEJRURO TIEJRURO (
-RUR0_C0 (RUR0_C0) ,
. arr_o_Cl (RURO_arr_o_Cl) ,
.arr_kill_Cl (RURO_arr_kill_Cl) ,
.gfmod_ps_Cl (gfmod_jps_Cl) ,
. elk (elk)); xmTIE_WURO TIEJWURO (
.WUR0_C0 (WURO_CO) , .art_i_Cl (art_i_Cl) , .gfmod_ns_Cl (WURO_gfmod_ns_Cl) , .gfmod_kill_Cl (WURO_gfmod_kill_Cl) , .elk (elk)) ; xmTIE_gfmod_State TIE_gfmod_State ( . s_width8_C0 (l'bl), -ps_usel_C0 (gfmod_usel_CO) , .ps_data_Cl (gfmod_ps_Cl) , .ns_width8_C0(l'bl) , .ns_defl_C0 (gfmod_def1_C0) , .ns_data8_Cl (gfmod_ns_Cl) , .ns_wen_Cl (~gfmod_kill_Cl) , . KillJΞ (KillJΞ) , .KillPipeJW (KillPipeJW) , . StallJR (gfmod_Stall_Cl) , .elk (elk)
); xmTIE_gfJRegfile TIE_gfJRegfile ( .rdO_addr_CO (gf_rdO_addr_CO) , .rdO_usel_CO (gf_rdO_usel_CO) , . rdO_data_Cl (gf_rdO_data_Cl) , . rd0_width8_C0 (gf_rd0_width8_C0) , .rdl_addr_CO (gf_rdl_addr_CO) , .rdl_usel_CO (gf_rdl_usel_CO) , . rdl_data_Cl (gf_rdl_data_Cl) , . rdl_width8_C0 (gf_rdl_width8_C0) , .rd2_addr_C0 (gf_rd2_addr_C0) , .rd2_usel_C0 (gf_rd2_usel_C0) , .rd2_data_Cl(gf_rd2_data_Cl) , .rd2_width8_C0 (gf_rd2_width8__C0) , .wd_addr_CO (gf_wd_addr_CO) , .wdJdef2_C0 (gf_wd_def2_C0) , .wd__wen_C2 (~gf_wdJill_C2) , .wd_data8_C2 (gf_wd_data8_C2) , -wd_defl_CO (gf_wd_def1_C0) , .wd__wen_Cl (~gf_wd_kill_Cl) , .wd_data8_Cl(gf_wd_data8_Cl) , .wd_width8_C0 (gf_wd_width8_C0) , .KillJΞ (KillJΞ) , .KillPipe_ (KillPipeJW) , . StallJR (gf_Stall_Cl) , .clk(clk)
// Stall logic assign TIE_Stall_R = I'bO
I gf_Stall_Cl j gfmod_Stall_Cl;
// pipeline semantic select signals to each stage wire lgf_semantic_Cl; xtdelayl #(1) ilgf_semantic_Cl ( .xtin (lgf_semantic_CO) ,
.xtout (lgf_semantic_Cl) , .elk (elk) ) ; wire sgf_semantic_C1,- xtdelayl #(1) isgf_semantic_Cl ( .xtin (sgf_semantic_C0) ,
.xtout (sgf_semantic_Cl) , .elk (elk) ) ; wire gf3_semantic_Cl; xtdelayl #(1) igf3_semantic_Cl ( .xtin (gf3_semantic_C0) ,
.xtout (gf3_semantic_Cl) , .elk (elk) ) ; wire WURO_semantic_Cl; xtdelayl #(1) iWURO_semantic_Cl ( .x-tin(WUR0_semantic_C0) ,
.xtout (WURO_semantic_Cl) , .elk (elk) ) ; wire RURO_semantic_Cl; xtdelayl # ( 1) iRURO_semantic_Cl ( .xtin (RUR0_semantic_C0) ,
.xtout (RURO_semantic_Cl) , .elk (elk) ) ; wire lgf_semantic_C ; xtdelay2 #(1) ilgf_semantic_C2 ( .xtin (lgf_semantic_C0) ,
.xtout (lgf_semantic_C2) , .elk (elk) ) ; wire gfl_semantic_Cl; xtdelayl #(1) igfl_semantic_Cl ( .xtin (gfl_semantic_C0) ,
.xtout (gfl_semantic_Cl) , .elk (elk) ) ; wire gf4_semantic_Cl; xtdelayl #(1) igf4_semantic_Cl ( .xtin (gf4_semantic_C0) ,
.xtout (gf4_semantic_Cl) , .elk (elk) ) ; wire gf2_semantic_Cl; xtdelayl #(1) igf2_semantic_Cl ( .xtin (gf2_semantic_C0) ,
.xtout (gf2_semantic_Cl) , .elk (elk) ) ;
// combine output interface signals from all semantics assign VAddr_Cl = 32 'bO; assign VAddrBase_Cl = 32 'bO
I (lgf_VAddrBase_Cl & {32{lgf_semantic_Cl} }) j (sgf_VAddrBase_Cl & {32{sgf_semantic_Cl} }) ,- assign VAddrOffset_C0 = 32'bO
I (lgf_VAddrOffset_C0 & {32{lgf_semantic_Cθ} } ) j (sgf_VAddrOffset_C0 & {32{sgf_semantic_Cθ} }) assign VAddrIndex_Cl = 32 'bO
I (lgf_VAddrIndex_Cl & {32{lgf_semantic_Cl} }) j (sgf_VAddrIndex_Cl & { 32 { sgf_semantic_Cl} } ) ; assign LSSize_C0 = 5'bO
I (lgf_LSSize_CO & {5{lgf_semantic_Cθ} }) j (sgf_LSSize_C0 & {5{sgf_semantic_Cθ} }) ; assign LSIndexed_C0 = I'bO
I (lgf_LSIndexed_CO & lgf_semantic_C0) I (sgf_LSIndexed_CO & sgf_semantic_CO) ; assign MemDataOutl28_Cl = 128 'bO; assign MemDataOut64_Cl = 64 'bO; assign MemDataOut32_Cl = 32 'bO; assign MemDataOutl6_Cl = 16 'bO; assign MemDataOut8_Cl = 8'bO
I (sgfJMemDataOut8_Cl & {s{sgf_semantic_Cl} }) ; assign Exception_Cl = I'bO; assign ExcCause_Cl = 6'bO;
// combine output state signals from all semantics assign gfmod_ns_Cl = 8'bO
I (gf3_gfmod_ns_Cl & {8{gf3_semantic_Cl} } ) j (WUR0_gfmod_ns_Cl & {8{WUR0_semantic_Cl} }) ; assign gfmod_kill_Cl = I'bO
I (gf3_gfmod_kill_Cl & gf3_semantic_Cl) j (WURO_gfmod_kill_Cl & WUR0_semantic_Cl) ,-
// combine output operand signals from all semantics assign art_o_Cl = 32 'bO; assign art_kill_Cl = I'bO; assign ars_o_Cl = 32 'bO
I (lgf_ars_o_Cl & {32{lgf_semantic_Cl} }) I ( sgf_ars_o_Cl & {32{sgf_semantic_Cl} } ) ; assign ars_kill_Cl = I'bO
I (lgf_ars_kill_Cl & lgf_semantic_Cl) I (sgf_ars_kill_Cl & sgf_semantic_Cl) ; assign arr_o_Cl = 32 'bO
I (RUR0_arr_o_Cl & {32 {RUR0_semantic_Cl} } ) ,- assign arr_kill_Cl = I'bO
I (RURO_arr_kill_Cl & RUR0_semantic_Cl) ; assign gr_o_C2 = 8'bO
I (lgf_9r_°_ 2 & {8{lgf_semantic_C2} }) ; assign gr_kill_C2 = I'bO
I (lgf_gr_kill_C2 & lgf_semantic_C2) ; assign gr_o_Cl = 8'bO
I (gfl_gr_o_Cl & {8 {gfl_semantic_Cl} } ) I (gf4_9r_°_c-*- & {8{gf4_semantic_Cl} }) I (gf2_gr_o_Cl & {8{gf2_semantic_Cl} }) ; assign grJill_Cl = I'bO
I (gfl_gr_kill_Cl & gfl_semantic_Cl) I (gf4_gr_kill_Cl & gf4_semantic_Cl) j (gf2_gr_kill_Cl & gf2_semantic_Cl) ,- assign gt_o_C2 = 8'bO
I dgf_gt_o_C2 & {8{lgf_semantic_C2} }) ; assign gt_kill_C2 = I'bO
I (lgf_gt_kill_C2 & lgf_semantic_C2) ; assign gt_o_Cl = 8'bO
I (gf3_gt_o_Cl & {8{gf3_semantic_Cl}}) ; assign gt_kill_Cl = I'bO
I (gf3_gt_kill_Cl & gf3_semantic_Cl) ;
// output operand to write port mapping logic assign AR_wd_data32_Cl = ars_o_Cl | arr_o_Cl | 32 'bO; assign AR_wd_kill_Cl = arsJill_Cl | arrJill_Cl | I'bO; assign gf_wd_data8_C2 = gt_o_C2 j gr_o_C2 | 8'bO; assign gf_wd_kill_C2 = gt_kill_C2 | gr_kill_C2 | I'bO; assign gf_wd_data8_Cl = gr_o_Cl | gt_o_Cl | 8'bO; assign gf_ d_kill_Cl = gr_kill_Cl | gtJkill_Cl | I'bO;
// read port to input operand mapping logic assign ars_i_ci = AR_rdO_data_Cl ; assign art_i_ci = AR_rdl_data_Cl ; assign gs_i_Cl = gf_rdO_data_Cl assign gt_i_Cl = gf_rdl_data_Cl assign gr_i_Cl = gf_rd2_data_Cl
// logic to support verification. wire ignore_TIE_aWriteDataJΞ = - (AR_wd_def1_C0 & (TIE_arWriteJR |
TIE_asWriteJR | TIE_atWrite_R) & -TIE_aDataKillJΞ) ; wire ignoreJTIE_aWriteDataJM = -(I'bO & (TIE_arWriteJR | TIE_asWriteJR
TIE_atWriteJR) & ~*TIE_aDataKill_M) ; wire ignoreJTIEJoWriteDataJΞ = (-TIEjotWriteJR & -TIEjotWriteJR) |
TIEJoDataKillJΞ; wire ignoreJTIE_bWriteDatal6_E ■= ignoreJTIEJoWriteDataJS; wire ignoreJTIE_bWriteData8_E = ignoreJTIEJoWriteDataJΞ; wire ignore_TIE_bWriteData4JΞ = ignoreJTIEJbWriteDataJS; wire ignore_TIE_bWriteData2JΞ = ignoreJTIEJoWriteDataJΞ; wire ignoreJTIEJoWriteDatalja = ignoreJTIEJoWriteDataJΞ ; wire ignoreJTIEJSSizeJR = -TIE_LoadJR & ~TIE_Store_R; wire ignoreJTIE_LSIndexedJR = -TIEJuoadJR & -TIE_StoreJR; wire ignore_TIE_LSOffsetJR = -TIEJύoadJR & -TIE_StoreJR | TIE_LSIndexedJR; wire ignoreJTIEJMemStoreDatal28JΞ = (TIEJύSSizeJR != 5'blOOOO) |
~TIE_Store_R; wire ignore_TIEJMemStoreData64JΞ = (TIE_LSSizeJR 5'b01000)
-TIE_StoreJR; wire ignoreJTIE_MemStoreData32JΞ (TIEJύSSizeJR 5'b00100)
-TIE_StoreJR; wire ignoreJTIEJMemStoreDatal6JΞ (TIE LSSize R != 5'bOOOlO)
~TIE_Store_R; wire ignoreJTIEJMemStoreData8JΞ (TIE LSSize R != 5'b00001)
-TIE Store R;
// clock and instructions assign elk = GIWCLK; assign Inst_C0 = InstrJR; assign TIE_instJR = TIE_Inst_C0;
// AR-related signals to/from core assign TIE_asReadJR = ars_use_C0; assign TIE_atReadJR = art_use_C0; assign TIE_atWrite_R = art_def_C0; assign TIΞ_arWriteJR = arr_def_C0; assign TIE_asWrite_R = ars_def_C0; assign TIE_aWriteMJR = 0; assign TIE_aWriteData_E = ignore_TIE_aWriteDataJΞ ? 0 : AR_wd_data32_Cl; assign TIE_aWriteData_M = ignore_TIE_aWriteDataJ ? 0 : 0; assign TIE_aDataKill_E = AR_wdJkill_Cl; assign TIE_aDataKillJM = 0; assign AR_rdO_data_Cl = SBusJΞ; assign AR_rdl_data_Cl = TBusJΞ;
// BR-related signals to/from core assign TIEJbsReadJR = I'bO bs_use_C0 I bs4 use CO | bs8 use CO; assign TIEJotReadJR = I'bO bt_use_C0; assign TIEjotWriteJR = I'bO bt^def_C0; assign TIEjosWrite_R = I'bO j bs_def_C0 bs4 def CO I bs8 def C0; assign -TIEJorWriteJR = I'bO j br_def_C0; assign TIE_bWriteDatal6_E = ignore_TIE_bWriteDatal6JΞ 0; assign TIE_bWriteData8JΞ ignore TIE bWriteDataδ E ? assign TIE_bWriteData4_E = ignore_TIE_bWriteData4JΞ ? assign TIE_bWriteData2JE = ignoreJTIE_bWriteData2JΞ ? assign TIΞjoWriteDatalJΞ = ignoreJTIEJbWriteDatalJΞ ? assign TIEjoDataKillJΞ = 0; assign TIEjbWriteSizeJR = {l'bO, I'bO, I'bO, I'bO, I'bO} assign TIEjosReadSizeJR = {l'bO, I'bO, I'bO, I'bO, l'bo} assign TIE_btReadSize_R = {l'bO, I'bO, I'bO, I'bO, l'bo}
// Load/store signals to/from core assign TIE_LoadJR = load_instruction_CO; assign TIE_StoreJR = store_instruction_CO; assign TIE_LSSizeJR ■= ignore_TIE_LSSizeJR ? 0 : LSSize_C0; assign TIE_LSIndexedJR = ignore_TIE_LSIndexed_R ? 0 : LSIndexed_C0; assign TIE_LSOffsetJR = ignoreJTIEJLSOffsetJR ? 0 : VAddrOffset_C0; assign TIE_MemStoreDatal28_E = ignore_TIEJMemStoreDatal28JS ? 0 :
MemDataOutl28_C1 ; assign TIEJMemStoreData64JΞ = ignoreJITE_MemStoreData64_E ? 0 :
MemDataOut64_Cl ; assign TIEJMemStoreData32_E = ignore_TIEJMemStoreData32JB ? 0 :
MemDataOut32_C1 ; assign TIEJemStoreDatalδJΞ = ignoreJTIE_MemStoreDatal6_E ? 0 :
MemDataOutl6_Cl ; assign TIEJMemStoreDataSJΞ = ignore_TIEJMemStoreData8JΞ ? 0 :
MemDataOut8_Cl ; assign MemDataInl28_C2 = TIEJMemLoadDataJM; assign MemDataIn64_C2 = TIEJMemLoadDataJM; assign MemDataIn32_C2 = TIEJMemLoadDataJM; assign MemDataInl6_C2 = TIEJMemLoadDataJM; assign MemDataIn8_C2 = TIEJMemLoadDataJM; assign VAddrIn_Cl = MemOpAddrJΞ,-
// CPEnable and control signals to/from core assign CPEnable_Cl = CPEnable; assign TIEJΞxceptionJS = Exception_Cl; assign TIΞJSxcCauseJΞ = ExcCause_Cl; assign KillPipeJW = ExceptJW | ReplayJW; endmodule module xtdelayl (xtout, xtin, elk) ; parameter size = 1; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; assign xtout = xtin; endmodule module xtdelay2 (xtout, xtin, elk) ; parameter size = 1; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; assign xtout = xtin; endmodule module xtRFenlatch (xtRFenlatchout,xtin,xten, elk) ; parameter size = 32; output [size-1 :0] xtRFenlatchout; input [size-1 :0] xtin,- input xten,- input elk; reg [size-1 :0] xtRFenlatchout; always @(clk or xten or xtin or xtRFenlatchout) begin if (elk) begin xtRFenlatchout <= #1 (xten) ? xtin : xtRFenlatchout; end end endmodule module xtRFlatch (xtRFlatchout ,xtin, elk) ; parameter size = 32; output [size-1 :0] xtRFlatchout; input [size-1 :0] xtin; input elk; reg [size-1 -.0] xtRFlatchout; always @(clk or.xtin) begin if (elk) begin xtRFlatchout <= #1 xtin; end end endmodule module xtadd(xtout, a, b) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; assign xtout = a + b; endmodule module xtaddc(sum, carry, a, b, c) ; parameter size = 32; output [size-1 :0] sum; output carry; input [size-1 :0] a; input [size-1.-0] b; input c; wire junk; assign {carry, sum, junk} = {a,c} + {b,c}; endmodule module xtaddcin (xtout, a, b, c) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input c,- assign xtout = ({a,c} + {b,c}) >> 1; endmodule module xtaddcout (sum, carry, a, b) ,- parameter size = 1; output [size-1 :0] sum; output carry input [size-1 :0] a; input [size-1.-0] b; assign {carry, sum} = a + b; endmodule module xtbooth(out, cin, a, b, sign, negate); parameter size ■= 16; output [size+l.-O] out; output cin; input [size-1 :0] a; input [2:0] b; input sign, negate; wire ase = sign & a [size-1],- wire [size+l:0] axl = {ase, ase, a}; wire [size+l:0] ax2 = {ase, a, 1'dθ},- wire one = b[l] A b[0] ; wire two = b[2] ? -b [1] & -b[0] : b [1] & b[0]; wire cin = negate ? (~b[2] & (b [1] | b[0])) : (b[2] & ~(b[l] & b[0])) assign out = {size+2{cin} } Λ (axl&{size+2 {one} } | ax2&{size+2{two} } ) ; endmodule module xtclock_gate_nor (xtout , tinl , xtin2 ) ; output xtout; input xtinl, xtin2; assign xtout = -(xtinl | | xtin2) ,- endmodule module xtclock_gate_or (xtout, xtinl, xtin2) ; output xtout; input xtinl, xtin2; assign xtout = (xtinl | | xtin2) ,- endmodule module xtcsa (sum, carry, a, b, c) ; parameter size = 1; output [size-1 :0] sum,- output [size-1 :0] carry; input [size-1 :0]' a; input [size-1 :0] b; input [size-1 :0] c; assign sum = a Λ b Λ c; assign carry = (a & b) | (b & c) | (c & a) endmodule module xtenflop (xtout, xtin, en, elk) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (en) tmp <= #1 xtin; end endmodule module xtfa(sum, carry, a, b, c) ; output sum, carry; input a, b, c; assign sum = a Λ b Λ c; assign carry = a & b | a & c | b & c; endmodule module xtflop (xtout, xtin, elk) ,- parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always ©(posedge elk) begin tmp <= #1 xtin,- end endmodule module xtha(sum, carry, a, b) ; output sum, carry; input a, b; assign sum = a Λ b; assign carry = a & b; endmodule module xtinc (xtout, a); parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; assign xtout = a + 1; endmodule module xtmux2e (xtout, a, b, sel) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input sel; assign xtout = (-sel) ? a : b; endmodule module xtmux3e (xtout, a, b, c, sel) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input [size-1 :0] c; input [1:0] sel; reg [size-1 :0] xtout; always @(a or b or c or sel) begin xtout = sel[l] ? c : (sel[0] ? b : a); end endmodule module xtmux4e (xtout, a, b, c, d, sel) ,- parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input [size-1 :0] c; input [size-1 :0] d; input [1:0] sel; reg [size-1.-0] xtout;
// synopsys infer_mux "xtmux4e" always @{sel or a or b or c or d) begin : xtmux4e case (sel) // synopsys parallel_case full_case 2'b00: xtout = a; 2'bθl: xtout = b; 2 'blO:
Xtout = C ; 2'bll: tout = d; default: xtout = {size{l 'bx} } ; endcase // case (sel) end // always @ (sel or a or b or c or d) endmodule module xtnflop (xtout , xtin, elk); parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (negedge elk) begin tmp <= #1 xtin; end // always @ (negedge elk) endmodule module xtscflop (xtout, xtin, clrb, elk); // sync clear ff parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input clrb; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (lclrb) tmp <= 0; else tmp <= #1 xtin; end endmodule module xtscenflo (xtout, xtin, en, clrb, elk); // sync clear parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input clrb; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (!clrb) tmp <= 0; else if (en) tmp <= #1 xtin; end endmodule
verysys/verify ref . module xmTIE_gfJRegfile (rd0_data_Cl, rd0_addr_C0, rd0jwidth8_C0, rd0_usel_C0, rdl_data_Cl, rdl_addr_C0, rdl_width8_C0 , rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_width8_C0 , rd2_usel_C0, wd_addr_C0, wd_width8_C0 , wd_defl_C0, wd_def2_C0, wd_data8_Cl, wd_data8_C2 , wd_wen_Cl, wd_wen_C2, KillJΞ, KillPipeJW, StallJR, elk) ; output [7:0] rdO_data_Cl,- input [3:0] rd0_addr_C0; input rd0_width8_C0; input rd0_usel_C0; output [7:0] rdl_data_Cl; input [3:0] rdl_addr_C0; input rdl_width8_C0; input rdl_usel_C0; output [7:0] rd2_data_Cl; input [3:0] rd2_addr_C0 ; input rd2_width8_C0 ; input rd2_usel_C0; input [3:0] wd_addr_C0; input wd_width8_C0 ; input wd_defl_C0 ; input wd_def2_C0 ; input [7 : 0] wd_data8_Cl ; input [7 : 0] wd_data8__C2 ,- input wd_wen_Cl ,- input wd_wen_C2 ; input KillJΞ ; input KillPipeJW; output StallJR; input elk;
'******************************* READ PORT rdO
*********** ************************************************************/
// compute the address mask wire rdO_addr_mask_CO = 1 ' dO ;
// masked address pipeline wire rd0_maddr_CO = 1 ' do ;
// bank-qualified use wire rd0_usel_bankO_CO = (rdO_usel_CO & (rdO_maddr_CO == (1 ' dO & rdO_addr_mask_CO) ) ) ;
// alignment mux for use 1 wire [7 : 0] rdO_data_bankO_Cl ; assign rdO_data_Cl [7 : 0] = rdO_data_bankO_Cl;
/*********************************************************************** READ PORT rdl
*********************************************************************
// compute the address mask wire rdl_addr_mask_CO = 1 ' do ,-
// masked address pipeline wire rdl_maddr_C0 = 1 'do ;
// bank-qualified use wire rdl_usel_bank0_C0 = (rdl_usel_C0 & (rdl_maddr_C0 == (1'dO & rdl_addr_mask_CO) ) ) ;
// alignment mux for use 1 wire [7:0] rdl_data_bankO_Cl; assign rdl_data Cl[7:0] = rdl_data_bankO_Cl; •
/*********************************************************************** READ PORT rd2
***********************************************************************/
// compute the address mask wire rd2_addr_mask_C0 = 1'dO;
// masked address pipeline wire rd2_maddr_C0 = 1 ' do ;
// bank-qualified use wire rd2_usel_bank0_C0 = (rd2_usel_C0 & (rd2_maddr_C0 == (1'dO & rd2_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7:0] rd2_data_bankO_Cl; assign rd2 data Cl[7:0] = rd2 data bankO Cl;
/*********************************************************************** WRITE PORT wd
***********************************************************************
// compute the address mask wire wd_addr_mask_C0 = 1 ' do ;
// bank-qualified write def for port wd wire wd_def l_bank0_C0 = (wd_defl_C0 & ( (wd_addr_C0 & wd_addr_mask_C0 ) == (1'dO & wd_addr_mask_C0) ) ) ,- wire wd_def 2_bank0_C0 = (wd_def2_C0 & ( (wd_addr_C0 & wd_addr_mask_C0) == (1'dO & wd_addr_mask_C0) ) ) ;
// write mux for def 1 wire [7:0] wd_wdata_Cl; assign wd_wdata_Cl = {l{wd_data8_Cl [7:0]}};
// write mux for def 2 wire [7:0] wd_wdata__C2 ; assign wd_wdata_C2 = {l{wd_data8_C2 [7 : 0] } } ;
wire StallJRO;
/*********************************************************************** PIPELINED BANK
***********************************************************************/ xmTIE_gf JRegf ilejoank TIE_gf JRegf ileJoankO (rd0_data_bank0_Cl, rd0_addr_C0 [3 :0] , rd0_usel_bank0_C0, -rdl_data_bank0_Cl , rdl_addr_C0 [3:0] , rdl_usel_bank0_C0, rd2_dataJoank0_Cl, rd2_addr_C0 [3 : 0] , rd2_usel_bank0_C0 , wd_addr_C0 [3 :0] , wd_def l_bank0_C0 , wd_def 2 Joank0_C0 , wd_wdata_Cl [7:0] , wd_wdata_C2 [7 : 0] , wd_wen_Cl, wd_wen_C2, KillJΞ, KillPipeJW, StallJRO, elk) ; assign StallJR = StallJRO | I'bO; endmodule
module xmTIE_gf JRegf ile_bank(rdO_data_Cl, rd0_addr_C0, rd0_usel_C0, rdl_data_Cl, rdl_addr_C0, rdl_usel_C0, rd2_data_Cl, rd2_addr_C0, rd2_usel_C0, wd_addr_C0, wd_defl_C0, wd_def2_C0, wd_data_Cl, wd_data_C2, wd_wen_Cl, wd_wen_C2 , KillJS, KillPipeJW, StallJR, elk) ; output [7:0] rd0_data_Cl; input [3:0] rd0_addr_C0; input rdQ_usel_C0; output [7:0] rdl_data_Cl; input [3:0] rdl_addr_C0; input rdl_usel_C0 ,- output [7:0] rd2_data_Cl; inpύt [3:0] rd2_addr_C0; input rd2_usel_C0; input [3:0] wd_addr_C0; input wd_defl_C0; input wd_def2_C0; input [7:0] wd data_Cl; input [7:0] wd_data_C2; input wd_wen_Cl; input wd_wen_C2 ,- input KillJΞ; input KillPipeJW; output StallJR; input elk; wire rd0_use2_C0 = 1'dO wire rdl_use2_C0 = 1'dO wire rd2 use2 CO = 1'dO wire kill_C0 KillPipeJW; wire kill_Cl KillPipeJW Kill E; wire kill_C2 KillPipeJW; wire kill C3 KillPipeJW;
// write definition pipeline wire wd_ns_defl_C0 = wd_defl_C0 & l'bl & ~kill_C0; wire wd_defl_Cl; xtdelayl #(1) iwd_def1_C1 (wd_def1_C1, wd_ns_def1_C0, elk); wire wd_ns_def2_C0 = wd_def2_C0 & l'bl & -kill_C0; wire wd_def2_Cl; xtdelayl #(1) iwd_def2_C1 (wd_def2_C1, wd_ns_def2_C0 , elk); wire wd_ns_def2_Cl = wd_def2_Cl & wd_wen_Cl & -kill_Cl; wire wd_def2_C2; xtdelayl #(1) iwd_def2_C2 (wd_def2_C2 , wd_ns_def2_C1, elk);
// write enable pipeline wire wd_we_C2 ; wire wd_we_C3 ; wire wd_ns_we_Cl = (1'dO | (wd_defl_Cl & wd_wen_Cl) ) & -kill_Cl; wire wd_ns_we_C2 = (wd_we_C2 | (wd_def2_C2 & wd_wen_C2) ) & -kill_C2; wire wd_ns_we_C3 = (wd_we_C3 | (1'dO & 1'dO)) & ~kill_C3; xtdelayl #(1) iwd_we_C2 (wd_we_C2 , wd_ns_we_Cl, elk); xtdelayl #(1) iwd_we_C3 (wd_we_C3 , wd_ns_we_C2 , elk) ;
// write address pipeline wire [3:0] wd_addr_Cl; wire [3:0] wd_addr_C2; wire [3:0] wd_addr_C3 , xtdelayl #(4) iwd_addr_Cl (wd_addr_Cl, wd_addr_C0, elk) xtdelayl #(4) iwd_addr_C2 (wd_addr_C2 , wd_addr_Cl, elk) xtdelayl #(4) iwd_addr_C3 (wd_addr_C3 , wd_addr C2 , elk)
// write data pipeline wire [7:0] wd_result_C2 ; wire [7:0] wd_result_C3 ; wire [7:0] wd_mux_Cl = wd_data_Cl; wire [7:0] wd_mux_C2 = wd_def2_C2 ? wd_data_C2 : wd_result_C2 ; xtdelayl #{8) iwd_result_C2 (wd_result_C2, wd_mux_Cl, elk) ; xtdelayl #(8) iwd_result_C3 (wd_result_C3, wd_mux_C2, elk) ; wire [7:0] rd0_data_C0 wir≤ [7-.0] rdl_data_C0 wire [7:0] rd2_data_C0 xtdelayl #(8) ird0_data_Cl (rdO_data_Cl, rd0_data_C0, elk) ; xtdelayl #(8) irdl_data_Cl (rdl_data_Cl, rdl_data_C0, elk); xtdelayl #(8) ird2_data_Cl (rd2__data_Cl, rd2_data_C0, elk); assign StallJR =
( (wd_addr_Cl == rd0_addr_C0) & (
(rd0_usel_C0 & (wd_ns_def2_C1) ) ) ) | ( (wd_addr_Cl == rdl_addr_C0) & (
(rdl_usel_C0 & (wd_ns_def2_Cl) ) ) ) | ( (wd_addr_Cl == rd2_addr_C0) & (
(rd2_usel_C0 & (wd_ns_def2_C1) ) ) ) | I'bO;
// verification register file replacement wire [7:0] xwd_verify; xtenflop #(8) wd_verify (xwd_verify, wd_result_C3 , wd_ns_we_C3 , elk) ; xtflop #(8) rdO_verify (rd0_data_C0, xwd_verify, elk) xtflop #(8) rdl_verify (rdl_data_C0, xwd_verify, elk) xtflop #(8) rd2_verify (rd2_data_C0, xwd_verify, elk) endmodule
module xmTIE_gfmod_State (ps_data_Cl, ps_width8_C0 , ps_usel_C0, ns_width8_C0 , ns_defl__C0, ns_data8_Cl, ns_wen_Cl, KillJΞ, KillPipeJW, StallJR, elk); output [7:0] ps_data_Cl; input ps_width8_C0 ; input ps_usel_C0 ; input ns_width8_C0 ; input ns_defl_C0; input [7:0] ns_data8_Cl; input ns_wen_Cl; input KillJΞ; input KillPipeJW; output StallJR; input clk; wire ps_addr_C0 = 1'dO; wire ns_addr_C0 = 1'dO; wire ns wen C2 = 1 ' dl ;
/*********************************************************************** READ PORT ps
*********************************************************************** // compute the address mask wire ps_addr_mask_C0 = 1'dO;
// masked address pipeline wire ,ps_maddr_C0 = 1 ' do ;
// bank-qualified use wire ps_usel_bank0_C0 = (ps_usel_C0 & (ps_maddr_C0 == (1'dO & ps_addr_mask_C0) ) ) ;
// alignment mux for use 1 wire [7:0] ps_data_bankO_Cl; assign ps_data_Cl [7 : 0] = ps_data_bankO_Cl ,- /************************* ********************************************** WRITE PORT ns
*** ******************************************************************** / // compute the address mask wire ns_addr_mask_C0 = 1 ' dO ,-
// bank-qualified write def for port ns wire ns_def l_bank0_C0 = (ns_def 1_C0 & ( (ns_addr_C0 & ns_addr_mask_C0 ) == ( 1 ' dO & ns_addr_mask_C0 ) ) ) ;
// write mux for def 1 wire [7 : 0] ns_wdata_Cl ; assign ns_wdata_Cl = {l{ns_data8_Cl [7 : 0] } } ;
wire Stall_R0;
/*********************************************************************** PIPELINED BANK
*********************************************************************** xmTIE_gfmod_State_bank TIE_gfmod_State_bankO (ps_data_bankO_Cl , ps_usel_bank0_C0, ns_defl_bank0_C0, ns_wdata_Cl [7 : 0] , ns_wen_Cl, ns_wen_C2, KillJΞ, KillPipeJW, StallJRO, elk); assign StallJR = StallJRO | I'bO; endmodule
module xmTIE_gfmod_State_bank(ps_data_Cl, ps_usel_C0, ns_defl_C0, ns_data_Cl, ns_wen_Cl, ns_wen_C2, KillJΞ, KillPipeJW, StallJR, elk) ; output [7:0] ps_data_Cl; input ps_usel_C0; input ns_defl_C0; input [7:0] ns_data_Cl; input ns_wen_Cl; input ns_wen_C2 ,- input Kill_E;' input KillPipeJW; output StallJR; input elk; wire ps_addr_C0 = 1 ' do wire ps_use2_C0 = 1 ' do wire ns_addr_C0 = 1 ' do wire ns_def2_C0 = 1 ' dO wire [7:0] ns data C2 = 0; wire kill_C0 KillPipeJW; wire kill_Cl KillPipeJW Kill E; wire kill_C2 KillPipeJW; wire kill C3 KillPipe W;
// write definition pipeline wire ns_ns_defl_C0 = ns_defl_C0 & l'bl & -kill_C0; wire ns defl Cl; xtdelayl #(1) ins_def1_C1 (ns_def1__C1, ns_ns_def1_C0 , elk); wire ns_ns_def2_C0 = 1'dO; wire ns_def2_Cl = 1'dO; wire ns_ns_def2_Cl = 1'dO; wire ns_def2_C2 = 1'dO;
// write enable pipeline wire ns_we_C2; wire ns_we_C3 ; wire ns_ns_we_Cl = (1'dO | (ns_defl_Cl & ns_wen_Cl) ) & ~kill_Cl; wire ns_ns_we_C2 = (ns_we_C2 | (ns_def2_C2 & ns_wen_C2)) & ~kill_C2; wire ns_ns_we_C3 = (ns_we_C3 j (1'dO & 1'dO)) & ~kill_C3 ; xtdelayl #(1) ins_we_C2 (ns_we_C2 , ns_ns_we_Cl, elk); xtdelayl #(1) ins_we_C3 (ns_we_C3 , ns_ns_we_C2, elk);
// write address pipeline wire ns_addr_Cl wire ns_addr_C2 wire ns_addr_C3 assign ns_addr_Cl = 1'dO assign ns_addr_C2 = 1'dO assign ns_addr_C3 = 1'dO
// write data pipeline wire [7:0] ns_result_C2 ,- wire [7:0] ns_result_C3 ; wire [7:0] ns_mux_Cl = ns_data_Cl; wire [7:0] ns_mux_C2 = ns_def2_C2 ? ns_data_C2 : ns_result_C2 ; xtdelayl #(8) ins_result_C2 (ns_result_C2 , ns_mux_Cl, elk); xtdelayl #(8) ins_result_C3 (ns_result_C3 , ns_mux_C2, elk); wire [7:0] ps_data_C0; xtdelayl #(8) ips_data_Cl (ps_data_Cl, ps_data_C0, elk); assign StallJR =
( (ns_addr_Cl == ps__addr_C0) & (
(ps_usel_C0 & (ns_ns_def2_Cl) ) ) ) | I'bO;
// verification register file replacement wire [7:0] xns_verify; xtenflop #(8) ns_verify (xns_verify, ns_result_C3 , ns_ns_we_C3 , elk); xtflop #(8) ps_verify (ps_data_C0, xns_verify, elk) ,- endmodule
module xmTIE_decoder (
GFADD8 ,
GFADD8I,
GFMULX8 ,
GFRWMOD8 ,
LGF8_I,
SGF8_I,
LGF8_IU;
SGF8_IU,
LGF8_X,
SGF8_X,
LGF8_XU,
SGF8 XU,
Figure imgf000321_0001
WUR 0_s emant ic , load_inst ruction, store_instruction,
TIΞ_Inst ,
Inst
) ; output GFADD8; output GFADD8I; output GFMULX8; output GFRWM0D8; output LGF8_I; output SGF8_I; output LGF8_IU; output SGF8_IU; output LGF8_X; output SGF8JX; output LGF8JXU; output SGF8JU; output RURO; output WURO; output [31 : 0] imm4 ; output [7 : 0] immδ ; output art_use output art_def output ars_use output ars_def output arr_use output arr_def output br_use output br_def output bs_use output bs_def output bt_use output bt_def output bs4_use output bs4_def output bs8_use output bs8_def output gr_use output gr_def output gs_use output gs_def output gt_use output gt_def; output gfmod_usel; output gfmod_defl; output AR_rdO_usel; output AR_rdO_width32; output AR_rdl_usel; output AR_rdl,_width32; output AR_wd_defl; output AR_wd_width32; output [3:0] gf_rdO_addr; output gf_rd0_usel; output ' gf_rd0_width8 ; output - [3 : 0] gf_rdl_addr,- output gf_rdl_usel ; output gf_rdl_width8 ,- output [3 : 0] gf_rd2_addr; output gf_rd2_usel ; output gf_rd2_width8 ; output [3:0] gf_wd_addr,- output gf_wd_def2 ,- output gf_wd__defl; output gf_wd__width8; output GFADD8_semantic ; output GFADD8I_semantic; output GFMULX8_semantic,- output GFRWMOD8_semantic ; output LGF8_I_semantic; output LGF8_IU_semantic ; output LGF8JX_semantic; output LGF8JXU_semantic ; output SGF8_I_semantic , output SGF8_IU_semantic; output SGF8JX_semantic ; output SGF8_XU_semantic; output RUR0_semantic; output WUR0_semantic; output load_instruction; output store^instruction; output TIE_Inst; input [23:0] Inst; wire [3:0] op2 = {inst [23 :20] } ; wire [3:0] opl = {ins [19:16] } ; wire [3:0] opo = {inst [3 : 0] } ; wire QRST = (op0==4 'b0000) ; wire CUSTO = (opl==4'b0110) & QRST; assign GFADD8 = (op2==4 ' bOOOO) & CUSTO; assign GFADD8I = (op2==4 'bOlOO) & CUSTO; assign GFMULX8 = (op2==4 'bOOOl) & CUSTO; assign GFRWMOD8 = (op2==4 ' bOOlO) & CUSTO; wire [3:0] r = {inst [15 : 12] } ; wire LSCI = (op0==4 'bOOll) ; assign LGF8_I = (r==4'b0000) & LSCI; assign SGF8_I = (r==4'b0001) & LSCI; assign LGF8_IU = (r==4'b0010) & LSCI; assign SGF8_IU = (r==4'b0011) & LSCI; wire LSCX = (opl==4 'blOOO) & QRST; assign LGF8_X = (op2==4 'bOOOO) & LSCX; OOl) & LSCX; OOlO) & LSCX; 0011) & LSCX; }; ; & QRST; & RST3; OOOO) & RUR; & RST3 ; OOOO) & WUR; 8 | GFRWMOD8 RURO I'bO; D8 | WURO | .1 bO;
Figure imgf000323_0001
LGF8_XU | SGF8_XU | WURO | I'bO; | LGF8_XU | SGF8_XU | I'bO; LGF8_IU | SGF8_IU | LGF8_X | SGF8_X
Figure imgf000324_0001
assign arr_use = I'bO; assign br_def = I'bO; assign br_use = I'bO; assign bs_def = I'bO; assign bs_use = I'bO; assign bt_def = I'bO; assign bt_use = I'bO; assign bs4_def = I'bO; assign bs4_use = I'bO; assign bs8_def = I'bO; assign bs8_use = I'bO; assign gr_def = GFADD8 | GFADD8I | GFMULX8 I LGF8 X I LGF8 XU | I'bO; assign gr_use = SGF8J j SGF8JXU | I ' bO ; assign gs_def = I'bO; assign gs_use = GFADD8 | GFADDSI | GFMULX8 I I'bO; assign gt_def = GFRWMOD8 | LGF8_I I LGF8_IU I I'bO; assign gt_use = GFADD8 | GFRWMOD8 I SGF8 I I SGF8 IU I'bO; wire [3:0] gr_addr = r; wire [3:0] gs_addr = s; wire [3:0] gt_addr = t; assign gf_wd_addr = 4'b0
I {4{gr_def}} & gr_addr
I {4{gt_def}} & gt_addr; assign gf_rdO_addr = gs_addr assign gf_rdl_addr = gt__addr assign gf_rd2_addr = gr^addr assign GFADD8_semantic = GFADD8 | 'bO; assign GFADDSI_semantic = GFADD8I 1 ' bO ; assign GFMULX8_semantic = GFMULX8 I'bO; assign GFRWMOD8_semantic = GFRWMOD8 | I'bO; assign LGF8_I_semantic = LGF8_I | I'bO; assign LGF8_IU_semantic = LGF8_IU I I'bO assign LGF8_X_semantic = LGF8JX | I'bO; assign LGF8JXU_semantic = LGF8_XU I I'bO assign SGF8_I_semantic = SGF8_I | I'bO; assign SGF8_IU_semantic = SGF8_IU I I'bO assign SGF8JX_semantic *= SGF8_X | I'bO; assign SGF8_XU_semantic = SGF8_XU Ij I'bO assign RURO_semantic = RURO | I'bO; assign WUR0_semantic = WURO j I'bO; assign imm4 = t; wire [7:0] imm8 = {inst [23 : 16] } ; assign load_instruction = I'bO
I LGF8__I j LGF8_IU j LGF8_X j LGF8_XU; assign store_instruction = I'bO
I SGF8_I j SGF8_IU
I SGF8_X j SGF8JKU; assign TIE_Inst = I'bO
I GFADD8 j GFADD8I
I GFMULX8
I GFRWMOD8
I LGF8_I
I SGF8_I j LGF8_IU
I SGF8_IU j LGF8J j SGF8_X j LGF8_XU j SGF8 CU j RURO j WURO; endmodule module xmTIE_GFADD8 (
GFADD8_C0 , gr_o_Cl , gr_kill_Cl, gs_i_Cl , gt_i_Cl , elk
); input GFADD 8_C0; output [7:0] gr_o_Cl; output gr_kill_Cl; input [7:0] gs_i_Cl; input [7:0] gt_i_Cl; input elk; assign gr_o_Cl = (gs_i_Cl) Λ (gt_i_Cl) ; wire GFADD8_C1; xtdelayl #(1) iGFADD8_Cl ( .xtin (GFADD8_C0) , .xtout (GFADD8_C1) , .clk(clk)); assign gr_kill_Cl = (I'bO) & (GFADD8_C1) ,- endmodule module xmTIE_GFADD8I (
) ; input GFADD8I_C0; output [7:0] gr_0_Cl; output gr_kill_Cl; input [7:0] gs_i_Cl; input [31:0] imm4_C0,- input elk; wire [31:0] imm4_Cl; xtdelayl #(32) iimm4_Cl ( .xtin(imm4_C0) , .xtout (imm4_Cl) , . el (elk)); assign gr_o_Cl = (gs_i_Cl) Λ (imm4_Cl) ; wire GFADD8I_C1; xtdelayl #(1) iGFADD8I_Cl ( .xtin (GFADD8I_C0) , . tout (GFADD8I_C1) ,
.elk (elk) ) ; assign grJill_Cl = (I'bO) & (GFADD8I_C1) ; endmodule module xmTIE_GFMULX8 (
GFMULX8_C0 , gr_o_Cl , gr_kill_Cl, gs_i_Cl, gfmod__ps_Cl , elk
); input GFMULX8_C0; output [7:0] gr_o_Cl; output gr_kill_Cl; input [7:0] gs_i_Cl; input [7:0] gfmod_ps_Cl; input elk; assign gr_o_Cl = (gs_i_Cl [7] ) ? ( ({gs_i_Cl [6 :0] , I'bO}) A (gfmod_ps_Cl) ) ({gs_i_Cl[6:0] , I'bO}) ; wire GFMULX8 Cl; xtdelayl # (1) iGFMULX8_Cl ( .xtin (GFMULX8_C0) , .xtout (GFMULX8_C1) ,
.el (elk)) ; assign gr_kill_Cl = (I'bO) & (GFMULX8_C1) ,- endmodule module xmTIE_GFRWMOD8 (
GFRWMOD8_C0 , gt_i_Cl , gt_o_Cl , gt_kill_Cl, gfmod_ps_Cl, gfmod_ns_Cl , gfmod_kill_Cl , elk
>; input GFRWMOD 8_C0; input [7:0] gt_i_Cl; output [7:0] gt_0_Cl; output gt_kill_Cl; input [7:0] gfmod_ps_Cl; output [7:0] gfmod_ns_Cl; output gfmod_kill_Cl; input elk; ire [7:0] tl_Cl; assign tl_Cl = gt_i_Cl; wire [7:0] t2_Cl; assign t2_Cl = gf mod__ps_Cl ; assign gfmod_ns_Cl = tl_Cl; assign gt_o_Cl = t2_Cl; wire GFRWM0D8_C1; xtdelayl # (1) iGFRWM0D8_Cl ( .xtin (GFRWMOD8_C0) , .xtout (GFRWMOD8_Cl) ,
.elk (elk) ) ; assign gfmodJ-.il1_C1 = (I'bO) & (GFRWM0D8_C1) ; assign gt_kill_Cl = (I'bO) & (GFRWMOD8_Cl) ; endmodule module xmTIE_LGF8_I (
LGF8_I_C0, gt_o_C2 , gt_kill_C2, ars_i_Cl, imm8_C0 ,
MemDataIn8_C2 ,
LSSize_C0,
VAddrBase_Cl,
VAddrOffset_C0,
LSIndexed_CO, elk
); input LGF8_I_C0; output [7:0] gt_0_C2; output gt_kill_C2; input [31:0] ars_i_Cl; input [7:0] imm8_C0; input [7:0] MemDataIn8_C2 ; output [4:0] LSSize_C0; output [31:0] VAddrBase_Cl ; output [31:0] VAddrOf fset_C0; output LSIndexed_C0; input elk; assign LSSize_C0 = 32 'hi; ' assign VAddrBase_Cl = ars_i__Cl; assign LSIndexed_C0 = I'bO; assign VAddrOffset_C0 = imm8_C0; assign gt_o_C2 = MemDataIn8_C2 ; wire LGF8_I_C2; xtdelay2 #(1) iLGF8_I_C2 ( .xtin(LGF8_I_C0) , .xtout (LGF8_I_C2) , . el (elk)),- assign gt_kill_C2 = (I'bO) & (LGF8_I_C2) ; endmodule module xmTIE__LGF8_IU (
LGF8_IU_C0, gt_o_C2 , gt_kill_C2, ars__i_Cl, ars_o_Cl, ars_kill_Cl, imm8_C0,
MemDataIn8_C2 ,
VAddrIn_Cl ,
LSSize_C0,
VAddrBase_Cl,
VAddrOffset_C0,
LSIndexed o, elk
) ; input LGF8_IU_C0; output [7:0] gt_o_C2; output gt_kill_C2; input [31:0] ars__i_Cl; output [31:0] ars_0_Cl; output arsJkill_Cl; input [7:0] imm8_C0; input [7:0] MemDataIn8_C2 ; input [31:0] VAddrIn_Cl ; output [4:0] LSSize_C0; output [31:0] VAddrBase Cl; output [31:0] VAddrOffset_C0; output LSIndexed_C0; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = I'bO; assign VAddrOffset_C0 = imm8_C0; assign gt_o_C2 = MemDataIn8_C2 ; assign ars_o_Cl = VAddrIn_Cl; wire LGF8_IU_C2; xtdelay2 #(1) iLGF8_IU_C2 ( .xtin (LGF8_IU_C0) , .xtout (LGF8_IU_C2) , . elk (elk)); assign gt_kill_C2 = (I'bO) & (LGF8JDJ_C2) ; wire LGF8_IU_C1; xtdelayl # (1) iLGF8_IU_Cl ( .xtin (LGF8_IU_C0) , .xtout (LGF8_IU_C1) , .elk (elk) ) ; assign ars_kill_Cl = (I'bO) & (LGF8_IU_C1) ; endmodule module xmTIE__LGF8_X ( LGF8_X_C0, gr_o_C2 , grJill_C2, ars_i_Cl , art_i_Cl,
MemDataIn8_C2 ,
VAddrIn_Cl,
LSSize_C0,
VAddrBase_Cl,
VAddrIndex_Cl,
LSIndexed_C0, elk
); input LGF8JX_C0; output [7:0] gr_o_C2; output grJill_C2 ; input [31:0] ars_i_Cl; input [31:0] art_i_Cl; input [7:0] MemDataIn8_C2 ; input [31:0] VAddr In_C 1 ; output [4:0] LSSize_C0; output [31:0] VAddrBase_Cl ; output [31:0] VAddrIndex_Cl ; output LSIndexed_C0; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = l'bl; assign VAddrIndex_Cl = art_i_Cl; assign gr_o_C2 = MemDataIn8_C2 ; assign ars_o__Cl = VAddrIn_Cl; wire LGF8_X_C2 ; xtdelay2 #(1) iLGF8JX_C2 ( .xtin (LGF8_X_C0) , .xtout (LGF8_X_C2) , .clk(clk)); assign gr_kill_C2 = (I'bO) & (LGF8JX_C2) ; endmodule module xmTIE_LGF8_XU (
LGF 8 JXU_C0 , gr_o_C2 , gr_kill_C2, ars_i_Cl, ars_o_Cl , ars_kill_Cl, art_i_Cl,
MemDataIn8_C2 ,
VAddrIn_Cl,
LSSize_C0,
VAddrBase_Cl,
VAddrIndex_Cl,
LSIndexed_C0, elk
); input LGF8JXU_C0; output [7:0] gr_0_C2 ; output gr_kill_C2; input [31.-0] ars_i_Cl; output [31:0] ars_o_Cl ; output" ars_kill_Cl; input [31:0] art_i_Cl; input [7:0] MemDataIn8_C2 ,- input [31:0] VAddrIn_Cl; output [4:0] LSSize_C0; output [31:0] VAddrBase_Cl ; output [31:0] VAddrIndex_Cl ; output LSIndexed_C0; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = l'bl; assign VAddrIndex_Cl = art_i_Cl; assign gr_o_C2 = MemDataIn8_C2 ; assign ars_o_Cl = VAddrIn_Cl; wire LGF8JXU_C2; xtdelay2 # (1) iLGF8_XU_C2 ( .xtin (LGF8JOJJC0) , .xtout (LGF8JXU_C2) ,
.elk (elk) ) ; assign gr_kill_C2 = (I'bO) & (LGF8_XU_C2) ; wire LGF8_XU_C1; xtdelayl # (1) iLGF8JXU_Cl ( .xtin (LGF8JXU_C0) , .xtout (LGF8_XU_C1) ,
.elk (elk)) ; assign ars_kill_Cl = (I'bO) & (LGF8JXU_C1) ; endmodule module xmTIE_SGF8_I (
SGF8_I_C0 , gt_i_Cl, ars_i_Cl, imm8_C0 ,
LSSize_C0,
MemDataOut8_Cl ,
VAddrBase_Cl,
VAddrOffset_C0,
LSIndexed_C0, elk
); input SGF8JLJC0 ; input [7 : 0] gt_i_Cl ; input [31 : 0] ars_i_Cl ; input [7 : 0] imm8_C0 ; output [4 : 0] LSSize_C0 ; output [7 : 0] MemDataOut8_Cl ; output [31 : 0] VAddrBase_Cl ; output [31 : 0] VAddrOf f set_C0 ; output LSIndexed_C0 ; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = I'bO; assign VAddrOffset CO = imm8_C0; assign MemDataOut8_Cl = gt_i_Cl; endmodule module xmTIE_SGF8_IU (
SGF8_IU_C0 , gt_i_Cl , ars_i_Cl , ars_o_Cl , ars_kill_Cl, imm8_C0 /
VAddrIn_Cl ,
LSSize_C0,
MemDataOut8_Cl ,
VAddrBase_Cl,
VAddrOffset CO, LSIndexed o, elk
); input SGF8_IU_CO; input [7:0] gt_i_C 1 ; input [31:0] ars_i__Cl; output [31:0] ars_o_Cl; output ars_kill_Cl; input [7:0] imm8_C0; input [31:0] VAddrIn_Cl; output [4:0] LSSize_C0; output [7:0] MemData0ut8_Cl; output [31:0] VAddrBase_Cl; output [31:0] VAddrOf fset_C0; output LSIndexed_C0 ,- input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = I'bO; assign VAddrOffset_C0 = imm8_C0; assign MemDataOut8_Cl = gt_i_Cl; assign ars_o_Cl = VAddrIn_Cl; wire SGF8_IU_C1; xtdelayl #(1) iSGF8_IU_Cl ( .xtin (SGF8_IU_C0) , .xtout (SGF8_IU_C1) ,
.elk (elk) ) ,- assign ars_kill_Cl = (I'bO) & (SGF8_IU_C1) ; endmodule module xmTIE_SGF8_X (
SGF8_X_C0, gr_i_Cl, ars_i_Cl, art_i_Cl,
LSSize_C0,
MemDataOut8_Cl ,
VAddrBase_Cl,
VAddrIndex_Cl ,
LSIndexed_C0, elk
) ; input SGF8_X_C0; input [7:0] gr_i_Cl ; input [31:0] ars_i_Cl; input [31:0] art_i_Cl; output [4:0] LSSize_C0; output [7:0] MemDataOut8_Cl; output [31:0] VAddrBase_Cl ; output [31:0] VAddr Index_Cl ,- output LS Indexed TO ; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = l'bl; assign VAddrIndex_Cl = art_i_Cl; assign" MemDataOut8_Cl = gr_i_Cl; endmodule module xmTIE_SGF8_XU (
SGF8JXU_C0, gr_i_Cl, ars_i_Cl , ars_o_Cl , ars_kill_Cl, art_i_Cl ,
VAddrIn_Cl ,
LSSize_C0,
MemDataOut8_Cl ,
VAddrBase_Cl,
VAddr Index_Cl,
LSIndexed_C0, elk
); input SGF8_XU_C0; input [7:0] gr_i_Cl; input [31:0] ars_i_Cl; output [31:0] ars_o_Cl ; output ars_kill_Cl; input [31:0] art_i_Cl; input [31:0] VAddrIn_Cl ; output [4:0] LSSize JCO; output [7:0] MemDataOut8_Cl; output [31:0] VAddrBase_Cl; output [31:0] VAddrIndex_Cl ,- output LSIndexedJ..0; input elk; assign LSSize_C0 = 32 'hi; assign VAddrBase_Cl = ars_i_Cl; assign LSIndexed_C0 = l'bl; assign VAddrIndex_Cl = art_i_Cl; assign MemDataOut8_Cl = gr_i_Cl; assign ars_o_Cl = VAddrIn_Cl; wire SGF8_XU_C1; xtdelayl #(1) iSGF8_XU_Cl ( .xtin (SGF8_XU_C0) , .xtout (SGF8_XU_C1) ,
.elk (elk) ) ; assign arsJ<ill_Cl = (I'bO) & (SGF8_XU_C1) ,- endmodule module xmTIEJRURO (
RUR0_C0 , arr_o_Cl , arr_kill_Cl, gfmod_jps_Cl, elk
); input RUR0_C0; output [31:0] arr_o_Cl ; output arr_kill_Cl; input [7:0] gfmod_jps_Cl; input elk; assign arr_o_Cl = {gfmod_jps_Cl} ; wire RUR0_C1; xtdelayl #(1) iRUR0_Cl ( .xtin(RUR0_C0) , .xtout (RUR0_C1) , . elk (elk)); assign arr_kill_Cl = (I'bO) & (RUR0_C1) ,- endmodule module xmTIEJrtURO (
WUR0_C0, art_i_Cl , gf od_ns_Cl , gfmod_kill_Cl,
Figure imgf000333_0001
o
H o" ?r
KillJΞ, Except JW, ReplayJW, GIWCLK, Reset output TIE_instJR; output TIE_asReadJR; output TIE_atReadJR; output TIE_atWriteJR; output TIE_arWriteJR; output TIE_asWriteJR; output TIE_aWriteMJR; output TIE_aDataKillJΞ; output [31:0] TIE_aWriteData_E; output TIE_aDataKillJM; output [31:0] TIE_aWriteDataJM; output TIEJύoadJR; output TIE_Store_R; output [4:0] TIE_LSSizeJR; output TIE_LSIndexedJR; output [31:0] TIE_LSOffsetJR; input [127:0] TIEJMemLoadDataJM; output [7:0] TIEJMemStoreData8_E; output [15:0] TIEJMemStoreDatalδJΞ; output [31:0] TIEJMemStoreData32_E; output [63:0] TIE_MemStoreData64_E; output [127:0] TIEJMemStoreDatal28_E; output TIE_Stall_R; output TIE_ExceptionJ5; output [5:0] TIEJδxcCauseJE; output TIEJosReadJR; output TIEJotReadJR; output TIEjotWriteJR; output TIEJorWriteJR; output TIEJosWriteJR; output [4:0] TIEJbsReadSizeJR; output [4:0] TIEJotReadSizeJR; output [4:0] TIEJoWriteSizeJR; input [15:0] TIE JosReadData_E ; input [15:0] TIEJotReadData JΞ ; output TIEJoWriteDatalJΞ; output [1:0] TIE_bWriteData2J3; output [3 : 0] TIE_bWriteData4_E; output [7:0] TIEJoWriteDataθJΞ; output [15:0] TIE_bWriteDatal6JΞ; output TIEJbDataKill_E; input [7:0] CPEnable; input [23:0] InstrJR; input [31:0], SBus JE ; input [31:0] TBus JΞ ; i nput [31:0] MemOp Addr JΞ ; input KillJΞ; input ExceptJW; input ReplayJW; input GIWCLK; input Reset;
// unused signals wire TMode = 0; // control signals wire KillPipeJW; wire elk;
// decoded signals wire GFADD8_C0; wire GFADD8I_C0; wire GFMULX8_C0; wire GFR MOD8_C0; wire LGF8_I_C0; wire SGF8_I_C0; wire LGF8_IU_C0; wire SGF8_IU_C0; wire LGF8_X_C0; wire SGF8JX_C0; wire LGF8_XU_C0; wire SGF8_XU_C0; wire RUR0_C0; wire WUR0_C0; wire [31:0] imm4_C0; wire [7:0] imm8_C0; wire art_use_C0 ; wire art_def_C0 ; wire ars_use_C0 wire ars_def_C0; wire arr_use_C0 ; wire arr_def_C0 wire br_use_C0 j wire br_def_C0 ; wire bs_use_C0 ; wire bs_def_C0; wire bt_use_C0 ; wire bt_def_C0; wire bs4_use_C0 ;
0;
0 ;
Figure imgf000335_0001
0; wire gf_rd2_usel_C0 ; wire gf_rd2_width8_C0; wire [3:0] gf_wd_addr_C0 ; wire gf_wd_def2_C0; wire gf_wd_def1_C0; wire gf_wd_width8_C0 ,- wire GFADD8_semantic_C0; wire GFADD8I_semantic_C0; wire GFMULX8_semantic_C0; wire GFRWMOD8_semantic_C0 ; wire LGF8_I_semantic_C0; wire LGF8_IU_semantic_C0; wire LGF8_X_semantic_C0 ; wire LGF8_XU_semantic_C0 ; wire SGF8_I_semantic_C0; wire SGF8_IU_semantic_C0 ; wire SGF8JX_semantic_C0; wire SGF8JXU_semantic_C0 ; wire RU 0_semantic_C0 ; wire WUR0_semantic_C0 ; wire load_instruction_C0; wire store_instruction_CO; wire TIE_Inst_C0; wire [23:0] Inst CO;
// state data, write-enable and stall signals wire [7:0] gfmod_ps_Cl; wire [7:0] gfmod_ns_Cl; wire gfmod_kill_Cl; wire gfmod_Stall_Cl;
// register data, write-enable and stall signals wire [31:0] AR_rdO_data_Cl; wire [31:0] AR_rdl_data_Cl ; wire [31:0] AR_wd_data32_Cl; wire AR_wd_kill_Cl; wire [7:0] gf_rdO_data_Cl; wire [7:0] gf_rdl_data_Cl ; wire [7-.0] g _rd2_data_Cl ; wire [7:0] g _wd_data8_C2 ; wire gf_wd_kill_C2; wire [7:0] gf_wd_data8_Cl ; wire gf_wdJill_Cl; wire gf_Stall_Cl;
// operands wire [31:0] art_i_Cl; wire [31:0] art_o_Cl; wire art_kill_Cl; wire [31:0] ars_i_Cl; wire [31:0] ars_o_Cl; wire ars_kill_Cl; wire [31:0] arr_o_Cl; wire arr_kill_Cl; wire [7:0] gr_i_C1 ; wire [7:0] gr_o_C2; wire gr_kill__C2; wire [7:0] gr_o_Cl ,- wire gr_kill_Cl; wire [7:0] gs_i_Cl; wire [7:0] gt i_Cl; wire [7:0] gt_o_C2; wire gt_kill_C2; wire [7:0] gt_o_Cl; wire gt_kill_Cl;
// output state of semantic GFADD8
// output interface of semantic GFADD8
// output operand of semantic GFADDS wire [7:0] GFADD8_gr_o_Cl; wire GFADD8_grJkill_Cl;
// output state of semantic GFADD8I
// output interface of semantic GFADD8I
// output operand of semantic GFADDSI wire [7:0] GFADD81_gr_o_C1; wire GFADD81_grJ-.il1_C1;
// output state of semantic GFMULXS
// output interface of semantic GFMULXS
// output operand of semantic GFMULX8 wi e [7.-0] GFMULX8_gr_o_C1; wire GFMULX8_gr_kill_Cl;
// output state of semantic GFRWMODS wire [7:0] GFRWMOD8_gfmod_ns_Cl; wire GFRWMOD8_gfmod_kill_Cl;
// output interface of semantic GFRWMODS
// output operand of semantic GFRWMODS wire [7.-0] GFRWM0D8_gt_o_Cl ; wire GFRWM0D8_gt_kill_Cl;
// output state of semantic LGF8_I
// output interface of semantic LGF8_I wire [4:0] LGF8_IJύSSize_C0; wire [31:0] LGF8_I_VAddrBase_Cl ,- wire [31:0] LGF8_I_VAddr0ffset_C0 ; wire LGF8_I_LSIndexed_C0 ;
// output operand of semantic LGF8_I wire [7:0] LGF8_I_gt_o_C2 ; wire LGF8_I_gt_kill_C2;
// output state of semantic LGF8_IU
// output interface of semantic LGF8_IU wire [4.-O] LGF8_IU_LSSize_C0; wire [31:0] LGF8_IU_VAddrBase_Cl; wire [31:0] LGF8_IU_VAddr0ffset_C0; wire LGF8_IU_LSIndexed_C0;
// output operand of semantic LGF8_IU wire [7:0] LGF8_IU_gt_0_C2 ; wire LGF8_IU_gt_kill_C2; wire [31:0] LGF8_IU_ars_o_Cl; wire LGF8_IU_ars_kill_Cl;
// output state of semantic LGF8_X
// output interface of semantic LGF8_X wire [4:0] LGF8_X_LSSize_C0; wire [31:0] LGF8_X_VAddrBase_Cl; wire [31:0] LGF8_X_VAddrIndex_Cl; wire LGF8JX_LSIndexed_C0;
// output operand of semantic LGF8_X wire [7:0] LGF8_X_gr_o_C2 ; wire LGF8_X_gr_kill_C2;
// output state of semantic LGF8J.U
// output interface of semantic LGF8JXU wire [4:0] LGF8_XU_LSSize_C0; wire [31:0] LGF8_XU_VAddrBase_Cl ; wire [31:0] LGF8_XU_VAddrIndex_Cl ; wire LGF8JXU_LSIndexed_C0 ;
// output operand of semantic LGF8JXU wire [7:0] LGF8_XU_gr_o_C2 ; wire LGF8_XU_gr_ki11_C2 ; wire [31:0] LGF8_XU_ars__o_Cl; wire LGF8_XU_ars_kill_Cl;
// output state of semantic SGF8_I
// output interface of semantic SGF8_I wire [4:0] SGF8_I_LSSize_C0; wire [7:0] SGF8_I_MemDataOut8_Cl; wire [31:0] SGF8_I_VAddrBase_Cl; wire [31:0] SGF8_I_VAddr0ffset__C0 ,- wire SGF8_IJuSIndexed_C0;
// output operand of semantic SGF8_I
// output state of semantic SGF8_IU
// output interface of semantic SGF8_IU wire [4:0] SGF8_IU_LSSize_C0 ; wire [7:0] SGF8_IU_MemDataOut8_Cl; wire [31:0] SGF8_IU_VAddrBase_Cl ; wire [31:0] SGF8_IU_VAddr0ffset_C0; wire SGF8_IU_LSIndexed_C0;
// output operand of semantic SGF8_IU wire [31:0] SGF8_IU_ars_o_Cl; wire SGF8_IU_ars_kill_Cl;
// output state of semantic SGF8JX
// output interface of semantic SGF8__X wire [4:0] SGF8_X_LSSize_C0; wire [7:0] SGF8 X MemDataOutβ Cl; wire [31:0] SGF8_X_VAddrBase_Cl; wire [31:0] SGF8_X_VAddrIndex_Cl; wire SGF8JX_LSIndexed_C0;
// output operand of semantic SGF8J
// output state of semantic SGF8JU
// output interface of semantic SGF8JXU wire [4:0] SGF8_XU_LSSize_C0; wire [7:0] SGF8_XUJMemDataOut8__Cl; wire [31:0] SGF8_XU_VAddrBase_Cl; wire [31:0] SGF8_XU_VAddrIndex_Cl; wire SGF8_XU_LSIndexed_C0;
// output operand of semantic SGF8_XU wire [31:0] SGF8_XU_ars_o_Cl; wire SGF8_XU_ars_kill_Cl;
// output state of semantic RURO
// output interface of semantic RURO
// output operand of semantic RURO wire [31:0] RUR0_arr_o_Cl; wire RUR0_arr_kill_Cl;
// output state of semantic WURO wire [7:0] WUR0_gfmod_ns_Cl ; wire WUR0_gfmod_kill_Cl;
// output interface of semantic WURO
// output operand of semantic WURO
// TIE-defined interface signals wire 31:0] VAddr_Cl; wire 31:0] VAddrBase_Cl; wire 31:0] VAddrOffset_C0; wire 31:0] VAddrIndex_Cl ,- wire 31:0] VAddrIn_Cl; wire 4:0] LSSize CO; wire LSIndexed CO; wire 127:0] MemDataInl28_C2 ; wire 63:0] MemDataIn64_C2 ; wire 31:0] MemDataIn32_C2j wire 15:0] MemDataInl6_C2 ; wire 7:0] MemDataIn8_C2 ; wire 127:0] MemDataOutl28_Cl; wire 63:0] MemDataOut64_Cl; wire 31:0] MemDataOut32_Cl; wire 15:0] MemDataOutl6_Cl; wire 7:0] MemDataOut8 Cl; wire Exception_Cl; wire [5:0] ExcCause_Cl; wire [7:0] CPEnable_Cl; xtflop #(1) reset (localReset, Reset, GIWCLK); xmTIE_decoder TIE_decoder ( ..GFADD8 (GFADD8 CO) , -GFADD8I (GFADD8I_C0) ,
-GFMULX8 (GFMULX8_C0) ,
.GFR MOD 8 ( GFRWMOD 8_C0) ,
.LGF8_I (LGF8_I_C0) ,
.SGF8_I (SGF8_I_C0) ,
.LGF8_IU(LGF8_IU_C0) ,
.SGF8_IU(SGF8_IU_C0) ,
. LGF 8 JC ( GF 8_X_C 0 ) ,
. SGF8_ (SGF8_X_C0) ,
. LGF 8 JOJ ( LGF 8 _XU_C 0 ) ,
. SGF8_XU ( SGF8_XU_C0 ) ,
-RURO (RUR0_C0) ,
.WURO ( UR 0_C0) ,
. imm4 (imm4_C0) ,
. imm8 (imm8_C0) ,
.art_use (art_use_C0) ,
.art_def (art_def_C0) ,
.ars_use (ars_use_C0) ,
.ars_def (ars_def_C0) ,
.arr_use (arr_use_C0) ,
.arr_def (arr_def_C0) ,
.br_use (br_use_C0) ,
.br_def (br_def_C0) ,
.bs_use (bs_use_C0) ,
.bs_def (bs_def_C0) ,
.bt_use (bt_use_C0) ,
.bt_def (bt_def_C0) ,
.bs4_use (bs4_use_C0) ,
.bs4_def (bs4_def_C0) ,
.bs8_use (bs8_use_C0) ,
.bs8_def (bs8_def_C0) ,
.gr_use (gr_use_C0) ,
.gr_def (gr_def_C0) ,
. gs_use (gs_use_C0) ,
.gs_def (gs_def_C0) ,
.gt_use (gt_use_C0) ,
.gt_def (gt_def_C0) ,
.gfmod_usel (gfmod_usel_CO) ,
.gfmod_def 1 (gfmod_defl_CO) ,
.AR_rdO_usel (AR_rdO_usel_CO) ,
-AR_rdO_width32 (AR_rd0_width32_C0) ,
.AR_rdl_usel (AR__rdl_usel_CO) ,
.AR_rdl_width32 (AR_rdl_width32_C0) ,
. AR_wd_de f 1 ( AR_wd_de f 1_C 0 ) ,
.AR_wd_width32 (AR_wd_width32_C0) ,
. gf _rdO_addr (gf _rd0_addr_C0 ) ,
.gf_rdO_usel (gf_rdO_usel_CO) ,
.gf_rd0_width8 (gf_rdO_width8_C0) ,
.gf_rdl_addr (gf_rdl_addr_CO) ,
.gf_rdl_usel (gf_rdl_usel_CO) ,
. gf _rdl_widt 8 (gf_rdl_width8_C0) ,
.gf_rd2_addr(gf_rd2_addr_C0) ,
.gf_rd2_usel (gf_rd2_usel_C0) ,
. gf_rd2_width8 (gf_rd2_width8_C0) ,
. gf wd_addr (gf_wd_addr_CO) ,
. gf_wd_def2 (gf_wd_def2_C0) ,
.gf_wd_def1 (gf_wd_def1JCO) ,
. gf_wd_width8 (gf_wd_width8_C0) ,
. GFADD8_semantic (GFADD8_semantic_CO) ,
.GFADD8I semantic (GFADD8I semantic CO) .GFMULX8_semantic (GFMULX8_semantic_C0). , .GFRWMOD8_semantic (GFRWMOD8_semantic_C0) , . LGF8_I_semantic (LGF8_I_semantic_C0 ) , .LGF8_IU_semantic ( GF8_IU_semantic_C0) , .LGF8_X_semantic (LGF8JX_semantic_C0) , .LGF8JXU_semantic (LGF8JXU_semantic_C0) , . SGF8_I_semantic (SGF8_I_semantic_C0) , .SGF8_IU_semantic (SGF8_IU_semantic_C0) , .SGF8_X_semantic (SGF8_X_semantic_C0) , .SGF8_XU_semantic (SGF8_XU_semantic_C0) , .RUR0_semantic (RUR0_semantic_C0) , .WUR0_semantic (WUR0_semantic_C0) , . load_instruction (load_instruction_CO ) , . store_instruction(store_instruction_CO) , .TIE_Inst (TIE_Inst_C0) , .Inst (Inst CO)
); xmTIE_GFADD8 TIE_GFADD8 (
.GFADD8_C0 (GFADD8_C0) , .gr_o_Cl(GFADD8_gr_o_Cl) , .grjH_ill_cl (GFADD8_gr_kill_Cl) , .gs_i_Cl (gs_i_Cl) , •gt_i_Cl(gt_i_Cl) , .elk (elk) ) ; xmTIE_GFADD8I TIE_GFADD8I (
.GFADD8I_C0 (GFADD8I_C0) , .gr_o_Cl(GFADD8I_gr_o_Cl) , .gr_kill_Cl (GFADD8I_gr_kill_Cl) , .gs_i_Cl (gs_i_Cl) , . imm4_C0 (imm4_C0) , .elk (elk)) ; xmTIE_GFMULX8 TIE_GFMULX8 (
.GFMULX8_C0 (GFMULX8_C0) , .gr_o_Cl(GFMULX8_gr_o_Cl) , .gr_kill_Cl (GFMULX8_gr_kill_Cl) , .gs_i_Cl (gs_i_Cl) , . gfmod_ps_Cl (gfmod_ps_Cl) , .elk (elk) ) ; xmTIE_GFRWMOD8 TIE_GFRWMOD8 (
. GFRWMOD8_C0 (GFRWMOD8_C0) , -gt_i_Cl(gt_i_Cl) , .gt_o_Cl(GFRWMOD8_gt_o_Cl) , .gtJill_Cl (GFRWMOD8_gt_kill_Cl) , . gfmod__ps_Cl (gfmod_ps_Cl) , .gfmod_ns_Cl (GFRWMOD8_gfmod_ns_Cl) , . gfmod_kill_Cl (GFRWMOD8_gfmodJill_Cl) .elk (elk) ) ,- xmTIE_LGF8_I TIE_LGF8_I (
.LGF8_I_C0 (LGF8_I_C0) ,
.gtJo_C2 (LGF8_I_gt_o_C2) ,
.gt_kill_C2 (LGF8_I_gt_kill_C2) ,
. ars_i_Cl (ars_i_Cl) ,
. imm8_C0 (imm8_C0) ,
. MemDataIn8_C2 (MemDataIn8_C2) ,
.LSSize_C0 (LGF8_I_LSSize_C0) , .VAddrBasejCl (LGF8_I_VAddrBase_Cl) , .VAddrOffset_C0 (LGF8_I_VAddrOffset_C0) , .LSIndexed_C0 (LGF8_I_LSIndexed_C0) , .elk (elk)) ; xmTIE_LGF8_IU TIE_LGF8_IU (
.LGF8_IU_C0 (LGF8_IU_C0) ,
.gt_0_C2 (LGF8_IU_gt_0_C2) ,
.gt_kill__C2 (LGF8_IU_gt_kill_C2) ,
. ars_i_Cl (ars_i_Cl) ,
.ars_o_Cl (LGF8_IU_ars_o_Cl) ,
.ars_kill_Cl (LGF8_IU_ars_kill_Cl) ,
.imm8_C0 (imm8_C0) ,
.MemDataIn8_C2 (MemDataIn8_C2) ,
.VAddrInj_:i (VAddrIn_Cl) ,
.LSSize_C0 (LGF8_IU_LSSize_C0) ,
.VAddrBaseJCI (LGF8_IU_VAddrBase_Cl) ,
.VAddrOffset_C0 (LGF8_IU_VAddrOffset_C0) ,
.LSIndexed_C0 (LGF8_IU_LSIndexed_C0) ,
.elk (elk)) ; xmTIE_LGF8JX TIE_LGF8_X(
.LGF8J_C0 (LGF8JX_C0) , . gr_o_C2 (LGF8_X_gr_o_C2) , .grJill_C2 (LGF8_X_grJill_C2) , .ars_i_Cl (ars_i_Cl) , .art_i_Cl (art_i_Cl) , .MemDataIn8_C2 (MemDataIn8_C2) , .VAddrln Cl ( AddrIn_Cl) , .LSSize_C0 (LGF8JX_LSSize_C0) , .VAddrBase_Cl (LGF8_X_VAddrBase_Cl) , . VAddrIndex_Cl (LGF8_X_VAddrIndex_Cl) , .LSIndexed_C0 (LGF8_X_LSIndexed__C0) , .elk (elk) ) ; xmTIE_LGF8JXU TIE_LGF8_XU(
.LGF8JXU_C0 (LGF8_XU_C0) ,
.gr_o_C2 (LGF8_XU_gr_o_C2) ,
.gr_kill_C2 (LGF8_XU_gr_kill_C2) ,
.ars_i_Cl (ars_i__Cl) ,
.ars_o_Cl (LGF8_XU_ars_o_Cl) ,
.ars_kill_Cl (LGF8_XU_ars_kill_Cl) ,
.art_i_Cl (art_i_Cl) ,
.MemDataIn8_C2 (MemDataIn8_C2) ,
.VAddrIn_Cl (VAddrIn_Cl) ,
.LSSize_C0 (LGF8_XU_LSSize_C0) ,
. VAddrBase_Cl (LGF8_XU_VAddrBase_Cl) ,
.VAddrIndex_Cl (LGF8_XU_VAddrIndex_Cl) ,
.LSIndexed_C0 (LGF8_XU_LSIndexed_C0) ,
.elk (elk)) ,- xmTIE_SGF8_I TIE_SGF8_I (
.SGF8_I_C0 (SGF8_I_C0) , -gt_i_Cl(gt_i_Cl) , .ars_i_Cl (ars_i_Cl) , .imm8_C0 (imm8_C0) , .LSSize_C0 (SGF8_I_LSSize_C0) , .MemDataOut8_Cl (SGF8_IJMemDataθut8_Cl) , .VAddrBase_Cl (SGF8_I_VAddrBase_Cl) , . VAddrOffset_C0 (SGF8_I_VAddrOffset_C0) , .LSIndexedjCO (SGF8_I_LSIndexed_C0) , .elk (elk)) ,- xmTIE_SGF8_IU TIE_SGF8_IU(
.SGF8_IU_C0 (SGF8_IU_C0) ,
.gt_i_Cl(gt_i_Cl) ,
.ars_i_Cl (ars_i_Cl) ,
. ars_o_Cl (SGF8_IU_ars_o_Cl) ,
.ars_kill_Cl (SGF8_IU_ars_kill_Cl) ,
. imm8_C0 (imm8_C0) ,
.VAddrIn_Cl (VAddrIn_Cl) ,
.LSSize_C0 (SGF8_IU_LSSize_C0) ,
.MemDataOut8_Cl (SGF8_IUJMemDataOut8_Cl) ,
.VAddrBase_Cl (SGF8_IU_VAddrBase_Cl) ,
. VAddrOffset_C0 (SGF8_IU_VAddrOffset_C0) ,
.LSIndexed_C0 (SGF8_IU_LSIndexed_C0) ,
.elk (elk)) ,- xmTIE_SGF8_X TIΞ_SGF8_X (
.SGF8_X_C0 (SGF8_X_C0) , .gr_i_Cl (gr_i_Cl) , .ars_i_Cl (ars_i_Cl) , .art_i_Cl (art_i_Cl) , .LSSize_C0 (SGF8_X_LSSize_C0) , .MemDataOut8_Cl (SGF8_X_MemDataOut8_Cl) , .VAddrBase_Cl (SGF8_X_VAddrB se_Cl) , .VAddrIndex_Cl (SGF8_X_VAddrIndex_Cl) , .LSIndexed_C0 (SGF8JX_LSIndexed_C0) , .elk (elk) ) ,- xmTIE_SGF8_XU TIE_SGF8_XU(
.SGF8_XU_C0 (SGF8_XU_C0) , . gr_i_Cl (gr_i_Cl) , .ars_i_Cl (ars_i_Cl) , .ars_o_Cl (SGF8_XU_ars_o_Cl) , .ars_kill_Cl (SGF8_XU_arsJill_Cl) , .art_i_Cl (art_i_Cl) , .VAddrIn_Cl(VAddrIn_Cl) , .LSSize_C0 (SGF8_XU_LSSize_C0) , .MemDataOut8_Cl (SGF8_XUJMemDataOut8_Cl) , .VAddrBase_Cl (SGF8_XU_VAddrBase_Cl) , .VAddrIndex_Cl (SGF8_XU_VAddrIndex_Cl) , .LSIndexed_C0 (SGF8_XU_LSIndexed_C0) , .elk (elk)) ; xmTIEJRURO TIEJRURO (
.RUR0_C0 (RUR0_C0) , .arr_o_Cl (RURO_arr_o_Cl) , .arr_kill_Cl (RURO_arrJill_Cl) , . gfmod_ps_Cl (gfmod_ps_Cl) , .elk (elk)) ; xmTIE_WURO TIE_WUR0 (
.WUR0_C0 (WUR0_C0) , .art_i_Cl (art_i_Cl) , .gfmod_ns_Cl (WURO_gfmod_ns_Cl) , .gfmod_kill_Cl (WURO_gfmodJill__Cl) , .elk (elk) ) ; xmTIE_gfmod_State TIE_gfmod_State ( .ps_width8_C0 (1 'bl) , .ps_usel_C0 (gfmod_usel_C0) , .ps_data_Cl (gfmod_ps_Cl) , .ns_widt 8_C0 (l'bl) , .ns_def1_C0 (gfmod_def1_C0) , .ns_data8_Cl (gfmod_ns_Cl) , .ns_wen_Cl (~gfmod_kill_Cl) , . Killja (KillJΞ) , .KillPipeJW (KillPipeJW) , . StallJR (gfmod__Stall_Cl) , .clk(clk)
); xmTIE_gfJRegfile TIE_gfJRegfile (
.rdO_addr_CO (gf_rdO_addr_CO) ,
.rdO_usel_CO (gf_rdO_usel_CO) ,
.rdO_data_Cl (gf_rdO_data_Cl) ,
. rd0_width8_C0 (gf_rd0_width8_C0 ) ,
.rdl_addr_C0 (gf_rdl_addr_CO) ,
. rdl_usel_CO (gf_rdl_usel_CO) ,
. rdl_data_Cl (gf_rdl_data_Cl) ,
.rdl_width8_C0 (gf_rdl_width8_CO) ,
.rd2_addr_C0 (gf_rd2_addr_C0) ,
.rd2_usel_C0 (gf_rd2_usel_C0) ,
.rd2_data_Cl (gf_rd2_data_Cl) ,
.rd2_width8_C0 (gf_rd2_width8_C0) ,
.wd_addr_CO (gf_wd_addr_CO) ,
.wd_def2_CO (gf_wd_def2_C0) ,
.wd_wen_C2 (~gf__wd_kill_C2) ,
.wd_data8_C2 (gf_wd_data8_C2) ,
.wd_defl_C0 (gf_wd_def1_C0) ,
.wd_wen_Cl (~gf_wd_kill_Cl) ,
.wd_data8_Cl (gf_wd_data8_Cl) ,
.wd_width8_C0 (gf_wd_width8_C0) ,
.KillJΞ (KillJE) ,
.KillPipeJW (KillPipeJW) ,
. StallJR (gf_Stall_Cl) ,
.clk(clk) );
// Stall logic assign TIE_Stall_R = I'bO
I gf_Stall_Cl j gfmod_Stall_Cl;
// pipeline semantic select signals to each stage wire LGF8_I_semantic_Cl; xtdelayl # (1) iLGF8_I_semantic_Cl ( .xtin (LGF8_I_semantic_C0) , .xtout (LGF8_I_semantic_Cl) , . elk (elk) ) ,- wire LGF8_IU_semantic_Cl; xtdelayl # (1) iLGF8_IU_semantic_Cl ( .xtin (LGF8_IU_semantic_C0) , .xtout (LGF8_IU_semantic_Cl) , . elk (elk) ) ; wire LGF8_X_semantic_Cl; xtdelayl #(1) iLGF8_X_semantic_Cl ( .xtin(LGF8_X__semantic_C0) , .xtout (LGF8_X_semantic_Cl) , . elk (elk) ) ,- wire LGF8_XU_semantic_Cl; xtdelayl # (1) iLGF8_XU_semantic_Cl ( .xtin (LGF8_XU_semantic_C0) , .xtout (LGF8_XU_semantic_Cl) , . elk (elk) ) ,- wire SGF8 I semantic Cl; xtdelayl #(1) iSGF8_I_semantic__Cl ( .xtin(SGF8_I_semantic_C0) ,
.xtout (SGF8_I_semantic_Cl) , .elk (elk) ) ; wire SGF8_IU_semantic_Cl; xtdelayl #(1) iSGF8_IU_semantic_Cl ( .xtin(SGF8_IU_semantic_C0) ,
.xtout (SGF8_IU_semantic_Cl) , .elk (elk) ) ; wire SGF8_X_semantic_Cl; xtdelayl #(1) iSGF8_X_semantic_Cl ( .xtin(SGF8_X_semantic_C0) ,
.xtout (SGF8JX_semantic_Cl) , .clk(clk) ) ,- wire SGF8_XU_semantic_C1,- xtdelayl # (1) iSGFSJXU_semantic_Cl ( .xtin (SGF8_XU_semantic_C0) ,
.xtout (SGF8JXU_semantic_Cl) , . elk (elk) ) ,- wire GFRWMOD8_semantic_C1; xtdelayl # (1) iGFRWMOD8_semantic_Cl ( .xtin (GFRWMOD8_semantic_C0) ,
.xtout (GFRWMOD8_semantic_Cl) , . elk (elk) ) ,- wire WURO_semantic_Cl; xtdelayl #(1) iWURO_semantic_Cl ( .xtin(WUR0_semantic_C0) ,
.xtout (WUR0_semantic_Cl) , .elk (elk) ) ; wire RURO_semantic_Cl; xtdelayl #(1) iRURO_semantic_Cl ( .xtin(RURO_semantic_CO) ,
.xtout (RURO_semantic_Cl) , .elk (elk) ) ; wire LGF8_X_semantic_C2 ; xtdelay2 # (1) iLGF8_X_semantic_C2 ( .xtin (LGF8_X_semantic_C0) ,
.xtout (LGF8JX_semantic_C2) , .elk (elk) ) ; wire LGF8_XU_semantic_C2 ; xtdelay2 #(1) iLGF8_XU_semantic_C2 ( .xtin(LGF8_XU_semantic_C0) ,
.xtout (LGF8_XU_semantic_C2) , .elk (elk) ) ; wire GFADD8_semantic_Cl; xtdelayl # (1) iGFADD8_semantic_Cl ( .xtin (GFADD8_semantic_C0) ,
.xtout (GFADD8_semantic_Cl) , .elk (elk) ) ; wire GFADD8I_semantic_Cl; xtdelayl #(1) iGFADD81_semantic_Cl ( .xtin(GFADD8I_semantic_C0) ,
.xtout (GFADD8I_semantic_Cl) , .elk(elk) ) ; wire GFMULX8_semantic_Cl; xtdelayl # (1) iGFMULX8_semantic_Cl ( .xtin (GFMULX8_semantic_C0) ,
.xtout (GFMULX8_semantic_Cl) , .elk (elk) ) ,- wire LGF8_I_semantic_C2; xtdelay2 #(1) iLGF8_I_semantic_C2 ( .xtin (LGF8_I_semantic_C0) ,
.xtout (LGF8_I_semantic_C2) , . elk (elk) ) ,- wire LGF8_IU_semantic_C2 ; xtdelay2 #(1) iLGF8_IU_semantic_C2 ( .xtin (LGF8_IU_semantic_C0) ,
.xtout (LGF8JIU_semantic_C2) , .clk(clk) ) ;
// combine output interface signals from all semantics assign VAddr_Cl = 32 'bO; assign VAddrBase_Cl = 32' bO
I (LGF8_I_VAddrBase_Cl & {32 {LGF8_I_semantic_Cl} } ) I (LGF8_IU_VAddrBase_Cl & {32{LGF8_IU_semantic_Cl} }) j (LGF8_X_VAddrBase_Cl & {32{LGF8_X_semantic_Cl} }) j (LGF8_XU_VAddrBase_Cl & {32{LGF8_XU_semantic_Cl} }) j (SGF8_I_VAddrBase_Cl & {32{SGF8_I_semantic_Cl} } ) j (SGF8_IU_VAddrBase_Cl & {32{SGF8_IU_semantic_Cl} }) j (SGF8_X_VAddrBase_Cl & {32{SGF8_X_semantic_Cl} }) j (SGF8_XU_VAddrBase_Cl & {32 {SGF8_XU_semantic_Cl} }) ; assign VAddrOffset_C0 = 32 'bO
" -' I (LGF8_I_VAddrOffset_C0 & {32 {LGF8_I_semantic__Cθ} }) I (LGF8_IU_VAddrOffset_C0 & {32 {LGF8_IU_semantic_Cθ} } ) j (SGF8_I_VAddrOffset_C0 & {32 {SGF8_I_semantic__Cθ} } ) I (SGF8_IU_VAddrOffset_C0 & {32{SGF8_IU_semantic_Cθ} } ) ; assign VAddrIndex_Cl = 32 'bO
I (LGF8_X_VAddrIndex_Cl & {32{LGF8_X_semantic_Cl} }) I (LGF8_XU_VAddrIndex_Cl & {32{LGF8_XU_semantic_Cl} }) j (SGF8JX_VAddrIndex_Cl & {32{SGF8JX_semantic_Cl} } ) I (SGF8_XU_VAddrIndex_Cl &. {32{SGF8JXU_semantic_Cl} }) ; assign LSSize_C0 = 5'bO
I (LGF8_I_LSSize_C0 & {5 {LGF8_I_semantic_C0 } } ) I (LGF8_IU_LSSize_C0 & {5 {LGF8_IU_semantic_C0 } } ) j (LGF8_X_LSSizeJCO & {5 {LGF8_X_semantic_C0 } } ) ] (LGF8_XU_LSSize_C0 & {5 {LGF8_XU_semantic_C0 } } ) I . (SGF8_I_LSSize_C0 & {5 {SGF8_I_semantic_Cθ} } ) I (SGF8_IU_LSSize_C0 & {5 {SGF8_IU__semantic_C0 } } ) j (SGF8_X_LSSize_C0 & {5 {SGF8_X_semantic_C0 } } ) j (SGF8_XU_LSSize_C0 & {5 {SGF8_XU_semantic_C0 } } ) ; assign LSIndexed_CO = I'bO
I (LGF8_I_LSIndexed_C0 & LGF8_I_semantic_C0) j (LGF8_IU_LSIndexed_C0 & LGF8_IU_semantic_C0) j (LGF8JX_LSIndexed_C0 & LGF8_X_semantic_C0) I (LGF8_XU_LSIndexed_C0 & LGF8_XU_semantic_C0) j (SGF8_I_LSIndexed_C0 & SGF8_I_semantic_C0) j (SGF8_IU_LSIndexed_C0 & SGF8_IU__semantic_C0) j (SGF8_X_LSIndexed_C0 & SGF8_X_semantic_C0) I (SGF8_XU_LSIndexed_C0 & SGF8_XU_semantic_C0) ; assign MemData0utl28_Cl = 128 'bO; assign MemData0ut64_Cl = 64 'bO; assign MemData0ut32_Cl = 32' bO; assign MemData0utl6_Cl = 16 'bO; assign MemDataOut8_Cl = 8'bO
I (SGF8_I_MemDataOut8_Cl & {8{SGF8_I_semantic_Cl} }) j (SGF8_IUJMemDataOut8_Cl & {s{SGF8_IU_semantic_Cl} }) j (SGF8_XJMemDataOut8_Cl & {8{SGF8JX_semantic_Cl} } ) I (SGF8JXUJMemDataOut8_Cl & {8{SGF8JXU_semantic_Cl} }) assign Exception_Cl = I'bO; assign ExcCause_Cl = 6'bO;
// combine output state signals from all semantics assign gfmod_ns_Cl = 8'bO
I (GFRWMOD8_gfmod_ns_Cl & {8{GFRWMOD8_semantic_Cl} } ) j (WUR0_gfmod_ns_Cl & {8 {WUR0_semantic_Cl} } ) ; assign gfmod_kill_Cl = I'bO
I (GFRWMOD8_gfmod_kill_Cl & GFRWMOD8_semantic_Cl) I (WUR0_gfmod_kill_Cl & WUR0_semantic_Cl) ,-
// combine output operand signals from all semantics assign art_o_Cl = 32 'bO; assign art_kill_Cl = I'bO; assign ars_o_Cl = 32 'bO
I (LGF8_IU_ars_o_Cl & {32{LGF8_IU_semantic_Cl} } ) j (LGF8_XU_ars_o_Cl & {32 {LGF8JXU_semantic_Cl} } ) j (SGF8_IU_ars_o_Cl & {32{SGF8_IU_semantic_Cl} } ) I (SGF8_XU_ars_o_Cl & {32{SGF8_XU_semantic_Cl} } ) ; assign ars_kill_Cl = I'bO
I (LGF8_IU_ars_kill_Cl & LGF8_IU_semantic_Cl) I (LGF8_XU_ars_kill_Cl & LGF8_XU_semantic_Cl) j (SGF8_IU_arsJill_Cl & SGF8_IU_semantic_Cl) j (SGF8_XU_ars_kill_Cl & SGF8_XU_semantic_Cl) ; assign' arr_o_Cl = 32 'bO
I (RUR0_arr_o_Cl & { 32 {RUR0_semantic_Cl } } ) ; assign arr_kill_Cl = I ' bO
I (RURO_arrJkill_Cl & RURO_semantic_Cl ) ; assign gr_o_C2 = 8 ' bO
I (LGF8_X_gr_o_C2 & { 8 {LGF8JX_semantic_C2 } } ) I (LGF8_XU_gr_o_C2 & { 8 {LGF8_XU_semantic_C2 } } ) ,- assign gr_kill_C2 = I'bO
I (LGF8_X_gr_kill_C2 & LGF8JX_semantic_C2) j (LGF8JXU_grJkill_C2 & LGF8_XU_semantic_C2) ,- assign gr_o_Cl = 8'bO
I (GFADD8_gr_o_Cl & {8{GFADD8_semantic_Cl}}) j (GFADD81_gr_o_C1 & {8 {GFADD81_semantic_C1} } ) j (GFMULX8_gr_o_Cl & { 8 {GFMULX8_semantic_Cl} } ) ; assign gr_kill_Cl = I'bO
I (GFADD8_gr_kill_Cl & GFADD8_semantic_Cl)
I (GFADD8I_gr_kill_Cl & GFADD8I_semantic_Cl) j (GFMULX8_gr_kill_Cl & GFMULX8_semantic_Cl) ; assign gt_o_C2 = 8'bO
I (LGF8_I_gt_o_C2 Sc {8{LGF8_I__semantic_C2}})
I (LGF8_IU_gt_o_C2 & {8{LGF8_IU_semantic_C2} }) ; assign gt_kill_C2 = I'bO
I (LGF8_I_gt_kill_C2 & L.GF8_I_semantic_C2) j (LGF8_IU_gt_kill_C2 & LGF8_IU_semantic_C2) ; assign gt_o_Cl = 8'bO
I (GFRWMOD8_gt_o_Cl & { 8 {GFRWMOD8_semantic_Cl} } ) ; assign gt_kill_Cl = I'bO
I (GFRWMOD8_gt_kill_Cl &■ GFRWMOD8_semantic_Cl) ;
// output operand to write port mapping logic assign AR_wd_data32_Cl = ars_o_Cl | arr_o_Cl | 32 'bO; assign AR_wd_kill_Cl = ars_kill_Cl | arr_kill_Cl | I'bO; assign gf_wd_data8_C2 = gt_o_C2 | gr_o_C2 | 8'bO; assign gf_wd_kill_C2 = gt_kill_C2 | gr_kill_C2 ] I'bO; assign gf_wd_data8_Cl = gr_o_Cl | gt_o_Cl | 8'bO; assign gf_wd_kill_Cl = gr_kill_Cl | gt_kill_Cl | I'bO;
// read port to input operand mapping logic assign ars_i_Cl = AR_rdO_data_Cl; assign art_i_Cl = AR_rdl_data_Cl; assign gs_i_Cl = gf_rdO_data_Cl; assign gt_i_Cl = gf_rdl__data_Cl ; assign gr_i_Cl = gf_rd2_data_Cl ;
// logic to support verification wire ignoreJTIE__aWriteData_E = - (AR_wd_def1_C0 & (TIE_arWrite_R |
TIE_asWriteJR | TIE_atWriteJR) & ~TIE_aDataKillJΞ) ,- wire ignoreJTIE_aWriteData_M = -(I'bO & (TIE_arWrite_R ] TIE_asWriteJR |
TIE_atWriteJR) & ~TIE_aDataKillJM) ; wire ignoreJTIEJoWriteDataJΞ = (-TIEjotWriteJR & -TIEjotWriteJR) |
TIEJoDataKillJΞ; wire ignoreJTIEJoWriteDatalδJΞ = ignore_TIE_bWriteDataJΞ; wire ignore_TIEJbWriteData8JS = ignoreJTIEJbWriteDataJΞ; wire ignoreJI E_bWriteData4JE = ignore TIE bWriteDataJS; wire ignore_TIE_bWriteData2JΞ = ignore_TIE_bWriteDataJΞ; wire ignoreJTIE_bWriteDatal_E = ignoreJTIEJbWriteDataJΞ; wire ignoreJTIEJSSizeJR = -TIEJύoadJR & -TIE_StoreJR; wire ignoreJTIE_LSIndexedJR = -TIE_LoadJR & -TIE_Store_R; wire ignore_TIE_LSOffsetJR = -TIEJLoadJR & ~TIE_Store_R | TIEJuSIndexedJR; wire ignoreJTIE_MemStoreDatal28_E = (TIE_LSSizeJR != 5'bl0000) |
~TIE_StoreJR; wire ignoreJTIEJMemStoreData64JE = (TIEJLSSizeJR != 5'b01000) |
~TIE_StoreJR; wire ignorejriEJMemStoreData32_E = (TIEJLSSizeJR != 5'b00100) |
~TIE_Store R; wire ignoreJTIEJMemStoreDatal6_E = (TIEjLSSize_R != 5'bOOOlO)
~TIE_Store_R; wire ignore_TIEJMemStoreData8JΞ = (TIE_LSSize_R ! = 5'bOOOOl) |
-TIE Store R;
// clock and instructions assign elk = GIWCLK; assign Inst_C0 = InstrJR; assign TIE_instJR = TIE_Inst_C0;
// AR-related signals to/from core assign TIE_asRead_R = ars_use_C0; assign TIE_atReadJR = art_use_C0; assign TIE_atWriteJR = art_def_C0; assign TIE_arWritejR = arr_def_C0; assign TIE asWriteJR = ars_def_C0; assign TIE_aWriteMJR = 0; assign TIE_aWriteDataJΞ = ignore_TIE_aWriteDataJΞ ? 0 AR_wd_data32_Cl ; assign TIE_aWriteData_M = ignore_TIE_aWriteData_M ? 0 0; assign TIE_aDataKillJΞ = AR_wd_kill_Cl; assign TIE_aDataKill_M = 0; assign AR_rdO_data_Cl = SBusJΞ; assign AR_rdl_data_Cl = TBusJΞ;
// BR-related signals to/from core assign TIEJbsReadJR = I'bO bs_use_C0 bs4 use CO I bs8 use CO; assign TIEJotReadJR = I'bO bt_use_C0; assign TIEjotWriteJR = I'bO bt_def_C0 ; assign TIEJosWriteJR = I'bO bs_def_C0 I bs4_def_C0 | bs8_def_C0; assign TIEJorWriteJR = I'bO br def CO; assign TIE_bWriteDatal6JΞ = ignore_TIE_bWriteDatal6jE ? C assign TIE_bWriteData8jE = ignore_TIE_bWriteData8JΞ ? 0 assign TIE_bWriteData4JΞ = ignore_TIE_bWriteData4JE ? 0 assign TIE_bWriteData2jE = ignore_TIE_bWriteData2jE ? 0 assign TIE_bWriteDataljE = ignore_TIE_bWriteDataljE ? 0 assign TIE_bDataKilljE = 0; assign TIE_bWriteSizejR = {l'bO, I'bO, I'bO, I'bO, l'bθ}; assign TIE_bsReadSize_R = {l'bO, I'bO, I'bO, I'bO, I'bO}; assign TIEJbtReadSize _R = {l'bO, I'bO, I'bO, I'bO, I'bO};
// Load/store signals to/from core assign TIE_LoadjR = load_instruction_C0; assign TIE_StoreJR = store_instruction_C0; assign TIEJLSSizeJR = ignore_TIE_LSSizeJR ? 0 : LSSize_C0; assign TIEJuSIndexedJR = ignore_TIE_LSIndexedJR ? 0 : LSIndexed_C0; assign TIE_LSOffsetjR = ignore_TIE_LSOffsetJR ? 0 : VAddrOffset CO; assign TIEJMemStoreDatal28JΞ = ignoreJTIE_MemStoreDatal28JΞ
MemDataOutl28_Cl; assign TIE_MemStoreData64_E ignoreJTIEJMemStoreData64_E ? 0
MemDataOut64_Cl ; assign TIE_MemStoreData32_E ignore_TIEJMemStoreData32JΞ
MemDataOut32_Cl ; assign TIEJMemStoreDatalβJΞ ignoreJTIE_MemStoreDatal6JE ? 0
MemDataOutl6_Cl ; assign TlEJMemStoreDataθJΞ = ignore_TIE_MemStoreData8JE ?
MemDataOut8_Cl ; assign MemDataInl28_C2 = TIEJMemLoadDataJM; assign MemDataIn64_C2 = TIEJMemLoadData_M assign MemDataIn32_C2 = TIEJMemLoadData_M assign MemDataInl6 C2 = TIE MemLoadData M assign MemDataIn8_C2 = TIEJMemLoadDataJM; assign VAddrIn_Cl = MemOpAddrjS;
// CPEnable and control signals to/from core assign CPEnable_Cl = CPEnable; assign TIEJΞxceptionJΞ = Exception_Cl; assign TIEJΞxcCauseJΞ = ExcCause_Cl; assign KillPipeJW = Except_W | ReplayJW; endmodule module xtdelayl (xtout, xtin, elk); parameter size = 1; output [size-1 :0] xtout; input [size-1 :0] xtin; input clk; assign xtout = xtin; endmodule module xtdelay2 (xtout, xtin, elk) ; parameter size = 1; output [size-1:0] xtout; input [size-1 :0] xtin; input elk; assign xtout = xtin; endmodule module xtRFenlatch (xtRFenlatchout, xtin,xten, elk) ; parameter size = 32; output [size-1 :0] xtRFenlatchout; input [size-1 :0] xtin; input xten; input elk; reg [size-1 :0] xtRFenlatchout; always @(clk or xten or xtin or xtRFenlatchout) begin if (elk) begin xtRFenlatchout <= #1 (xten) ? xtin : xtRFenlatchout; end end endmodule module xtRFlatch (xtRFlatchout,xtin, elk) ; parameter size = 32; output [size-1 :0] xtRFlatchout; input [size-1 :0] xtin; input elk; reg [size-1 :0] xtRFlatchout; always tS(clk or xtin) begin if (elk) begin xtRFlatchout <= #1 xtin; end end endmodule module xtadd (xtout, a, b) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; assign xtout = a + b; endmodule module xtaddc(sum, carry, a, b, c) ; parameter size = 32; output [size-1 :0] sum; output carry; input [size-1 :0] a; input [size-1 :0] b; input c; wire junk; assign {carry, sum, junk} = {a,c}-+ {b,c}; endmodule module xtaddcin (xtout, a, b, c) ,- parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] a; input [size-1 :0] b; input c,- assign xtout = ({a,c} + {b,c}) >> 1; endmodule module xtaddcout (sum, carry, a, b) ; parameter size = 1; output [size-1 :0] sum; output carry; input [size-1 :0] a; input [size-1 :0] b; assign {carry, sum} = a + b; endmodule module xtbooth(out, cin, a, b, sign, negate); parameter size = 16; output [size+l:0] out; output cin; input [size-1 :0] a; input [2:0] b; input sign, negate; wire ase = sign & a [size-1]; wire [size+l:0] axl = {ase, ase, a}; wire [size+l:0] ax2 = {ase, a, 1'dθ}; wire one = b [1] Λ b[0],- wire two = b[2] ? ~b[l] & -b[0] .- b[l] & b[0]; wire cin = negate ? (~b[2] & (b [1] | b[0])) : (b[2] & ~ (b [1] & b[0])); assign out = {size+2{cin} } A (axl&{size+2{one} } | ax2&{size+2{two} }) ; endmodule module xtclock_gate_nor (xtout , tinl , xtin2 ) ; output xtout; input xtinl, xtin2; assign xtout = - (xtinl xtin2) endmodule module xtclock_gate_or (xtout, tinl,xtin2) ; output xtout; input xtinl, tin2 ; assign xtout = (xtinl xtin2) endmodule module xtcsa (sum, carry, a, b, c) ; parameter size = 1; output [size-1 :0] sum; output [size-1 :0] carry; input [size-1 :0] a; input [size-1 :0] b; input [size-1 :0] c; assign sum = a Λ b * c; assign carry = (a & b) | (b & c) (c Sc a) endmodule module xtenflop (xtout, xtin, en, elk) ,- parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always ® (posedge elk) begin if (en) tmp <= #1 xtin; end endmodule module xtfa (sum, carry, a, b, c) ; output sum, carry; input a, b, c; assign sum = a A b A c; assign carry = a & b | a & c b & c; endmodule module xtflop (xtout, xtin, elk) ; parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin tmp <= #1 xtin; end endmodule module xtha (sum, carry, a, b) ; output sum, carry; input a, b; assign sum = a Λ b; assign carry = a & b; endmodule module xtinc (xtout, a); parameter size = 32; output [size-1 : 0] xtout ; input [size-1 : 0] a; assign xtout = a + 1 ; endmodule module xtmux2e (xtout, a, b, sel); parameter size = 32; output [size-1 :0] xtout; input [size-1.-0] a; input [size-1 :0] b; input sel; assign xtout = (-sel) ? a : b; endmodule module xtmux3e (xtout, a, b, c, sel); parameter size = 32; output [size-1 :0] xtout,- input [size-1 :0] a; input [size-1 :0] b; input [size-1 :0] c; input [l-.O] sel; reg [size-1 :0] xtout;
always @(a or b or c or sel) begin xtout = sel[l] ? c : (sel[0] ? b : a) ; end endmodule module xtmux4e (xtout, a, b, c, d, sel) ; parameter size = 32; output [size-1 : 0] xtout ; input [size-1 . - 0] a input [size-1 : 0] b input [size-1 : 0] c input [size-1 : 0] d input [1 : 0 sel ; reg [size-1 : 0] xt :out;
// synopsys infer mix "xtmux4e" always @(sel or a or b or c or d) begin : xtmux4e case (sel) // synopsys parallel_case full_case 2'b00: xtout = a,- 2 'b01:
Xtout = b; 2'blO: xtout = C; 2'bll: xtout = d; default: xtout = {size{l'bx}}; endcase // case (sel) end // always @ (sel or a or b or c or d) endmodule module xtnflop (xtout, xtin, elk) ; parameter size = 32; output [size-1 :0] tout ; input [size-1 :0] xtin; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (negedge elk) begin tmp <= #1 xtin; end // always @ (negedge elk) endmodule module xtscflop (xtout, xtin, clrb, elk); // sync clear ff parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input clrb; input elk; reg [size-1.-0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (lclrb) tmp <= 0; else tmp <= #1 xtin; end endmodule module xtscenflop (xtout, xtin, en, clrb, elk); // sync clear parameter size = 32; output [size-1 :0] xtout; input [size-1 :0] xtin; input en; input clrb; input elk; reg [size-1 :0] tmp; assign xtout = tmp; always @ (posedge elk) begin if (lclrb) tmp <= 0; else if (en) tmp <= #1 xtin; end " '-' endmodule
xtensa-gf .h #ifndef XTENSA_NO_INTRINSICS
#ifdef XTENSA
/* Do not modify. This is automatically generated.*/ typedef int gf8 attribute ( (user ("gf8") ) ) ;
#define GFADD8_ASM(gr, gs, gt) { \ asm ("gfaddδ %0,%1,%2» : "=v" (gr) : "v" (gs) , "v" (gt) ) ; \
}
#define GFADD8(gs, gt) ({ \ gf8 _gr; \ gf8 _gs = gs; \ gf8 _gt = gt; \
GFADD8_ASM(_gr, _gs, _gt);\
_gr; \ })
#define GFADD8I_ASM(gr, gs, imm4) { \ asm ("gfadd8i %0,%1,%2" : "=v" (gr) : "v" (gs) , "i" (imm4) ) ; \
}
#define GFADD8I(gs, imm4) ({ \ gfS _gr; \ gf8 _gs = gs; \
GFADDSI_ASM (_gr, _gs , imm4) ; \
_gr; \ })
#define GFMULX8_ASM(gr, gs) { \ register int _xt_state asm ("state"); \ asm ("gfrnulxβ %1,%2" : "+t" (jXt_state) , "=v" (gr) : "v" (gs) ) ,-
\ } ttdefine GFMULXβ (gs ) ( { \ gf 8 _gr ; \ gf8 _gs = gs ,- \
GFMULX8_ASM (_gr, _gs) ; \
_gr; \ })
#define GFRWMOD8_ASM(gt) { \ register int jXt_state asm ("state"),- \ asm ("gfrwmodθ %1" : "+t" (jX±_state) , "=v" (gt) : "1" (gt) ) ; \
} ttdefine GFRWMOD8 (gt) ({ \ gf8 _gt = gt; \
GFRWMOD8_ASM(_gt) ;\ gt = _gt; \ }) , imm8) { \ %0,%1,%2" : "=v" (gt) : "a" (ars), "i"
Figure imgf000354_0001
#define LGF8_I(ars, imm8) ({ \ gfS _gt; \ const unsigned _ars = ars; \ LGF8_I_ASM(_gt, __ars, imm8);\ _gt; \ })
#define SGF8_I_ASM(gt, ars, imm8) { \ asm volatile (" sgf8_i %0,%1,%2" : : "v" (gt) , "a" (ars), "i"
(immS) ) ,- \ }
#define SGF8_I(gt, ars, imm8) ({ \ gfS _gt = gt; \ unsigned _ars = ars; \
SGF8_I_ASM(_gt, _ars, immS) ;\ })
#define LGF8_IU_ASM(gt, ars, imm8) { \ asm volatile (" lgf8_iu %0,%1,%3" : "=v" (gt) , "=a" (ars) : "1"
(ars) , "i" (imm8) ) ; \ }
#define LGF8_IU(ars, imm8) ({ \ gfS _gt; \ unsigned _ars = ars; \
LGF8_IU_ASM(_gt, _ars , imm8);\ ars = _ars; \
_gt; \ })
#define SGF8_IU_ASM (gt, ars, imm8) { \ asm volatile (" sgf8_iu %1,%0,%3" : "=a" (ars) : "v" (gt) , "0" (ars),
" i " ( imm8 ) ) ; \ }
#define SGF8_IU(gt, ars, imm8) ({ \ gfS _gt = gt; \ unsigned _ars = ars; \
SGF8_IU_ASM(_gt, _ars , imm8);\ ars = _ars; \ })
#define LGF8_X_ASM(gr, ars, art) { \ asm volatile (" lgf8_x %0,%1,%2" : "=v" (gr) : "a" (ars), "a"
(art)); \ }
#define LGF8_X(ars, art) ({ \ gfS _gr; \ const unsigned _ars = ars; \ unsigned _art = art; \ LGF8_X_ASM (__gr, _ars , _art) ; \
_gr; \
})
#define 'SGF8_X_ASM(gr, ars, art) { \ asm volatile (" sgf8_x %0,%1,%2" : : "v" (gr) , "a" (ars), "a"
(art)); \ }
#define SGF8_X(gr, ars, art) ({ \ gf8 _gr = gr; \ unsigned _ars = ars,- \ unsigned _art = art; \ SGF8_X_ASM (_gr, _ars , _art) ; \ })
#define LGF8_XU_ASM(gr, ars, art) { \ asm volatile ("lgf8_xu %0,%1,%3" : "=v» (gr) , "=a" (ars) : "1"
(ars) , "a" (art) ) ; \ }
#define LGF8jXU(ars, art) ({ \ gf8 _gr; \ unsigned _ars = ars; \ unsigned _art = art; \
LGF8_XU_ASM (_gr, _ars, _art) ; \ ars = _ars; \
_gr; \ })
#define SGF8_XU_ASM(gr, ars, art) { \ asm volatile ("sgf8_xu %1,%0,%3" : "=a" (ars) : "v" (gr) , "0" (ars),
"a" (art)),- \ }
#define SGF8JXU(gr, ars, art) ({ \ gfδ _gr = gr; \ unsigned _ars = ars,- \ unsigned _art = art; \
SGF8_XU_ASM (_gr, _ars , _art) ; \ ars = _ars; \ })
#define RUR0_ASM (arr) { \ register int _xt_state asm ("state"); \ asm ("rurO %1" : "+t" (jtjState) , "=a" (arr) : ); \
}
#define RURO () ({ \ unsigned _arr; \
RUR0_ASM(_arr) ,-\
_arr; \ })
#define WUR0_ASM(art) { \ register int _xt_state asm ("state") ; \ asm ("wurO %1" : "+t" (_xt_state) : "a" (art)); \
}
#define WURO (art) ({ \ unsigned _art = art; \
WURO_ASM (_art) ; \ })
#define-'gf8_loadi (_s, o) ({ \ gf8 t; \ gf8 *s = _s; \
LGF8_I_ASM(t, s, o) ; \ t; \ })
#define gf8_storei (_t, _s, o) ({ \ gfβ t = Jt; \ gf8 *s = _s; \
SGF8_I_ASM(t, S, o) ; \ })
#define gf8_move(_r, _s) ({ \ gf8 r = _r; \ gf8 s = _s; \ GFADD8_ASM(r, S, 0); \
})
#define RUR(n) ({ \ int v; \ register int _xt_state asm ("state"); \ asm ("rur %1, %2" : "+t" (_xt_state) , "=a" (v) : "i" (n) ) ; \ v; \ })
#define WUR(v, n) ({ \ register int _xt_state asm ("state"); \ asm ("wur %1, %2" : "+t" (_xt_state) : "a" (v) , "i" (n) ) ; \
})
#endif
#endif
APPENDIX B
#! /usr/xtensa/tools/bin/perl -w use Getopt : :Long; use strict;
$main: :inline_mux_count = 0; sub inline_mux { my($data, $sel, $width, $out, $style, $code) = @_; my($i, $n, $nl, $module, $inst, $d, $fail, ©data, @data_uniq) ,-
$n = @$data; if ($style eq "encoded") {
$module = "xtmux${n}e" ;
$fail = 0; elsif ($style eq "priority") {
$fail = scalar (@$data) != scalar (@$sel) +1;
$module = "xtmux${n}p" ; elsif ($style eq "selector") {
$fail = scalar (@$data) != scalar (@$sel) ,-
$module = "xtmux${n} " ; else { die "inline_mux: bad style $style";
if ($fail) { die "inline_mux: data / selection mismatch for $style $n" ;
f ($n == 0) { print " assign $out = 0;\n"; elsif ($n == 1) { print " assign $out = " . (shift @$data ";\n"; else {
@data_uniq = uniq(@$data) ; $nl = ®data_uniq; if ($style eq "priority" && ($nl !=. $n defined $code) ) { if (! defined $code) { for($i = 0; $i < $nl; $i++) {
$code->{$data_uniq[$i] } = $i;
} }
©data = sort { $code->{$a} <=> $code->{$b} } @data_uniq; print " wire [" . (ceil_log2 ($nl) -1) . ":0] ${out}jSel
=\n" for ($i = 0 ; $i < $n-l ; $i++) { print " $sel-> [$i] ? $code-> { $data-> [$i] } : \n" ;
} print " $code->{$data-> [$n-l] } ;\n" ; inline_mux(\@data, "${out}jSel", $width, $out, "encoded"); } else {
# drop an instance of the mux
$inst = $main: :inline_mux__count+* ; print " $module #($width) m$inst ($out" ; print map(", $_", @$data) ; if ($style eq "priority" | | $style eq "selector") { print map(", $_", @$sel) ; print ");\n"; } else { print ", $sel) ,-\n"; } # min of a list sub min { my($min, $v) ;
$min = $__[0] ; foreach $v (@_) {
$min = $v < $min ? $v : $min;
} return $min;
}
# max of a list sub max { my($max, $v) ;
$max = $__ [0] ; foreach $v (@_) {
$max = $v > $max ? $v : $max;
} return $max;
# ceil(log2 (x) ) sub ceil_log2 { my($x) = @_; my($n) ; for($n = 0, $x -= 1; $x > 0; $x >>= 1, $n++) {
} return $n;
}
# 2 ' X sub pow2 { my ( $x) = @_ return 1 << $x;
# uniqify an array sub uniq { my(%seen) ,- return grep ( ! $seen{$_}++, @_) }
# difference between two arrays sub diff {
' my($aref, $bref) = @_; m (%hash) grep($hash{$_} = 1, @$bref) ; return grep ( ! defined $hash{$_}, @$aref) ; sub wfield { my ( $name , $port , $stage) = @_;
$name = " $port- > {NAME}_$name" ; return $stage >= 0 ? "$.{name}_C$stage" : $name; }
sub rfield { my($name, $port, $stage) = @_; $name = "$port->{NAME}_$name" ; return $stage >= 0 ? "${name}__C$stage" : $name;
sub write_def { my ($write_port, $stage) = @_; return grep{$_ == $stage, @{$write_j?ort->{DEF} })
}
sub read_use { my ($read_port, $stage) = @_; return grep($_ == $stage, @{$read_port->{USE} }) ; }
sub init_print_break { my ($indent) = @_;
$main: :col = 0;
$main: :indent = $indent; } sub printjoreak { my($d) = @_; if ($main: :col + length($d) + 1 >= 85) { $main: :col = 4; print ("\n" . (' ' x $main: :indent)) ;
} print "$d";
$main: :'col += length ($d) + 1; }
sub doc { my ($a) = << ' END_OFJDOCUMENTATIO ' ;
The pipelined register file instantiates a number of pipelined register file banks, each of which contains a register file core.
The core is a simple multiple-read port multiple-write port register file. The address size is $rf->{ADDR_SIZE} (lg2 $rf->{MINJEEIHGT} ) and its declaration is
$rf->{ADDR_DECL} . The data size is $rf->{DECL_SIZE} ($rf- >{MIN_WIDTH}) and its declaration is $rf->{DECLJDECL} .
Multiple banks are used to support multiple widths for read and write ports .
We build NUMJBANK ($rf->MAXJWIDTH / $rf->MINJWIDTH) pipelined register banks, each of which has MINJHEIGHT words and MIN_WIDTH bits in each word. Each width must be a power of 2 multiple of the minimum width; in particular, NUMjBANK must also be a power of 2.
A final read alignment mux looks at the low-order address bits and the read-width mask to mux the correct data onto the output. This splits the address into HI_ADDR_SIZE and LO_ADDR_SIZE fields. The high order bits go directly to the register file core; the low address bits are fed to the alignment mux. The read output is always MAXJWIDTH in size and smaller data values are shifted to the LSB of the output word.
As a concrete example, consider a register file of size 1024 bits (32x32) with read widths of 32 and 128.
NUMJBANK = 4 MINJtϊEIGHT = 32 MINJWIDTH = 32 MAX_HEIGHT = 8 MAXJWIDTH = 128 ADDR_SIZE = 5 ADDRjDECL = [4:0] W0RD_SIZE = 32 WORDJOECL = [31:0] HI_ADDR_SIZE = 3 LO_ADDR_SIZE = 2
The read mask is :
11 to read width 32
10 to read width 64 (not used in this case)
00 to read width 128
END OF DOCUMENTATION return $a;
sub derive_constants { my($rf) = @_; my ($read__port, $write__port, $n, $w, ©width);
# determine parameters for register file banks foreach $read_port (®{$rf->{READJPORT} }) { push (©width, @{$read_port->{WIDTH} }) ;
} foreach $write_port (@{$rf->{WRITEJPORT} }) { push (©width, @{$write_port->{WIDTH} } ) ;
}
©width = sort {$a <=> $b} (&uniq (©width) ) ;
$rf->{MINJWIDTH} = $width[0]; $rf-> {MAXJWIDTH} = $width[$#width] ;
$rf->{MIN_HEIGHT} = $rf->{SIZE} / $rf->{MAXJWIDTH} ; $rf->{MAXjHEIGHT} = $rf->{SIZE} / $rf->{MIN_WIDTH} ; $rf->{NUMJBANK} = $rf-> {MAXJWIDTH} / $rf->{MINJWIDTH} ; foreach $w (©width) {
$n = $w / $rf->{MINJWIDTH}; if ($n != pow2 (ceil_log2 ($n) ) ) { die "width $w not valid multiple of $rf->{MIN_WIDTH}\n" ,-
} }
# register file core parameters
$rf->{ADDR_SIZE} = ceil_log2 ($rf->{MINJHEIGHT}) ; $rf->{ADDR_DECL} = $rf ->{ADDR_SIZE} > 0 ? " [" . ($rf-
>{ADDR_SIZE}-1) . ":0]" : "";
$rf->{WORD_SIZE} = $rf->{MINJWIDTH};
$rf->{WORDJDECL} = $rf ->{WORD_SIZE} > 0 ? " [" . ($rf- >{W0RD_SIZE}-1) . ":0]" : "";
$rf->{HI_ADDR_SIZE} = ceil_log2 ($rf ->{MAX_HEIGHT} ) ; $rf->{LO_ADDR_SIZΞ} = $rf ->{HI_ADDR_SIZE} - $rf -> {ADDR_SIZE} ; $rf->{FULLJWORD_SIZE} = $rf -> {MAXJWIDTH} ;
$rf->{FULLJWORDjDECL} = $rf ->{FULLJW0RD_SIZE} > 0 ? "[" . ($rf- > { FULL_WORD__SIZE } - 1 ) . " : 0 ] " : " " ;
$rf->{MAX_LATENCY} = 0 ; foreach $write_port (@{$rf->{WRITEJPORT} }) { my(@def) = sort (&uniq(@{$write_port->{DEF} }) ) ; $write_port->{DEF} = \®def;
$write_port->{MAXJOEF} = &max(2, @{$write_j>ort->{DEF} } ) ; $write_port-> {MAXJWIDTH} = max (@{$write_port->{ IDTH} }) ; $rf->{MAX_LATENCY} = max ($rf ->{MAX_LATENCY} , $write_port- >{MA JDEF}) ;
} foreach $read_port (@{$rf->{READ_PORT} }) { my (©use) = sort (&uniq(@{$read_port->{USE} })) ,-
$read_port->{USE} = \@use;
$read_jport->{MIN_USE} = min(@{$read_port->{USΞ} }) ; $read_port-> {MAX_USE} = max (@{ $read_port->{USB} } ) ,- $read_jport- > {MAX_WIDTH} = max (@{ $read_port- > { IDTH} } ) ;
}
$rf->{NUMJTEST_VECTOR} = $rf->{NUMjrEST_VECTOR} | | 1000; $rf->{USE_LATCHES} =' $rf->{USE_LATCHES} | | 1; $rf->{TESTJTRANSPARENT_LATCHES} = $rf->{TESTJTRANSPARENT_LATCHΞS} if . ($rf->{TRANSPARENT_LATCHJMODE}) { # an old name for it $rf->{TESTJTRANSPARENT_LATCHES} = 1;
}
$rf->{DESIGNJPREFIX} = $rf->{DESIGN_PREFIX} || " " ;
sub write_regfile { my($rf) = @_; my ($lo_addr_decl, ©iolist, $s, $i, $j , $h, $1, $w) ; my (©defer, $read_port, $write_port) ,-
$lo_addr_decl = $rf->{LO_ADDR_SIZE} > 0 ? " [" . ($rf- >{L0_ADDR_SIZE}-1) . ":0]" : ""; init_print_break(2) ; printJoreak ( "module $rf->{DESIGNJPREFIx}$rf->{NAME} (") ; foreach $read_port (@{$rf->{READJPORT} } ) { foreach $s (®{$read_port->{USE} }) { my($data) = rfield ( "data" , $readjport, $s) ; my($decl) = " [" . ($read_port->{MAXJWIDTH} - 1) . ":0]"; printJoreak { "$data, " ) ; push(@iolist, " output $decl $data,-\n") ;
}
# don't need an address for a single word register file if ($rf->{HI_ADDR_SIZE} > 0) { my($addr) = rfield ("addr" , $read_port, 0) ; my($decl) = " [" . ($rf->{HI_ADDR_SIZE} - 1) . ":0]"; print_break ( "$addr, ") ; push (©iolist, " input $decl $addr;\n"); } else { my($addr) = rfield ("addr" , $read_port, 0) ; push(@defer, " wire $addr = 0,-\n");
} foreach $w (@{ $read_port- > {WIDTH} } ) { my ($width) = rf ield ( "width$w" , $read_port, 0) ; print_break ( "$width, " ) ; push (©iolist, " input $width; \n" ) ;
} foreach $s (®{$read_port->{USE} } ) { my($use) = rfield("use$s" , $read_port, 0) ; print_break("$use, "),- push (©iolist, " input $use;\n"); } } foreach $write_port (@{$rf->{WRlTE__PORT} }) {
# don't need an address for a single word register file if ($rf->{HI_ADDR_SIZE} > 0) { my($addr) = wfield ("addr" , $write_port, 0) ; my($decl) = " [" . ($rf->{HI_ADDR_SIZE} - 1) . ":0]"; print_break("$addr, ") ; push (©iolist, " input $decl $addr;\n") ; } else { my($addr) = rfield ( "addr", $write__port, 0) ; push (©defer, " wire $addr = 0;\n") ,-
} foreach $w (@{$write_j?ort->{WIDTH} }) { my($width) = rfield ( "width$w" , $writejport, 0) ; printJoreak ( "$width, ") ; push(@iolist, " input $width;\n") ;
} foreach $s (@{$write_port->{DEF} }) { my($def) = wfield("def$s" , $write_port, 0) ; print_break("$def, ") ; push (©iolist, " input $def;\n"),-
} foreach $w (@{$write_port-> {WIDTH} }) { foreach $s (@{$write_port->{DEF} }) { my($data) = wfield ("data$w" , $write_jport, $s) ; my($decl) = " [" . ($w - 1) . ":0]"; printjoreak ( "$data, " ) ; push(@iolist, " input $decl $data;\n"); } } foreach $s (1 .. $write_port->{MAJDEF}) { my($wen) = wfieldC'wen" , $write_port, $s) ; if ($s > &max(@{$write_port->{DEF} }) ) { push(@defer, " wire $wen = l'dl;\n"); } else { printjorea ( "$wen, " ) ; push (©iolist, " input $wen,-\n") ; } }
printjoreak ("KillJΞ, "),- push (©iolist, " input KillJΞ;\n") ; printjoreak ("KillPipeJW, ") ; push (©iolist, " input KillPipeJW; \n" ) ,- prinjorea ( "StallJR, " ) ; push (©iolist, " output StallJR; \n" ) ; if ($rf->{USE_LATCHES} && $rf->{TEST_TRANSPARENT_LATCHES} ) { print_break ( "TMode , " ) ; push (©iolist, "input TMode;\n"); } printjbreakC'clk) ;\n") ; push(@iolist, " input clk;\n"); print joinC', ©iolist); print "\n"; print joinC', ©defer); print "\n» ;
foreach $read_port (@{$rf->{READ_PORT} } ) { print " /*" . ('*' x 70) . "\n"; print " READ PORT $read_port->{NAME}\n" ; print " *" . ('*' x 70) . "/\n"; if ($rf->{LOjADDR_SIZE} > 0) { my (©data, ©sel) ; foreach $w (@{$read_port->{WIDTH} }) { my($width) = rfield("width$w" , $read__port, 0) ,- my($mask) = -($w / $rf->{MINJWIDTH} - 1) & ( (1 << $rf- >{LO_ADDR_SIZE}) - 1) ; push(@data, $rf->{LO_ADDR_SIZE} . '"d" . $mask) ; push(@sel, $width) ;
} my ($addr_mask) = rfield("addr_mask" , $read__port, 0) ; print " wire $lo_addr_decl $addr_mask; \n" ,- inline_mux(\@data, \@sel, $rf->{L0_ADDR_SIZE} , $addr_mask, "selector") ;
} else { my ($addr_mask) = rfield("addr_mask" , $read__port, 0) ; print " wire $addr_mask = 0;\n";
} print "\n"; print " // masked address pipeline\n" ;' if ($rf->{LO_ADDR_SIZE} > 0) { my($addr) = rf ield ("addr" , $read_port, 0) ,- my($maddr) = rf ield("maddr" , $read_port, 0) ; my ($addr_mask) = rf ield("addr_mask" , $read__port, 0) ; print " wire $lo__addr_decl $maddr = $addr &
$addr_mask; \n" ; for($s = 1; $s <= $read_port->{MAXjσSE}; $s++) { my($maddr) = rfieldC'maddr", $read_j?ort, $s) ; print " wire $lo_addr_decl $maddr;\n";
} for($s = 1; $s <= $read_ >ort->{MAXJUSE} ; $s++) { my($maddr) = rfieldC'maddr", $read_port, $s-l) ,- my($maddrl) = rf ield("maddr" , $read_port, $s) ; print " xtdelayl # ($rf->{LO_ADDR_SIZE} ) i$maddrl ($maddrl, $maddr, clk);\n";
} } else { my($maddr) = rfieldC'maddr", $read_port, 0) ; print " wire $maddr = 0;\n";
} print "\n"; print " // bank-qualified use\n"; foreach $s (@{$read_port->{USE} } ) { foreach $i (0 .. $rf->{NUMJ3ANK}-1) { my($use) = rfield("use$s" , $read_jιort, 0) ; my($maddr) = rfieldC'maddr", $read_port, 0) ,- my ($addr_mask) = rfield("addr_mask" , $read_port, 0) ; my ($use_banki) = rfield("use$s" . "_bank$i", $read_port, 0); print " wire $use_banki = ($use & ($maddr == ($i & $addr_mask) ) ) ;\n" ; } } print "\n"; # determine which banks need to be muxed into which output ports my(©align) ; for($i = 0; $i < $rf->{NUM_BANK}; $i++) { $align[$i] = [ ];
} for($w = 1; $w <= $rf->{NUMJ3ANK}; $w *= 2) {
# does this port need this read-width? if (grep($_ == $w * $rf->{MINJWIDTH} , @{$readjport-
>{WIDTH}})) { for($j = 0; $j < $rf->{NUMJ3ANK}; $j += $w) { for($i = 0; $i < $w; $i++) { push(@{$align[$i] }, $i+$j);
}
' }
}
} for($i = 0; $i < $rf->{NUMJ3ANK}; $i++) {
@{$align[$i] } = sort {$a <=> $b} (&uniq(@{$align[$i] }) ) ; }
# print STDOUT "Read table\n" ,-
# for($i = 0; $i < $rf->{NUMJBANK}; $i++) {
# print STDOUT "set $i: " . joinC ', @{$align[$i] } ) . "\n";
# } foreach $s (@{$read_port->{uSE} } ) { print " // alignment mux for use $s\n" ; for($i = 0; $i < $rf->{NUM_BANK}; $i++) { my ($dataJoanki) = rfield("data_bank$i" , $read_port, $s) ,- print " wire $rf->{WORDJDECL} $data_banki;\n" ;
} for($i = 0; $i < $rf->{NUMJBANK}; $i++) { my (©data) ; " foreach $j (@{$align [$i] } ) { my ($data_bankj ) = rfield ("data_bank$j " , $read_port,
$s) push (©data, $data_bankj ) ; }
1 $h = $rf->{LO_ADDR_SIZE} - 1;
$1 = $rf->{LO_ADDR_SIZE} - ceil_log2 ($#data + 1) ; my($sel) = rfieldC'maddr", $read__port, $s) . »[$h:$l]";
$h = $rf->{MINJWIDTH} * ($i+l) - 1;
$1 = $rf->{MIN_WIDTH} * $i; my($data) = rf ield ("data" , $read_port, $s) . »[$h:$l]"; my($prefix) = rfield("align$i" , $read__port, $s) ; if (©data > 0) { inline_mux(\@data, $sel, $rf->{WORD_SIZE} , $data,
"encoded") ;
} }
* print "\n";
} print "\n"; foreach $write_jport (@{$rf->{WRITE_PORT} }) { print " /*" . ('* x 70) . "\n"; print " WRITE PORT $write__port->{NAME}\n"; print " *" . ('*' x 70) . "/\n"; if ($rf->{LO_ADDR_SIZE} > 0)_ { my (©data, ©sel); foreach $w (@{$write_port->{WIDTH} } ) { my($width) = wfield("width$w" , $write_port, 0) ; my($mask) = ~($w / $rf->{MINJWIDTH} - 1) & ( (1 << $rf- >{LO_ADDR_SIZE}) - 1) ; push (©data, $rf->{LO_ADDR_SIZE} . '"d" . $mask) ; push(@sel, $width) ;
} my ($addr_mask) = wfield("addr_mask" , $write_port, 0) ; print " wire $lo_addr_decl $addr_mask;\n" ; inline_mux(\@data, \@sel, $rf->{LO_ADDR_SIZE} , $addr_mask,
"selector";
} else { my ($addr_mask) = wfield("addr_mask" , $write_port, 0) ; print " wire $addr_mask = 0;\n";
} print "\n"; if (@{$write_port->{WIDTH}} > 1) { print " // width pipeline\n"; foreach $w (@{ $write_port->{WIDTH} } ) { for($s = 1; $s <= $write_port->{MAX OEF},- $s++) { my($width) = wfield("width$w" , $write_port, $s) ; print " wire $width;\n";
} for($s = 1; $s <= $write_port->{MAXJDEF} ; $s++) { my($width) = wfield("width$w" , $write_port, $s-l) ; my($widthl) = wfield ("width$w" , $writejport, $s) ; print " xtdelayl #(1) i$widthl ($widthl, $width, elk) ;\n",
} } print "\n";
} print " // bank-qualified write def for port $write_jport- >{NAME}\n"; foreach $s (@{$write_port->{DEF} }) { foreach $i (0 .. $rf->{NUMJBANK}-l) { my($def) = wfield ( "def$s", $write_port, 0) ; my($addr) = wfield ( "addr" , $write_port, 0) ; my ($addr_mask) = wfield("addr_mask" , $write_port, 0) ,- my ($defJσanki) = wfield("def$s" . "_bank$i", $write_port, 0); print " wire $def_banki = ($def & (($addr & $addr__mask) == ($i & $addr__mask) ) ) ; \n" ,-
}
} print "\n"; foreach $s (@{$write_port->{DEF} }) { my (©data, ©sel) ,- print " // write mux for def $s\n"; my($wdata) = wfieldC'wdata" , $writejport, $s) ; foreach $w (@{$write_port->{WIDTH} }) { $± = $rf->{MAXJWIDTH} / $w; my($width) = wfield("width$w" , $write_port, $s) ; my($data) = wfield("data$w" , $write_port, $s) ;
$data = "{" . $! . "{$data" . " [" . ($w-l) . ":0]}}"; pus (©data, $data) ; push(@sel, $width) ;
} print " wire $rf->{FULLJWORDJDECL} $wdata,-\n"; inline_mux(\@data, \@sel, $rf->{FULLJWORD_SIZE} , $wdata,
"selector") print "\n";
} print "\n";
}
# drop n copies of the pipelined regfile print " /*" . ('*• x 70) . "\n"; print " PIPELINED BANK\n" ; print " *" . ('*' x 70) . "/\n"; for($i = 0; $i < $rf->{NUMJ8ANK}; $i++) { init_printJoreak (8) ; printjoreak ( " $rf->{DESIGN_PREFIX}$rf->{NAME}Joank $rf- >{NAME}_bank$i(") ,- foreach $read_port (@{$rf->{READ_PORT} }) { foreach $s (@{$read_port->{USE} }) { my ($data_banki) = rfield("data_bank$i" , $read_port, $s) ; printjorea ( "$data_banki , " ) ;
}
# don't need an address for a single word register file if ($rf->{ADDR_SIZE} > 0) { my($addr) = rfield ("addr" , $read_port, 0) ; my($decl) = " [" . ($rf->{HI_ADDR_SIZE}-1) . ":$rf- >{LO_ADDR_SIZE} ] " ; printjoreak ( " $addr$decl , " ) ,-
} foreach $s (@{$read_port->{USE} } ) { my ($use_banki) = rfield("use$s" . 'Jbank$i", $read_port,
0); printjoreak ("$use_banki, "),- }
} foreach $write_port (®{$rf->{WRITE_PORT} }) { if ($rf->{ADDR_SIZE} > 0) { my($addr) = wfield ("addr" , $writejPort, 0) ; my($decl) = " [" . ($rf->{HIjADDR_SIZE} -1) . ":$rf- >{LO_ADDR_SIZE} ] " ; printjoreak ("$addr$decl, ") ,-
} foreach $s (@{$write_port->{DEF} }) { my ($def_banki) = wfield ( "def$s" . "Joank$i", $write_port, 0); printjoreak ( "$defJbanki, ") ;
} foreach $s (®{ $write__port->{DEF} } ) { my ($wdata) = wf ield C'wdata" , $write_port , $s) ,-
$h = $rf->{MINJWIDTH} * ($i+l) - 1;
$1 = $rf->{MIN_WIDTH} * $i; printjoreak ("$wdata" . "[$h:$l], ") ; } foreach $s (1 .. $write__port->{MAXJDEF}) { my($wen) = wfield("wen" , $write_port, $s) ; printjoreak ("$wen, ") ; }
} printjoreak ("KillJΞ, ") ; printjoreak ( "KillPipeJW, ") ; printjoreak ("StallJR$i, ") ; if ($rf->{USE_LATCHES} && $rf->{TEST_TRANSPARENT_LATCHES} ) { print_break ( "TMode, " ) ;
} printjbreakC'clk) ;\n") ; print "\n"; } print " assign StalljR ="; for($i = 0; $i < $rf->{NUMjBANK}; $i++) { print " StalljR$i |";
} print " l'b0;\n"; print "\n";
print "endmodule\n" ;
sub write_regfilejoank { my($rf) = @_; my(@defer, ©iolist, $s, $sl, $rs, $ws, $i, $j , $read_ >ort, $write_port, $result) ; init_printJoreak (2) ,- printjoreak ("module $rf->{DESIGN_PREFIX}$rf->{NAME}_bank(") ;
# read port I/O list foreach $read_port (@{$rf->{READJ?ORT} } ) { foreach $s (@{$read_port->{USE} } ) { my($data) = rfield ("data" , $read_port, $s) ; printjoreak ( "$data, " ) ,- push(@iolist, " output $rf->{WORDJDECL} $data; \n" ) ; }
# don't need an address for a single word register file my($addr) = rfield ("addr" , $read_jport, 0) ,- if ($rf->{ADDR_SIZE} > 0) { printjoreak ( " $addr, " ) ; push(@iolist, " input $rf->{ADDR_DECL} $addr;\n"); } else { push(@defer, " wire $rf->{ADDR_DECL} $addr = 0;\n"); } foreach $s (1 .. $rf->{MAX_LATENCY}) { my($use) = rfield("use$s" , $readjPort, 0) ; if (read_use ($read_port, $s) ) { printjoreak ("$use, "),- push (©iolist, " input $use,-\n") ; } else { push(©defer, " wire $use = 0;\n") ;
} } }
# write port I/O list foreach $write_port (@{$rf->{WRITE_PORT} } ) { my($addr) = wfield ("addr" , $write_port, 0) ; if ($rf->{ADDR_SIZE} > 0) { printjoreak ( "$addr, " ) ; push(@iolist, " input $rf->{ADDR_DECL} $addr;\n"); } else { push(@defer, " wire $rf->{ADDRjDECL} $addr = 0;\n"); } foreach $s (1 .. $writ'e_ }ort->{MAXJ->EF}) { my($def) = wfield ("def$s" , $write_port, 0) ,- if (write_def ($write_port, $s) ) { printjoreak ( " $def, " ) ; push (©iolist, " input $def ;\n") ; } else { push(@defer, " wire $def = 0;\n");
} } foreach $s (1 .. $write_port->{MAX_DEF}) { my($data) = wfield ("data" , $write_port, $s) ; if (write_def ($write_port, $s) ) { printjoreak ("$data, ") ; push(@iolist, " input $rf->{WORD_DECL} $data,-\n"); } else { push(@defer, " wire $rf->{WORD_DECL} $data = 0;\n"),-
} } foreach $s (1 .. $write_port->{MAXJDEF} ) { my($wen) = wfieldC'wen" , $write_port, $s) ; printjoreak ( "$wen, " ) ; push (©iolist, " input $wen,-\n") ;
} } printjoreak ("KillJS, ") ; push(@iolist, " input Kill_E,Nn" ) ; print_break("KillPipe_W, ") ; push (©iolist, " input KillPipe_W;\n") ; print_break("StalljR, ") ; push (©iolist, " output StalljR;\n") ; if ($rf->{USE_LATCHES} && $rf->{TESTJTRANSPARENT_LATCHES} ) { printjoreak ("TMode, ") ; push(@iolist, "input TMode,- \n" ) ;
} printjoreak ( "elk) ; \n" ) ; push (©iolist, " input clk;\n"); print joinC1, ©iolist); print "\n"; print join( ' ' , ©defer) ; print "\n"; for($S = 0; $S <= $rf->{MAX_LATENCY}+l; $S++) { # can't kill after commit point which is C3 my($kill) = "kill_C$s"; my($value) =
$s == 1 ? "KillPipeJW | KillJΞ" :
$s <= 3 ? "KillPipeJW" :
"I'bO " ,- print " wire $kill = $value,-\n";
} print "\n";
########################################################################### # Write-port information
########################################################################### foreach $write_port (@{$rf->{WRITEJPORT} }) { # write definition pipeline print " // write definition pipeline\n" ; for($s = 1; $s <= $write_port->{MAX_DEF} ; $s++) { for($i = $s; $i <= $write_port->{MAX_DΞF} ; $i++) { my($wen) = $s == 1 ? "1" : wfield ("wen" , $write_port, $s-
1) my($def) = wfield("def$i" , $write_port, $s-l) ; my($ns_def) = wfield("ns_def$i" , $write_jport, $s-l) ; my($defl) = wfieldC'def$1" , $write_port, $s) ; my($kill) = "kill_C" . ($s-l) ; if (write_def ($write_port, $i) ) { print " wire $ns_def = $def & $wen & ~$kill;\n"; print " xtdelayl #(1) i$def1 ($def1, $ns_def, elk) ,-\n",
} else { print " wire $ns_def = 0;\n"; print " wire $defl = 0,-\n"; } } } print "\n";
# write enable pipeline print " // write enable pipeline\n"; for($s = 1; $s <= $write_port->{MAXJDEF} ; $s++) { my $wel = wfieldC'we", $write_port, $s+l) ,- print " wire $wel;\n";
} for($s = 1; $s <= $write_port->{MAX_DEF}+l; $s++) { my $first = $s == 1; my $last = $s == $write_port->{MAX_DEF} + 1; my $we = $first ? "1'dO" -. wfieldC'we", $write_port, $s) ; my $def = $last ? "1'dO" : wfield ("def$s" , $write_port,
$s) my $wen = $last ? "1'dO" : wfieldC'wen" , $write_port, $s) ; my $kill = "kill_C$s"; my $ns_we = wfield("ns_we" , $write__port , $s) ; print " wire $ns_we = ($we | ($def & $wen) ) &
-$kill;\n» ■
} for ($s = 1 ; $s <= $write_ port-> {MAX_DEF} ; $s++) { my $ns_we = wf ield ( "ns_we" , $write_jport , $s) ; my $wel = wfield C'we" , $write_port , $s+l) ; print " xtdelayl # (1) i$wel ($wel , $ns_we , clk) ; \n" ;
} print " \n" ;
# Write address pipeline print " // write address pipeline\n" ; for($s = 1; $s <= $write_jport->{MAX_DEF}+l; $s++) { my $addr = wfield ("addr" , $write_port, $s) ; print " wire $rf->{ADDR_DECL} $addr,-\n";
} for($s = 1; $s <= $write_joort->{MAX_DEF}+l; $s++) { my $addr = wfield ("addr" , $write_port, $s-l) ; my $addrl = wfield ("addr" , $write_port, $s) ; if ($rf->{ADDR_SIZE} == 0) { print " assign $addrl = 0;\n"; } else { print " xtdelayl # ($rf->{ADDR_SIZE} ) i$addrl ($addrl, $addr, clk);\n";
} } print "\n";
# Write data pipeline print " // write data pipeline\n"; for($s = 1; $s <= $write_port->{MAX_DEF}; $s++) { my $resultl = wfield ("result" , $write_port, $s+l) ,- print " wire $rf->{WORD_DECL} $resultl;\n" ;
} for($s = 1; $s <= $write_port->{MAX_DEF}*+-l; $s++) { my $result = wfield ("result" , $write_port, $s) ; my $data = wfield ( "data" , $write_jport , $s) ; my $sel = wfieldC'def$s", $write_port, $s) ; my $mux = wfield ("mux" , $write_port, $s) ; if ($s == 1) { print " wire $rf->{WORD_DECL} $mux = $data;\n"; } elsif ($s == $write_port->{MAXjDEF}+l) { print " wire $rf->{WORDjDECL} $mux = $result;\n"; } else { print " wire $rf->{WORD_DECL} $mux = $sel ? $data : $result;\n" ;
# print " xtmux2e # ($rf->{WORD_SIZE} ) i$mux($mux,
$result, $data, $sel);\n";
} } for($s = 1; $s <= $write_port->{MAX_DEF}; $s++) { my $mux = wfield("mux" , $write_port, $s) ; my $resultl = wfield ("result" , $write_port, $s+l) ,- print " xtdelayl # ($rf->{WORD_SIZE}) i$resultl ($resultl, $mux, elk) ;\n" ;
} print "\n"; ########################################################################### # Read-port information
########################################################################### foreach $read_port (@{$rf->{READ_PORT} }) {
# need to declare read data which aren't ports for($s = $read port->{MINJJSE} - 1; $s <= $read_port- >{MAX_USE}; $S++) { if (! read αse ($read_jρort, $s) ) { my($data) = rfield ("data" , $read_port, $s) ,- print " wire $rf->{W0RD_DECL} $data;\n"; } } } print "\n"; foreach $read_jport (@{$rf->{RΞAD_PORT} }) { if ($read_port->{MAXJUSE} >= 2) { print " // read address pipeline for port $read_port- >{NAME}\n"; for($s = 1; $s <= $read_port->{MA JUSE}-l; $s++) { my $addrl = rfield ("addr" , $read_jport, $s) ; print " wire $rf->{ADDR_DECL} $addrl;\n";
} for($s = 1; $s <= $read_jport->{MAX_USE}-l; $s++) { my $addr = rfield ("addr" , $read_port, $s-l) ,- my $addrl = rfield ("addr" , $read_port, $s) ; if ($rf->{ADDR_SIZE} == 0) { print " assign $addrl = 0;\n"; } else { print " xtdelayl # ($rf->{ADDR_SIZE}) i$addrl($addrl, $addr, clk);\n";
} } print "\n";
}
$rs = <DOCUMENTATION;
Bypass logic generation is somewhat tricky. For the first use (typically usel) the data comes from
(a) write data coming from the datapath (wrO_data_Ci , i=l..n)
(b) data stored in the write pipeline (wrO_result_Cn, i=2..n+1)
(c) the register file (rd0_data_C0)
For later uses (e.g., use 2) the data comes from
(a) write data coming from the datapath (wrO_data_Ci, i=2..n)
(b) the read pipeline previous stage (rdO_data_c{i-l})
To avoid WAW hazards, there is a defined priority on this data. Consider a use 1,2,3,4 read pipe and a def 1,2,3,4 write pipe. The priority order for use 1 is : wrO_data_Cl , wrO_data_C2 , wrO_result_C2 , wrO_data_C3 , wrO_result_C3 , wr0_data_C4 , wr0_result_C4 , wrO_result_C5 , register file.
The priority order for use 2 is similar, except for all places where the write pipeline would be used, we use the previous stage read pipeline instead. This is because the data stored in the write pipeline has already been bypassed into the read pipeline earlier.
Hence, the unique sources are wrO_data_C2, wrO_data_C3, wrO data C , rdO_data_Cl with a priority order of : wrO_data_Cl , wrO_data_C2 , rdO_data_Cl , wrO_data_C3 , rdO_data_Cl , wrO_data_C4 , rdO_data_Cl, rdO_data_Cl , rdO_data_Cl .
Because of all of the write pipeline data is available very early, we build a special mux for the first stage bypass. We first mux together all of the stored data in the write pipe with the read data from the register file. Then we mux together all of the data coming from the datapath. Finally, we select between these two. DOCUMENTATION if ($main: :verify) { for($rs = $read_port->{MIN_USE}-l; $rs <= $read_port- >{MAX_USE}-1; $rs++) { my $rdata = rfield ("data" , $read_port, $rs) ; my $rdatal = rfield("data", $read_port, $rs+l) ; print " xtdelayl # ($rf->{WORD_SIZE}) i$rdatal ($rdatal, $rdata, elk) ,-\n";
} print "\n"; } else { -' print " // Read bypass controls for port $read_port- >{NAME}\n";
# bypass the data being defined in stage $ws for($rs = $read_port->{MIN_USE}-l; $rs <= $read_port- >{MAX_USE}-1; $rs++) { for($ws = $rs+l; $ws <= $rf->{MAX_LATENCY}+l; $ws++) { foreach $write ρort (@{$rf->{WRITE_PORT} }) { if (write_def ($write_port, $ws) ) { my $waddr = wfield ("addr", $write_port, $ws) ,- my $raddr = rfield ("addr" , $read_port, $rs) ; my $def = wfield ("def$ws" , $write_jport, $ws) ; my $wen = wfield ( "wen" , $write_port, $ws) ; my $kill = "kill_C$ws"; my $bypass =• "bypass_data_$read_port- >{NAME}_C$rs\_$write_port->{NAME} _:$ws"; print " wire $bypass = ($waddr == $raddr) & $def & $wen & ~$kill;\n";
} } } }
# bypass the old data in the write pipeline in stage $ws for($rs = $read_port->{MINjαSE}-l; $rs <= $read_port- >{MAJJSE}-1; $rs++) { for($ws = $rs+l; $ws <= $rf->{MAX_LATENCY}+1; $ws++) { foreach $write_port (@{$rf->{WRITEJ?ORT} }) { if ($ws > 1 && $rs <= $write_jport->{MAX_DEF}+l) { my $waddr = wfield ("addr" , $write_port, $ws) ; my $raddr = rfield ("addr" , $read_port, $rs) ; my $we = wfieldC'we", $write_port, $ws) ; my $kill = "kill_C$ws"; my $bypass = "bypass_result_$readjport- > {NAME}_C$rs\j$writejport->{NAME}_C$ws"; print " wire $bypass = ($waddr == $raddr) & $we & -$kill,-\n";
} } } } print "\n"; for($rs = $read_port->{MIN_USE}-l; $rs <= $read_port- >{MAXjαSE}-l; $rs++) { my $mux = rfieldC'mux", $read__port, $rs) ,- my $mux_result = rfield("mux_result" , $read_port, $rs) ; my $rdata = rfield ("data" , $read_port, $rs) ; my $rdatal = rfield ("data" , $read_jρort, $rs+l) ,- print " // Read bypass for port $read_port->{NAME} use " . ($rs+l) . "\n"; if ($rs == $read_port->{MIN_USE} - 1) { my(®data, ®sel) ;
# bypass the results from the write pipeline (s) for($ws = $rs+l; $ws <= $rf->{MAX_LATENCY}+1; $ws++)
{ foreach $write_jport (@{$rf->{WRITEJPORT} }) { if ($ws > 1 && $rs <= $write_port->{MAXJDEF}+l)
{ my $result = wfield ("result" , $write_port,
$ws) ; my $bypass = "bypass_result_$read_port- >{NAME}_C$rs\_$write_port->{NAME}_C$ws" ; push(@data, $result) ; push(®sel, $bypass) ; } } }
# lowest priority is data from register file push (©data, $rdata) ; print " wire $rf->{WORDJDECL} $mux_result,-\n" ; inline_mux(\@data, \@sel, $rf->{WORD_SIZE} , $mux_result, "priority") ;
$rdata = $mux__result; }
# choose binary encoding for the data bypass mux
# order stage 2 last, read data first my (©data, ©sel, $ncode, %code) ; $ncode = 0 ,-
$code{$rdata} = $ncode++; for($ws = $rs+l; ' $ws <= $rf->{MAX_LATENCY}+l; $ws++) { foreach $write_port (@{$rf->{WRITEJPORT} }) { if ($rs <= $write_port->{MAX_DEF}+l) { if ($ws != 2 && write_def ($write_jport, $ws) ) { my $wdata = wfield ("data" , $write_port, $ws) ; $code{$wdata} = $ncode++; } } } } foreach $write_port (@{$rf->{WRITEJPORT} }) { if (write_def ($write_jport, 2)) { my $wdata = wfield ("data" , $write_port, • 2) ; $code{$wdata} = $ncode++; } }
# build the priority-encoded bypass mux for($ws = $rs+l; $WS <= $rf->{.MAX_LATENCY}+1; $ws++) { foreach $write_port (@{$rf->{WRITEJ?ORT} } ) { if ($rs <= $write_port->{MAX_DEF}+l) { if (write_def ($write_port, $ws) ) { my $wdata = wfieldC'data" , $write_port, $ws) ; my $bypass = "bypass_data_$read_port- >{NAME}_C$rs\_$writejport->{NAME}_C$ws"; push (©data, $wdata) ; push(@sel, $bypass) ;
} if ($ws > 1) { my $bypass = "bypass_result_$read_jport- > {NAME}_C$rs\_$write_port-> {NAME}_C$ws " ; push(@data, $rdata) ; push(@sel, $bypass) ; } } }
} push (©data, $rdata) ,-. print " wire $rf->{WORDJDECL} $mux;\n"; inline_mux(\@data, \@sel, $rf->{WORD_SIZE} , $mux, "priority", \%code) ,- print " xtdelayl # ($rf->{WORD_SIZE}) i$rdatal ($rdatal, $mux, elk) ;\n"; print "\n"; } } } print " assign StallJR =\n" ,- foreach $write_port (@{$rf->{WRITEJPORT} }) { foreach $read_j)ort (@{$rf-> {READJPORT} }) { for($s = 1; $s <= $write_port->{MAX_DEF}-l; $s++) { my($waddr) = wfield ("addr" , $write_port, $s) ; my($raddr) = rfield ("addr" , $read_port, 0) ; print " (($waddr == $raddr) & (\n"; for($i = 1; $i <= $write_port->{MAX_DEF} - $s; $i++) { my($use) = rfield("use$i", $read_j?ort, 0) ; print " ($use & ("; for($j = $i+$s; $j <= $write_j?ort->{MAX_DEF} ; $j++) { my($ns_def) = wfield("ns_def$j " , $write_port, $s) ; print " $ns_def" ,- if ($j != $write_Port->{MAX_DEF}) { print " I " ,-
} } print " ) ) " ,- if ($i == $write_port->{MAXJDEF} - $s) { print ")) |\n"; } else { print " |\n";
}
} print " l'b0,-\n"; print "\n";
########################################################################### # Drop the core-cell
########################################################################### if ($main: :verify) { print " // verification register file core -- hack\n"; my $last; foreach $write_port (@{$rf->{WRITEJPORT} }) { my $data = wfield ("result" , $write_j?ort, $write_jport-
>{MAXJDEF}+1) ; my $we = wfield ("ns_we", $write_port, $write_port- >{MAXJDEF}+1) ; my $tmp = wfield ("tmp", $write_port, $write_port- >{MAX_DEF}+1) ; print " ' wire $rf->{WORDJDECL} $tmp;\n"; print " xtenflop # ($rf->{WORD_SIZE}) x$tmp($tmp, $data,
$we, clk);\n";
$last = $tmp;
} foreach $read_port (@{$rf-> {READJPORT} }) { my $data = rfield ("data" , $read__port, $read_port- >{MIN_USE}-1) ; print " xtflop # ($rf->{WORD_SIZE}) x$data ($data, $last, elk) ;\n»;
} } else { print " // register file core\n" ,- init_print_break (8) ; my $r = @{$rf->{READjPORT} } ; my $w = ©{$rf->{WRITEJPORT}}; my $n = $rf->{MINJHEIGHT} ; my $module = "xtregfile_${r}R${w}W_${n}" ; if (! $rf->{USE_LATCHES}) {
$module .= "_FF";
} printJoreak ( " $module # ($rf->{WORD_SIZE} ) icore ("); foreach $read_port (@{$rf->{READ_PORT} } ) { my $data = rfield ("data" , $read_port, $read_port->{MIN_USE}
1) printjoreak ("$data, "); if ($rf->{ADDR_SIZE} > 0) { my $addr = rfield ("addr", $read_port, $read_port- >{MINjαSE} - 1) ; printjoreak ( " $addr, " ) ; } } foreach $write_port (@{$rf->{WRITE_PORT} }) { my $data = wfield ("result" , $write_port, $write_port--
>{MAX_DEF}+1) ; printjoreak ( "$data, " ) ; if ($rf->{ADDR_SIZE} > 0) { my $addr = wfield ("addr" , $write_port, $write_port- >{MAX_DEF}+1) ; printJoreak ( "$addr, " ) ,-
} my $we = wfield ( "ns_we" , $write_port, $write_port- >{MAXJDEF}+1) ; printJbreak ( "$we, " ) ;
} if ($rf->{USE_LATCHES} && $rf->{TΞST_TRANSPARENT_LATCHES} ) { printjoreak ( "TMode, " ) ,-
} printjoreak ("elk) ;\n") ,-
} print "endmodule\n" ;
sub set_def { my($rf) = ©_,- my($def, $s, $w, $read_port, $write_port, $field, $width, $data_size, $addr_size) ;
# $def->{Kill_E} = {SIZE => 1, DIR => "in", DEFAULT => "0" }; ■' -" $def->{KillPipeJW} = {SIZE => 1, DIR => "in", DEFAULT => "0" }; $def-> {StallJR} = {SIZE => 1, DIR => "out" } ; foreach $read_port (@{$rf->{READ_PORT} }) { $data_size = $read_port->{MAXJWIDTH} ,- $addr_size = $rf->{HI_ADDR_SIZE} ,- $field = rfieldC'addr", $read_port, 0) ;
$def->{$field} = { SIZE => $addr_size, DIR => "in", DEFAULT =>
"x"}; foreach $s (@{$read_port->{USE} } ) {
$field = rfield("use$s", $read_port, 0) ;
$def->{$field} = { SIZE => 1, DIR => "in", DEFAULT => "0"};
$field = rfield("data", $read_ >ort, $s) ; $def->{$field} = { SIZE => $data_size, DIR => "out" } ; } foreach $width (@{$read_port->{WIDTH} }) {
$field = rfield("width$width", $read_port, 0) ; $def->{$field} = { SIZE => 1, DIR => "in", DEFAULT => "0"};
}
} foreach $write_port (@{$rf->{WRITEJPORT} }) { $data_size = $write_port->{MAXJWIDTH} ; $addr_size = $rf->{HI_ADDR_SIZE} ,-
$field = wfield ("addr" , $write_port, 0) ;
$def->{$field} = { SIZE => $addr_size, DIR => "in", DEFAULT =>
"x"}; foreach $s (@{$write_port->{DEF} }) {
$field = wfield ("def$s", $write_ 3ort, 0) ;
$def->{$field} = { SIZE => 1, DIR => "in", DEFAULT => "0"}; foreach $w (@{$write_port-> {WIDTH} }) {
$field = wfield("data$w" , $write_jport, $s) ; $def->{$field} = { SIZE => $data_size, DIR => "in", DEFAULT => "x"}; } } foreach $s (1 .. $write_port->{MAX_DEF}) { if ($s <= &max(@{$write_port->{DEF} }) ) { $field = wfieldC'wen" , $write_port, $s) ,- $def->{$field} = { SIZE => 1, DIR => "in", DEFAULT =>
"x"};
} } foreach $width (@{$write_port->{WIDTH} }) {
$field = rfield("width$width", $write_port, 0) ; $def->{$field} = { SIZE => 1, DIR => "in", DEFAULT => "0"}; } } return $def;
}.;
sub regfile_stall_write { my($rf, $time, $addr, $width) my($i) ; for($i = 0; $i < $width / $rf->{MINJWIDTH} ; $i++) { $main: :regfile_stall->{$time}->{$addr + $i} = 1;
} } sub regfile_stall_read { my($rf, $time, $addr, $width) = @_; my($i, $stall) ;
$stall = 0; for($i = 0; $i < $width / $rf->{MIN_WIDTH} ; $i++) {
$stall |= defined $main: :regfile_stall->{$time}->{$addr + $i},-
} return $stall;
} sub regfile_write { my($rf, $time, $addr, $data, $width) = @_; my($i) ; for($i = 0; $i < $width / $rf->{MINJWIDTH} ; $i++) { $main: :regfile->{$time}->{$addr + $i} =
($data >> ($i * $rf->{MINJWIDTH}) ) & ( (1 << $rf- >{MI JWIDTH}) - 1) ; } } sub regfile_read { my($rf, $time, $addr, $width) = @_; my($t, $out_value, $i) ; $out_value = 0; for($i = 0; $i < $width / $rf->{MINJWIDTH} ; $i++) { my ($value) ; for($t = $time; $t >= 0; $t--) {
$value = $main: :regfile->{$t}->{$addr + $i}; if (defined $value) { last; } } if ( ! defined $value) { die "regfile_read: time=$time addr=$addr value undefined";
}
$out_value | = $value << ( $i * $rf - > {MINJWIDTH} ) ;
* } return $out_value;
}
sub init_field { my($rf, $time) = @_; my($field, $default, $size, $dir, $value) ; foreach $field (keys (%{$rf->{SIGNALS} }) ) { my($info) = $rf->{siGNALS}->{$field} ; $default = $info->{DEFAULT} ; $size = $info->{SIZE} ; $dir = $info->{DIR} ; if ($dir eq "in" && $size > 0) { if ($default eq "0") {
$value = 0; } elsif ($default eq "x") {
$value = $size . "'b" . ('x' x $size) } else { die "Bad init field in $field\n" ,-
} add_field($rf, $time, $field, $value) ; } elsif ($field eq "StalljR") { add_field($rf, $time, $field, "1:0"); } } }
sub add_field { my($rf, $time, $field, $value) = @_; my($info) = $rf->{SIGNALS} ->{$field} ; die "addjfield: field \"$field\" not found" if ! defined $info; return if $info->{SIZE} == 0; if (! defined $main: :vector->{$time}) {
$main: :ve.ctor->{$time} = { } ; initjfield($rf , $time) ;
}
$main: :vector->{$time} ->{$field} = $value;
}
sub make_ iew_pipeline_register_cell { my ( $rf ) = ©_,- my (@iolist , $read_j?ort , $s , $w, $s , $write_port , $module) ; foreach $read_port (®{ $rf - > { READ JPOR } } ) { foreach $s (@{ $read_port-> {USE} } ) { my ($data) = rf ield ( "data" , $read_ port , $s) ; my ( $decl) = " [ " . ($read_port- > {MAXJWIDTH} - 1) . " : 0] " ; push (©iolist , " $data, " ) ; print TEST " wire $decl $data,-\n";
}
# don't need an address for a single word register file if ($rf->{HI_ADDR__SIZE} > 0) { my($addr) = r ield ("addr" , $read_port, 0) ; my($decl) = " [" . ($rf->{HI_ADDR_SIZE} - 1) . ":0]"; push(@iolist, "$addr, ") ; print TEST " reg $decl $addr;\n";
} foreach $w (@{ $read_port-> {WIDTH} }) { my($width) = rfield("width$w" , $read__port, 0) ; push (©iolist, "$width, ") ,- print TEST " reg $width;\n";
} foreach $s (@{$read_port->{USE} }) { my($use) = rfield("use$s", $read_port, 0) ; push (©iolist, "$use, ") ; print TEST " reg $use,-\n"; } } foreach $write_port (@{$rf->{W'RITEJPORT} }) {
# don't need an address for a single word register file if ($rf->{HI_ADDRJ3 ZE} > 0) { my($addr) = wfield ("addr" , $write_port, 0) ,- my($decl) = " [" . ($rf->{HI_ADDR_SIZE} - 1) . ":0]"; push (©iolist, "$addr, ") ; print TEST " reg $decl $addr;\n";
} foreach $w (@{$write_j>ort->{wiDTH} }) { my($width) = rf ield("width$w", $write_port, 0) ; push (©iolist, 1,$width, ") ; print TEST " reg $width;\n";
} foreach $s (@{$write_ >ort->{DEF} }) { my($def) = wfield ("def $s" , $write_port, 0) ,- push(@iolist, "$def, ") ; print TEST " reg $def;\n";
} foreach $w (@{$write_port->{wiDTH} }) { foreach $s (®{$write_port->{DEF} }) { my($data) = wfield("data$w" , $write_port, $s) ; my($decl) = " [" . ($w - 1) . ":0]"; push (©iolist, "$data, ") ; print TEST " reg $decl $data,-\n";
} } foreach $s (1 .. $write_port->{MAXJDEF}) { if ($s <= &max(@{$write__port->{DΞF} }) ) { my($wen) = wfield ("wen" , $write_port, $s) ; push (©iolist, "$wen, "); print TEST " reg $wen;\n"; } } } push(@iolist, "KillJΞ, " ) ,- print TEST "// reg KillJΞ;\n" ; push(@iolist, "KillPipe_W, ") ; print TEST " reg KillPipeJW; \n" ; push(®iolist, "StallJR, ") ; print TEST " wire StallJR; \n" ; push(@iolist, "elk) ;\n") ; print TEST " reg clk;\n"; print TEST " $rf->{NAME} iθ("; print TEST join('', ©iolist); print TEST "\n";
sub print__vector { my($rf) = ©_,- my($time, $size, $value, $width, $last value, $mask, $addr, $dir,
$field) my ($max_time) = max(keys (%$main: :vector) ) ; print TEST "module driver,- \n" ; make_view_pipeline_register_cell ($rf) ; print TEST " initial begin\n" ,- print TEST " #2 ;\n"; for($time = 0; $time <= $max_time; $time++) { print TEST "\n"; print TEST "\n"; print TEST " //' time: $time\n"; foreach $field (sort (keys (%{$main: :vector->{$time} }) ) ) {
$dir = $rf->{SIGNALS}->{$field}->{DIR}; next if $dir ne "in";
$value = $main: :vector->{$time}->{$field} ;
$last_value = $main: :vector->{$time-l}->{$field} ; if ($time == 0 | | ! defined $last_value | | $value ne
$last_value) { print TEST " $field = $value,-\n" ; }
} print TEST " #5;\n"; if (defined $main: :print_vector{$time}) { print TEST " \$display (\"$main: :print_vector{$time}\") ;\n";
} foreach $field (sort (keys (%{$main: :vector->{$time} } ) ) ) { $dir = $rf->{SIGNALS} ->{$field} ->{DIR} ; next if $dir ne "out";
($width, $value) = split (':' , $main: :vector->{$time} - >{$field}); if ($field ne "StallJR") {
$field = $field . " [" . ($width - 1) . ":0]";
} print TEST " if ($field != $value) begin\n"; print TEST " \$display (\"FAIL! %d $field %d $value\", \$time, $field);\n"; print TEST " end\n" ;
} print TEST " #5 ;\n";
} print TEST " end\n" ; print TEST "xtflop #(1) dummy (Ki11_E, Stall_R, clk);\n\n"; print TEST "initial begin elk = 1; end\n" ; print TEST "always begin #5 elk = -elk; end\n\n"; print TEST "always begin #(" . ($max_time*lO) . "*10) \$finish; end\n"; print TEST "endmodule\n" ; } $main: :try_kill = 0; $main: :time = 0; $main: :nop_count = 0 ; sub inst { my($rf, $arg, $kill) = ®_; my ($write_port , $read_port, $write_port_num, $read_j?ort_num) ,- my($i, $arg_print, $op, $field, ©operand, $stall, $port, $addr, $width, $data, $def, $use, $time) ;
$time = $main: :time++; ©operand = split ( ' ' , $arg) ; $arg_print = " " ;
# check for stall on any read port $stall = 0; foreach $op (©operand) { next if substr($op, 0, 1) eq ">"; ($port, $addr, $use, $width) = split ('-' , $op) ; $stall |= regfile_stall_read($rf , $time + $use, $addr, $width) ;
} if ($stall) { add_field($rf , $time, "StalljR", "1:1"); }
# if there is no stall, try a random killpipe when this
# instruction reaches W if ($main: :tryjkill && $kill && $stall == 0 && int (rand (20) ) ==
0) {
$main: :nop_count = 4; add_field($rf, $time + 3, "KillPipe_W" , 1) ; $arg_print .= sprintf ("%-10s ", "Kill!"); }
# process the read(s) foreach $op (©operand) { next if substr($op, 0, 1) eq ">";
($port, $addr, $use, $width) = split ('-' , $op) ; $read_jport = $rf->{READ_PORT} -> [$port] ;
$field = rfieldC'addr", $read_port, 0) ; add_field($rf , $time, $field, $addr) ;
$field = rfield("use$use" , $read_port, 0) ; add_field($rf , $time, $field, 1) ,-
$field = rfield("width$width", $read_port, 0) ,- add_field($rf, $time, $field, 1) ,-
$data = regfile_read ($rf, $time, $addr, $width) ; if (! $stall) {
$field = rfieldC'data", $read_port, $use) ; add_field($rf, $time + $use, $field, "$width.- $data") ,-
}
$arg_jorint . = sprintf ( "%-20s " , "$op=$data " ) ;
}
# process the write (s) foreach $op (©operand) { next if substr($op, 0, 1) ne ">";
($port, $addr, $def, $width) = split ('-' , substr ($op, 1) ) ;
$write_port = $rf->{WRITE_PORT}-> [$port] ;
$field = wfield ("addr", $write_j?ort, 0) ; add_field($rf , $time, $field, $addr) ;
$field = wfield ( "def$def", $write_port, 0) ; add_field($rf, $time, $field, 1) ,-
$field = wfield ( "width$width" , $write_port, 0) ; add_field($rf, $time, $field, 1) ,-
$field = wfield("data$width" , $write_port, $def) ;
$data = int (rand (pow2 ($width) ) ) ; add_field($rf , $time + $def, $field, $data) ; if (! $stall) { for($i = 1; $i <= $def; $i++) {
$field = wfieldC'wen", $write_port, $i) ; add_field($rf, $time + $i, $field, 1) ,- regfile_stall_write ($rf, $time + $i, $addr, $width) ;
} if ($main: :nop_count == 0) { regfile_write ($rf, $time, $addr, $data, $width) ;
} }
$arg_ print . = sprintf ( "%-20s " , "$op=$data " ) ,-
} if ( $main : :nop_count > 0) {
$main: :nop_count-- ; }
$main: :print_vector{$time} = sprintf ("%4d: %d %s", $time, $stall, $arg_print) ,-
# replay the instruction on a stall if ($stall) { inst($rf, $arg, $kill) ;
}
sub test_view_pipeline_regfile { my($rf) = @_; my($i, $num, $port, $addr, $use, $def, $width, $op, $read_port, $write_port) ;
# write each address using max write-width, min def, min port # $write_port = $rf->{WRITE__PORT} -> [0] ;
$width = $write_port->{WIDTH}->[$#{@{$write_port->{WIDTH}}}] ,- for($addr = 0,-.$addr < $rf->{SIZE} / $width; $addr++) {
$a = $addr * $width / $rf->{MINJWIDTH} ;
$def = @{$write_port->{DEF}} [0] ;
$op = ">0-$a-$def-$width"; inst ($rf, $op, 0) ; }
# flush the pipeline for($i = 0; $i < 10; $i++) { inst($rf, '"', 0); }
# read each address using each read-width, each use, each port $port = 0; foreach $read__port (@{$rf->{READ_PORT} } ) { foreach $use (@{$read_port->{uSE} }) { foreach $width (@{ $read_port->{WIDTH} }) { for($addr = 0; $addr < $rf->{siZE} / $width; $addr++) { $a = $addr * $width / $rf->{MINJWIDTH} ; $op = "$port-$a-$use-$width" ; inst ($rf , $op, 0) ;
} } } $port++;
} while ($main: .-time < $rf ->{NUMJTEST_VECTOR} - 10) { $θp = »"; for($port = 0; $port < @{$rf-> {READ JPORT} } ; $port++) { if (int(rand(8) ) != 0) {
$read_port = @{$rf ->{READJPORT} } [$port] ;
$num = @{$read_port-> {WIDTH}} ;
$width = $read__port->{WIDTH}-> [int (rand ($num) )] ;
$addr = int (rand ($rf-> {SIZE} / $width) ) * $width / $rf-
>{MINJWIDTH};
$num = @{$read port->{USE}} ;
$use = $read port->{USE} -> [int (rand($num) ) ] ; $op .= " $port-$addr-$use-$width"; }
} for($port = 0; $port < @{$rf->{WRITEJPORT} } ,- $port++) { if (int (rand (8) ) != 0) {
$write_port = @{$rf->{WRITE_PORT} } [$port] ; $num = @{$write port->{WIDTH}},-
$width = $write_port->{wiDTH}-> [int (rand($num) ) ] ; $addr = int (rand($rf->{SIZE} / $width) ) * $width / $rf- >{MIN_WIDTH};
$num = @{$write_port->{DEF}} ;
$def = $writejport->{DEF}-> [int (rand ($num) )] ; $op .= " >$port-$addr-$def-$width"; }
} inst($rf, $op, 1) ;
}
# flush the pipeline for($i = 0; $i < 10; $i++) { inst($rf, "", 0) ; } print_vector ($rf) ,- }
my($rf, $rf_all, $i, $TEST, $usage, $ret) ; srand 1;
# default values $main: : erify = 0; $usage = <<EOF; usage: $0 [options] EOF
# parse the command line $ret = GetOptions (
"verify" => \$main: :verify, ); if (! $ret) { print "$usage"; exit 1; }
$rf_all = '$rf_all = [ • . joinC ', <>) . '] ;',- eval ($rf_all) || die "Syntax error in input description"; for($i = 0; $i < @$rf_all; $i++) { $rf = $rf_all->[$i] ; derive_constants ($rf) ; $rf->{SIGNALS} = set_def ($rf) ; write_regfile ($rf) ; print "\n\n"; write_regfilejoank ($rf) ; print "\n\n"; if (defined $rf->{TESTJFILENAME}) {
$TEST = $rf->{TESTjFILENAME}; open(TEST, ">$TEST") || die "Can't open $TEST: $!"; test_view_pipeline_regfile ($rf) ; close TEST; }
APPENDIX C
# ! /usr/xtensa/tools/bin/perl -w
# Generate ISA documentation from TIE files.
# $Id: GenISAHTM ,v 1.6 2000/01/06 00:53:16 earl Exp $
# Copyright 1999-2000' Tensilica Inc.
# These coded instructions, statements, and computer programs are
# Confidential Proprietary Information of Tensilica Inc. and may not be
# disclosed to third parties or copied in any form, in whole or in part ,
# without the prior written consent of Tensilica Inc. package Xtensa: :GetISAHTML;
# Imports
# Use this to find library files use lib $ENV{ ' TENSILICAjTOOLS ' } . '/lib'; #use lib '@χtools@/lib' ;
# Perl library modules use strict ; use Getopt : : ong;
# Other modules use HTML; use Xtensa: :TargetISA; use Xtensa: :GenISA;
# Program
# Prevent use strict errors for our global variables use vars qw(%idiom) ;
{
$::myname = 'GetISAHTML' /
# command line my $htmldir = undef; my $tpp = undef; die ("Usage is: $::myname -htmldir dir [options...] file\n")
,'; unless &GetOptions ("htmldir=s" => \$htmldir,
"tpp=s" => \$tpp) && defined ($htmldir) && OARGV == 1; if ( ! -d $htmldir) { mkdir ($htmldir, 0777)
|| die (" $ : :myname: $!, creating $htmldir.\n") ;
}
$htmldir . = '/' unless $htmldir =~ m|/$|; # ready for catenating filenames my ($isafile) = @ARGV; my $isa = new Xtensa: :TargetISA ($tpp, $isafile) ;
GenlSAHTML ($htmldir, $isa) ; } sub GenlSAHTML { my ($htmldir, $isa) = @_; my $indexfh = new FileHandle ($htmldir . 'index.html',
' > ' ) ; die ("$ : .-myname: $!, opening $ {htmldirjindex.html for output. \n") unless $indexfh; my $index = new HTML ($indexfh, 2) ; $index- >hbegin ( ' Instructions ' ) ; $index->block ('hi', 'Instructions'); $index->block ('h2', 'Alphabetic by mnemonic'); $index->bblock ( ' table ' ) ; $index->bblock ( ' t ead ' ) ; $index- >bblock ('tr');
$index->block ( ' th ' , 'Mnemonic', attribute {' align' , left1));
$index- >block ('th', 'Synopsis', attribute (' align' , 'left') ) ,-
$index->eblock ('tr'); $index- >eblock ( ' thead ' ) ;
$index- >bblock {'tbody'); f my $inst; foreach $inst (sort {isort ($a->mnemonic () , $b- >mnemonic () ) }
$isa->instructio () ) { my $instname = uc ($inst->mnemonic () ) ; my $synopsis = $inst->synopsis () ; if ( !defined($synopsis) ) { print STDERR ("$::myname: No synopsis for $instname\n" ) ;
$synopsis = ' ' ;
:- }
$_ = $instname; s/\.//g; my $instfile = $_ . '.html1; $index->bblock ('tr'); $index->bblock ('th', attribute ( 'align' , 'left')); $index->link ($instfile, $instname) ; $index->eblock ('th'); $index->block ('td', $synopsis, attribute (' align' ,
'left') )
$index->eblock ('tr'); my $instfh = new FileHandle ($htmldir . $instfile,
> ■ ) ; die ("$ : :myname: $!, opening $htmldir$instfile for output . \n" ) unless $instfh; my $html = new HTML ($instfh, 2) ;
$html->hbegin ("$instname - $synopsis"); instdoc ($html, $isa, $inst) ;
$html->hend () ;
$html->close () ;
$instfh->close () ; } # foreach inst $index->eblock ( ' tbody' ) ; $index->eblock ( ' table ' ) ; $index->hend () ; $index->close (); $indexfh->close () ; }
# Generate the instruction word box sub instbox { my ($html, $isa, $inst, $caption) = @_; my $instname = uc ($inst->mnemonic () ) ,- my $maxinstlen = $isa->maxsize () ; my $cellwidth = sprintf ("%. Of" , 720 / $maxinstlen) - 2; my $iv = $inst->value () ; my $im = $inst->mask () ; my $il = $inst->size () ,- my $pad = $maxinstlen - $il; my ©fields = ( ' ' ) x $il ,- push (©fields, "\n"); # something to force a mismatch my $oper; foreach $oper ($inst->operands () ) { my $field = $oper->field() ; my $fieldname = $field->name () ; my $b; foreach $b ($field->bitlist () ) { $fields t$b] = $fieldname;
. }
$html->bblock ('table1, attribute (' frame ' , 'void'), attribute ( ' rules ' , 'groups ' ) , attribute (' cellspacing' , 0), attribute ( ' cellpadding ' , 0) ) ; if (defined($caption) && $caption ne '') { $html->inline ('caption', $caption)
}
# column groups my $repeat ,- foreach $repeat (1 .. $pad) {
$html->empty ('col', attribute ( 'width' , $cellwidth) ) ,-
} my $j = $il-l; my $i; for ($i = $il-2; $i >= 0; $i -= 1) { if ($fields [$i] ne $fields [$i+l] ) {
$html->empty ( ' colgroup ' , attribute (' colspan' , $j -
$i)); foreach $repeat (1 .. ($j - $i) ) { $html->empty ('col', attribute ( 'width' , $cellwidth) ) ;
}
$j = $i;
}
}
$html->empty ('colgroup', attribute (' colspan' , $j +•
D) ; foreach $repeat (1 .. ($j + 1) ) {
$html->empty ('col', attribute ( 'width' , $cellwidth) ) ;
}
# bit numbers
$html- >bblock ( ' thead ' ) ;
$html->bblock ( ' tr ' ) ; foreach $repeat (1 .. $pad) {
$html->block('td' , '', attribute ( 'width' , $cellwidth) ) ;
} for ($i = $il-l ; $i >= 0 ; $i -= 1) { if ( $f ields [$i] ne $f ields [$i+l]
II $i == o
II $fields [$i] ne $fields t$i-l] ) { $html->bblock ('td', '', attribute ( 'width' , $cellwidth) , attribute ( ' align ' , ' center ' ) ) ; $html->inline ('small', $i) ; $html->eblock ( ' td ) ; } else {
$html->block ('td', '', attribute ( 'width' , $cellwidth) , attribute ( 'align' , 'center'));
} }
$html->eblock ('tr'); $html->eblock ( ' thead' ) ; # fields
$html->bblock ( ' tbody ' ) ; $html->bblock ( ' tr ' ) ; if ($pad != 0) {
$html->block ('td', '', attribute (' colspan' , $pad) , attribute ( 'width' , $pad * $cellwidth) ) ;
}
$j = $il-l; for ($i = $il-l; $i >= 0; $i -= 1) { if ($i != $j S S $fields[$i] ne $fields [$i+l] ) { $html->block ('td', $fields [$i+l] , attribute (' colspan' , $j $i attribute ( 'width' , ($j - $i) * $cellwidth) , attribute (' align' , 'center'), attribute ( 'bgcolo ' ,
#FFE4E1
$j = $i;
} if ($fields [$i] eq ' ' ) { $b = ($iv >> $i) Sc 1;
$html->block ('td', $b, attribute ( 'width' , $cellwidth) , attribute (' align' , 'center'), attribute ( ' bgcolor ' , ' #FFF0F5 ' ) ) ; $j = $i - 1; }
} if ($j != -1) {
$html->block ('td', $fields [0] , attribute (' colspan' ,
$j + 1), attribute ( 'width' , ($j + 1) * $cellwidth) , attribute ( ' align ' , ' center ' ) ) ;
}
$html->eblock ('tr'); $html- >eblock ( ' tbody ' ) ,- # field widths $html->bblock ( ' tfoot ' ) ; $html ->bblock ( ' tr ' ) ; if ($pad != 0) {
$html->block ('td', '', attribute (' colspan' , $pad) , attribute ( 'width' , $pad * $cellwidth) ) ;
' $j = $il-l; for ($i = $il-2; $i >= 0; $i -= 1) { if ($fields [$i] ne $fields [$i+l] ) {
$html->bblock ('td', attribute (' colspan' , $j - $i) , attribute ( 'width' , ($j - $i) *
$cellwidth) , attribute ( ' alig ' , ' center' ) ) ;
$html->inline ('small', $j - $i) ;
$html->eblock ( ' td ' ) ;
$j = $i;
}
}
$html->bblock ('td', attribute (' colspan' , $j + 1), attribute ( 'width' , ($j + 1) *
$cellwidth) , attribute (' align' , 'center'));
$html->inline ('small', $j+l) ;
$html->eblock ('td');
$html->eblock ('tr');
$html->eblock ( ' tfoot ' ) ;
$html->eblock ( ' table ' ) ;
} # instbox
# Generate documentation for instrution $inst to HTML object $hmtl. sub instdoc { my ($html, $isa, $inst) = @_; my $instname = uc ($inst->mnemonic () ) ; my $synopsis = $inst->synopsis () ; if (! defined ($synopsis) ) { print STDERR ("$::myname: No synopsis for $instname\n" ) ;
$synopsis = ' ' ;
}
$html->block ('hi', "$instname &#8212; $synopsis");
$html->block ('h2', ' Instruction Word' ) ; instbox ($html, $isa, $inst) ,-
$html->block ('h2', 'Package'); if ($idiom{$instname} ) {
$_ •= 'Assembler Macro'; } else {
$_ = $inst->package () ; s/ :.$//; my $pkglong = $pkglong{$_} ; if (defined ($pkglong) ) { $_ = $pkglong,-
} else { tr/a-z/A-Z/;
}
$html->block Cp', $_) ;
$html->block ('h2', 'Assembler Syntax');
$html->bblock ( 'p ' ) ; my @iasm = map ($_->name, $inst->operands) ; $html->inline ('code', @iasm =■= 0 ? $instname
: ($instname . ' ' . -join (', ', ©iasm) ) ) ;
$html->eblock Cp1) ; $html->block Ch2'", 'Description'); my $idesc = $inst->description() ; if (! defined ($idesc) ) { print STDERR "$::myname: Mo description for $instname . \n" ; } else {
$idesc ■=- s|<INSTREF>( [A- Z.]+?) </lNSTREF> I insthref ($1) |gei; $html->iprint ($idesc) ;
} my $iasmnote = $inst->asmnote () ; if (defined($iasmnote) ) {
$iasmnote =~ s | <INSTREF> ( [A-
Z.]+?) </INSTREF> j insthref ($1) |gei;
$html->block ('h2', 'Assembler Note ') ,-
$html->iprint ($iasmnote) ;
}
$html->block ( 'h2 ' , 'Operation' ) ; my $tiesem = $inst->tiesemantics () ; if (defined ($tiesem) ) {
$html->pre ($tiesem) ; } else {
$html->bblock Cp');
$html->binline ( ' code ' ) ;
$html->iprint ('x &#8592; y');
$html->inline ('sub', '7'.);
$html->inline ( ' sup ' , ' 8 ' ) ;
$html->iprint (' | | y') ;
$html->einline ( ' code ' ) ;
$html->eblock ( 'p') ;
}
$html->block ( 'h2 ' , 'Exceptions' ) ;
{ my ©exceptions = $inst->exceptions () ; if (©exceptions != 0) {
$html->bblock (' ul ' ) ; my $e; foreach $e (©exceptions) { my $ename = $e->name(); my $elong = $exclong{$ename} ; $elong = $ename unless defined($elong) ; $html->block ('li', $elong) ;
}
$html->eblock C ul ' ) ; } else {
$html->block ("p1 , 'None'); }
} my $iimpnote = $inst->impnote () ; if (defined ($iimpnote) ) {
$iimpnote =~ s | <TNSTREF> ( [A-
Z.]+?) </INSTREF> I insthref ($1) |gei;
$html->block ('h2'', ' Implementation Note ') ;
$html->iprint ($iimpnote) ;
} } # instdoc
# Return HTML fragment for referencing another instruction sub insthref { my ($inst) = @_; $_ = $inst; s/\-//g;
'<CODExA HREF=" ' . $_ . '.html">' . $inst .* '</Ax/CODE>' ; }
# Local Variables:
# mode:perl
# perl -indent-level .- 2
# cperl -inden -level : 2
# End:
# Stuff common to GenlSAHTML and GenlSAMIF
# $Id: GenISA.pm,v 1.7 1999/12/19 08:10:38 earl Exp $
# Copyright 1999-2000 Tensilica Inc.
# These coded instructions, statements, and computer programs are
# Confidential Proprietary Information of Tensilica Inc. and may not be
# disclosed to third parties or copied in any form, in whole or in part,
# without the prior written consent of Tensilica Inc. package Xtensa: :GenISA;
# Exports use Exporter () ;
©Xtensa: :GenISA: : ISA = qw (Exporter) ; ©Xtensa: :GenISA: :EXPORT = qw(%pkglong %pkgchapter ' %exclong &isort -^generated) ;
©Xtensa: :GenISA: :EXPORT_OK =■ ©Xtensa: :GenISA: : EXPORT; %Xtensa: -.GenlSA: :EXPORT_TAGS = 0 ; # Imports
# Perl library modules use strict;
# Module body begins here
# Prevent use strict errors for our global variables use vars qw(%pkglong %pkgchapt'er %exclong) ;
%pkglong = (
32bitdiv' => '32-bit Integer Divide ' ,
32bitmul' => '32-bit Integer Multiply', athens' => 'Xtensa VI', booleans' => 'Coprocessor Option', coprocessor1 => 'Coprocessor Option', core' ==> 'Core Architecture', datacache' => 'Data Cache', debug' => 'Debug Option', density' => 'Code Density Option', exception' => 'Exception Option', fp' => 'Floating Point', instcache' => 'Instruction Cache', interrupt' => 'Interrupt Option', maclδ' => 'MAC16 Option', 'rnisc' => 'Miscellaneous', 'mull6' => 'Mull6 Option', regwin' => 'Windowed Registers Option', spec' => 'Speculation Option', sync' => 'Multiprocessor Synchronization Option', timer' => 'Timer Option',
'vectorinteger ' => 'Vector Integer Coprocessor' ); %pkgchapter = (
32bitdiV => 'ch5',
32bitmul' => 'ch5', athens ' => ' ch5 ' , booleans' => ' ch7 ' , coprocessor' => ' ch7 ' , core ' => ' ch5 ' , datacache' => 'ch5', debug' •=> ' ch5 ' , density' =■> ' ch.5 ' , exception' ■=> ' ch5 ' , fp' => 'chδ', instcache' => ch5 ' , interrupt' => ' ch5 ' , 1 macl6 ' => ' ch6 ' , misc ' => ' ch5 ' , mull6 ' •=> 'ch5 ' , 'regwin' => ' ch5 ' , ' spec' => ' ch5 ' , 1 sync ' => ' ch5 ' , ' timer' => ' ch5 ' , 'vectorinteger ' => 'vec' ); %exclong = (
'SystemCall' => 'System Call', 'LoadStoreError' => 'Load Store Error', 'Floatingpoint' => 'Floating Point Exception', ' InstructionFetchError ' =•> 'Instruction Fetch Error ' ,
' IntegerDivideByZero ' => 'Integer Divide by Zero' );
# Instruction name sort sub isort { my ($am, $bm) = @_; if ($am =- /A([A-Za-z]+) (\d+) (.*)$/) { my($al,$a2,$a3) = ($1,$2,$3); if ($bm =~ /A([A-Za-z]+) (\d+) (.*)$/) { my($bl,$b2,$b3) = ($1,$2,$3); return ($al cmp $bl) | | ($a2 <=> $b2) | | ($a3 cmp
$b3)
}
}
$am cmp $bm;
}
# Generated output file comment sub generated { my ($handle, $cstart, $cend, ©files) -= @_; my $date; chomp ($date = "date");
$handle->print ($cstart, ' This file is automatically generated -- DO NOT EDIT', $cend, "\n");
$handle->print ($cstart, ' Generated from ', joinC ', ©files), $cend, "\n"),-
$handle->print ($cstart, ' by ' , $::myname, ' on ' , $date, $cend, "\n") ; } i;
# Local Variables: #'-' mode :perl
# cperl-indent-level : 2
# perl-indent-level -.2
# End:

Claims

WHAT IS CLAIMED IS :
1. A system for designing a configurable processor, the system comprising: hardware generation means for, based on a configuration specification including a predetermined portion and a user-defined portion, generating a description of a hardware implementation of the processor; and software generation means for, based on the configuration specification, generating software development tools specific to the hardware implementation; wherein the hardware generation means is for, based on the user-defined portion of the configuration specification, including a user-defined register file in the description of the hardware implementation of the processor; and the software generation means is for, based on the user-defined portion, including software related to the user-defined processor register file in the software development tools.
2. The system of claim 1, wherein the software related to the user-defined processor register file includes an instruction for accessing elements in the register file according to a field of the instruction.
3. The system of claim 2, wherein the hardware generation means is for generating at least part of the description of the hardware implementation in a register transfer level hardware description language.
4. The system of claim 1, wherein the configuration specification defines the register file using a statement specifying the width of elements in the register file.
5. The system of claim 1, wherein the configuration specification defines the register file using a statement specifying the number of elements in the register file.
6. The system of claim 1, wherein the hardware generation means is for determining a number of at least one of read ports and write ports of the register file independently of the configuration specification.
7. The system of claim 6, wherein the hardware generation means is for determining a number of read ports based on scheduling information in the configuration specification.
8. The system of claim 1, wherein the hardware generation means is for generating, as part of the processor hardware implementation description, a description of logic to assign write ports of the user-defined register file to instruction operands to minimize data staging costs.
9. The system of claim 1, wherein the hardware generation means is for generating pipeline logic for accessing the register file.
10. The system of claim 9, wherein read ports for the user-defined register file are read in the earliest stage of any instruction that uses them as a source operand.
11. The system of claim 9, wherein write ports for the user-defined register file are read in the latest stage of any instruction that uses it as a destination operand or in an instruction commit stage if later.
12. The system of claim 1, wherein the hardware generation means is for generating, as part of the hardware implementation of the processor, logic to provide a read port for the register file for each field, within an instruction accessing the register file, used to select a source operand from the register file.
13. The system of claim 1, wherein the hardware generation means is for generating, as part of the hardware implementation of the processor, bypass logic for accessing the register file.
14. The system of claim 13, wherein the hardware generation means is for generating the interlock logic for a given pipeline of the processor described by the configuration specification based on instruction operand and state usage descriptions in the configuration specification.
15. The system of claim 1, wherein the hardware generation means is for generating, as part of the hardware implementation of the processor, interlock logic for accessing the register file.
16. The system of claim 15, wherein the hardware generation means is for generating the interlock logic based on scheduling information in the configuration specification.
17- -The system of claim 15, wherein the hardware generation means is for generating the interlock logic for a given pipeline of the processor described by the configuration specification based on instruction operand and state usage descriptions in the configuration specification.
18. The system of claim 1 , wherein the hardware generation means is for generating the processor hardware implementation description to use at least one portion of processor logic described by the predetermined portion of the configuration specification to support access of the user-defined register file.
19. The system of claim 18, wherein the at least one portion of processor logic includes address computation logic.
20. The system of claim 19, wherein the address computation logic includes address adder logic.
21. The system of claim 19 wherein the at least one portion of processor logic includes data alignment logic shared between the predetermined and user-defined portions.
22. The system of claim 19 wherein the 'at least one portion of processor logic is a data memory.
23. The system of claim 1, wherein the user-defined portion of the configuration specification includes a description of an instruction which conditionally writes to the user-defined register file.
24. The system of claim 1, wherein the software generation means is for generating, as part of the software relating to the user-defined register file, diagnostic tests for design verification and manufacturing of the processor based on the configuration specification.
25. The system of claim 1, wherein: the configuration specification includes both reference and implementation semantics for instructions of the processor; and the reference semantics can be used to verify design correctness of the implementation semantics.
26. The system of claim 1, wherein: the processor instruction set description language includes instruction test cases; and the software generation means is for generating diagnostics for the test cases.
27. The system of claim 1, wherein the software generation means is for automatically generating test vectors by sampling operands to instructions in the processor instruction set description language while running an application.
28. The system of claim 1, wherein the software generation means is for generating at least a portion of an operating system as part of the software relating to user-defined states and register files.
29. The system of claim 28, wherein the generated portion of the operating system includes save and restore sequences for processor state.
30. The system of claim 29, wherein the save and restore sequences are generated with respect to interdependencies of component states and is valid for those interdependencies.
31. The system of claim 28, wherein the operating system is capable of saving less than an entirety of processor state during task switching.
32. The system of claim 28, wherein: the user-defined portion of the configuration specification defines a software data type not found in the predetermined portion of the configuration specification; and the compiler supports the software data type.
33. The system of claim 1, wherein the software generation means is for generating at least one of a compiler, a linker, a simulator and a debugger as part of the software relating to the user-defined register file.
34. The system of claim 1, wherein: the software generation means is for generating a compiler as part of the software relating to the user-defined register file; and the compiler is capable of allocating program variables to registers in the user-specified register file.
35. The system of claim 34, wherein the compiler is further capable of loading a value from memory .into a register in the user-defined register file, and storing a value in a register of the user- defined register file into memory.
36. The system of claim 34, wherein the compiler is further capable of moving a value from one register in a user-defined register file to another register in a user-defined register file.
37. The system of claim 34, wherein the compiler is for using scheduling information in the configuration specification to determine stall cycles of instructions in the software generated by the software generation means which access the user-defined register file.
38. The system of claim 1, wherein the software generation means is for automatically generating a monitor to check for coverage of bypass paths.
39. A system for designing a configurable processor, the system comprising: hardware generation means for, based on a. configuration specification including a predetermined portion and a user-defined portion, generating a description of a hardware implementation of the processor; and software generation means for, based on the configuration specification, generating software development tools specific to the hardware implementation; wherein the configuration specification includes a statement specifying scheduling information of instructions used in the software development tools; the hardware generation means is for, based on the configuration specification, generating a description of at least one of pipeline logic, pipeline stalling logic and instruction rescheduling logic.
40. The system of claim 39, wherein the scheduling information includes a statement that an operand of an instruction enters a pipeline of the processor at a given stage.
41. The system of claim 39, wherein the scheduling information includes a statement that an operation of an instruction exits a pipeline of the processor at a given stage.
42. The system of claim 39, wherein: the software generated by the software generation means includes a compiler which uses instructions described in the user-defined portion of the configuration specification; and the compiler uses the scheduling information during instruction scheduling to schedule the instructions described in the user-defined portion of the configuration specification.
43. The system of claim 39, wherein the configuration specification includes a description of an instruction which requires a plurality of processor cycles to be processed.
44. The system of claim 43, wherein: the configuration specification includes a description of an instruction's semantics which is independent of a target pipeline of the processor; and the hardware generation means is for generating as part of the processor hardware implementation a pipeline based on a pipeline description separate from the instruction semantics.
45. A system for designing a configurable processor, the system comprising: hardware generation means for, based on a configuration specification including a predetermined portion and a user-defined portion, generating a description of a hardware implementation of the processor; software generation means for, based on the configuration specification, generating software development tools specific to the hardware implementation; and document generation means for generating documentation of a processor instruction set described by the configuration specification based on the configuration specification.
46. The system of claim 45, wherein the document generation means is for using reference semantics of instructions defined in the configuration specification to generate the processor instruction set documentation.
47. The system of claim 45, wherein: the user-defined portion of the configuration specification contains reference semantics of an instruction defined therein and a user-defined specification of at least one of a synopsis and a text description for the user-defined instruction; and the document generation means is for using the at least one of the synopsis and the text description to generate documentation of the processor instruction set.
48. A system for designing a configurable processor, the system comprising: hardware generation means for, based on a configuration specification including a predetermined portion and a user-defined portion, generating a description of a hardware implementation of the processor; and software generation means for, based on the configuration specification, generating software development tools specific to the hardware implementation; wherein the configuration specification includes a specification of a processor exception and when a processor instruction raises the exception; and the hardware generation is for generating hardware supporting that exception as part of the processor hardware implementation.
49. A processor simulation system comprising: hardware simulation means for executing a hardware description of an extensible processor; software simulation means for executing a software reference model of the extensible processor; and cosimulation means for operating the hardware simulation means and the software simulation means and comparing results of simulations therefrom to establish correspondence between the hardware description of the extensible processor and the software reference model of the extensible processor.
PCT/US2001/005051 2000-02-17 2001-02-15 Automated processor generation system for designing a configurable processor and method for the same WO2001061576A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020027010522A KR100589744B1 (en) 2000-02-17 2001-02-15 Automated processor generation system for designing a configurable processor and method for the same
JP2001560891A JP4619606B2 (en) 2000-02-17 2001-02-15 Automated processor generation system and method for designing a configurable processor
GB0217221A GB2376546B (en) 2000-02-17 2001-02-15 Automated processor generation system for designing a configurable processor and method for the same
AU2001238403A AU2001238403A1 (en) 2000-02-17 2001-02-15 Automated processor generation system for designing a configurable processor and method for the same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/506,502 2000-02-17
US09/506,502 US7036106B1 (en) 2000-02-17 2000-02-17 Automated processor generation system for designing a configurable processor and method for the same

Publications (2)

Publication Number Publication Date
WO2001061576A2 true WO2001061576A2 (en) 2001-08-23
WO2001061576A3 WO2001061576A3 (en) 2003-03-27

Family

ID=24014856

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/005051 WO2001061576A2 (en) 2000-02-17 2001-02-15 Automated processor generation system for designing a configurable processor and method for the same

Country Status (8)

Country Link
US (4) US7036106B1 (en)
JP (1) JP4619606B2 (en)
KR (1) KR100589744B1 (en)
CN (1) CN1288585C (en)
AU (1) AU2001238403A1 (en)
GB (1) GB2376546B (en)
TW (1) TW571206B (en)
WO (1) WO2001061576A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6941548B2 (en) 2001-10-16 2005-09-06 Tensilica, Inc. Automatic instruction set architecture generation
US7219212B1 (en) 2002-05-13 2007-05-15 Tensilica, Inc. Load/store operation of memory misaligned vector data using alignment register storing realigned data portion for combining with remaining portion
WO2007092260A1 (en) * 2006-02-02 2007-08-16 Microsoft Corporation Software support for dynamically extensible processors
CN100338568C (en) * 2002-04-26 2007-09-19 株式会社东芝 Generating method for developing environment in development on-chip system and media for storing the same program
US7346881B2 (en) 2002-05-13 2008-03-18 Tensilica, Inc. Method and apparatus for adding advanced instructions in an extensible processor architecture
US7937559B1 (en) 2002-05-13 2011-05-03 Tensilica, Inc. System and method for generating a configurable processor supporting a user-defined plurality of instruction sizes
CN102567149A (en) * 2010-12-09 2012-07-11 上海华虹集成电路有限责任公司 SOC (system on chip) verifying method

Families Citing this family (107)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003505753A (en) 1999-06-10 2003-02-12 ペーアーツェーテー インフォルマツィオーンステヒノロギー ゲゼルシャフト ミット ベシュレンクテル ハフツング Sequence division method in cell structure
AU2001243463A1 (en) * 2000-03-10 2001-09-24 Arc International Plc Memory interface and method of interfacing between functional entities
FR2811093A1 (en) * 2000-06-30 2002-01-04 St Microelectronics Sa DEVICE AND METHOD FOR EVALUATING ALGORITHMS
DE10034869A1 (en) * 2000-07-18 2002-02-07 Siemens Ag Method for automatically obtaining a functional sequence of processes and tools for this
US8058899B2 (en) 2000-10-06 2011-11-15 Martin Vorbach Logic cell array and bus system
GB0028079D0 (en) * 2000-11-17 2001-01-03 Imperial College System and method
US9411532B2 (en) 2001-09-07 2016-08-09 Pact Xpp Technologies Ag Methods and systems for transferring data between a processing device and external devices
US9552047B2 (en) 2001-03-05 2017-01-24 Pact Xpp Technologies Ag Multiprocessor having runtime adjustable clock and clock dependent power supply
US9250908B2 (en) 2001-03-05 2016-02-02 Pact Xpp Technologies Ag Multi-processor bus and cache interconnection system
US9436631B2 (en) 2001-03-05 2016-09-06 Pact Xpp Technologies Ag Chip including memory element storing higher level memory data on a page by page basis
US10031733B2 (en) 2001-06-20 2018-07-24 Scientia Sol Mentis Ag Method for processing data
DE10305584A1 (en) * 2002-02-04 2003-08-07 Arthrex Inc Endoscopic instrument, provided with gripping device suitable for gripping surgical thread and precise positioning of knot
DE10205523A1 (en) * 2002-02-08 2003-08-28 Systemonic Ag Method for providing a design, test and development environment and a system for executing the method
US9170812B2 (en) 2002-03-21 2015-10-27 Pact Xpp Technologies Ag Data processing system having integrated pipelined array data processor
JP2003316838A (en) * 2002-04-19 2003-11-07 Nec Electronics Corp Design method for system lsi and storage medium with the method stored therein
GB0215033D0 (en) * 2002-06-28 2002-08-07 Critical Blue Ltd Instruction set translation method
GB0215034D0 (en) * 2002-06-28 2002-08-07 Critical Blue Ltd Architecture generation method
FR2843214B1 (en) * 2002-07-30 2008-07-04 Bull Sa METHOD FOR FUNCTIONALLY CHECKING AN INTEGRATED CIRCUIT MODEL TO CONSTITUTE A VERIFICATION PLATFORM, EMULATOR EQUIPMENT AND VERIFICATION PLATFORM.
EP1537486A1 (en) 2002-09-06 2005-06-08 PACT XPP Technologies AG Reconfigurable sequencer structure
US7228531B1 (en) * 2003-02-03 2007-06-05 Altera Corporation Methods and apparatus for optimizing a processor core on a programmable chip
US7305391B2 (en) * 2003-02-07 2007-12-04 Safenet, Inc. System and method for determining the start of a match of a regular expression
US7194705B1 (en) * 2003-03-14 2007-03-20 Xilinx, Inc. Simulation of integrated circuitry within a high-level modeling system using hardware description language circuit descriptions
US8612992B2 (en) * 2003-04-09 2013-12-17 Jaluna Sa Operating systems
DE60323811D1 (en) * 2003-04-09 2008-11-13 Jaluna S A operating systems
EP1503286B1 (en) * 2003-07-30 2014-09-03 Jaluna SA Multiple operating system networking
KR20070005917A (en) * 2003-09-30 2007-01-10 쟈루나 에스에이 Operating systems
US7290174B1 (en) * 2003-12-03 2007-10-30 Altera Corporation Methods and apparatus for generating test instruction sequences
US7770147B1 (en) * 2004-03-08 2010-08-03 Adaptec, Inc. Automatic generators for verilog programming
US20050216900A1 (en) * 2004-03-29 2005-09-29 Xiaohua Shi Instruction scheduling
US7398492B2 (en) * 2004-06-03 2008-07-08 Lsi Corporation Rules and directives for validating correct data used in the design of semiconductor products
US7404156B2 (en) * 2004-06-03 2008-07-22 Lsi Corporation Language and templates for use in the design of semiconductor products
US7334201B1 (en) * 2004-07-02 2008-02-19 Tensilica, Inc. Method and apparatus to measure hardware cost of adding complex instruction extensions to a processor
US7324106B1 (en) * 2004-07-27 2008-01-29 Nvidia Corporation Translation of register-combiner state into shader microcode
US7386825B2 (en) 2004-07-29 2008-06-10 International Business Machines Corporation Method, system and program product supporting presentation of a simulated or hardware system including configuration entities
US7389490B2 (en) * 2004-07-29 2008-06-17 International Business Machines Corporation Method, system and program product for providing a configuration specification language supporting selective presentation of configuration entities
WO2006046711A1 (en) * 2004-10-28 2006-05-04 Ipflex Inc. Data processing device having reconfigurable logic circuit
CN100389419C (en) * 2004-12-11 2008-05-21 鸿富锦精密工业(深圳)有限公司 System and method for system setting file memory
US7712081B2 (en) * 2005-01-19 2010-05-04 International Business Machines Corporation Using code motion and write and read delays to increase the probability of bug detection in concurrent systems
US7664928B1 (en) * 2005-01-19 2010-02-16 Tensilica, Inc. Method and apparatus for providing user-defined interfaces for a configurable processor
US7386814B1 (en) 2005-02-10 2008-06-10 Xilinx, Inc. Translation of high-level circuit design blocks into hardware description language
JP4342464B2 (en) * 2005-03-29 2009-10-14 富士通マイクロエレクトロニクス株式会社 Microcontroller
JP2009505171A (en) * 2005-06-27 2009-02-05 イコア コーポレイション Method for specifying a stateful transaction-oriented system and apparatus for flexible mapping to structurally configurable in-memory processing of semiconductor devices
DE102005041312A1 (en) * 2005-08-31 2007-03-15 Advanced Micro Devices, Inc., Sunnyvale Memory access to virtual target device
US7523434B1 (en) 2005-09-23 2009-04-21 Xilinx, Inc. Interfacing with a dynamically configurable arithmetic unit
US20070074078A1 (en) * 2005-09-23 2007-03-29 Potts Matthew P Test replication through revision control linking
US7478356B1 (en) * 2005-09-30 2009-01-13 Xilinx, Inc. Timing driven logic block configuration
US7366998B1 (en) 2005-11-08 2008-04-29 Xilinx, Inc. Efficient communication of data between blocks in a high level modeling system
EP1974265A1 (en) * 2006-01-18 2008-10-01 PACT XPP Technologies AG Hardware definition method
JP2007272797A (en) * 2006-03-31 2007-10-18 Toshiba Corp Pipeline high order synthesis system and method
US7827517B1 (en) * 2006-05-19 2010-11-02 Altera Corporation Automated register definition, builder and integration framework
JP4707191B2 (en) * 2006-09-26 2011-06-22 富士通株式会社 Verification support program, recording medium storing the program, verification support apparatus, and verification support method
JP5218063B2 (en) * 2006-11-21 2013-06-26 日本電気株式会社 Instruction opcode generation system
US7529909B2 (en) * 2006-12-28 2009-05-05 Microsoft Corporation Security verified reconfiguration of execution datapath in extensible microcomputer
US7971132B2 (en) * 2007-01-05 2011-06-28 Dialogic Corporation Universal multimedia engine and method for producing the same
JP2008176453A (en) * 2007-01-17 2008-07-31 Nec Electronics Corp Simulation device
US8726241B1 (en) * 2007-06-06 2014-05-13 Rockwell Collins, Inc. Method and system for the development of high-assurance computing elements
US7913203B1 (en) 2007-11-23 2011-03-22 Altera Corporation Method and apparatus for designing a system on multiple field programmable gate array device types
US7873934B1 (en) 2007-11-23 2011-01-18 Altera Corporation Method and apparatus for implementing carry chains on field programmable gate array devices
US8176406B2 (en) * 2008-03-19 2012-05-08 International Business Machines Corporation Hard error detection
US7974967B2 (en) * 2008-04-15 2011-07-05 Sap Ag Hybrid database system using runtime reconfigurable hardware
CN102144167B (en) * 2008-08-20 2014-03-12 国立大学法人九州工业大学 Generating device and generating method
US8136063B2 (en) * 2008-11-14 2012-03-13 Synopsys, Inc. Unfolding algorithm in multirate system folding
US8843862B2 (en) * 2008-12-16 2014-09-23 Synopsys, Inc. Method and apparatus for creating and changing logic representations in a logic design using arithmetic flexibility of numeric formats for data
US8127262B1 (en) * 2008-12-18 2012-02-28 Xilinx, Inc. Communicating state data between stages of pipelined packet processor
KR101553652B1 (en) * 2009-02-18 2015-09-16 삼성전자 주식회사 Apparatus and method for compiling instruction for heterogeneous processor
WO2011028116A2 (en) * 2009-09-04 2011-03-10 Silicon Hive B.V. Method and apparatus and record carrier
US8156459B1 (en) * 2009-11-10 2012-04-10 Xilinx, Inc. Detecting differences between high level block diagram models
US8548798B2 (en) * 2010-02-26 2013-10-01 International Business Machines Corporation Representations for graphical user interfaces of operators, data types, and data values in a plurality of natural languages
US8972953B2 (en) * 2010-04-16 2015-03-03 Salesforce.Com, Inc. Methods and systems for internally debugging code in an on-demand service environment
US8839214B2 (en) * 2010-06-30 2014-09-16 Microsoft Corporation Indexable type transformations
US8385340B1 (en) * 2010-08-17 2013-02-26 Xilinx, Inc. Pipeline of a packet processor programmed to concurrently perform operations
US8358653B1 (en) * 2010-08-17 2013-01-22 Xilinx, Inc. Generating a pipeline of a packet processor from a parsing tree
JP2012099035A (en) * 2010-11-05 2012-05-24 Fujitsu Ltd Operation verification method for processor, operation verification device for processor and operation verification program for processor
US8392866B2 (en) * 2010-12-20 2013-03-05 International Business Machines Corporation Task-based multi-process design synthesis with notification of transform signatures
US8341565B2 (en) 2010-12-20 2012-12-25 International Business Machines Corporation Task-based multi-process design synthesis with reproducible transforms
US8407652B2 (en) 2010-12-20 2013-03-26 International Business Machines Corporation Task-based multi-process design synthesis
US8423343B2 (en) * 2011-01-24 2013-04-16 National Tsing Hua University High-parallelism synchronization approach for multi-core instruction-set simulation
US8707266B2 (en) * 2011-03-21 2014-04-22 Cisco Technology, Inc. Command line interface robustness testing
US8520428B2 (en) * 2011-03-25 2013-08-27 Intel Corporation Combined data level-shifter and DE-skewer
CN102521011B (en) * 2011-11-18 2014-08-06 华为技术有限公司 Simulator generation method and simulator generation device
TWI505636B (en) * 2012-04-10 2015-10-21 Univ Lunghwa Sci & Technology Finite impulse reponse filter having optimum multi-stage sampling rate and method for manufacturing the same
CN103543983B (en) * 2012-07-11 2016-08-24 世意法(北京)半导体研发有限责任公司 For improving the novel data access method of the FIR operating characteristics in balance throughput data path architecture
GB2508233A (en) 2012-11-27 2014-05-28 Ibm Verifying logic design of a processor with an instruction pipeline by comparing the output from first and second instances of the design
US9696998B2 (en) * 2013-08-29 2017-07-04 Advanced Micro Devices, Inc. Programmable substitutions for microcode
US9811335B1 (en) 2013-10-14 2017-11-07 Quicklogic Corporation Assigning operational codes to lists of values of control signals selected from a processor design based on end-user software
US9336072B2 (en) * 2014-02-07 2016-05-10 Ralph Moore Event group extensions, systems, and methods
US9660650B1 (en) * 2014-03-13 2017-05-23 Altera Corporation Integrated circuits with improved register circuitry
US9268597B2 (en) 2014-04-01 2016-02-23 Google Inc. Incremental parallel processing of data
CN106664441A (en) * 2014-07-07 2017-05-10 汤姆逊许可公司 Enhancing video content according to metadata
CN105279062A (en) * 2014-07-24 2016-01-27 上海华虹集成电路有限责任公司 Method for adjusting random weight
US9250900B1 (en) 2014-10-01 2016-02-02 Cadence Design Systems, Inc. Method, system, and computer program product for implementing a microprocessor with a customizable register file bypass network
US10528443B2 (en) * 2015-01-30 2020-01-07 Samsung Electronics Co., Ltd. Validation of multiprocessor hardware component
US9507891B1 (en) * 2015-05-29 2016-11-29 International Business Machines Corporation Automating a microarchitecture design exploration environment
US10642617B2 (en) * 2015-12-08 2020-05-05 Via Alliance Semiconductor Co., Ltd. Processor with an expandable instruction set architecture for dynamically configuring execution resources
US9542290B1 (en) 2016-01-29 2017-01-10 International Business Machines Corporation Replicating test case data into a cache with non-naturally aligned data boundaries
CN105912264A (en) * 2016-04-01 2016-08-31 浪潮电子信息产业股份有限公司 Method and system for upgrading hard disk expander and hard disk expander
US10169180B2 (en) 2016-05-11 2019-01-01 International Business Machines Corporation Replicating test code and test data into a cache with non-naturally aligned data boundaries
US10055320B2 (en) 2016-07-12 2018-08-21 International Business Machines Corporation Replicating test case data into a cache and cache inhibited memory
US10223225B2 (en) 2016-11-07 2019-03-05 International Business Machines Corporation Testing speculative instruction execution with test cases placed in memory segments with non-naturally aligned data boundaries
US10261878B2 (en) 2017-03-14 2019-04-16 International Business Machines Corporation Stress testing a processor memory with a link stack
CN108920232B (en) * 2018-06-20 2021-06-22 维沃移动通信有限公司 Target object processing method and terminal equipment
CN109101239B (en) * 2018-08-30 2021-09-14 杭州电子科技大学 Standard answer generation method of online Verilog code automatic judgment system
CN111814093A (en) * 2019-04-12 2020-10-23 杭州中天微系统有限公司 Multiply-accumulate instruction processing method and device
CN111523283B (en) * 2020-04-16 2023-05-26 北京百度网讯科技有限公司 Method and device for verifying processor, electronic equipment and storage medium
US11662988B2 (en) * 2020-09-29 2023-05-30 Shenzhen GOODIX Technology Co., Ltd. Compiler for RISC processor having specialized registers
TWI783310B (en) * 2020-11-26 2022-11-11 華邦電子股份有限公司 Counting method and counting device
CN113392603B (en) * 2021-08-16 2022-02-18 北京芯愿景软件技术股份有限公司 RTL code generation method and device of gate level circuit and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0743599A2 (en) * 1995-05-15 1996-11-20 Interuniversitair Micro-Elektronica Centrum Vzw Method of generating code for programmable processor, code generator and application thereof
US5896521A (en) * 1996-03-15 1999-04-20 Mitsubishi Denki Kabushiki Kaisha Processor synthesis system and processor synthesis method

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5287511A (en) * 1988-07-11 1994-02-15 Star Semiconductor Corporation Architectures and methods for dividing processing tasks into tasks for a programmable real time signal processor and tasks for a decision making microprocessor interfacing therewith
US5623418A (en) 1990-04-06 1997-04-22 Lsi Logic Corporation System and method for creating and validating structural description of electronic system
US5555201A (en) 1990-04-06 1996-09-10 Lsi Logic Corporation Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including interactive system for hierarchical display of control and dataflow information
US5867399A (en) 1990-04-06 1999-02-02 Lsi Logic Corporation System and method for creating and validating structural description of electronic system from higher-level and behavior-oriented description
US5544067A (en) 1990-04-06 1996-08-06 Lsi Logic Corporation Method and system for creating, deriving and validating structural description of electronic system from higher level, behavior-oriented description, including interactive schematic design and simulation
US5572437A (en) * 1990-04-06 1996-11-05 Lsi Logic Corporation Method and system for creating and verifying structural logic model of electronic design from behavioral description, including generation of logic and timing models
US5613098A (en) 1991-03-07 1997-03-18 Digital Equipment Corporation Testing and debugging new Y architecture code on existing X architecture system by using an environment manager to switch between direct X code execution and simulated Y code execution
US5361373A (en) 1992-12-11 1994-11-01 Gilson Kent L Integrated circuit computing device comprising a dynamically configurable gate array having a microprocessor and reconfigurable instruction execution means and method therefor
US5748979A (en) 1995-04-05 1998-05-05 Xilinx Inc Reprogrammable instruction set accelerator using a plurality of programmable execution units and an instruction page table
ATE257611T1 (en) 1995-10-23 2004-01-15 Imec Inter Uni Micro Electr DESIGN SYSTEM AND METHODS FOR COMBINED DESIGN OF HARDWARE AND SOFTWARE
US5696956A (en) 1995-11-08 1997-12-09 Digital Equipment Corporation Dynamically programmable reduced instruction set computer with programmable processor loading on program number field and program number register contents
US5819064A (en) 1995-11-08 1998-10-06 President And Fellows Of Harvard College Hardware extraction technique for programmable reduced instruction set computers
US6035123A (en) 1995-11-08 2000-03-07 Digital Equipment Corporation Determining hardware complexity of software operations
US5887169A (en) 1996-03-15 1999-03-23 Compaq Computer Corporation Method and apparatus for providing dynamic entry points into a software layer
US5857106A (en) 1996-05-31 1999-01-05 Hewlett-Packard Company Runtime processor detection and installation of highly tuned processor specific routines
US5748875A (en) 1996-06-12 1998-05-05 Simpod, Inc. Digital logic simulation/emulation system
US6031992A (en) 1996-07-05 2000-02-29 Transmeta Corporation Combining hardware and software to provide an improved microprocessor
US5693956A (en) * 1996-07-29 1997-12-02 Motorola Inverted oleds on hard plastic substrate
US5832205A (en) 1996-08-20 1998-11-03 Transmeta Corporation Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed
US5889990A (en) 1996-11-05 1999-03-30 Sun Microsystems, Inc. Information appliance software architecture with replaceable service module providing abstraction function between system library and platform specific OS
US6006022A (en) 1996-11-15 1999-12-21 Microsystem Synthesis, Inc. Cross-linked development and deployment apparatus and method
US6028996A (en) 1997-03-18 2000-02-22 Ati Technologies, Inc. Method and apparatus for virtualizing system operation
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6058466A (en) * 1997-06-24 2000-05-02 Sun Microsystems, Inc. System for allocation of execution resources amongst multiple executing processes
US6321323B1 (en) 1997-06-27 2001-11-20 Sun Microsystems, Inc. System and method for executing platform-independent code on a co-processor
US5995736A (en) 1997-07-24 1999-11-30 Ati Technologies, Inc. Method and system for automatically modelling registers for integrated circuit design
US6078736A (en) 1997-08-28 2000-06-20 Xilinx, Inc. Method of designing FPGAs for dynamically reconfigurable computing
US6269409B1 (en) 1997-09-02 2001-07-31 Lsi Logic Corporation Method and apparatus for concurrent execution of operating systems
US5999730A (en) 1997-10-27 1999-12-07 Phoenix Technologies Limited Generation of firmware code using a graphic representation
US6230307B1 (en) 1998-01-26 2001-05-08 Xilinx, Inc. System and method for programming the hardware of field programmable gate arrays (FPGAs) and related reconfiguration resources as if they were software by creating hardware objects
US6052524A (en) 1998-05-14 2000-04-18 Software Development Systems, Inc. System and method for simulation of integrated hardware and software components
US6496847B1 (en) 1998-05-15 2002-12-17 Vmware, Inc. System and method for virtualizing computer systems
US6275893B1 (en) 1998-09-14 2001-08-14 Compaq Computer Corporation Method and apparatus for providing seamless hooking and intercepting of selected kernel and HAL exported entry points in an operating system
EP0992916A1 (en) * 1998-10-06 2000-04-12 Texas Instruments Inc. Digital signal processor
US6216216B1 (en) 1998-10-07 2001-04-10 Compaq Computer Corporation Method and apparatus for providing processor partitioning on a multiprocessor machine
US6282633B1 (en) 1998-11-13 2001-08-28 Tensilica, Inc. High data density RISC processor
US6477697B1 (en) * 1999-02-05 2002-11-05 Tensilica, Inc. Adding complex instruction extensions defined in a standardized language to a microprocessor design to produce a configurable definition of a target instruction set, and hdl description of circuitry necessary to implement the instruction set, and development and verification tools for the instruction set
US6477683B1 (en) 1999-02-05 2002-11-05 Tensilica, Inc. Automated processor generation system for designing a configurable processor and method for the same
US6295571B1 (en) 1999-03-19 2001-09-25 Times N Systems, Inc. Shared memory apparatus and method for multiprocessor systems
US6385757B1 (en) * 1999-08-20 2002-05-07 Hewlett-Packard Company Auto design of VLIW processors
US6640238B1 (en) * 1999-08-31 2003-10-28 Accenture Llp Activity component in a presentation services patterns environment
US6415379B1 (en) 1999-10-13 2002-07-02 Transmeta Corporation Method and apparatus for maintaining context while executing translated instructions
US6615167B1 (en) 2000-01-31 2003-09-02 International Business Machines Corporation Processor-independent system-on-chip verification for embedded processor systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0743599A2 (en) * 1995-05-15 1996-11-20 Interuniversitair Micro-Elektronica Centrum Vzw Method of generating code for programmable processor, code generator and application thereof
US5918035A (en) * 1995-05-15 1999-06-29 Imec Vzw Method for processor modeling in code generation and instruction set simulation
US5896521A (en) * 1996-03-15 1999-04-20 Mitsubishi Denki Kabushiki Kaisha Processor synthesis system and processor synthesis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAUTH A ET AL: "Describing instruction set processors using nML" EUROPEAN DESIGN AND TEST CONFERENCE, 1995. ED&TC 1995, PROCEEDINGS. PARIS, FRANCE 6-9 MARCH 1995, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 6 March 1995 (1995-03-06), pages 503-507, XP010147975 ISBN: 0-8186-7039-8 *
FAUTH A ET AL: "Generation of hardware machine models from instruction set descriptions" VLSI SIGNAL PROCESSING, VI, 1993., YWORKSHOP ON VELDHOVEN, NETHERLANDS 20-22 OCT. 1993, NEW YORK, NY, USA,IEEE, 20 October 1993 (1993-10-20), pages 242-250, XP010140403 ISBN: 0-7803-0996-0 *
HARTOOG M R ET AL: "Generation Of Software Tools From Processor Descriptions For Hardware/software Codesign" PROCEEDINGS OF THE DESIGN AUTOMATION CONFERENCE. ANAHEIM, JUNE 9 - 13, 1997, NEW YORK, ACM, US, vol. CONF. 34, 9 June 1997 (1997-06-09), pages 303-306, XP010227598 ISBN: 0-7803-4093-0 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6941548B2 (en) 2001-10-16 2005-09-06 Tensilica, Inc. Automatic instruction set architecture generation
GB2398144B (en) * 2001-10-16 2005-10-12 Tensilica Inc Automatic instruction set architecture generation
US7971197B2 (en) 2001-10-16 2011-06-28 Tensilica, Inc. Automatic instruction set architecture generation
CN100338568C (en) * 2002-04-26 2007-09-19 株式会社东芝 Generating method for developing environment in development on-chip system and media for storing the same program
EP1357485A3 (en) * 2002-04-26 2008-10-22 Kabushiki Kaisha Toshiba Method of generating development environment for developing system LSI and medium which stores program therefor
US7219212B1 (en) 2002-05-13 2007-05-15 Tensilica, Inc. Load/store operation of memory misaligned vector data using alignment register storing realigned data portion for combining with remaining portion
US7346881B2 (en) 2002-05-13 2008-03-18 Tensilica, Inc. Method and apparatus for adding advanced instructions in an extensible processor architecture
US7376812B1 (en) 2002-05-13 2008-05-20 Tensilica, Inc. Vector co-processor for configurable and extensible processor architecture
US7937559B1 (en) 2002-05-13 2011-05-03 Tensilica, Inc. System and method for generating a configurable processor supporting a user-defined plurality of instruction sizes
WO2007092260A1 (en) * 2006-02-02 2007-08-16 Microsoft Corporation Software support for dynamically extensible processors
US7757224B2 (en) 2006-02-02 2010-07-13 Microsoft Corporation Software support for dynamically extensible processors
CN102567149A (en) * 2010-12-09 2012-07-11 上海华虹集成电路有限责任公司 SOC (system on chip) verifying method

Also Published As

Publication number Publication date
US7437700B2 (en) 2008-10-14
CN1436335A (en) 2003-08-13
KR20030016226A (en) 2003-02-26
GB2376546B (en) 2004-08-04
TW571206B (en) 2004-01-11
US20090172630A1 (en) 2009-07-02
GB2376546A (en) 2002-12-18
US7036106B1 (en) 2006-04-25
GB0217221D0 (en) 2002-09-04
JP4619606B2 (en) 2011-01-26
US8161432B2 (en) 2012-04-17
WO2001061576A3 (en) 2003-03-27
CN1288585C (en) 2006-12-06
US20060101369A1 (en) 2006-05-11
JP2004502990A (en) 2004-01-29
KR100589744B1 (en) 2006-06-15
AU2001238403A1 (en) 2001-08-27
US9582278B2 (en) 2017-02-28
US20090177876A1 (en) 2009-07-09

Similar Documents

Publication Publication Date Title
WO2001061576A2 (en) Automated processor generation system for designing a configurable processor and method for the same
Marwedel et al. Code generation for embedded processors
Sun et al. Custom-instruction synthesis for extensible-processor platforms
US5854929A (en) Method of generating code for programmable processors, code generator and application thereof
KR100775547B1 (en) Automated processor generation system for designing a configurable processor and method for the same
Nurmi Processor design: system-on-chip computing for ASICs and FPGAs
EP0743599A2 (en) Method of generating code for programmable processor, code generator and application thereof
Chattopadhyay et al. LISA: A uniform ADL for embedded processor modeling, implementation, and software toolsuite generation
Mantovani et al. HL5: a 32-bit RISC-V processor designed with high-level synthesis
Rykunov Design of asynchronous microprocessor for power proportionality
Basu et al. High level synthesis from Sim-nML processor models
Zaretsky et al. Overview of the FREEDOM compiler for mapping DSP software to FPGAs
Buchholz et al. Behavioral emulation of synthesized RT-level descriptions using VLIW architectures
Hirvonen et al. AEx: Automated customization of exposed datapath soft-cores
Guo et al. Automation of IP core interface generation for reconfigurable computing
Strauch An aspect and transaction oriented programming, design and verification language (PDVL)
Lewis et al. CADRE: an asynchronous embedded DSP for mobile phone applications
Jaggar A Performance Study of the Acorn RISC Machine
Urban et al. Compiler-Centred Microprocessor Design (CoMet)-From C-Code to a VHDL Model of an ASIP
Öberg Synthesis of VLIW Accelerators from Formal Descriptions in a Real-Time Multi-Core Environment
Eissa Modeling of a multi-core microblaze system at RTL and TLM abstraction levels in systemC
Järvelä Vector operation support for transport triggered architectures
Lawrence et al. INCA: a next-generation architecture for simulation
Radetzki et al. Modeling of a Multi core MicroBlaze System at TL and TLM Abstraction Levels in SystemC
CN1382280B (en) For designing automatic processor generation system and the method thereof of configurable processor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase in:

Ref country code: GB

Ref document number: 0217221

Kind code of ref document: A

Free format text: PCT FILING DATE = 20010215

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1020027010522

Country of ref document: KR

ENP Entry into the national phase in:

Ref country code: JP

Ref document number: 2001 560891

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 018052401

Country of ref document: CN

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1020027010522

Country of ref document: KR

122 Ep: pct application non-entry in european phase
WWG Wipo information: grant in national office

Ref document number: 1020027010522

Country of ref document: KR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)