CA1288170C - Multinode reconfigurable pipeline computer - Google Patents

Multinode reconfigurable pipeline computer

Info

Publication number
CA1288170C
CA1288170C CA000551833A CA551833A CA1288170C CA 1288170 C CA1288170 C CA 1288170C CA 000551833 A CA000551833 A CA 000551833A CA 551833 A CA551833 A CA 551833A CA 1288170 C CA1288170 C CA 1288170C
Authority
CA
Canada
Prior art keywords
output
inputs
group
masnet
programmable processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA000551833A
Other languages
French (fr)
Inventor
Daniel M. Nosenchuck
Michael G. Littman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Princeton University
Original Assignee
Princeton University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Princeton University filed Critical Princeton University
Application granted granted Critical
Publication of CA1288170C publication Critical patent/CA1288170C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • G06F15/8092Array of vector units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Abstract

MULTINODE RECONFIGURABLE PIPELINE COMPUTER

Inventors: Daniel M. Nosenchuck and Michael G. Littman ABSTRACT

A multinode parallel-processing computer comprises plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. The nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.

Description

~AC~GROUND O~ lE ~VENTIoN
,'- 1. Field of ~he Invention Th~ lnventlon relates to a comput~r ~ormed of many nodes in whlch each of the node lr.cludes a reconfigurable, many-func~lon ALU pipelln~ conne~ted to ~ultiple, lndependent me~ory planes through a ~ulti-function memory-ALU netwo~k 6witch (MASNET) and the multiple node~ are connected in a hypercube topology.
2. pe~criPtlon of Rel~te~ Art The computer of the present lnvention i6 both a parallel and ~ pipelined ~achine. The prior art doe~ dlsclose in certain limlted contexts the concept of parallelism and plpelining. See, for example~ U.S, Patent No. 4,589,067. However, the Lnternal archltecture of the pre~ent inven~lon i~ unique ln that lt allows ~or most, i~ not all of the computer building bloc~s b~ing ,5 ~lmultaneou~ly actlve. U.S. Paten~ No. 4,589,06? i~ typical of t~l~ prlor art in that lt describe~ a vector proce~sor based upon a dynamically rec~nfigu~a~le ALU pipellnQ. Thi~ procQssor i8 fiimllar to a ~lngle ~unctlonal unlt of the pre~ent lnvention's recon~igurable pipeline. In one Gense the plpellne of the pre~ent lnventio~'~ node i~ thus a pipeline of plpeline~. other 6tructures t~at pos~ibly merlt comparl60n wl~ the present lnvention are th~ Systolic Array by Kung, The MIT Data-Flow Concept and thQ concept of other parallel ~r~hltecSure~.
The Sy~tolic Array concept by H.T. Kung of Carnegie Melon UniverGlty involves d~ta which i~ "pumped" (l.e. flow~) through -2~ ~

8817{~

the computer as "waves". Unlike the present invention, the Systolic Array system is comprised of homogenous building blocks where each building block performs a given operation. In the Systolic Array computer, as data flows through, the interconnection between identical building blocks remains fixed during a computation. At best, the configuration cannot be changed until all data is processed by the Systolic Array. In the present invention, by contrast, the interconnection between building blocks can be changed at any time, even when data is passing through the pipeline (i.e. dynamic reconfiguration of interconnects). The present invention is also distinct from the Systolic Array concept in that each building block (i.e.
functional unit) of the node pipeline of the present invention can perform a different operation from its neighbors (e.g.
functional unit 1 - floating point multiply; functional unit 2-integer minus; functional unit 3 - logical compare, etc.). In addition, during the course of computation, each building block of the present invention can assume different functionalities (i.e. reconfiguration of functionality).
The MIT Data-Flow computer is comprised of a network of hardware-invoked instructions that may be connected in a pipeline arrangement. The instruction processing is asynchronous to the "data-flow". Each data word is appended with a field of token bits which determines the routing of the data to the appropriate data instruction units. Each instruction unit has a data ~ ~38170 queue for each operand input. The instruotlon does not "~ire"
(l.e. execute) until all operand~ are p~eaent. The present invention lnclude~ the concept of data ~lowing thrvugh a plpeline network of hardware functiS~nal units tha~ perfor~ operations on da~a (e.g. act as inst~uc~ons t~lat proCeas data). Howe~er, by aontra~t, the pre~ent invention doe6 not function in an a~ynchronous mode. In~tèad, data~is fetched from memory and i~
routed by a swi~ch (MASNEsT) to pipelined instruction un~ t~
thro~gh the cen~rali~ed control of a very high ~p~ed micro~equencing unit. ~his ~ynchronous control ~eguence is in ~harp contrast to the a~ynchronou~ distributed data routing invoked by the Data Flow arohiteature~

.. . . .... ... ... _ . . ..... . . .. . .. . . . . .
Moreover, the pre~ent lnventlon, unlike the Data-Flo~
Machine, ha~ no token field (i.e. a data field that guides the data to the appro~iate functional unit) nor do the functional un~t~ ha~e queues (i.e. buffers that hold operands, instructions, or re~ults)~ The Data-Flow Machine hs~ funotional unite waitin~
for data. The p~esent invention has func~ional unit~ that are continuously active. The control of the pipeline of the present inven~lon ~ aohleved by a Gentral controller, re~erred to as a mic~oeeS~uenCer~ whereas the ~ata-Flow ~achine uses distributed control, The present invention also has the ability to xecon~igure itself based upon internal flow of data Using ths TAG
fiold, a f-~tur~ not ~ound in Data-Flow ma~hine~ F~rthermore, the Data-Flow cSomputer doe~ not effeo~ivoly perform ~erieS~ of _~",,,_, _ , , ., ... __ .

" s ~

~,~8~1t~0 like or dissimilar computations on continuous streams of vector data (i.e. a single functional operation on all data flowing through the pipeline). In contrast the present invention performs this operation quite naturally.
There are two other principal differences between the parallel architecture of the present invention and other parallel architectures. First, each node of the present invention involves a unique memory/processor design (structure). Other parallel architectures involve existing stand-alone computer architectures augmented for interconnection with neighboring nodes. Second, other general multiple-processors/parallel computers use a central processing unit to oversee and control interprocessor communications so that local processing is suspended during global communications. The nodes of the present invention, by contrast, use an interprocessor router and cache memory which allows for communications without disturbing local processing of data.
The following U.S. Patents discuss programmable or reconfigurable pipeline processors: 3,787,673; 3,875,391;
3,990,732; 3,978,452; 4,161,036; 4,225,920; 4,228,497; 4,307,447;
4,454,489; 4,467,409; and 4,482,953. A useful discussion of the history of both programmable and non-programmable pipeline processors is found in columns 1 through 4 of U.S. Patent No.
4,594,655. In addition, another relevant discussion of the early efforts to microprogram pipeline computers is found in the ~r i~

`'~~' '' ' ''' ~ ~,~al70 ar~icle entitled PROGRP~ING OF PIPELI~E~ PROCESSORS by Peter M.
I~ogge ~rom the ~arch 1~77 edltion of COMPVTER ARCHITECTURE page~
63-fi9.
Lastly, the following U.S. Patents are cited for their general dlscusslon of pip~lined processors: ~,051,55~;
4,101,960; 4,174,5141 4,244,019; 4,270,1~1; 4,363,094; ~,438,4~4:
4,442,49~; 4,4S4,578; 4,491,020; 4,498,134 and 4,507,728 SUMMARY OF THE I~VE~TIO~
~riefly descrlbQd, ~he present invention uses A small number ~e.g. 128~ of powerful nodes operating conc~rrently. ~he individual nodes need not be, but could be, synchroni2ed. By limltlng the number of nodes, the total communlca~ions and related hardware and software that is ~quired ~o ~olve any glven problem 1~ kept to a manageable level, whlle at the same time, lS uslng to advan~age ~he gain and sp~ed and capacity that i6 inherent with concurrency. In addition, ~he interprocessor aommunications between nodes of the present ~nven~ion that do occur, do not lnterrupt the local processing of dats ~ithin the node. These ~e~tures provlde for a very efflcient ~eans of procesGing lArge amounts of data rapidly. Each ncde of the present inventlon i~ comparable ~o the speed and performance to Cla~s VI supercomputer~ (e.g. Cray 2 Cyber 205, etc.). Within given node the computer ùses many (e.g. 30) functlonal units (e.g. ~loating point arithmetic processors, integer -~ arlthmetic/loglc proces60rs, sp~cial-purpose processors, etc.) ~.~88170 organized in a synchronous, dynamically-reconfigurable pipeline such that most, if not all, all of the functional units are active during each clock cycle of a given node. This architectural design serves to minimize the storage of intermediate results in memory and assures that the sustained speed of typical calculation is close to the peak speed of the machine. This, for example, is not the case with existing Class VI supercomputers where the actual sustained speed for a given computation is much less than the peak speed of the machine. In addition, the invention further provides for flexible and general interconnection between the multiple planes of memory, the dynamically reconfigurable pipeline, and the interprocessor data routers.
Each node of the present invention includes a reconfigurable arithmetic/logic unit (ALU), a multiplane memory and a memory-ALU
network (MASNET) switch for routing data between the memory planes and the reconfigurable ALU. Each node also includes a microsequencer and a microcontroller for directing the timing and nature of the computations within each node. Communication between nodes is controlled by a plurality of hyperspace routers.
A front end computer associated with significant off-line mass storage provides the input instructions to the multi-node computer. The preferred connection topology of the node is that of a boolean hypercube.
The reconfigurable ALU pipeline within each node preferably ~ ~8~3170 comprises pipeline processing elements including floating-point processors, integer/logic processors and special-purpose elements. The processing elements are wired into substructures that are known to appear frequently in many user applications.
Three hardwired substructures appear frequently within the reconfigurable ALU pipeline. One substructure comprises a two element unit, another comprises a three-element unit and the last substructure comprises a one-element unit. The three-element substructure is found typically twice as frequently as the two element substructure and the two element substructure is found typically twice as frequently as the one element substructure.
The judicious use of those substructures helps to reduce the complexity of the switching network employed to control the configuration of the ALU pipeline.
The invention will be further understood by reference to the following drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
- Figure 1 illustrates an embodiment of the multinode computer arranged in a two-dimension nearest-neighbor grid which is a subset of the boolean hypercube.
Figure 2 is a schematic diagram of an individual node illustrating the memory/MASNET/ALU circuit interconnections.
Figure 3 is a schematic diagram illustrating the layout of one memory plane within a single node such as illustrated in Figure 2.

Figure 4 lllustrate~ two typical substxucture~ fo~med from five a~ithmetic/logic Uhits as might be found within the ,_ recon~igurabl~ ALU plpeline of each ~ode.
~igure 5A ~llustrates a typlcal ALU pipelinQ organization 5and th~ ~witchillg network (~LONET) which allo~s for a change ln ~on~lguxation of the subs~ructures.
Fl~ur~ 5~ illustr~ts9 a preferred embodlment of the interconnQctlcn o~ a FLONET to n grouping of the three common ~ub~ructure~ in a reconfig~rable ALU plpeline.
10Flgur~ 6 iOE a ~chematic dlagram o~ a 32-regi~ter x n-bit, memory/ALU network switch ~MAS~ET) and lnternode communiaat~ons unit where the blo~ks represent 8iX po~t reglster file~.
` ~lgure 7 i6 a schematic dlagra~ of a 2 x 2 MASNET which lllustrate~ how the lnput data ~tre~m can ~ource two output data ~_6treams with a relatlve 6hift o~ "p" element~.
Flgura 8 i~ a fichematlc dlagram of an 8-node hypercube 6howing th~ relationshlp of ~he hyperspace router~ to the MAS~ET
~nl~s of each node.
DETAI~ED DESCRIPTION O~ THE INVENTIO~
20Durinq the Gour~e of this descrlption, like numbers will be used to ldentlfy like elements according to the d~fferent fi~Ju~es whlch illu~trate the invention.
The computer lO accordlng to the preferred embodl~ent of the lnventlon illustrated ln Fiqure 1 lncludes a plurality of 25multlple memory/cotnputatlonal unlts referred to ~s nodes 12.

-~ ~ 88 1 7~3 ~_ ~omputer lO ls of the parallel-proce~in~ variety capable o~
,._ performing Arithmetic and logical operatlons with ~i~h vector and ~caler ef~iciency and ~peed. Such a devic~ is capable of solving A wldQ range of computational problem~. Each nod~ 12 ls connected via drop-line network 18 to a front ~nd aompu~er 16 that provldes a host envlronment ~uitabl~ for multl~u~er program developmQnt, multinode initialization and oper~lon, And off~ e data manipulation. Front-end computer 16 is connected to an of~-line ma~s storage unlt 20 by lnterconnection 22. Each node 12 i~

al~o connected to ad~Acent nodes by lnternode connections 14.
~or purpose~ o~ clarlty and lllustratlon, only 2~ hodcs 12 ~re lllustr~ted with s~mple internode llnks 14 in Figure 1. However, it wlll be appreciated that the nodes 12 can be connected in a -- gener~l hypercube configuration and that the invention may 1S comprlse fewer or more than 128 nodes as the applicatlon require~. Rather tha~ interconnect a large number of relatively ~low microproce660r~, a~ i~ done wlth other prior art parallel computer~, the pre~ent lnventlon lncorpora~es a r~l~t1vQly ~mall number of interconnected, large-capacity, hlgh-speed powsrful nodes 12. According to the preferred embodiment of the prcsent inventlon, the conflguration typically consists o~ between l and 128 node~ 12. Thls approach limlts the number of physical and logical interconnecte 14 between nodeq 12. The preferred connection topology 1~ that of a boolean hypercube. Each of tho node~ 12 of the computer lO 18 comparable to a class VI

supercompu~er ~n p~ocessing speed and capacty.
The deta1lG of ~ typical indlvldual node 12 are illustrated lh Figure 2. Each node 12, whlCh i~ the building block of the computer 10, is comp~i~ed of ~ve (5) ba~lc elements, namely:
(1) a ~econflgurab~e ALU pipeline 2~ h~vlng many (e.g. 9 or mor~) high-perform~nce and epecial-purpose elements 62 (2) a group 28 of indepehdent memory planes 30, (3) a non-blocking multiple-lnput and multlple~output switching MAS~ET (Memory/ALU Swltch networ~) 26, (4) a microsequencer 40 ~nd (5) a microcontrolle~
42. Figure 2 lllu~trate~ cUch a node 12 which include-Q 8 memory planes 30 aonne~ted to a reconflgur~ble pipeline 24 ~y ~o~y-ALU
h~twork switah (MASNET) 26. As used in thiQ description the ,_ terms ~proCeSQing QlQment~ unc~lonal unit", "progra~mable proce~sors" and "building block~" refer to arithmetic/logic unlt6 tS 62 Whlch comprise elther floating polnt arithm~tic proces~ors, lnteger/ari~hmetic/logic proCessorS, special-purpo~e pro~es&or~
or a aombinatlon of the ~oregolng.
MicrosequenaQr 40 is connected vla llne~ 46 to memory 28, MASNET ~6 And reconfigurable ~U pipeline 24 ~espectively.
Similarly, mlcrocontroller 42 1B connected to the same elements via line~ 44. Micro~equencer 40 governs the cloa~ing of data between and wl~hl~ the various elements and serves to def1ne data pathway~ and the configuration o~ pipeline 2~ for each clock tlck o~ the node 12. In a typical operation, a new 6et of operands is pre~ented to the pipeline 24 ~nd a new ~et o~ results ls derived from the pipeline 24 on every clock of the node 12.
Microsequencer 40 is responsible for the selection of the microcode that defines the configuration of pipeiine 24, MASNET
26 and memory planes 30. In typical operation, the addresses increase sequentially in each clock period from a specific start address until a specified end address is reached. The address ramp is repeated continually until an end-of-computation interrupt flag is issued. The actual memory address used by a given plane 30 of memoxy 28 may differ from the microsequencer 40 address depending upon the addressing mode selected. (See discussion concerning memory planes below).
Microcontroller 42, also referred to as a node manager, is used to initialize and provide verification of the various parts of the node 12. For a given computation, after the initial set up, control is passed to the mierosequencer 40 which takes over until the computation is complete. In principal, microcontroller 42 does not need to be active during the time that eomputations are being performed although in a typieal operation the mieroeontroller 42 would be monitoring the progress of the eomputation and preparing unused parts of the eomputer for the next eomputation.
In addition to the five basic elements which constitute a minimal node 12, each node 12 may be expanded to include local mass storage units, graphic processors, pre-and post-processors, auxilliary data routers, and the like. Each node 12 is operable ., , f- ln a st~nd-alone modQ becaus~ the node manager 42 is a stand-~lone mlcrocomputer. However, in the normal oas~ the node 12 ~ould be programmed from the front-end computer 16.
The layout of ~ ~ingle memory plan~ 30 i8 ~chematically lllu~trated in Figure 3. Nemo~y plane~ 30 are o~ hlgh capaaity and are capable of ~ourcing (reading3 or sinking (writing) a dat~
word ln a clock o~ the machine 10. Each memory plane 30 can be enabled ~or read-only, write-only or ~ead/write operation~. The mcmory plane6 ~0 ~uppcr~ three po~lble addressl~g mode6, namely:
(1) direct, (2) translate and (3) computed. With all three modes, the working addre~ prefetched by prefetch address reglster 5~ on the pxevlou~ cycle of the com~U~er 10. In the dlrect mode, the addre6s from the mlcrosequen~er ~ddres~ bus 46 i~ u~ed to select the memory elem~nt of lnterest. In the tran~latQ mode, ~h~ micros~uencer addres~ is used to look up the actual addre~ ln a large me~ory table of addres~es. This large tabl~ of addre6ses is ~ored in a ~ep~rate memory unit referred tc as the translate memory bank or table 50. The tran~late table 50 can be used to generate an arbltrary scan pattern through main me~ory bank 54. It can also be u~ed to protect certain deslgn~ted memory elements from ever being over-written. ~he computed addre~ mode allow~ the pipeline 24 to define the ~ddress of the next sourced or ~inked data word.
~econflgurAble pipeline 24 is formed of varlouG proces~ing ~l~ments shown as units 62 ln Flgure 4 and a switch l3-81~.~

networ~ 6hown as FLON~T 70 in Figure~ 5~ and 5~ (FLONET iB an abbreviatiOn for Functlonal an~ Loglcal oxganization NETwork).
~rhree ~3) permanently hardwired ~ubstructures or uslit~ 62, 64 or 66 axe connected to FLONE~. FLON~T 10 reconflgure~ the wiring of the pipelined ~ubstructure~ ~2, 64 and 66 illustrated collect-ively as 6a in F~gure 5A and 69 in Figure 5~. The cpec;al;zed reconf~gurable interconne~tio~ achleved by elecSronic switches ~o that n~W configurations can be defined within a clock period of the node 12. ~n example of high level data processin~ ln a specl~ic sltuatiu~ hown ln Figure 4. The plpoline proco~inq elements include ~loaSlng-point arithmetic processor~ (e.g. AMD
29325, We~tek 1032/1033), integer arithme~lc/logic units 62 . _ .. . . . . .
(e.~. AM~ 25332), and -~pecial-purpose elemen~s such a3 vec~or regener~ion units and convexgence checkers. A useful discus~ion ~elated to the foregoing ~pecial-purpose element~ oan be found in an article entitled "Two-Dimensional, Non steady vi~cou~ Flow Simulation on the Navier ~tokes ~olnpu~er ~iniNode~, J. Sci.
Compute, Vol. 1, No. 1 (1986) by D.M ~osenchuok, M.G. Litt~an and W. Fla~nery. Proces~ing elements 62 are wired together ln three ~3) di3tinct sub~tructures 62, 64 and G6 that have been found to appear frequently in many user application programs.
Two o~ the most commonl~ used subatructures 64 and 66 are shown by ~ elementA encloG~d in dotted line~ in FigUre 4 .
Sub~tructure 64 oomprl~es Shree ~LU unit~ 6Z having four lnput~
Z5 and one output. ~wo AL~ units ~ ac~ept the fou~ ~nputs in two , . ,... , . ,.. _ .

38~70 ,~
p~ir~ o~ twos. Th~ outputs of the tw~ ~t~ ~ f~
inputs to the ~hlrd ALU unit 62. Each o~ the three ALU uni~s 62 : are capable o~ performing floa~lng point and interger addition, subtr~tlon, multiplication and dlvlGlon, loglcal AND, OR, NOT
~xclusive OR, mAsk, ~hift, ~nd compare functlons wl~h a logic~l register ~ used to store constan~s. Su~structurQ 66 comprises two arlthmetlc/loglc units 62 and i~ Adapted ~o prov~de three input~ and ohe output. One of the two arlthmetic/logic units 62 ~ccep~s two lnput~ and produae~ one output that forms one input to the ~econd arithmetic/logic unit 62. The other input to the second arl~hme~lc/loglc unit 62 comes dire~tly fr~m the outslde.
The singl~ output of ~ubstructure 66 co~es from th~ second arithmetlc/loglc unlt 62. Acco~dingly, 6ub~truc~ure ~ co~p~ses a ~hree lnput and on~ output device. The thlrd and la~t most common ~ubstructure is an individual arlth~etlc/logic unit 62 standing alone, l.e. ~wo input~ and one output. Sub~tructures 62, ~4 and 66 are per~anently hardwired into those respective configur~tions, however, the reconflguratlon among those unl~s ls controlled by FLONET 70. A simpllfied ~LONET 70 is schematically repre~ented in Figure 5A. For ~implicity, two three-element subst~uctures 64, two two-element substructures 66 and two one-element ~ubstructure~ 62 are illustrated. This result6 ln a twelve-functional unlt, I-igh-level reconfigurable pipeline 24.
Figure 5B illuQtrates an optimal 1AYOUt of a FLo~E~ u lnterconnect. According to th~ preferred embodiment of th~

--/ S

~ ~38170 .

. inventlon lO, th~ optimal ratio between the three-el~ment substructures 64 and the two element 6ubstruotures 66 is in the rangQ of l.5 to 2.0 to l.0 (l.5-2.0:l). Likewise the optima ratio bQtWe~n the two element ~ubstructure~ 6~ and the eingle-S ~lement ~ubstructure~ 62 i~ approximately 2 to 1 (2~
Accordingly, ~igure 5B lllustrate~ the optimal scenario wh~ch includes eight thr~-element ~ubstructures 64, four two-element 6u~tructurefi 66 and two Glhgle-element substru~ture~ ~2. The number o~ three element ~ubst~uctures 64 eould vary between 6 ~nd 8 accordlng to the embodiment lllu~trated in Flgur~ SB. The preferrod ratio6 ~ust descrlbed are approx~ate and might ~ary slightly from applic~tion to appllcation. Howevet, it has been found that the foregoing ratios do provlde very clo~e to optimal results, r Accordlng to ~he preferred embodlment of the lnvention the grouplng 69 of ~ubstructure 62, 64 and 66 in Flgure 5B have th~
functional unit~, or building blo~ks, 62 organlzed ln the ~ollowlhg manner: each of the threa function units 62 (i.e.
programmabl~ proce~sors) ln the elght ~ubatructures 64 would be floating polnt proces~ors like the AMD 2~325; two of sub~truc~ures 66 would have each of their two functional units 62 in the fo~m of floatlnq point processor~ llke th~ A~SD 29325 wherea~ the remaining two substructures 66 would have integer/
logic processor~ like the AMD2~332; la~tly one of the remalning single functional units 62 would be a floAtlng polnt proGes~or . ' f38170 .
~ like the AMD 29325 and the other remaining single fun~tional unit ;. 62 would be an integer logic processor llXe the AMD 29332.
Alternatively, lt ls also possible to pair processors to fo~m a hybrid functional unlt ~2. For example, a floating point ; 5 proce~sor like the A~D 29325 could be paired in ~ m~nner known to tho6e o~ ordinary ~kill ln the ar~ w~th an lnteger loqic pro~e~sor llk~ the AM~ 2~33~ so that the functional unit 62 can al~-ernate betw~n floatlng poln~ and lnteger/logic. It i8 also possl~l~ to Use a 6lngle many-funct~on processor (~loating point arithmet~c, lnteger arlthmetic/logic) like the ~citek 3332 to actlvat~ thQ same resUlt.
Th~ details of a MASNET 26 (Memory AlU Swltoh ~ETwork) are shown ln detail wlth elxteen input6 and sixteen outputs in Figure 6. MASNET 26 l~ made up of regi6ter rlle6 72 (e.g. Weitek 1066) that Ar~ cross connected ~n a ~enes ~witching network arrangement and pipelined BO as to make the connectlon of a~y inpU~ to any output non-blocking. ~he MASNET 26 lllustrated ln Flgure ~ iOE a slxteen-by-~lxteen (16 x 16) circuit. Th~ faat that each register file 72 has local ~mory al60 means that by u~lng thQ
MASNET 26 it is .possible to reorder data ~ lt ~lows through the network. This ~eature can be used, for example, to create two data ~treAms ~rom a common source ln whlch one ~s delayed wlth re~pect to the other by ~everal elements. The formatlon of ~ultlple data streams ~rom a common source ls also a feature Or MASNET 26. Figure 7 illustrates more explici.tly how a 2 X 2 ~SNE~ . a ~ln~lQ ~egi~t2r ~ile 72) can achieve bo~h of t~es~
sl~ple t~sks.
M~SNET 26 lg ~d al~o for lnternode communloation~ in that `~ it route~ data words corresponding to the nodal boundaries to ; S bordering nodes 12 through ~yperspace routers 80. This routing 1~ Achieved afi the data flows through t~e MASN~T 26 without the ; introduction o~ any additional delays. Likewise, the hyperspa~Q
router 80 of a given node 12 can in~ect needed bounda~y point I values into th~ data strea~ a6 ~hey are need~d without the in~roduction of any delay~. A more detailed discussion of lnternode co~munlcatio~ follow~.
The global topology of the multinode computer 10 is t~at o~
a hyper~ube, The hype~cube represent~ A compromise ~atween th~
time required for a~bitrary internode communl¢ation~, and the nu~ber of physical lnterconnectlons between nodes 12. Two addrQsslng mode~ s~pport ~nternode data communlcations, namely:
(1) glob~l a~dressed and (2) expliclt boundary-point definitio~, o~ BPD. Global addreBBing i3 simply extended addre~sing, where an addrQ~ ~pQoifies the nod~memory-plane/offset o~ the data.
From a ~oftware standpoint, the addre~ treated a~ a simple l~n~ar address whose range extends acro~s all nodes in the computer lO. Internode co~unicatlon~ i~ handled by ~oftware and i8 entlrely tran~parent to the programmer lf default arbitrat~on and communl~atlon~-lock parameter~ are cho~en. B~D ln~olve~ the explicit defini~lon of boundary points, their source, and all -l8 ~.~8817(~

de~tin~tion addresses. Whenever ~PD data i~ genQrated, it l~
im~iately rout~d to BDP cac~es 82 in the destination nodes 12 An lllustrated ln Fi~ur~ 8. Loeal addre~slng and ~PD may be ln~ermlxed. The ~ain advantage of global addressing over BPD 18 sof~ware ~lmpllclty, al~hough BPD has the c~pa~llity of ; ellmlna~ing most internodR aommu~icatlons overhead by pre-communlcating boundary-point d~ta befor~ they are requested by other nodss.
Data ar~ physically routed between node~ 12 using local swltahlng net~or~ attached to each node 12. The local swltching nQtworks prevlously refsrred to as hyperspace routers 80 are illu~trated in Fl~ure 8. Hyperspace router~ 80 a~e non-blocking permutation ne~works wlth a topology ~imil~r to the BsneS
-- ne~wor~. ~or a multinode cla~ co~puter or order d ~l.e., NN=2d, NN=number of nodes), the hyperspace router permits d+l input6 whlch includ~ d nsighboring nodes 12 plus on~ addltional lnput for the host node 12. The data ~re self-routing ln that the de~t~nation addresa, c~rrled with the dat~, ~g used to s~tabllsh hypQr6pace router swl~ch states. An eight node ~yst~m is illustrated ln Flgure 8. In this exampl~, d - 3, and each hyperspacQ router 80 has a 4 x 4 n~twork with a delay of three mlnor cloc~s. For 3<d<8 wher~ 6mall d ls an lntegsr, an 8 x 8 router 80 ls requir~d, with d - 7 providlng co~plete swltch utilization. slnce th~ hyperqpace router 8 must ~e configured for ln2 d ~ 1 lnpu~s, op~imal ~lardware performance is glven by a , I g_ .~88170 - -computer ~rr~y havlng the size o~
NN- ~ , n = ~,1,2,3 con~iguratlong o~ 1, 2, ~, 128, ~- node~ ~ully utll~ze the hypergpace routers 80. Multinode computer conflgu~ations with non-lnt~g~r 1n2 d are algo ~upp~rted, except th~ hyperspace i router 80 ~ scaled up to th~ next integral di~enslon. The implication~ o~ thls are not ~evere, in thaS a~ide from ~he p~nalty o~ additlonAl switch hardware, a ~llghtly greater amount of ~torage i8 regulr~d for the permu~atlo~ tables. The node store~ these table6 ln ~ h~gh-qpeed look up ~able. The ~ength of the table ls (d+l)l When the computer ~rows beyond 128 nodes, the hyperspace router l~creases to a 16 x 16 switch. Sinoe the `' look-up tables be~omo p~ohibitlvely large, the permutation , routlng 19 then accompli~hed by bit-slice hardwAre whi~h is lS Homewha~ ~low~r than the loo~-up tables. These consideratlons have e6tablished 128 node6 as the initial, preferred ~omputer con~lguratlo~.
Data transml~slon betwaen ~ode~ 12 occurs over flber-optic cables ln byte-6e~1al format at a duplex rate of 1 Gbyte/se~ond.
This rate provlde~ Approx~mately two o~ders-of-m~gnitude head room ~or occ~slonal bur~t tran6mi6sion6 and also for future computer eXpan6lon~ E~ch node 12 has a 1 Mword boundary-polnt wrlte-through cache which, in the a~sence o~ host-node requests ror cach~ bus cycle8 18 contlnuou51y up-dated ~y the hyperspflce 2~ router 80. Thus, current boundary data are m~intalned phy~ically ,f~, -~0- .

~.~8~170 r and logl~ally ~lose to the ALU pipeline lnput6.
While the lnventlon has been dqscribsd with re~erence to the preferred embodimen~ thereof it wlll be ~pprsciated that various modlf~cation~ can ~e mAde to thQ parts and methods that comprise S thQ lnvent~on wi~hout departlng from thQ epirit and scope ther~o~.

Claims (24)

1. A multi-node, parallel processing computer apparatus, comprising:
a plurality of nodes each including an internal memory and a reconfigurable arithmetic logic (ALU) pipeline unit and a memory/ALU/switch network (MASNET) for transferring data from said internal memory through said MASNET to said reconfigurable ALU pipeline unit and from said reconfigurable ALU pipeline unit through said MASNET to said internal memory, said recon-figurable ALU pipeline unit further including a first group of programmable processors permanently connected together in a first configuration having four (4) inputs and one (1) output and a second group of programmable processors permanently connected together in a second configuration different from said first configuration, said second group having three (3) inputs and one (1) output, and an ALU pipeline configuration switching network means (FLONET) for selectively connecting said first and second groups to each other, and sequencer means for providing instructions to said FLONET once a clock cycle;
and router means for routing data between said nodes, wherein said reconfigurable ALU pipeline unit selectively performs different computations according to instructions from said sequencer means once a clock cycle.
2. A reconfigurable computer apparatus comprising:
a first group of programmable processors permanently connected together in a first configuration having four (4) inputs and one (1) output, said first group including a first programmable processor having at least two (2) inputs and at least one (1) output; a second programmable processor having at least two (2) inputs and at least one (1) output; and, a third programmable processor having two (2) inputs permanently connected to the outputs of said first and second programmable processors, said third programmable processor also having an output, such that the four inputs of said first group comprise the inputs of said first and second programmable processors and the output of said first group comprises the output of said third programmable processor;
a second group of programmable processors permanently connected together in a second configuration different from said first configuration, said second group having three (3) inputs and one (1) output and including a fourth programmable processor having two (2) inputs and one (1) output; and, a fifth programmable processor having two (2) inputs and one (1) output, one of said inputs of said fifth programmable processor being permanently connected to the output of said fourth programmable processor, such that the three (3) inputs of said second group comprise the two (2) inputs to said fourth programmable processor and the input to said fifth programmable processor not connected to the output of said fourth programmable processor, and the output of said second group comprising the output of said fifth programmable processor;
a third group of programmable processors comprising individual processors having two (2) inputs and one (1) output;
switching means (FLONET) for selectively connecting said first, second and third groups together; and, sequencer means for providing instructions to said FLONET
once a clock cycle, wherein said apparatus selectively performs different computations according to instructions from said sequencer means once a clock cycle.
3. A reconfigurable computer apparatus including arithmetic/logic units (ALU), said apparatus comprising:
at least a first substructure including three (3) ALU units permanently connected together in a first configuration having four (4) inputs and one (1) output;
at least a second substructure including two (2) ALU units permanently connected together in a second configuration having three (3) inputs and one (1) output;
at least a third substructure including at least one individual ALU unit having two (2) inputs and one (1) output;
switching means for selectively connecting said first, second and third substructures together; and sequencer means for providing instructions to said switching means, wherein said apparatus selectively performs computations according to instructions from said sequencer means.
4. A node apparatus for use in a multi-node, parallel processing system, said node apparatus comprising:
an internal memory including a plurality of memory planes;
a dynamically reconfigurable arithmetic logic (ALU) pipeline means for performing computations, including a plurality of ALUs at least three of which are permanently connected to each other;
an ALU pipeline configuration switching network means (FLONET) for selectively connecting groups of ALUs in said dynamically reconfigurable arithmetic logic pipeline means together;
a memory/ALU/switch network (MASNET) for transferring data from the memory planes of said internal memory through said MASNET to said dynamically reconfigurable ALU pipeline means and from said dynamically reconfigurable ALU pipeline means through said MASNET to said internal memory; and, sequencer means for providing instructions to said FLONET, wherein said dynamically reconfigurable ALU pipeline means selectively performs different computations according to instructions from said sequencer means.
5. The apparatus of claim 1 wherein said first group of programmable processors comprises:
a first programmable processor having at least two (2) inputs and at least one (1) output;
a second programmable processor having at least two (2) inputs and at least one (1) output; and a third programmable processor having two (2) inputs permanently connected to the outputs of said first and said second programmable processors, said third programmable processor also having an output, wherein the inputs to said first group comprise the inputs of said first and second programmable processors and the output of said first group comprises the output of said third programmable processor.
6. The apparatus of claim 5 wherein said second group of programmable processors comprise:
a fourth programmable processor having at least two (2) inputs and at least one (1) output; and, a fifth programmable processor having two (2) inputs and one (1) output, one of said inputs of said fifth programmable processor being permanently connected to the output of said fourth programmable processor, wherein the inputs of said second group comprise the two inputs to said fourth programmable processor and the one input to said fifth programmable processor not connected to the output of said fourth programmable processor, and the output of said second group comprises the output of said fifth programmable processor.
7. The apparatus of claim 6 wherein said reconfigurable ALU pipeline unit further comprises:
a third group of programmable processors comprising individual programmable processors connected to said FLONET for selective connection with said first and second groups of programmable processors.
8. The apparatus of claim 7 wherein the ratio of said first group of programmable processors with respect to said second group of programmable processors in a given reconfigurable ALU pipeline unit is approximately in the range of 1.5-2.0 to 1Ø
9. The apparatus of claim 8 wherein the ratio of said second group of programmable processors to said third group of programmable processors is approximately 2.0 to 1Ø
10. The apparatus of claim 9 wherein said internal memory comprises a plurality of memory planes.
11. The apparatus of claim 10 wherein each memory plane comprises:
a main memory bank;
an address multiplexer for transmitting data to and from said main memory bank;
a prefetch address register connected between said main memory bank and said address multiplexer; and a translate table means connected to said address multiplexer for scanning said assembly bank in a random access manner.
12. The apparatus of claim 11 wherein said sequencer means further comprises:
microsequencer means connected to said internal memory, MASNET and reconfigurable ALU pipeline unit for governing the clocking of data between said internal memory, MASNET and said reconfigurable ALU pipeline unit.
13. The apparatus of claim 12 wherein each node further comprises:
a microcontroller connected to said internal memory, MASNET
and said reconfigurable ALU pipeline unit for initializing and verifying the status of said internal memory, MASNET and reconfigurable ALU pipeline.
14. The apparatus of claim 13 wherein said MASNET
comprises:
a plurality of register files cross connected in a Benes switching network arrangement and pipelined so as to make the connection of any input to any output non-blocking.
15. The apparatus of claim 14 further comprising:
boundary-point definition (BPD) cache means connected between said router means and said MASNET for routing BPD data to specific destination nodes, wherein said apparatus supports both global addressing and explicit BPD addressing modes.
16. The apparatus of claim 15 further comprising:
a front end computer for feeding data and instructions to said nodes; and, off-line mass storage means connectable to said front end computer.
17. The apparatus of claim 16 wherein said nodes are connected together in the topology of a boolean hypercube and vary in number in the range of from 1 to 128.
18. The apparatus of claim 2 further comprising:
an internal memory; and, a memory-ALU switch network means ( MASNET) for transferring data from said internal memory through said MASNET to said switching means and for transferring data from said switching means through said MASNET to said internal memory.
19. The apparatus of claim 18 wherein said sequencer means further comprises:
microsequencer means connected to said internal memory, MASNET and switching means for governing the clocking of data between said internal memory, MASNET and switching means.
20. The apparatus of claim 19 further comprising:
microcontroller means connected to said internal memory, MASNET and switching means for initializing and verifying the status of said internal memory, MASNET and switching means.
21. The apparatus of claim 2 wherein at least some of said processors comprise floating point arithmetic processors.
22. The apparatus of claim 2 wherein at least some of said processors comprise integer arithmetic logic processors.
23. The apparatus of claim 2 wherein the ratio of said first group of programmable processors with respect to said second group of programmable processors is approximately in the range of 1.5-2.0 to 1Ø
24. The apparatus of claim 2 wherein the ratio of said second group of programmable processors to said third group of programmable processors is approximately 2.0 to 1Ø
CA000551833A 1986-11-14 1987-11-13 Multinode reconfigurable pipeline computer Expired - Fee Related CA1288170C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US931,549 1986-11-14
US06/931,549 US4811214A (en) 1986-11-14 1986-11-14 Multinode reconfigurable pipeline computer

Publications (1)

Publication Number Publication Date
CA1288170C true CA1288170C (en) 1991-08-27

Family

ID=25460953

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000551833A Expired - Fee Related CA1288170C (en) 1986-11-14 1987-11-13 Multinode reconfigurable pipeline computer

Country Status (9)

Country Link
US (1) US4811214A (en)
EP (1) EP0268435B1 (en)
JP (1) JPS63147258A (en)
AU (1) AU599428B2 (en)
CA (1) CA1288170C (en)
DE (1) DE3751235T2 (en)
DK (1) DK595887A (en)
ES (1) ES2070825T3 (en)
NO (1) NO874742L (en)

Families Citing this family (176)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0576749B1 (en) * 1992-06-30 1999-06-02 Discovision Associates Data pipeline system
US5093920A (en) * 1987-06-25 1992-03-03 At&T Bell Laboratories Programmable processing elements interconnected by a communication network including field operation unit for performing field operations
US5008882A (en) * 1987-08-17 1991-04-16 California Institute Of Technology Method and apparatus for eliminating unsuccessful tries in a search tree
US5148547A (en) * 1988-04-08 1992-09-15 Thinking Machines Corporation Method and apparatus for interfacing bit-serial parallel processors to a coprocessor
US5005120A (en) * 1988-07-29 1991-04-02 Lsi Logic Corporation Compensating time delay in filtering signals of multi-dimensional reconvigurable array processors
US5452231A (en) * 1988-10-05 1995-09-19 Quickturn Design Systems, Inc. Hierarchically connected reconfigurable logic assembly
US5241635A (en) * 1988-11-18 1993-08-31 Massachusetts Institute Of Technology Tagged token data processing system with operand matching in activation frames
US5109353A (en) * 1988-12-02 1992-04-28 Quickturn Systems, Incorporated Apparatus for emulation of electronic hardware system
US5329470A (en) * 1988-12-02 1994-07-12 Quickturn Systems, Inc. Reconfigurable hardware emulation system
NL8803079A (en) * 1988-12-16 1990-07-16 Philips Nv LINKING NETWORK FOR A DATA PROCESSOR, PROVIDED WITH A SERIAL SWITCHING WITH AT LEAST A RECONFIGURABLE SWITCHING MATRIX AND AT LEAST A BATTERY OF SILOS AND A DATA PROCESSOR FITTED WITH SUCH A LINKING NETWORK.
US5280620A (en) * 1988-12-16 1994-01-18 U.S. Philips Corporation Coupling network for a data processor, including a series connection of a cross-bar switch and an array of silos
US5099421A (en) * 1988-12-30 1992-03-24 International Business Machine Corporation Variable length pipe operations sequencing
US5155820A (en) * 1989-02-21 1992-10-13 Gibson Glenn A Instruction format with designations for operand lengths of byte, half word, word, or double word encoded in address bits
US5353243A (en) * 1989-05-31 1994-10-04 Synopsys Inc. Hardware modeling system and method of use
US5369593A (en) 1989-05-31 1994-11-29 Synopsys Inc. System for and method of connecting a hardware modeling element to a hardware modeling system
US5345578A (en) * 1989-06-30 1994-09-06 Digital Equipment Corporation Competitive snoopy caching for large-scale multiprocessors
EP0420339A3 (en) * 1989-09-29 1992-06-03 N.V. Philips' Gloeilampenfabrieken Multi-plane random access memory system
US5450557A (en) * 1989-11-07 1995-09-12 Loral Aerospace Corp. Single-chip self-configurable parallel processor
WO1991010198A1 (en) * 1990-01-05 1991-07-11 Maspar Computer Corporation Router chip with quad-crossbar and hyperbar personalities
US5193202A (en) * 1990-05-29 1993-03-09 Wavetracer, Inc. Processor array with relocated operand physical address generator capable of data transfer to distant physical processor for each virtual processor while simulating dimensionally larger array processor
US5133073A (en) * 1990-05-29 1992-07-21 Wavetracer, Inc. Processor array of N-dimensions which is physically reconfigurable into N-1
US5157785A (en) * 1990-05-29 1992-10-20 Wavetracer, Inc. Process cell for an n-dimensional processor array having a single input element with 2n data inputs, memory, and full function arithmetic logic unit
US5313645A (en) * 1991-05-13 1994-05-17 International Business Machines Corporation Method for interconnecting and system of interconnected processing elements by controlling network density
GB9027186D0 (en) * 1990-12-14 1991-02-06 Int Computers Ltd Data processing network
GB9027663D0 (en) * 1990-12-20 1991-02-13 Sandoz Ltd Light-stabilizing compositions
JPH06507990A (en) * 1991-05-24 1994-09-08 ブリティッシュ・テクノロジー・グループ・ユーエスエイ・インコーポレーテッド Optimizing compiler for computers
CN1042677C (en) * 1991-10-07 1999-03-24 中国人民解放军国防大学 Parallel computer structure and its usage
GB2263565B (en) * 1992-01-23 1995-08-30 Intel Corp Microprocessor with apparatus for parallel execution of instructions
US5809270A (en) * 1992-06-30 1998-09-15 Discovision Associates Inverse quantizer
US7095783B1 (en) 1992-06-30 2006-08-22 Discovision Associates Multistandard video decoder and decompression system for processing encoded bit streams including start codes and methods relating thereto
US6435737B1 (en) * 1992-06-30 2002-08-20 Discovision Associates Data pipeline system and data encoding method
US6330665B1 (en) 1992-06-30 2001-12-11 Discovision Associates Video parser
US6079009A (en) * 1992-06-30 2000-06-20 Discovision Associates Coding standard token in a system compromising a plurality of pipeline stages
US5603012A (en) * 1992-06-30 1997-02-11 Discovision Associates Start code detector
US6112017A (en) * 1992-06-30 2000-08-29 Discovision Associates Pipeline processing machine having a plurality of reconfigurable processing stages interconnected by a two-wire interface bus
US6067417A (en) * 1992-06-30 2000-05-23 Discovision Associates Picture start token
US6047112A (en) * 1992-06-30 2000-04-04 Discovision Associates Technique for initiating processing of a data stream of encoded video information
US5768561A (en) * 1992-06-30 1998-06-16 Discovision Associates Tokens-based adaptive video processing arrangement
US5315701A (en) * 1992-08-07 1994-05-24 International Business Machines Corporation Method and system for processing graphics data streams utilizing scalable processing nodes
JPH08506198A (en) * 1993-01-22 1996-07-02 ユニバーシティ コーポレイション フォーアトモスフェリック リサーチ Multi-pipeline multi-processor system
JPH06325005A (en) * 1993-05-14 1994-11-25 Fujitsu Ltd Reconstructible torus network system
JPH0713945A (en) * 1993-06-16 1995-01-17 Nippon Sheet Glass Co Ltd Bus structure of multiprocessor system with separated arithmetic processing part and control/storage part
US5805914A (en) * 1993-06-24 1998-09-08 Discovision Associates Data pipeline system and data encoding method
US5861894A (en) * 1993-06-24 1999-01-19 Discovision Associates Buffer manager
US5680583A (en) * 1994-02-16 1997-10-21 Arkos Design, Inc. Method and apparatus for a trace buffer in an emulation system
JP3308770B2 (en) * 1994-07-22 2002-07-29 三菱電機株式会社 Information processing apparatus and calculation method in information processing apparatus
US6217234B1 (en) * 1994-07-29 2001-04-17 Discovision Associates Apparatus and method for processing data with an arithmetic unit
US5699536A (en) * 1995-04-13 1997-12-16 International Business Machines Corporation Computer processing system employing dynamic instruction formatting
US5794062A (en) * 1995-04-17 1998-08-11 Ricoh Company Ltd. System and method for dynamically reconfigurable computing using a processing unit having changeable internal hardware organization
US5943242A (en) * 1995-11-17 1999-08-24 Pact Gmbh Dynamically reconfigurable data processing system
US7266725B2 (en) 2001-09-03 2007-09-04 Pact Xpp Technologies Ag Method for debugging reconfigurable architectures
US5841967A (en) * 1996-10-17 1998-11-24 Quickturn Design Systems, Inc. Method and apparatus for design verification using emulation and simulation
DE19651075A1 (en) 1996-12-09 1998-06-10 Pact Inf Tech Gmbh Unit for processing numerical and logical operations, for use in processors (CPU's), multi-computer systems, data flow processors (DFP's), digital signal processors (DSP's) or the like
US6338106B1 (en) 1996-12-20 2002-01-08 Pact Gmbh I/O and memory bus system for DFPS and units with two or multi-dimensional programmable cell architectures
DE19654595A1 (en) * 1996-12-20 1998-07-02 Pact Inf Tech Gmbh I0 and memory bus system for DFPs as well as building blocks with two- or multi-dimensional programmable cell structures
DE19654593A1 (en) * 1996-12-20 1998-07-02 Pact Inf Tech Gmbh Reconfiguration procedure for programmable blocks at runtime
DE19654846A1 (en) * 1996-12-27 1998-07-09 Pact Inf Tech Gmbh Process for the independent dynamic reloading of data flow processors (DFPs) as well as modules with two- or multi-dimensional programmable cell structures (FPGAs, DPGAs, etc.)
EP1329816B1 (en) * 1996-12-27 2011-06-22 Richter, Thomas Method for automatic dynamic unloading of data flow processors (dfp) as well as modules with bidimensional or multidimensional programmable cell structures (fpgas, dpgas or the like)
DE19704044A1 (en) * 1997-02-04 1998-08-13 Pact Inf Tech Gmbh Address generation with systems having programmable modules
DE19704728A1 (en) * 1997-02-08 1998-08-13 Pact Inf Tech Gmbh Method for self-synchronization of configurable elements of a programmable module
US6542998B1 (en) 1997-02-08 2003-04-01 Pact Gmbh Method of self-synchronization of configurable elements of a programmable module
DE19704742A1 (en) 1997-02-11 1998-09-24 Pact Inf Tech Gmbh Internal bus system for DFPs, as well as modules with two- or multi-dimensional programmable cell structures, for coping with large amounts of data with high networking effort
US6134516A (en) * 1997-05-02 2000-10-17 Axis Systems, Inc. Simulation server system and method
US6389379B1 (en) 1997-05-02 2002-05-14 Axis Systems, Inc. Converification system and method
US6421251B1 (en) 1997-05-02 2002-07-16 Axis Systems Inc Array board interconnect system and method
US6026230A (en) * 1997-05-02 2000-02-15 Axis Systems, Inc. Memory simulation system and method
US6009256A (en) * 1997-05-02 1999-12-28 Axis Systems, Inc. Simulation/emulation system and method
US6321366B1 (en) 1997-05-02 2001-11-20 Axis Systems, Inc. Timing-insensitive glitch-free logic system and method
US5960191A (en) 1997-05-30 1999-09-28 Quickturn Design Systems, Inc. Emulation system with time-multiplexed interconnect
US5970240A (en) * 1997-06-25 1999-10-19 Quickturn Design Systems, Inc. Method and apparatus for configurable memory emulation
US8686549B2 (en) 2001-09-03 2014-04-01 Martin Vorbach Reconfigurable elements
US6101181A (en) * 1997-11-17 2000-08-08 Cray Research Inc. Virtual channel assignment in large torus systems
US5970232A (en) * 1997-11-17 1999-10-19 Cray Research, Inc. Router table lookup mechanism
US6085303A (en) * 1997-11-17 2000-07-04 Cray Research, Inc. Seralized race-free virtual barrier network
US6230252B1 (en) * 1997-11-17 2001-05-08 Silicon Graphics, Inc. Hybrid hypercube/torus architecture
DE19861088A1 (en) 1997-12-22 2000-02-10 Pact Inf Tech Gmbh Repairing integrated circuits by replacing subassemblies with substitutes
DE19807872A1 (en) 1998-02-25 1999-08-26 Pact Inf Tech Gmbh Method of managing configuration data in data flow processors
US6205537B1 (en) * 1998-07-16 2001-03-20 University Of Rochester Mechanism for dynamically adapting the complexity of a microprocessor
US6216174B1 (en) 1998-09-29 2001-04-10 Silicon Graphics, Inc. System and method for fast barrier synchronization
JP2003505753A (en) 1999-06-10 2003-02-12 ペーアーツェーテー インフォルマツィオーンステヒノロギー ゲゼルシャフト ミット ベシュレンクテル ハフツング Sequence division method in cell structure
US6674720B1 (en) 1999-09-29 2004-01-06 Silicon Graphics, Inc. Age-based network arbitration system and method
US6751698B1 (en) 1999-09-29 2004-06-15 Silicon Graphics, Inc. Multiprocessor node controller circuit and method
DE50115584D1 (en) * 2000-06-13 2010-09-16 Krass Maren PIPELINE CT PROTOCOLS AND COMMUNICATION
ATE437476T1 (en) 2000-10-06 2009-08-15 Pact Xpp Technologies Ag CELL ARRANGEMENT WITH SEGMENTED INTERCELL STRUCTURE
US8058899B2 (en) 2000-10-06 2011-11-15 Martin Vorbach Logic cell array and bus system
US20040015899A1 (en) * 2000-10-06 2004-01-22 Frank May Method for processing data
US6990555B2 (en) * 2001-01-09 2006-01-24 Pact Xpp Technologies Ag Method of hierarchical caching of configuration data having dataflow processors and modules having two- or multidimensional programmable cell structure (FPGAs, DPGAs, etc.)
US7210129B2 (en) 2001-08-16 2007-04-24 Pact Xpp Technologies Ag Method for translating programs for reconfigurable architectures
US9037807B2 (en) * 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US7444531B2 (en) * 2001-03-05 2008-10-28 Pact Xpp Technologies Ag Methods and devices for treating and processing data
US7844796B2 (en) * 2001-03-05 2010-11-30 Martin Vorbach Data processing device and method
US7581076B2 (en) 2001-03-05 2009-08-25 Pact Xpp Technologies Ag Methods and devices for treating and/or processing data
US20090300262A1 (en) * 2001-03-05 2009-12-03 Martin Vorbach Methods and devices for treating and/or processing data
US20090210653A1 (en) * 2001-03-05 2009-08-20 Pact Xpp Technologies Ag Method and device for treating and processing data
US7400668B2 (en) * 2001-03-22 2008-07-15 Qst Holdings, Llc Method and system for implementing a system acquisition function for use with a communication device
US7249242B2 (en) 2002-10-28 2007-07-24 Nvidia Corporation Input pipeline registers for a node in an adaptive computing engine
US7489779B2 (en) 2001-03-22 2009-02-10 Qstholdings, Llc Hardware implementation of the secure hash standard
US8843928B2 (en) 2010-01-21 2014-09-23 Qst Holdings, Llc Method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
US6836839B2 (en) 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7752419B1 (en) 2001-03-22 2010-07-06 Qst Holdings, Llc Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US7653710B2 (en) 2002-06-25 2010-01-26 Qst Holdings, Llc. Hardware task manager
US7962716B2 (en) * 2001-03-22 2011-06-14 Qst Holdings, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US6577678B2 (en) 2001-05-08 2003-06-10 Quicksilver Technology Method and system for reconfigurable channel coding
AU2002347560A1 (en) * 2001-06-20 2003-01-02 Pact Xpp Technologies Ag Data processing method
US7996827B2 (en) 2001-08-16 2011-08-09 Martin Vorbach Method for the translation of programs for reconfigurable architectures
US7434191B2 (en) * 2001-09-03 2008-10-07 Pact Xpp Technologies Ag Router
US8686475B2 (en) 2001-09-19 2014-04-01 Pact Xpp Technologies Ag Reconfigurable elements
US7376811B2 (en) * 2001-11-06 2008-05-20 Netxen, Inc. Method and apparatus for performing computations and operations on data using data steering
US7046635B2 (en) 2001-11-28 2006-05-16 Quicksilver Technology, Inc. System for authorizing functionality in adaptable hardware devices
US6986021B2 (en) 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US8412915B2 (en) * 2001-11-30 2013-04-02 Altera Corporation Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements
US7602740B2 (en) * 2001-12-10 2009-10-13 Qst Holdings, Inc. System for adapting device standards after manufacture
US7215701B2 (en) 2001-12-12 2007-05-08 Sharad Sambhwani Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7088825B2 (en) * 2001-12-12 2006-08-08 Quicksilver Technology, Inc. Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7231508B2 (en) * 2001-12-13 2007-06-12 Quicksilver Technologies Configurable finite state machine for operation of microinstruction providing execution enable control value
US7577822B2 (en) * 2001-12-14 2009-08-18 Pact Xpp Technologies Ag Parallel task operation in processor and reconfigurable coprocessor configured based on information in link list including termination information for synchronization
US7403981B2 (en) * 2002-01-04 2008-07-22 Quicksilver Technology, Inc. Apparatus and method for adaptive multimedia reception and transmission in communication environments
AU2003214046A1 (en) * 2002-01-18 2003-09-09 Pact Xpp Technologies Ag Method and device for partitioning large computer programs
WO2003060747A2 (en) * 2002-01-19 2003-07-24 Pact Xpp Technologies Ag Reconfigurable processor
AU2003214003A1 (en) 2002-02-18 2003-09-09 Pact Xpp Technologies Ag Bus systems and method for reconfiguration
WO2003081454A2 (en) * 2002-03-21 2003-10-02 Pact Xpp Technologies Ag Method and device for data processing
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US7493375B2 (en) 2002-04-29 2009-02-17 Qst Holding, Llc Storage and delivery of device features
US7328414B1 (en) * 2003-05-13 2008-02-05 Qst Holdings, Llc Method and system for creating and programming an adaptive computing engine
US7660984B1 (en) 2003-05-13 2010-02-09 Quicksilver Technology Method and system for achieving individualized protected space in an operating system
US7471643B2 (en) * 2002-07-01 2008-12-30 Panasonic Corporation Loosely-biased heterogeneous reconfigurable arrays
US7461234B2 (en) * 2002-07-01 2008-12-02 Panasonic Corporation Loosely-biased heterogeneous reconfigurable arrays
AU2003252157A1 (en) * 2002-07-23 2004-02-09 Gatechange Technologies, Inc. Interconnect structure for electrical devices
AU2003256699A1 (en) * 2002-07-23 2004-02-09 Gatechange Technologies, Inc. Self-configuring processing element
US20040019765A1 (en) * 2002-07-23 2004-01-29 Klein Robert C. Pipelined reconfigurable dynamic instruction set processor
US7657861B2 (en) * 2002-08-07 2010-02-02 Pact Xpp Technologies Ag Method and device for processing data
WO2005010632A2 (en) * 2003-06-17 2005-02-03 Pact Xpp Technologies Ag Data processing device and method
AU2003286131A1 (en) * 2002-08-07 2004-03-19 Pact Xpp Technologies Ag Method and device for processing data
US8108656B2 (en) 2002-08-29 2012-01-31 Qst Holdings, Llc Task definition for specifying resource requirements
EP1537486A1 (en) 2002-09-06 2005-06-08 PACT XPP Technologies AG Reconfigurable sequencer structure
WO2004027648A1 (en) 2002-09-18 2004-04-01 Netezza Corporation Intelligent storage device controller
US7937591B1 (en) 2002-10-25 2011-05-03 Qst Holdings, Llc Method and system for providing a device which can be adapted on an ongoing basis
AU2003287317B2 (en) * 2002-10-31 2010-03-11 Lockheed Martin Corporation Pipeline accelerator having multiple pipeline units and related computing machine and method
US7478031B2 (en) 2002-11-07 2009-01-13 Qst Holdings, Llc Method, system and program for developing and scheduling adaptive integrated circuity and corresponding control or configuration information
US8276135B2 (en) 2002-11-07 2012-09-25 Qst Holdings Llc Profiling of software and circuit designs utilizing data operation analyses
US7225301B2 (en) 2002-11-22 2007-05-29 Quicksilver Technologies External memory controller node
DE112004000026D2 (en) * 2003-04-04 2006-06-14 Pact Xpp Technologies Ag Method and device for data processing
JP2006526227A (en) * 2003-05-23 2006-11-16 ワシントン ユニヴァーシティー Intelligent data storage and processing using FPGA devices
US7609297B2 (en) * 2003-06-25 2009-10-27 Qst Holdings, Inc. Configurable hardware based digital imaging apparatus
WO2006082091A2 (en) * 2005-02-07 2006-08-10 Pact Xpp Technologies Ag Low latency massive parallel data processing device
US7379424B1 (en) 2003-08-18 2008-05-27 Cray Inc. Systems and methods for routing packets in multiprocessor computer systems
US7200837B2 (en) * 2003-08-21 2007-04-03 Qst Holdings, Llc System, method and software for static and dynamic programming and configuration of an adaptive computing architecture
EP1676208A2 (en) * 2003-08-28 2006-07-05 PACT XPP Technologies AG Data processing device and method
US7343362B1 (en) * 2003-10-07 2008-03-11 United States Of America As Represented By The Secretary Of The Army Low complexity classification from a single unattended ground sensor node
US8782654B2 (en) 2004-03-13 2014-07-15 Adaptive Computing Enterprises, Inc. Co-allocating a reservation spanning different compute resources types
US20070266388A1 (en) 2004-06-18 2007-11-15 Cluster Resources, Inc. System and method for providing advanced reservations in a compute environment
US8176490B1 (en) 2004-08-20 2012-05-08 Adaptive Computing Enterprises, Inc. System and method of interfacing a workload manager and scheduler with an identity manager
CA2586763C (en) 2004-11-08 2013-12-17 Cluster Resources, Inc. System and method of providing system jobs within a compute environment
EP1859378A2 (en) 2005-03-03 2007-11-28 Washington University Method and apparatus for performing biosequence similarity searching
US9075657B2 (en) 2005-04-07 2015-07-07 Adaptive Computing Enterprises, Inc. On-demand access to compute resources
US8863143B2 (en) 2006-03-16 2014-10-14 Adaptive Computing Enterprises, Inc. System and method for managing a hybrid compute environment
WO2006112980A2 (en) 2005-03-16 2006-10-26 Cluster Resources, Inc. Reserving resources in an on-demand compute environment from a local compute environment
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
US9015324B2 (en) 2005-03-16 2015-04-21 Adaptive Computing Enterprises, Inc. System and method of brokering cloud computing resources
US8782120B2 (en) 2005-04-07 2014-07-15 Adaptive Computing Enterprises, Inc. Elastic management of compute resources between a web server and an on-demand compute environment
US7281942B2 (en) * 2005-11-18 2007-10-16 Ideal Industries, Inc. Releasable wire connector
EP1974265A1 (en) 2006-01-18 2008-10-01 PACT XPP Technologies AG Hardware definition method
US7660793B2 (en) 2006-11-13 2010-02-09 Exegy Incorporated Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors
US8041773B2 (en) 2007-09-24 2011-10-18 The Research Foundation Of State University Of New York Automatic clustering for self-organizing grids
US20100272811A1 (en) * 2008-07-23 2010-10-28 Alkermes,Inc. Complex of trospium and pharmaceutical compositions thereof
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
CN102122275A (en) * 2010-01-08 2011-07-13 上海芯豪微电子有限公司 Configurable processor
US8589867B2 (en) 2010-06-18 2013-11-19 Microsoft Corporation Compiler-generated invocation stubs for data parallel programming model
US20110314256A1 (en) * 2010-06-18 2011-12-22 Microsoft Corporation Data Parallel Programming Model
JP6045505B2 (en) 2010-12-09 2016-12-14 アイピー レザボア, エルエルシー.IP Reservoir, LLC. Method and apparatus for managing orders in a financial market
US10121196B2 (en) 2012-03-27 2018-11-06 Ip Reservoir, Llc Offload processing of data packets containing financial market data
US11436672B2 (en) 2012-03-27 2022-09-06 Exegy Incorporated Intelligent switch for processing financial market data
WO2015035320A1 (en) * 2013-09-06 2015-03-12 Huawei Technologies Co., Ltd. System and method for an asynchronous processor with a hierarchical token system
GB2535547B (en) * 2015-04-21 2017-01-11 Adaptive Array Systems Ltd Data processor
US11074213B2 (en) * 2019-06-29 2021-07-27 Intel Corporation Apparatuses, methods, and systems for vector processor architecture having an array of identical circuit blocks

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3787673A (en) * 1972-04-28 1974-01-22 Texas Instruments Inc Pipelined high speed arithmetic unit
US3875391A (en) * 1973-11-02 1975-04-01 Raytheon Co Pipeline signal processor
US3978452A (en) * 1974-02-28 1976-08-31 Burroughs Corporation System and method for concurrent and pipeline processing employing a data driven network
CH587180A5 (en) * 1974-09-26 1977-04-29 Knotex Maschinenbau Gmbh
US4051551A (en) * 1976-05-03 1977-09-27 Burroughs Corporation Multidimensional parallel access computer memory system
US4174514A (en) * 1976-11-15 1979-11-13 Environmental Research Institute Of Michigan Parallel partitioned serial neighborhood processors
US4101960A (en) * 1977-03-29 1978-07-18 Burroughs Corporation Scientific processor
US4161036A (en) * 1977-11-08 1979-07-10 United States Of America, Director National Security Agency Method and apparatus for random and sequential accessing in dynamic memories
US4228497A (en) * 1977-11-17 1980-10-14 Burroughs Corporation Template micromemory structure for a pipelined microprogrammable data processing system
US4363094A (en) * 1977-12-29 1982-12-07 M/A-COM DDC, Inc. Communications processor
US4244019A (en) * 1978-06-29 1981-01-06 Amdahl Corporation Data processing system including a program-executing secondary system controlling a program-executing primary system
JPS6024985B2 (en) * 1978-08-31 1985-06-15 富士通株式会社 Data processing method
US4225920A (en) * 1978-09-11 1980-09-30 Burroughs Corporation Operator independent template control architecture
US4247892A (en) * 1978-10-12 1981-01-27 Lawrence Patrick N Arrays of machines such as computers
US4307447A (en) * 1979-06-19 1981-12-22 Gould Inc. Programmable controller
CA1174370A (en) * 1980-05-19 1984-09-11 Hidekazu Matsumoto Data processing unit with pipelined operands
US4482953A (en) * 1980-05-30 1984-11-13 Fairchild Camera & Instrument Corporation Computer with console addressable PLA storing control microcode and microinstructions for self-test of internal registers and ALU
IT1131598B (en) * 1980-07-16 1986-06-25 Telettra Lab Telefon CAVITY FOR MICROWAVES STABLE IN TEMPERATURE
US4467409A (en) * 1980-08-05 1984-08-21 Burroughs Corporation Flexible computer architecture using arrays of standardized microprocessors customized for pipeline and parallel operations
JPS57155666A (en) * 1981-03-20 1982-09-25 Fujitsu Ltd Instruction controlling system of vector processor
US4442498A (en) * 1981-04-23 1984-04-10 Josh Rosen Arithmetic unit for use in data processing systems
US4438494A (en) * 1981-08-25 1984-03-20 Intel Corporation Apparatus of fault-handling in a multiprocessing system
JPS5883257A (en) * 1981-11-13 1983-05-19 Noritoshi Nakabachi Ultrasonic microscope
US4498134A (en) * 1982-01-26 1985-02-05 Hughes Aircraft Company Segregator functional plane for use in a modular array processor
US4612628A (en) * 1983-02-14 1986-09-16 Data General Corp. Floating-point unit constructed of identical modules
US4594655A (en) * 1983-03-14 1986-06-10 International Business Machines Corporation (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions
US4589067A (en) * 1983-05-27 1986-05-13 Analogic Corporation Full floating point vector processor with dynamically configurable multifunction pipelined ALU
US4621339A (en) * 1983-06-13 1986-11-04 Duke University SIMD machine using cube connected cycles network architecture for vector processing
JPS6057467A (en) * 1983-09-09 1985-04-03 Nec Corp Vector data processor
JPH0642237B2 (en) * 1983-12-28 1994-06-01 株式会社日立製作所 Parallel processor
JPS6113379A (en) * 1984-06-28 1986-01-21 Fujitsu Ltd Image processor
US4761755A (en) * 1984-07-11 1988-08-02 Prime Computer, Inc. Data processing system and method having an improved arithmetic unit
JPH07113884B2 (en) * 1985-12-28 1995-12-06 株式会社東芝 Logic circuit

Also Published As

Publication number Publication date
EP0268435A3 (en) 1990-12-05
DK595887A (en) 1988-05-15
AU599428B2 (en) 1990-07-19
EP0268435B1 (en) 1995-04-12
ES2070825T3 (en) 1995-06-16
NO874742L (en) 1988-05-16
AU7982287A (en) 1988-05-19
DE3751235D1 (en) 1995-05-18
DK595887D0 (en) 1987-11-13
NO874742D0 (en) 1987-11-13
JPS63147258A (en) 1988-06-20
DE3751235T2 (en) 1995-08-24
EP0268435A2 (en) 1988-05-25
US4811214A (en) 1989-03-07

Similar Documents

Publication Publication Date Title
CA1288170C (en) Multinode reconfigurable pipeline computer
US6339819B1 (en) Multiprocessor with each processor element accessing operands in loaded input buffer and forwarding results to FIFO output buffer
EP0726532B1 (en) Array processor communication architecture with broadcast instructions
US4617625A (en) Vector processor
US5175863A (en) Signal data processing system having independently, simultaneously operable alu and macu
US5410727A (en) Input/output system for a massively parallel, single instruction, multiple data (SIMD) computer providing for the simultaneous transfer of data between a host computer input/output system and all SIMD memory devices
US6167502A (en) Method and apparatus for manifold array processing
US6510510B1 (en) Digital signal processor having distributed register file
US5689722A (en) Multipipeline multiprocessor system
AU2001245761A1 (en) Enhanced memory algorithmic processor architecture for multiprocessor computer systems
US5175862A (en) Method and apparatus for a special purpose arithmetic boolean unit
US5243699A (en) Input/output system for parallel processing arrays
US6446190B1 (en) Register file indexing methods and apparatus for providing indirect control of register addressing in a VLIW processor
US4837676A (en) MIMD instruction flow computer architecture
JPH0228864A (en) Multiprocessor subsystem
EP0532700A4 (en) Multi-dimensional processor system and processor array with massively parallel input/output
Lin et al. Reconfigurable buses with shift switching: Concepts and applications
JPH0228721A (en) Processing system
US20030221086A1 (en) Configurable stream processor apparatus and methods
Gottlieb et al. Clustered programmable-reconfigurable processors
Nosenchuck et al. Multinode reconfigurable pipeline computer
USRE41012E1 (en) Register file indexing methods and apparatus for providing indirect control of register addressing in a VLIW processor
Schwartz et al. The optimal synchronous cyclo-static array: a multiprocessor supercomputer for digital signal processing
Nakkar et al. Dynamically programmable cache
Mzoughi et al. Very high speed vectorial processors using serial multiport memory as data memory

Legal Events

Date Code Title Description
MKLA Lapsed