US20010052106A1 - Method for determining an optimized memory organization of a digital device - Google Patents

Method for determining an optimized memory organization of a digital device Download PDF

Info

Publication number
US20010052106A1
US20010052106A1 US09/823,409 US82340901A US2001052106A1 US 20010052106 A1 US20010052106 A1 US 20010052106A1 US 82340901 A US82340901 A US 82340901A US 2001052106 A1 US2001052106 A1 US 2001052106A1
Authority
US
United States
Prior art keywords
cycle budget
block
scheduling
blocks
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/823,409
Other versions
US6449747B2 (en
Inventor
Sven Wuytack
Francky Catthoor
Hugo De Man
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/823,409 priority Critical patent/US6449747B2/en
Publication of US20010052106A1 publication Critical patent/US20010052106A1/en
Application granted granted Critical
Publication of US6449747B2 publication Critical patent/US6449747B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/06Power analysis or power optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Definitions

  • the invention relates to methods for designing essentially digital devices.
  • An essentially digital device comprises at least of a memory organization (an amount of memories with their sizes and an interconnection pattern) and registers. Such a memory organization is determined during the design process of said digital device.
  • the operation of an essentially digital system can essentially be described as a set of data access operations or instructions on data structures or variables, being stored in said memories.
  • register allocation starting from a fully scheduled flow graph (thus ordered data access operations or instructions are used as input), are presented. Said allocation techniques are scalar oriented. Many of these techniques construct a scalar conflict or compatibility graph and solve the problem using graph coloring or clique partitioning. This conflict graph is fully determined by the schedule which is fixed before. This means that no effort is spent to come up with an optimal conflict graph and thus the potential optimization by reconsidering the schedule is not exploited. Moreover only register allocation is addressed and not memories.
  • the Improved Force Directed Scheduling [W. Verhaegh, P. Lippens, E. Aarts, J. Korst, J. van Meerbergen, A. van der Werf, Improved force-directed scheduling in high-throughput digital signal processing, IEEE Transactions on CAD and Systems, Vol.14, No.8, Aug. 1995.] shows a method wherein scheduling intervals are gradually reduced until the desired result is obtained.
  • the cost function used to determine which scheduling interval has to be reduced at each iteration only takes the number of parallel data accesses to reduce the required memory bandwidth into account.
  • (1)FDS does not take into account which data is being accessed. Balancing the number of simultaneous data accesses is a local optimization which can be very bad globally. In IFDS all data is treated equally, although in practice some simultaneous data accesses are more expensive in terms of memory cost than other. Also the required number of memories cannot be estimated accurately by looking locally only, as is done in IFDS, because all conflicts have to be considered for this.
  • One aspect of the invention includes a method of determining optimized scheduling intervals and optimized access conflicts useful for determining an optimized memory organization of an essentially digital device, the method comprising: determining an initial scheduling of the data access instructions for a plurality of disjunct blocks, wherein each of the blocks include part of the data access instructions, and wherein at least one of the blocks is executed a plurality of times that is defined by an iteration count, deriving from the initial scheduling an initial block cycle budget for each block, while the overall cycle budget for performance of the digital device is larger than a predetermined overall cycle budget, repeating the method comprising: (a) for substantially all of the blocks, performing the method comprising: temporarily reducing a block cycle budget for a selected block by a predetermined amount; determining optimized scheduling intervals of the data access instructions such that the performance of the digital device is guaranteed to be within the block cycle budgets, wherein determining the optimized scheduling intervals comprises optimizing access conflicts with respect to an evaluation criterion related to the memory cost of the digital device; computing the overall cycle
  • Another aspect of the invention includes a method of determining a cost-cycle budget curve for an essentially digital device that is represented by a digital representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times that is indicated by an iteration count, the method comprising: generating a cost-cycle budget curve that compares the cost of a memory organization of the digital device versus the cycle budget, wherein the cost-cycle budget curve is incrementally generated.
  • Another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that execution of the functionality with the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: determining block cycle budgets while optimizing scheduling intervals.
  • Yet another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that the performance of the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: (a) determining initial block cycle budgets such that no access conflict exists between the data access instructions within each block; (b) temporarily reducing, in an iterative process, a block cycle budget by a predetermined amount, and computing the access conflict cost and overall cycle budget reduction of such reduced block cycle budget; (c) reducing the block cycle budget for at least one selected block; and (d) returning to act (b) if the overall cycle budget for execution of the functionality with the digital device is larger than the predetermined overall
  • Yet another aspect of the invention includes a method optimizing the scheduling of data instructions, the method comprising: determining a scheduling of data instructions for a plurality of blocks; determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; calculating a cumulative cycle budget for the blocks; and repeating the identifying, reducing and modifying, wherein the modifying includes modifying a modified scheduling until the cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
  • Yet another aspect of the invention includes a system for optimizing the scheduling of data instructions, the system comprising: means for determining a scheduling of data instructions for a plurality of blocks; means for determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; means for identifying one of the blocks for a cycle budget reduction; means for reducing the cycle budget of the identified block; and means for modifying the scheduling and cycle budget of at least one of the blocks until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
  • Yet another aspect of the invention includes a digital device having an optimized memory organization, wherein the design of the memory organization is generated by the method comprising: determining an initial scheduling of data instructions for a plurality of blocks, wherein the data instructions are to be executed by the digital device; determining a cycle budget for substantially each of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of the block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; repeating the identifying, reducing and modifying acts, wherein the modifying act includes modifying a modified scheduling, until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget, and wherein the modified scheduling is used to define the design of a memory organization for the digital device.
  • FIG. 1 shows at the left side code representing the functionality of a digital system to be designed.
  • the overall conflict graph for each of said schedules and a memory organization of said digital device being compatible with said graph are shown in row two and three respectively.
  • FIG. 2 shows at the left side code representing the functionality of a digital system to be designed.
  • the local optimization column is an optimization method shown which optimizes each of the blocks in said code separately.
  • the global optimization column is an optimization method shown which optimizes the block all together.
  • FIG. 3 shows source code type description of the invented method for performing a global optimization of the storage bandwidth optimization.
  • FIG. 4 shows at the left side again code.
  • the right side shows the cycle budget distribution over the blocks changes while optimizing.
  • For left to right the result of more optimized schedulings are shown in terms of cycle and the approaching of the target cycle budget.
  • FIG. 5 shows a cost, here energy, cycle budget curve being obtained while executing the method in an incremental way.
  • FIG. 6 shows the cycle budget reduction while optimizing and the relation with the cost figures (being actual and estimated power consumption here).
  • FIG. 7 b shows a flow graph, indicating the storage of previously determined information for use in next iterations.
  • FIG. 7 a shows a cost-cycle budget curve and the incremental approach, determining a point of the curve, based on information obtained while determining a previous point with a larger cycle budget.
  • FIG. 8 a shows a flow chart of an approach wherein a preprocessing for block cycle distribution is done.
  • FIG. 8 b shows a flow chart showing the incremental approach.
  • FIG. 8 c shows another flow chart showing the incremental approach, wherein explicit the information flow from one optimization (stored) to a next optimization (loading) and the generation of points on the pareto curve.
  • FIG. 9 is a flow chart illustrating a process of optimizing a memory organization.
  • the invention presents a method and a design system incorporating said method for designing essentially digital devices with a hierarchical flow graph representation, said method incorporating partly a design method, suited for designing essentially digital devices with a flat flow graph representation, the latter design method being disclosed in U.S. patent application Ser. No. 09360140, herein fully incorporated by reference.
  • Devices with a hierarchical flow graph representation are defined as devices with a functional representation containing loops and/or function calls.
  • a memory organization for an essentially digital device is obtained starting of from a system description.
  • a system description can be a source level description but is not limited thereto.
  • the hierarchical nature of the digital device representation result in a storage bandwidth optimization step, producing a memory cost versus cycle budget curve, from which a memory organization can be selected for instance by the system designer as indicated in FIG. 9.
  • the main goal of the storage bandwidth optimization inspired design methods, both for devices with a flat- and hierarchical graph representation, is to find which basic groups, being groups of scalar signals, such as arrays, must be stored in different memories, being part of the memory organization of said digital system under design, in order to meet the cycle budget, e.g. resulting in completion of the functionality to be executed by said digital device within a predetermined amount of time.
  • FIG. 1 a source type description wherein arrays A, B and C are accessed, meaning being read (denoted R(A)) and/or values being stored in said arrays (denoted W(A)).
  • the method works at the level of basic groups, meaning that the arrays will be evaluated as a whole.
  • the individual signals defined by the arrays elements are not considered separately, meaning that an access to A[1] and A[2] are considered both as an access to A.
  • Splitting of arrays is of course possible but the main concept is that groups of signals such as array elements are considered.
  • the basic groups are defined and a scheduling of the data access instructions, meaning indicating in which order they will be executed, is given, one can by inspection of that schedule determine whether access conflicts exist.
  • simultaneous accesses can be beneficial from the point of view of duration of execution as a shorter execution time is obtained, there is a memory cost involved.
  • the invention relates to a method for determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of said digital device, said representation comprising data access instructions on basic groups, being groups of scalar signals, said data access instructions having scheduling intervals, said optimized memory organization being such that execution of said functionality with said digital device is guaranteed to be within a predetermined overall cycle budget.
  • the flat flow graph design method disclosed in U.S. Ser. No. 09360140 dealt with a single (flat) flow graph within the given time constraints.
  • the flow graph is for instance extracted from a C input description. Internally, it constructively generates a (partial) memory access ordering steered by a sophisticated cost model which incorporated global tradeoffs and access conflicts over the entire algorithm code.
  • the cost can include memory size/area and the cost thereof, power consumption of the device and mainly the memory related power consumption, latency and all possible cost factor, for which estimations can be made from a high level description.
  • the memory related power consumption cost model is typically based on memory size, access frequency and other high level estimates (e.g., possibilities for array size reduction).
  • ECG Extended Conflict Graph
  • a method for determining an optimized memory organization for essentially digital systems with loops or function calls, hence having a hierarchical flow graph representation, hence a non-flat flow graph, is now presented. Said method aims at avoiding the destruction of hardware or code reuse possibilities present in the original specification while performing a storage bandwidth optimization step. Further said method provides an approach with a complexity being far less than the complexity than would be obtained when applying the method, being dedicated for systems without loops or function calls, on a specification with unrolled loops and inlined function calls.
  • the application code can be considered to be partitioned in blocks corresponding to function bodies, loop bodies and conditional branches. Each statement of the code belongs to one and only one block.
  • a block can be defined as a code part which contains a flat flow graph. It can contain multiple basic blocks and condition scopes. Even the function hierarchy can be adapted to the needs of ordering freedom.
  • the term loop body and block coincide; all loop bodies are defined as a separate block and the body of a nested loop is part of an other block. It is assumed that at least one of said disjunct blocks, each block including part of said data access instructions, is executed at least a plurality of times, as indicated by the iteration count, associated to each block.
  • a program at the left side in code type format representing the functionality of a digital system to be designed, is given containing 3 consecutive loops (for construct), defining 3 blocks (within the scope of the for-loop), to be ordered in 500 cycles (hence defining the overall cycle budget given for executing said digital device functionality).
  • the table of FIG. 1 shows three different distributions with each a different cycle budget distribution over said blocks and a good ordering/schedule matching the distribution.
  • the overall conflict graph for each of said schedules and a memory organization of said digital device being compatible with said graph are shown in row two and three respectively.
  • the resulting conflict graph and cheapest memory architecture is given in the last two rows.
  • the second solution (loop-i 2 cycles, loop-j 1 cycle and loop-k 2 cycles) is the cheapest solution.
  • the very poor third distribution even forces a dual-ported memory (due to assignment of one cycle for loop-i).
  • the illustration here is based on a simplified example to show the problem. But in fact, the problem in real-life applications is much more difficult.
  • the number of block iterations will not be equal for each block.
  • the impact on the global time elapse of one block is much bigger than another block. Hence, there is more freedom in the search space.
  • FIG. 1 indicates that different cycle budget distributions can have an enormous effect on the conflict graph and the cost of the memory organization.
  • a global optimization over all blocks is needed to obtain the global optimal conflict graph. Ordering the memory accesses on a block per block basis will result in a poor global result. The local solution of one block will typically not match the (local) solutions found in other blocks. Together, the local optima will then sum up to an expensive global solution as shown in FIG. 2. Solving the problem locally per block will lead to different conflict graphs.
  • the total application has one memory architecture and therefore one global conflict graph only. The memory architecture cannot change from one block to the next. Therefore, all the block (local) conflicts graphs should be added to one single global conflict graph. A typical conflict mismatch is shown in the left side of FIG. 2. A fully connected global graph is the result, requiring four memories.
  • FIG. 2 thus compares a local optimization method shown which optimizes each of the blocks in said code separately, leading to a completely covered conflict graph and an expensive memory organization with the invented global method which optimizes the block all together, leading to conflict re-use, and hence a conflict graph with less conflicts and a cheaper memory organization.
  • a method for determining an optimized memory organization of an essentially digital device represented by an appropriate representation determines block cycle budgets while optimizing said scheduling intervals. Hence no preprocessing step for block cycle budget distribution (as shown in FIG. 8 a, ( 750 )) is needed. Instead during calculation for each overall cycle budget obtained one can assign for each block the related cycle budget selected in ( 700 ). Note that the decision making ( 700 ) on which block the cycle budget reduction will be applied and hence using the related scheduling for said block as determined in ( 600 ) is performed after optimization of each block.
  • substantially all said blocks said combined or interleaved block cycle determination and optimizing said scheduling intervals exploit the same global conflict graph to be optimized.
  • one select a block denoted the selected block
  • said block cycle budgets are fixed at that moment except for the block cycle budget of the selected block which is reduced, said optimizing access conflicts within a single global conflict graph with respect to an evaluation criterion related to the global memory cost of said digital device.
  • the method suited for digital systems with a hierarchical flow graph representation, generating a storage cycle budget distribution over multiple blocks incorporates a method suited for digital systems with an essentially flat flow graph representation, more in particular by placing said latter method in a loop, iterating over substantially all the blocks. As motivated before as such, this could leads to a poor global optimum. Therefore a special way of iterating over said blocks is performed. Indeed one starts with initially large block cycle budgets, the value of said block cycle budget for instance being such that between the data access instructions within one blocks no access conflict exist. In this example one derives from said no access conflict condition said initial block cycle budgets.
  • the total cycle budget being defined by said block cycle budgets and the iteration count of said blocks, will usually be larger than the predetermined overall cycle budget wanted. Said iteration over said blocks is not so that first the minimal block cycle budget of a first block is determined and thereafter for another block. Instead a limited and temporarily, hence not yet approved, block cycle budget reduction of essentially all said blocks is examined, meaning its influence on the access conflict cost is determined, then at least for one of said blocks its cycle budget reduction is applied, and the procedure is started all over again.
  • a method for determining an optimized memory organization of an essentially digital device represented by an appropriate representation, the method comprises the steps of determining initial block cycle budgets, for instance such that between the data access instructions within each block no access conflict exist ( 550 ) and as long as the overall cycle budget for execution of said functionality with said digital device is larger than said predetermined overall cycle budget (substep 1 ) iteratively performing over substantially all said blocks a temporarily reducing the selected block cycle budget with a predetermined amount ( 600 )( 610 ), computing the access conflict cost and overall cycle budget reduction, (substep 2 ) applying for at least one selected block said block cycle budget reduction ( 700 ) and returning to substep 1 .
  • Note that ( 775 ) in FIG. 8 b can in an embodiment of the invention be the block ( 210 ) of FIG. 7 b.
  • the initialization step ( 550 ) determines for each block a first initial cycle budget, denoted 1 e cycle budget.
  • the process of reducing for one block the cycle budget is done for essentially each block via iteration ( 610 ). So for each block a 2 e cycle budget is obtained and an associated cost.
  • step ( 700 ) By comparing in step ( 700 ) the gain reduction of the overall cycle budget, being affected by the block cycle budget reduction, with the conflict cost increase, one determines for which block the cycle budget is approved. For that block and only that block (in a single block embodiment) the 2 e cycle budget becomes the 1 e cycle budget while the others remain at their 1 e cycle budget. So one performs an update of single block cycle budget.
  • FIG. 3 shows a source code description of the method for performing a global optimization of the storage bandwidth optimization approach.
  • the global optimization is also an incremental approach starting off with a too large target budget, computing the associated minimal cost and reducing said target budget until the cycle budget requirement is met.
  • a so-called flat graph like optimizer used for optimization of flat code, hence code without blocks can be used.
  • this flat graph solver is not used with a fixed cycle budget distribution over said blocks. Instead cycle budgets for each of said blocks are generated while optimizing. Important is to notice that there is a gain/cost analysis for essentially all blocks and only thereafter a decision for typical one block.
  • FIG. 4 indicates that after a first iteration ( 80 ) block ( 11 ) its cycle budget is reduced from 5 to 4 (resulting due to the 5 times iterations in an overall cycle budget for that block of 20) while the other blocks cycle budget are unchanged. In the next two iterations block ( 13 ) its cycle budget are reduced.
  • the method step ( 600 ) of FIG. 8 b is performed by executing a flat-graph type optimizer ( 90 ) (as shown in the FIG. 3 code).
  • the design method is an incremental approach, reducing the global cycle budget every step until the target cycle budget is met. Initially, for instance the memory access ordering is sequential. Therefore every block can be ordered without any conflicts. During the iteration over the blocks, the cycle budget is made smaller for every individual block. Gradually, more conflicts will have to be introduced. The storage bandwidth optimization approach decides which block(s) are reduced in local cycle budget and so which conflicts are added globally. Finally, after multiple steps of decreasing the budgets for the blocks, the global cycle budget is met and a global conflict graph is produced.
  • the cost-cycle budget curve ( 100 ) in FIG. 7 a is generated in such a way that the computation of point ( 120 ) on said curve is based on a previously computed point ( 110 ) on said curve.
  • a previously computed point ( 110 ) on said curve With based on is meant that part of the computations done for point ( 110 ) are stored and re-used for computing point ( 120 ). Due to the nature of the storage bandwidth optimization problem, preferably said previously computed curve has a large cycle budget.
  • said previously determined point ( 110 ) can be an initial point or just a point along the curve.
  • FIG. 7 b shows a flow-chart, wherein an initial computation part ( 200 ) and a computation part ( 210 ) over which an iteration ( 300 ) is performed.
  • Said iteration is however not a mere repetition of said computation part ( 210 ). While executing said computation part information is stored ( 410 ) on storage means ( 400 ) and said information is re-used ( 420 ) in the next iteration. Said information can be final ordering of a blocks memory accesses obtained during executing said computation part ( 210 ), for instance for re-initialization of the next iteration or information on already selected access conflicts for re-use purposes in the conflict cost model or indications on which blocks are re-scheduled during such execution. Note that within said iteration loop a decision ( 220 ) is needed to select whether another point of the curve must be computed or not. A possible way is that the circuit designer inputs a range for the cycle budget or that the method itself generates a flag, indicating that it is of no use to go to lower cycle budgets as no feasible solutions exist.
  • said cost-cycle budget curve is an optimized cost-cycle budget.
  • One does not present a cost for a cycle budget but the lowest possible cost for such a cycle budget.
  • said cost-cycle budget is an optimal cost-cycle budget.
  • said cost-cycle budget is a Pareto optimal curve, meaning that no lower cost is possible for a given cycle budget and for a specified cost no smaller cycle budget is possible.
  • the presented optimization approach contains heuristics and hence only an approximation ( 100 ) of a Pareto optimal curve ( 500 ) is obtained, hence the terminology near Pareto optimal cost-cycle budget curve.
  • the proposed method initializes with a sequential ordering. Every memory access has its own time slot. A block containing X memory accesses will have an initial cycle budget of X cycles (assuming one cycle per memory access). Due to this type of ordering, the global conflict graph will not contain any conflicts. In the successive steps of the algorithm the global budget will shrink. Every step, (at least) one of the blocks will reduce in length. The reduction of the block is by a predetermined amount but at least one cycle. Said predetermined amount can for instance by selected by the system designer depending on a trade-off between design speed (completion of the method) and accuracy. A large predetermined amount will increase the design speed but reduce the accuracy.
  • Said predetermined amount can also be selected by the computation system itself based on estimations of the effect of the related block cycle reduction, estimation being determined with easier, more approximate methods than the method step itself, of course.
  • FIG. 4 shows at the left side code ( 10 ) describing the digital system.
  • three blocks ( 11 )( 12 )( 13 ) located each within a loop (for constructs) are recognized.
  • a non-conflicting for instance sequential scheduling ( 60 ) as indicated in FIG. 4 with an overall cycle budget of 95 cycles and a scheduling of the instructions of block ( 11 ) in 5 cycles, resulting due to the 5 times iterations in an overall cycle budget for that block of 25.
  • step ( 775 , FIG. 8 b ) 51 , FIG. 3). Said step will be repeated until the overall cycle budget is equal or lower than the target cycle budget, here 75 cycles as specified by the condition ( 52 , FIG. 3) and the budget cycle condition in FIG. 8 b.
  • an empty set of re-usable access conflicts is determined for at least one of said blocks and then while performing said step of optimizing access conflicts one takes into account the re-usable access conflicts within said sets of re-usable access conflicts not related to the selected block, being the block which block cycle budget reduction currently under investigation.
  • the set of re-usable access conflicts of said selected block is updated.
  • scheduling intervals of blocks from which access conflicts within said blocks set of re-usable access conflict are not re-used are not modified during said step of optimizing access conflicts. Note that deciding not to re-use an access conflict can easily be determined by comparing the conflict graphs of the two related blocks. When an access conflict connects at least one basic group which is not accessed by one of said related blocks, then said access conflict is not re-usable and hence not-reused.
  • the ordering freedom is limited.
  • the returned ordering freedom to a block is based on the final ordering of the previous step.
  • the memory access is scheduled between the ASAP and ALAP time. Both the ASAP and ALAP are put close to the location of the previous ordering. This is a first aspect of the incremental nature of the method.
  • the determining of optimized scheduling intervals steps are thus initialized with scheduling intervals substantially near but a bit larger than the previously determined scheduling intervals, determined in an earlier iteration.
  • the algorithm is speed up further by reusing ordering results which did not change. Since much ordering information is discarded in a step, this does not mean it is useless. By keeping track of which blocks have to be rescheduled, the tool execution time can be decreased drastically. This happens especially in large applications containing many independent blocks. This keeping track and hence storage of said change/non-change information is a second aspect of incremental nature of the method.
  • FIG. 8 c shows the optimized memory organization determination method, comprising the steps of initial scheduling and deriving of an initial block cycle budget ( 1550 ) (which can be access conflict free in an embodiment of the invention), temporality reducing a block cycle budget, determining optimized scheduling intervals and computing of the overall cycle budget ( 1600 ).
  • Said step ( 1600 ) is done in a loop, such that this is executed for substantially all blocks.
  • the method comprises further of the steps of (finally) reducing the block cycle budget 1700 .
  • the steps 1550 , 1600 , 1700 are done in another loop, as long if the overall cycle budget is larger than the predetermined overall cycle budget.

Abstract

A system and method for determining optimized scheduling intervals and optimized access conflicts and for determining an optimized memory organization of an essentially digital device. The system includes an optimizer for determining an optimized scheduling of the data access instructions for a plurality of disjunct code blocks, wherein each of the code blocks include part of the data access instructions. The system performs an iterative process of successively reducing the cycle budget for selected blocks and modifying the scheduling of the selected blocks until a cumulative cycle budget for all of the blocks is met.

Description

    FIELD OF THE INVENTION
  • The invention relates to methods for designing essentially digital devices. [0001]
  • BACKGROUND OF THE INVENTION
  • An essentially digital device comprises at least of a memory organization (an amount of memories with their sizes and an interconnection pattern) and registers. Such a memory organization is determined during the design process of said digital device. The operation of an essentially digital system can essentially be described as a set of data access operations or instructions on data structures or variables, being stored in said memories. [0002]
  • In [L. Stok, Data path synthesis, integration, the VLSI journal, Vol.18, pp.1-71, June 1994.] register allocation, starting from a fully scheduled flow graph (thus ordered data access operations or instructions are used as input), are presented. Said allocation techniques are scalar oriented. Many of these techniques construct a scalar conflict or compatibility graph and solve the problem using graph coloring or clique partitioning. This conflict graph is fully determined by the schedule which is fixed before. This means that no effort is spent to come up with an optimal conflict graph and thus the potential optimization by reconsidering the schedule is not exploited. Moreover only register allocation is addressed and not memories. [0003]
  • In the less explored domain of memory allocation and assignment for hardware systems, the current techniques start from a given schedule [L. Ramachandran, D. Gajski, V. Chaiyakul, An algorithm for array variable clustering, Proceedings European Design and Test Conference, pp.262-266, Paris, Mar. 1994.], [P. Lippens, J. van Meerbergen, W. Verhaegh, A. van der Werf, Allocation of multiport memories for hierarchical data streams, Proceedings IEEE International Conference on Computer-Aided Design, pp.728-735, Santa Clara, Nov. 1993.], [O. Sentieys, D. Chillet, J. P. Diguet, J. Philippe, Memory module selection for high-level synthesis, Proceedings IEEE workshop on VLSI signal processing, Monterey Calif., Oct. 1996.] or perform first a bandwidth estimation step [F. Balasa, F. Catthoor, H. DeMan, Dataflow-driven memory allocation for multi-dimensional processing systems,” Proceedings IEEE International Conference on Computer Aided Design}, San Jose, Calif., Nov. [0004] 1994.] which is a kind of crude ordering that does not really optimize the conflict graph either. These techniques have to operate on groups of signals instead of on scalars to keep the complexity acceptable.
  • In the parallel compiler domain [M. Al-Mouhamed, S. Seiden, A heuristic storage for minimizing access time of arbitrary data patterns, IEEE Transactions on Parallel and Distributed Systems, Vol.8, No.4, pp.441-447, Apr. 1997.] proposes a technique to partition arrays into groups of data that have to be assigned to different memories such that they can be accessed simultaneously for an SIMD architecture. They combine the constraints of a number of given access patterns into a single linear address transformation that calculates for every data element the memory in which it should be stored to minimize the total access time. This technique allows to avoid the allocation of multi-port memories for storing data with self-conflicts, by explicitly splitting arrays into smaller arrays that can be assigned to single port memories. However said method does not exploit all optimization opportunities for instance by rescheduling data access instructions. [0005]
  • [S. Pinter, Register allocation with instruction scheduling: a new approach, ACM SIGPLAN Notices, Vol.28, pp.248-257, June 1993.] optimizes a conflict graph in the context of scalar register allocation by removing weighted edges in a coloring problem prior to scheduling. However, the conflicts in their initial conflict graph are determined by the sequential ordering of the input code. Also this idea was not applied to groups of scalars. [0006]
  • The Improved Force Directed Scheduling (IFDS) [W. Verhaegh, P. Lippens, E. Aarts, J. Korst, J. van Meerbergen, A. van der Werf, Improved force-directed scheduling in high-throughput digital signal processing, IEEE Transactions on CAD and Systems, Vol.14, No.8, Aug. 1995.] shows a method wherein scheduling intervals are gradually reduced until the desired result is obtained. The cost function used to determine which scheduling interval has to be reduced at each iteration only takes the number of parallel data accesses to reduce the required memory bandwidth into account. (1)FDS does not take into account which data is being accessed. Balancing the number of simultaneous data accesses is a local optimization which can be very bad globally. In IFDS all data is treated equally, although in practice some simultaneous data accesses are more expensive in terms of memory cost than other. Also the required number of memories cannot be estimated accurately by looking locally only, as is done in IFDS, because all conflicts have to be considered for this. [0007]
  • SUMMARY OF THE INVENTION
  • One aspect of the invention includes a method of determining optimized scheduling intervals and optimized access conflicts useful for determining an optimized memory organization of an essentially digital device, the method comprising: determining an initial scheduling of the data access instructions for a plurality of disjunct blocks, wherein each of the blocks include part of the data access instructions, and wherein at least one of the blocks is executed a plurality of times that is defined by an iteration count, deriving from the initial scheduling an initial block cycle budget for each block, while the overall cycle budget for performance of the digital device is larger than a predetermined overall cycle budget, repeating the method comprising: (a) for substantially all of the blocks, performing the method comprising: temporarily reducing a block cycle budget for a selected block by a predetermined amount; determining optimized scheduling intervals of the data access instructions such that the performance of the digital device is guaranteed to be within the block cycle budgets, wherein determining the optimized scheduling intervals comprises optimizing access conflicts with respect to an evaluation criterion related to the memory cost of the digital device; computing the overall cycle budget resulting from the optimized scheduling intervals; and (b) reducing the block cycle budget for at least one selected block, the selection of the block being based at least in part upon the memory cost and an overall cycle budget reduction. [0008]
  • Another aspect of the invention includes a method of determining a cost-cycle budget curve for an essentially digital device that is represented by a digital representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times that is indicated by an iteration count, the method comprising: generating a cost-cycle budget curve that compares the cost of a memory organization of the digital device versus the cycle budget, wherein the cost-cycle budget curve is incrementally generated. [0009]
  • Another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that execution of the functionality with the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: determining block cycle budgets while optimizing scheduling intervals. [0010]
  • Yet another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that the performance of the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: (a) determining initial block cycle budgets such that no access conflict exists between the data access instructions within each block; (b) temporarily reducing, in an iterative process, a block cycle budget by a predetermined amount, and computing the access conflict cost and overall cycle budget reduction of such reduced block cycle budget; (c) reducing the block cycle budget for at least one selected block; and (d) returning to act (b) if the overall cycle budget for execution of the functionality with the digital device is larger than the predetermined overall cycle budget. [0011]
  • Yet another aspect of the invention includes a method optimizing the scheduling of data instructions, the method comprising: determining a scheduling of data instructions for a plurality of blocks; determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; calculating a cumulative cycle budget for the blocks; and repeating the identifying, reducing and modifying, wherein the modifying includes modifying a modified scheduling until the cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget. [0012]
  • Yet another aspect of the invention includes a system for optimizing the scheduling of data instructions, the system comprising: means for determining a scheduling of data instructions for a plurality of blocks; means for determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; means for identifying one of the blocks for a cycle budget reduction; means for reducing the cycle budget of the identified block; and means for modifying the scheduling and cycle budget of at least one of the blocks until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget. [0013]
  • Yet another aspect of the invention includes a digital device having an optimized memory organization, wherein the design of the memory organization is generated by the method comprising: determining an initial scheduling of data instructions for a plurality of blocks, wherein the data instructions are to be executed by the digital device; determining a cycle budget for substantially each of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of the block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; repeating the identifying, reducing and modifying acts, wherein the modifying act includes modifying a modified scheduling, until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget, and wherein the modified scheduling is used to define the design of a memory organization for the digital device.[0014]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows at the left side code representing the functionality of a digital system to be designed. The overall conflict graph for each of said schedules and a memory organization of said digital device being compatible with said graph are shown in row two and three respectively. [0015]
  • FIG. 2 shows at the left side code representing the functionality of a digital system to be designed. In the local optimization column is an optimization method shown which optimizes each of the blocks in said code separately. In the global optimization column is an optimization method shown which optimizes the block all together. [0016]
  • FIG. 3 shows source code type description of the invented method for performing a global optimization of the storage bandwidth optimization. [0017]
  • FIG. 4 shows at the left side again code. The right side shows the cycle budget distribution over the blocks changes while optimizing. For left to right the result of more optimized schedulings are shown in terms of cycle and the approaching of the target cycle budget. [0018]
  • FIG. 5 shows a cost, here energy, cycle budget curve being obtained while executing the method in an incremental way. [0019]
  • FIG. 6 shows the cycle budget reduction while optimizing and the relation with the cost figures (being actual and estimated power consumption here). [0020]
  • FIG. 7[0021] b shows a flow graph, indicating the storage of previously determined information for use in next iterations.
  • FIG. 7[0022] a shows a cost-cycle budget curve and the incremental approach, determining a point of the curve, based on information obtained while determining a previous point with a larger cycle budget.
  • FIG. 8[0023] a shows a flow chart of an approach wherein a preprocessing for block cycle distribution is done.
  • FIG. 8[0024] b shows a flow chart showing the incremental approach.
  • FIG. 8[0025] c shows another flow chart showing the incremental approach, wherein explicit the information flow from one optimization (stored) to a next optimization (loading) and the generation of points on the pareto curve.
  • FIG. 9 is a flow chart illustrating a process of optimizing a memory organization.[0026]
  • DETAILED DESCRIPTION OF THE INVENTION
  • The invention presents a method and a design system incorporating said method for designing essentially digital devices with a hierarchical flow graph representation, said method incorporating partly a design method, suited for designing essentially digital devices with a flat flow graph representation, the latter design method being disclosed in U.S. patent application Ser. No. 09360140, herein fully incorporated by reference. Devices with a hierarchical flow graph representation are defined as devices with a functional representation containing loops and/or function calls. [0027]
  • With the invention a memory organization for an essentially digital device is obtained starting of from a system description. Such a system description can be a source level description but is not limited thereto. The hierarchical nature of the digital device representation result in a storage bandwidth optimization step, producing a memory cost versus cycle budget curve, from which a memory organization can be selected for instance by the system designer as indicated in FIG. 9. [0028]
  • The main goal of the storage bandwidth optimization inspired design methods, both for devices with a flat- and hierarchical graph representation, is to find which basic groups, being groups of scalar signals, such as arrays, must be stored in different memories, being part of the memory organization of said digital system under design, in order to meet the cycle budget, e.g. resulting in completion of the functionality to be executed by said digital device within a predetermined amount of time. Note in FIG. 1 a source type description wherein arrays A, B and C are accessed, meaning being read (denoted R(A)) and/or values being stored in said arrays (denoted W(A)). The method works at the level of basic groups, meaning that the arrays will be evaluated as a whole. The individual signals defined by the arrays elements are not considered separately, meaning that an access to A[1] and A[2] are considered both as an access to A. Splitting of arrays is of course possible but the main concept is that groups of signals such as array elements are considered. When the basic groups are defined and a scheduling of the data access instructions, meaning indicating in which order they will be executed, is given, one can by inspection of that schedule determine whether access conflicts exist. There is an access conflict between two basic groups, indicated by a line or arrow in a conflict graph ([0029] 2 e row in FIG. 1) when in the schedule simultaneous accesses to said basic groups are found. Although such simultaneous accesses can be beneficial from the point of view of duration of execution as a shorter execution time is obtained, there is a memory cost involved. Indeed when a memory constellation or organization is selected wherein said basic groups can be stored when using a certain schedule, it is observed that conflicting basic groups must be assigned to separate memories. Comparing of the first and second column of FIG. 1 shows this. Note in column 3 a self-conflict for basic group A resulting in a two port memory wherein A is stored. The data access possibilities are indicated by an arrow in row 3. A single double sided arrow indicates a single port memory with read/write possibilities. A single sided arrow indicates a port with either a read or write possibility, hence two single sided arrows indicated a dual port memory with a port with read and a port with write possibilities. Ordering the memory accesses, meaning determining scheduling intervals, being time intervals, for the data access instructions on basic groups, within a certain number of memory cycles determines the required memory bandwidth. Note that with memory is meant any type of storage means used within digital systems or devices.
  • If two memory accesses are ordered in the same memory cycle, they are in conflict and parallelism is required to perform both accesses. These simultaneous memory accesses can be done in two different memories, or, if it is the same array in a dual port memory. Thus, the conflict constrains the signal to memory assignment and incurs a certain cost. A tradeoff has to be made between the performance gain of every conflict and the cost it incurs. On the one hand, every array in a separate memory is optimal for speed and seemingly also for power. But having many memories is very costly for area, interconnect and complexity and due to the routing overhead also for power in the end. Due to presence of the real time constraints and the complex control and data dependencies, a difficult tradeoff has to be made. [0030]
  • The invention relates to a method for determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of said digital device, said representation comprising data access instructions on basic groups, being groups of scalar signals, said data access instructions having scheduling intervals, said optimized memory organization being such that execution of said functionality with said digital device is guaranteed to be within a predetermined overall cycle budget. [0031]
  • Note that the concept of memory cycles does not have to equal the (data-path) clock nor the maximum access frequency of the memory, it is only used to describe the relative ordering of memory accesses. Every memory access is assumed to take up an integer number of abstract cycles. A memory access can only take place in a cycle after any access on which it depends. The found ordering does not necessarily have to match exactly with the final schedule (complete scheduling, including data-path related issues). It is produced to make sure that it is possible to meet the real time constraints with the derived memory architecture. Hence the scheduling obtained with the method can also be denoted a partial ordering or partial scheduling, meaning that said scheduling intervals are typically larger than the amount of cycles needed for performing the memory access. [0032]
  • The flat flow graph design method disclosed in U.S. Ser. No. 09360140 dealt with a single (flat) flow graph within the given time constraints. The flow graph is for instance extracted from a C input description. Internally, it constructively generates a (partial) memory access ordering steered by a sophisticated cost model which incorporated global tradeoffs and access conflicts over the entire algorithm code. The cost can include memory size/area and the cost thereof, power consumption of the device and mainly the memory related power consumption, latency and all possible cost factor, for which estimations can be made from a high level description. The memory related power consumption cost model is typically based on memory size, access frequency and other high level estimates (e.g., possibilities for array size reduction). The technique used for this is to order the memory accesses in a given cycle budget by iteratively reducing the intervals of every memory access (starting with ASAP and ALAP). The interval reductions are driven by the probability and cost of the potential conflicts between the accesses. An Extended Conflict Graph (ECG) is generated which follows out of the memory access ordering. It contains the conflicting arrays which are accessed simultaneously. These arrays need to be stored in different memories. Note that many possible orderings (and also schedules) are compatible with a given ECG. A consolidation of the memory organization is needed in the subsequent memory allocation and assignment step. [0033]
  • A method for determining an optimized memory organization for essentially digital systems with loops or function calls, hence having a hierarchical flow graph representation, hence a non-flat flow graph, is now presented. Said method aims at avoiding the destruction of hardware or code reuse possibilities present in the original specification while performing a storage bandwidth optimization step. Further said method provides an approach with a complexity being far less than the complexity than would be obtained when applying the method, being dedicated for systems without loops or function calls, on a specification with unrolled loops and inlined function calls. [0034]
  • The application code can be considered to be partitioned in blocks corresponding to function bodies, loop bodies and conditional branches. Each statement of the code belongs to one and only one block. A block can be defined as a code part which contains a flat flow graph. It can contain multiple basic blocks and condition scopes. Even the function hierarchy can be adapted to the needs of ordering freedom. In practice, the term loop body and block coincide; all loop bodies are defined as a separate block and the body of a nested loop is part of an other block. It is assumed that at least one of said disjunct blocks, each block including part of said data access instructions, is executed at least a plurality of times, as indicated by the iteration count, associated to each block. [0035]
  • A method is presented for determining an optimized memory organization of an essentially digital device represented by a data access instruction representation describing the functionality of said digital device, said representation comprising of a plurality of disjunct blocks, each block including part of data access instructions, at least one block being executed at least a plurality of times, indicated by said blocks iteration count. [0036]
  • Because the number of iterations to the blocks is mostly different, the storage bandwidth optimization problem is increased in two directions. First, the distribution of the global cycle count over the blocks need to be found within the timing constraints and while optimizing a cost function. Second, a single memory architecture must satisfy the constraints of all blocks, and therefore the global (common or shares for essentially all blocks) extended conflict graph cost must be minimized. [0037]
  • In U.S. Ser. No. 09360140 the distribution of the global cycle count over the blocks was performed as a preprocessing step ([0038] 750) as shown in FIG. 8a before the extended conflict graph cost being minimized. Combining all the conflicts of the locally optimized blocks in the global conflict graph can lead in certain cases to a poor result for instance in cases where reuse of the same conflict over different blocks is essential. The method now presented is intended for improving storage bandwidth optimization for said cases without the need for performing a preprocessing distribution step and still with acceptable computational complexity.
  • It is assumed that an overall throughput constraint (maximal or average) is put forward for the entire application. For instance, in a video application the timing constraint is 40 ms to arrive at 25 frames per second. Sometimes, additional timing constraints are given. [0039]
  • The distribution of the cycles over the blocks is crucial. A wrong distribution will produce a too expensive memory architecture because the memory access ordering cannot be made nicely in some of the blocks while there are cheap cycles available in other blocks. Every single block affects the global cost of the memory subsystem. Therefore, if cycle budget of one block is too tight (while there is space in other blocks) it will cause additional cost. The global cycle budget can be distributed over different blocks in many different ways, as shown in FIG. 1. At the left side of the figure a program at the left side, in code type format representing the functionality of a digital system to be designed, is given containing 3 consecutive loops (for construct), defining 3 blocks (within the scope of the for-loop), to be ordered in 500 cycles (hence defining the overall cycle budget given for executing said digital device functionality). The table of FIG. 1 shows three different distributions with each a different cycle budget distribution over said blocks and a good ordering/schedule matching the distribution. The overall conflict graph for each of said schedules and a memory organization of said digital device being compatible with said graph are shown in row two and three respectively. The resulting conflict graph and cheapest memory architecture is given in the last two rows. Obviously, the second solution (loop-[0040] i 2 cycles, loop-j 1 cycle and loop-k 2 cycles) is the cheapest solution. The very poor third distribution even forces a dual-ported memory (due to assignment of one cycle for loop-i). The illustration here is based on a simplified example to show the problem. But in fact, the problem in real-life applications is much more difficult. First, because more signals, accesses and blocks are involved, the number of different possible distributions increases. Second, the number of block iterations will not be equal for each block. The impact on the global time elapse of one block is much bigger than another block. Hence, there is more freedom in the search space. FIG. 1 indicates that different cycle budget distributions can have an enormous effect on the conflict graph and the cost of the memory organization.
  • A global optimization over all blocks is needed to obtain the global optimal conflict graph. Ordering the memory accesses on a block per block basis will result in a poor global result. The local solution of one block will typically not match the (local) solutions found in other blocks. Together, the local optima will then sum up to an expensive global solution as shown in FIG. 2. Solving the problem locally per block will lead to different conflict graphs. The total application has one memory architecture and therefore one global conflict graph only. The memory architecture cannot change from one block to the next. Therefore, all the block (local) conflicts graphs should be added to one single global conflict graph. A typical conflict mismatch is shown in the left side of FIG. 2. A fully connected global graph is the result, requiring four memories. Ordering the memory accesses with a global view can potentially “reuse” conflicts over different blocks. When the different blocks use the same conflicts, as shown at the right hand side of FIG. 2, the global conflict graph and the resulting memory architecture are much cheaper. Again, the simpler example shows the essence of the problem. However, the real problem is much more complex. [0041]
  • FIG. 2 thus compares a local optimization method shown which optimizes each of the blocks in said code separately, leading to a completely covered conflict graph and an expensive memory organization with the invented global method which optimizes the block all together, leading to conflict re-use, and hence a conflict graph with less conflicts and a cheaper memory organization. [0042]
  • A method is presented for determining an optimized memory organization of an essentially digital device represented by an appropriate representation, the method determines block cycle budgets while optimizing said scheduling intervals. Hence no preprocessing step for block cycle budget distribution (as shown in FIG. 8[0043] a, (750)) is needed. Instead during calculation for each overall cycle budget obtained one can assign for each block the related cycle budget selected in (700). Note that the decision making (700) on which block the cycle budget reduction will be applied and hence using the related scheduling for said block as determined in (600) is performed after optimization of each block.
  • Moreover for substantially all said blocks said combined or interleaved block cycle determination and optimizing said scheduling intervals exploit the same global conflict graph to be optimized. Thus for substantially all said blocks, one select a block, denoted the selected block, one determines optimized scheduling intervals of said data access instructions such that execution of said functionality with said digital device is guaranteed to be within all the block cycle budgets, said block cycle budgets are fixed at that moment except for the block cycle budget of the selected block which is reduced, said optimizing access conflicts within a single global conflict graph with respect to an evaluation criterion related to the global memory cost of said digital device. [0044]
  • The method suited for digital systems with a hierarchical flow graph representation, generating a storage cycle budget distribution over multiple blocks, incorporates a method suited for digital systems with an essentially flat flow graph representation, more in particular by placing said latter method in a loop, iterating over substantially all the blocks. As motivated before as such, this could leads to a poor global optimum. Therefore a special way of iterating over said blocks is performed. Indeed one starts with initially large block cycle budgets, the value of said block cycle budget for instance being such that between the data access instructions within one blocks no access conflict exist. In this example one derives from said no access conflict condition said initial block cycle budgets. The total cycle budget, being defined by said block cycle budgets and the iteration count of said blocks, will usually be larger than the predetermined overall cycle budget wanted. Said iteration over said blocks is not so that first the minimal block cycle budget of a first block is determined and thereafter for another block. Instead a limited and temporarily, hence not yet approved, block cycle budget reduction of essentially all said blocks is examined, meaning its influence on the access conflict cost is determined, then at least for one of said blocks its cycle budget reduction is applied, and the procedure is started all over again. [0045]
  • A method is presented, as shown in FIG. 8[0046] b, for determining an optimized memory organization of an essentially digital device represented by an appropriate representation, the method comprises the steps of determining initial block cycle budgets, for instance such that between the data access instructions within each block no access conflict exist (550) and as long as the overall cycle budget for execution of said functionality with said digital device is larger than said predetermined overall cycle budget (substep 1) iteratively performing over substantially all said blocks a temporarily reducing the selected block cycle budget with a predetermined amount (600)(610), computing the access conflict cost and overall cycle budget reduction, (substep 2) applying for at least one selected block said block cycle budget reduction (700) and returning to substep 1. Note that (775) in FIG. 8b can in an embodiment of the invention be the block (210) of FIG. 7b.
  • It must be understood that the initialization step ([0047] 550) determines for each block a first initial cycle budget, denoted 1 e cycle budget. In step (600) for one block its cycle budget is reduced hence one obtains a 2 e cycle budget lower than the 1 e cycle budget and an associated cost due to that reduction. For the other blocks their cycle budget is not reduced if they are not the block under consideration in the step. The process of reducing for one block the cycle budget is done for essentially each block via iteration (610). So for each block a 2 e cycle budget is obtained and an associated cost. By comparing in step (700) the gain reduction of the overall cycle budget, being affected by the block cycle budget reduction, with the conflict cost increase, one determines for which block the cycle budget is approved. For that block and only that block (in a single block embodiment) the 2 e cycle budget becomes the 1 e cycle budget while the others remain at their 1 e cycle budget. So one performs an update of single block cycle budget.
  • FIG. 3 shows a source code description of the method for performing a global optimization of the storage bandwidth optimization approach. The global optimization is also an incremental approach starting off with a too large target budget, computing the associated minimal cost and reducing said target budget until the cycle budget requirement is met. Note that for optimization of hierarchical code, hence code with blocks due to loops or function calls, a so-called flat graph like optimizer, used for optimization of flat code, hence code without blocks can be used. However this flat graph solver is not used with a fixed cycle budget distribution over said blocks. Instead cycle budgets for each of said blocks are generated while optimizing. Important is to notice that there is a gain/cost analysis for essentially all blocks and only thereafter a decision for typical one block. [0048]
  • Note that although the method steps ([0049] 53) with the method step (51) are done for essentially all blocks there is actually an update for less blocks (one or more) based on gain/cost analysis (700, FIG. 8b). FIG. 4 indicates that after a first iteration (80) block (11) its cycle budget is reduced from 5 to 4 (resulting due to the 5 times iterations in an overall cycle budget for that block of 20) while the other blocks cycle budget are unchanged. In the next two iterations block (13) its cycle budget are reduced. The method step (600) of FIG. 8b is performed by executing a flat-graph type optimizer (90) (as shown in the FIG. 3 code).
  • The design method is an incremental approach, reducing the global cycle budget every step until the target cycle budget is met. Initially, for instance the memory access ordering is sequential. Therefore every block can be ordered without any conflicts. During the iteration over the blocks, the cycle budget is made smaller for every individual block. Gradually, more conflicts will have to be introduced. The storage bandwidth optimization approach decides which block(s) are reduced in local cycle budget and so which conflicts are added globally. Finally, after multiple steps of decreasing the budgets for the blocks, the global cycle budget is met and a global conflict graph is produced. [0050]
  • With incremental is meant that the cost-cycle budget curve ([0051] 100) in FIG. 7a, is generated in such a way that the computation of point (120) on said curve is based on a previously computed point (110) on said curve. With based on is meant that part of the computations done for point (110) are stored and re-used for computing point (120). Due to the nature of the storage bandwidth optimization problem, preferably said previously computed curve has a large cycle budget. Note that said previously determined point (110) can be an initial point or just a point along the curve. FIG. 7b shows a flow-chart, wherein an initial computation part (200) and a computation part (210) over which an iteration (300) is performed. Said iteration is however not a mere repetition of said computation part (210). While executing said computation part information is stored (410) on storage means (400) and said information is re-used (420) in the next iteration. Said information can be final ordering of a blocks memory accesses obtained during executing said computation part (210), for instance for re-initialization of the next iteration or information on already selected access conflicts for re-use purposes in the conflict cost model or indications on which blocks are re-scheduled during such execution. Note that within said iteration loop a decision (220) is needed to select whether another point of the curve must be computed or not. A possible way is that the circuit designer inputs a range for the cycle budget or that the method itself generates a flag, indicating that it is of no use to go to lower cycle budgets as no feasible solutions exist.
  • Note that said cost-cycle budget curve is an optimized cost-cycle budget. One does not present a cost for a cycle budget but the lowest possible cost for such a cycle budget. Hence said cost-cycle budget is an optimal cost-cycle budget. One can state even that said cost-cycle budget is a Pareto optimal curve, meaning that no lower cost is possible for a given cycle budget and for a specified cost no smaller cycle budget is possible. Naturally the presented optimization approach contains heuristics and hence only an approximation ([0052] 100) of a Pareto optimal curve (500) is obtained, hence the terminology near Pareto optimal cost-cycle budget curve.
  • As an example of the proposed method initializes with a sequential ordering. Every memory access has its own time slot. A block containing X memory accesses will have an initial cycle budget of X cycles (assuming one cycle per memory access). Due to this type of ordering, the global conflict graph will not contain any conflicts. In the successive steps of the algorithm the global budget will shrink. Every step, (at least) one of the blocks will reduce in length. The reduction of the block is by a predetermined amount but at least one cycle. Said predetermined amount can for instance by selected by the system designer depending on a trade-off between design speed (completion of the method) and accuracy. A large predetermined amount will increase the design speed but reduce the accuracy. Said predetermined amount can also be selected by the computation system itself based on estimations of the effect of the related block cycle reduction, estimation being determined with easier, more approximate methods than the method step itself, of course. Combination of designer selection and steering by the computation system itself, such as selection of a range by said designer wherein said computation system can choose the block cycle reduction step, is also possible. [0053]
  • However, depending on the number of iterations of the concerning block, the impact on the global budget is much bigger. The global conflict cost change is calculated in the case of the block reduction. But the reduction is not approved yet. The block(s) having the biggest gain (based on a change in total cycle budget and/or change in conflict cost) is actually reduced in size. All the other ordering results in this step are discarded. The cycle budget reduction is continued until the target cycle budget is reached. Due to the block cycle budget reduction, additional conflicts arise. Note that this basic algorithm is greedy, since only a single path is explored to reach the target cycle budget. The traversal of the cycle search space can be made less greedy however. At every step, multiple reduction possibilities exist. Instead of discarding non-selected block ordering information (as proposed in the previous paragraph), these can be selected and explored further. Many of the branches will be equivalent to the already found greedy solutions. Due to this property, the exploration will not explode but still extra solutions can be found. Moreover, a lower bound can also cut off some of the potential branching paths. Note however, due to the heuristics in the flat graph solver, the solution may be different even though the distribution of cycles over the blocks is equal. In this way, the longer the tool will run, the more and (maybe) better solutions can be found. [0054]
  • FIG. 4 shows at the left side code ([0055] 10) describing the digital system. Within said example code three blocks (11)(12)(13) located each within a loop (for constructs) are recognized. When executing the first step of the method (50) in FIG. 3, (550) in FIG. 8b, for each of said three blocks one obtains a non-conflicting for instance sequential scheduling (60) as indicated in FIG. 4 with an overall cycle budget of 95 cycles and a scheduling of the instructions of block (11) in 5 cycles, resulting due to the 5 times iterations in an overall cycle budget for that block of 25. Then one performs the step (775, FIG. 8b) (51, FIG. 3). Said step will be repeated until the overall cycle budget is equal or lower than the target cycle budget, here 75 cycles as specified by the condition (52, FIG. 3) and the budget cycle condition in FIG. 8b.
  • To further improve the global result and to avoid a long execution time and unstable behavior, two new inputs can be entered to the flat-graph scheduler. First of all, a list of reusable conflicts can be specified. The internal cost function is adapted to (re)use conflicts which are already used by other blocks if possible. [0056]
  • Before starting said repeating of said substeps, an empty set of re-usable access conflicts is determined for at least one of said blocks and then while performing said step of optimizing access conflicts one takes into account the re-usable access conflicts within said sets of re-usable access conflicts not related to the selected block, being the block which block cycle budget reduction currently under investigation. Finally in [0057] substep 2 the set of re-usable access conflicts of said selected block is updated. In a further embodiment scheduling intervals of blocks from which access conflicts within said blocks set of re-usable access conflict are not re-used are not modified during said step of optimizing access conflicts. Note that deciding not to re-use an access conflict can easily be determined by comparing the conflict graphs of the two related blocks. When an access conflict connects at least one basic group which is not accessed by one of said related blocks, then said access conflict is not re-usable and hence not-reused.
  • Second, the ordering freedom is limited. The returned ordering freedom to a block is based on the final ordering of the previous step. The memory access is scheduled between the ASAP and ALAP time. Both the ASAP and ALAP are put close to the location of the previous ordering. This is a first aspect of the incremental nature of the method. The determining of optimized scheduling intervals steps are thus initialized with scheduling intervals substantially near but a bit larger than the previously determined scheduling intervals, determined in an earlier iteration. [0058]
  • The algorithm is speed up further by reusing ordering results which did not change. Since much ordering information is discarded in a step, this does not mean it is useless. By keeping track of which blocks have to be rescheduled, the tool execution time can be decreased drastically. This happens especially in large applications containing many independent blocks. This keeping track and hence storage of said change/non-change information is a second aspect of incremental nature of the method. [0059]
  • FIG. 8[0060] c shows the optimized memory organization determination method, comprising the steps of initial scheduling and deriving of an initial block cycle budget (1550) (which can be access conflict free in an embodiment of the invention), temporality reducing a block cycle budget, determining optimized scheduling intervals and computing of the overall cycle budget (1600). Said step (1600) is done in a loop, such that this is executed for substantially all blocks. The method comprises further of the steps of (finally) reducing the block cycle budget 1700. The steps 1550, 1600, 1700 are done in another loop, as long if the overall cycle budget is larger than the predetermined overall cycle budget.
  • Due to the incremental nature, the steps in this last iteration loop, can exploit information from a previous iteration step by storing information of a current information and loading this for the next optimization. Each iteration of said second loop generates a couple of numbers, more in particular memory cost and cycle budget, defining a point on the Pareto curve. [0061]
  • The following references [Erik Brockmeyer, Arnout Vandecappelle, Sven Wuytack, Francky Catthoor “Low power storage cycle budget distribution tool support for hierarchical graphs” 13th international symposium on system synthesis (ISSS) Madrid, Spain, Sep. 20-22, 2000.], [Erik Brockmeyer, Arnout Vandecappelle, Francky Catthoor “Systematic Cycle budget versus System Power Trade-off: a New Perspective on System Exploration of Real-time Data-dominated Applications” International Symposium on Low Power Electronics and Design (ISLPED) pp. 137-142, Rapallo, Italy, Aug. 2000.], [Erik Brockmeyer, Arnout Vandecappelle, Sven Wuytack, Francky Catthoor “Low power storage cycle budget distribution tool support for hierarchical graphs”, 13th international symposium on system synthesis (ISSS) Madrid, Spain, Sep. 20-22, 2000.] each incorporated, in their entirety, by reference. [0062]

Claims (36)

What is claimed is:
1. A method of determining optimized scheduling intervals and optimized access conflicts useful for determining an optimized memory organization of an essentially digital device, the method comprising:
determining an initial scheduling of the data access instructions for a plurality of disjunct blocks, wherein each of the blocks include part of the data access instructions, and wherein at least one of the blocks is executed a plurality of times that is defined by an iteration count;
deriving from the initial scheduling an initial block cycle budget for each block;
while the overall cycle budget for performance of the digital device is larger than a predetermined overall cycle budget, repeating the method comprising:
(a) for substantially all of the blocks, performing the method comprising:
temporarily reducing a block cycle budget for a selected block by a predetermined amount;
determining optimized scheduling intervals of the data access instructions such that the performance of the digital device is guaranteed to be within the block cycle budgets, wherein determining the optimized scheduling intervals comprises optimizing access conflicts with respect to an evaluation criterion related to the memory cost of the digital device;
computing the overall cycle budget resulting from the optimized scheduling intervals; and
(b) reducing the block cycle budget for at least one selected block, the selection of the block being based at least in part upon the memory cost and an overall cycle budget reduction.
2. The method of
claim 1
, wherein the initial scheduling is such that no access conflict exists between the data access instructions within each block.
3. The method recited of
claim 1
, wherein the initial scheduling comprises a sequential ordering.
4. The method of
claim 1
, wherein the predetermined amount is at least 1.
5. The method of
claim 1
, additionally comprising:
before repeating the acts (a) and (b), determining an empty set of re-usable access conflicts for at least one of the blocks;
while optimizing access conflicts, utilizing re-usable access conflicts within the sets of re-usable access conflicts that are not related to the selected block; and
updating the set of re-usable access conflicts of the selected block in act (b).
6. The method of
claim 5
, wherein scheduling intervals of blocks from which access conflicts within the set of re-usable access conflict are not re-used are not modified while optimizing access conflicts.
7. The method of
claim 1
, additionally comprising selecting an optimized memory organization.
8. A method of determining a cost-cycle budget curve for an essentially digital device that is represented by a digital representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times that is indicated by an iteration count, the method comprising:
generating a cost-cycle budget curve that compares the cost of a memory organization of the digital device versus the cycle budget, wherein the cost-cycle budget curve is incrementally generated.
9. The method of
claim 8
, wherein the cost-cycle budget curve is near Pareto-optimal.
10. The method of
claim 8
, wherein generating a cost-cycle budget curve comprises:
determining an initial scheduling of the data access instructions such that between the data access instructions within each block no access conflict exists in each block and deriving therefrom for each block an initial block cycle budget;
computing a first overall cycle budget resulting from the initial schedule and a first memory organization in accordance with the initial scheduling of the data access instructions, the first overall cycle budget and the cost of the digital device with the first memory organization defining a first point on the cost-cycle budget curve;
while the overall cycle budget for execution of the functionality with the digital device is reducible, repeating the acts:
(a) for substantially all the blocks, performing the method comprising:
temporarily reducing the block cycle budget by a predetermined amount;
determining optimized scheduling intervals of the data access instructions such that execution of the functionality with the digital device is guaranteed to be within block cycle budgets, the determining of the optimized scheduling intervals comprising optimizing access conflicts with respect to an evaluation criterion that is related to the cost; and
computing an overall cycle budget resulting from the optimized scheduling, wherein computing utilizes at least in part the iteration count of the blocks;
(b) reducing the block cycle budget for at least one selected block; and
(c) adding the overall cycle budget and related cost to the cost-cycle budget curve.
11. The method of
claim 10
, wherein the initial scheduling is a sequential ordering.
12. The method of
claim 10
, wherein the predetermined amount is at least one.
13. The method recited of
claim 10
, additionally comprising:
before repeating the acts (a), (b), and (c), determining an empty set of re-usable access conflicts for at least one of the blocks;
while optimizing access conflicts, utilizing re-usable access conflicts within the sets of re-usable access conflicts not related to the selected block; and
updating the set of re-usable access conflicts of the selected block in act (b).
14. The method of
claim 13
, wherein scheduling intervals of blocks from which access conflicts within the blocks set of re-usable access conflict are not re-used and are not modified while optimizing access conflicts.
15. The method of
claim 8
, wherein the cost is selected from the group comprising: memory area/size cost, memory related power consumption of the digital device, latency of the digital device, area cost of the digital device, and combinations thereof.
16. The method of
claim 8
, wherein generating the cost-cycle budget curve initially generates high cycle budgets and progressively generates lower cycle budgets, wherein generating the lower cycle budget exploits the results from high cycle budget computations.
17. The method of
claim 16
, wherein the exploitation of higher cycle budget computations in lower cycle budget computations comprises re-using access conflicts.
18. A method of determining an optimized memory organization of an essentially digital device, wherein an optimized memory organization is determined based on the cost-cycle budget curve generated by the method of
claim 8
.
19. A method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that execution of the functionality with the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising:
determining block cycle budgets while optimizing scheduling intervals.
20. The method of
claim 19
, wherein determining the block cycle budget while optimizing the scheduling interval comprises, interleaving of determining block cycle budgets and optimizing scheduling intervals.
21. The method of
claim 19
, wherein the block cycle budgets are determined substantially simultaneously while the scheduling intervals are optimized.
22. The method of
claim 19
, wherein optimizing scheduling intervals comprises optimizing a global conflict graph that is shared by each of the blocks.
23. A method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that the performance of the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising:
(a) determining initial block cycle budgets such that no access conflict exists between the data access instructions within each block;
(b) temporarily reducing, in an iterative process, a block cycle budget by a predetermined amount, and computing the access conflict cost and overall cycle budget reduction of such reduced block cycle budget;
(c) reducing the block cycle budget for at least one selected block; and
(d) returning to act (b) if the overall cycle budget for execution of the functionality with the digital device is larger than the predetermined overall cycle budget.
24. A method optimizing the scheduling of data instructions, the method comprising:
determining a scheduling of data instructions for a plurality of blocks;
determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block;
identifying one of the blocks for a cycle budget reduction;
reducing the cycle budget of the identified block;
modifying the scheduling of at least one of the blocks;
calculating a cumulative cycle budget for the blocks; and
repeating the identifying, reducing and modifying, wherein the modifying includes modifying a modified scheduling until the cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
25. The method of
claim 24
, wherein the reducing the cycle budget of the identified block only modifies the scheduling of the identified block.
26. The method of
claim 24
, wherein the reducing the cycle budget of the identified block modifies the scheduling of instructions for substantially all of the blocks
27. The method of
claim 24
, wherein the initial scheduling are determined such that no access conflict exists in any of the blocks.
28. The method of
claim 24
, wherein the modifying the scheduling of at least one of the blocks increases the amount of access conflicts in at least one block when compared with the initial scheduling.
29. The method of
claim 24
, wherein the modifying the scheduling of at least one of the blocks results in an equal amount or more access conflicts when compared to the initial scheduling.
30. A system for optimizing the scheduling of data instructions, the system comprising:
means for determining a scheduling of data instructions for a plurality of blocks;
means for determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block;
means for identifying one of the blocks for a cycle budget reduction;
means for reducing the cycle budget of the identified block; and
means for modifying the scheduling and cycle budget of at least one of the blocks until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
31. The system of
claim 30
, wherein the means for reducing only modifies the scheduling of the identified block.
32. The system of
claim 30
, wherein the means for reducing modifies the scheduling of instructions for substantially each of the blocks
33. The system of
claim 30
, wherein the means for determining an initial scheduling enforces that no access conflict exists in any of the blocks.
34. The system of
claim 30
, wherein the means for modifying the scheduling increases the amount of access conflicts in at least one block when compared with the initial scheduling.
35. The system of
claim 30
, wherein the means for modifying scheduling results in an equal amount or more access conflicts when compared with the initial scheduling.
36. A digital device having an optimized memory organization, wherein the design of the memory organization is generated by the method comprising:
determining an initial scheduling of data instructions for a plurality of blocks, wherein the data instructions are to be executed by the digital device;
determining a cycle budget for substantially each of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of the block;
identifying one of the blocks for a cycle budget reduction;
reducing the cycle budget of the identified block;
modifying the scheduling of at least one of the blocks;
repeating the identifying, reducing and modifying acts, wherein the modifying act includes modifying a modified scheduling, until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget, and wherein the modified scheduling is used to define the design of a memory organization for the digital device.
US09/823,409 1998-07-24 2001-03-30 Method for determining an optimized memory organization of a digital device Expired - Fee Related US6449747B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/823,409 US6449747B2 (en) 1998-07-24 2001-03-30 Method for determining an optimized memory organization of a digital device

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US9412498P 1998-07-24 1998-07-24
US09/360,140 US6421809B1 (en) 1998-07-24 1999-07-23 Method for determining a storage bandwidth optimized memory organization of an essentially digital device
US09/823,409 US6449747B2 (en) 1998-07-24 2001-03-30 Method for determining an optimized memory organization of a digital device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/360,140 Continuation US6421809B1 (en) 1998-07-24 1999-07-23 Method for determining a storage bandwidth optimized memory organization of an essentially digital device

Publications (2)

Publication Number Publication Date
US20010052106A1 true US20010052106A1 (en) 2001-12-13
US6449747B2 US6449747B2 (en) 2002-09-10

Family

ID=22243214

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/360,042 Expired - Fee Related US6609088B1 (en) 1998-07-24 1999-07-23 Method for determining an optimized memory organization of a digital device
US09/360,140 Expired - Lifetime US6421809B1 (en) 1998-07-24 1999-07-23 Method for determining a storage bandwidth optimized memory organization of an essentially digital device
US09/823,409 Expired - Fee Related US6449747B2 (en) 1998-07-24 2001-03-30 Method for determining an optimized memory organization of a digital device

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US09/360,042 Expired - Fee Related US6609088B1 (en) 1998-07-24 1999-07-23 Method for determining an optimized memory organization of a digital device
US09/360,140 Expired - Lifetime US6421809B1 (en) 1998-07-24 1999-07-23 Method for determining a storage bandwidth optimized memory organization of an essentially digital device

Country Status (2)

Country Link
US (3) US6609088B1 (en)
EP (2) EP0974898A3 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149916A1 (en) * 2003-12-29 2005-07-07 Tatiana Shpeisman Data layout mechanism to reduce hardware resource conflicts
EP1583009A1 (en) * 2004-03-30 2005-10-05 Interuniversitair Micro-Elektronica Centrum Method and apparatus for designing and manufacturing electronic circuits subject to process variations
US20050235232A1 (en) * 2004-03-30 2005-10-20 Antonis Papanikolaou Method and apparatus for designing and manufacturing electronic circuits subject to process variations
US20060253204A1 (en) * 2004-03-30 2006-11-09 Antonis Papanikolaou Method and apparatus for designing and manufacturing electronic circuits subject to leakage problems caused by temperature variations and/or ageing
US20110191758A1 (en) * 2010-01-29 2011-08-04 Michael Scharf Optimized Memory Allocator By Analyzing Runtime Statistics
US8146034B2 (en) 2010-04-30 2012-03-27 International Business Machines Corporation Efficient Redundancy Identification, Redundancy Removal, and Sequential Equivalence Checking within Designs Including Memory Arrays.
US8181131B2 (en) 2010-04-30 2012-05-15 International Business Machines Corporation Enhanced analysis of array-based netlists via reparameterization
US8291359B2 (en) * 2010-05-07 2012-10-16 International Business Machines Corporation Array concatenation in an integrated circuit design
US8291400B1 (en) * 2007-02-07 2012-10-16 Tilera Corporation Communication scheduling for parallel processing architectures
US8307313B2 (en) 2010-05-07 2012-11-06 International Business Machines Corporation Minimizing memory array representations for enhanced synthesis and verification
US8336016B2 (en) 2010-05-07 2012-12-18 International Business Machines Corporation Eliminating, coalescing, or bypassing ports in memory array representations
US8478574B2 (en) 2010-04-30 2013-07-02 International Business Machines Corporation Tracking array data contents across three-valued read and write operations
US8566764B2 (en) 2010-04-30 2013-10-22 International Business Machines Corporation Enhanced analysis of array-based netlists via phase abstraction
US20150046913A1 (en) * 2013-07-09 2015-02-12 International Business Machines Corporation Data splitting for multi-instantiated objects
US9128844B2 (en) * 2012-12-14 2015-09-08 International Business Machines Corporation Enhancing analytics performance using distributed multi-tiering
CN113742080A (en) * 2020-09-10 2021-12-03 吕戈 Efficient construction method and device for immutable object execution environment

Families Citing this family (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7266725B2 (en) 2001-09-03 2007-09-04 Pact Xpp Technologies Ag Method for debugging reconfigurable architectures
DE19651075A1 (en) 1996-12-09 1998-06-10 Pact Inf Tech Gmbh Unit for processing numerical and logical operations, for use in processors (CPU's), multi-computer systems, data flow processors (DFP's), digital signal processors (DSP's) or the like
DE19654595A1 (en) 1996-12-20 1998-07-02 Pact Inf Tech Gmbh I0 and memory bus system for DFPs as well as building blocks with two- or multi-dimensional programmable cell structures
EP1329816B1 (en) 1996-12-27 2011-06-22 Richter, Thomas Method for automatic dynamic unloading of data flow processors (dfp) as well as modules with bidimensional or multidimensional programmable cell structures (fpgas, dpgas or the like)
US6542998B1 (en) 1997-02-08 2003-04-01 Pact Gmbh Method of self-synchronization of configurable elements of a programmable module
US8686549B2 (en) 2001-09-03 2014-04-01 Martin Vorbach Reconfigurable elements
DE19861088A1 (en) 1997-12-22 2000-02-10 Pact Inf Tech Gmbh Repairing integrated circuits by replacing subassemblies with substitutes
JP2003505753A (en) 1999-06-10 2003-02-12 ペーアーツェーテー インフォルマツィオーンステヒノロギー ゲゼルシャフト ミット ベシュレンクテル ハフツング Sequence division method in cell structure
US7212201B1 (en) 1999-09-23 2007-05-01 New York University Method and apparatus for segmenting an image in order to locate a part thereof
JP3722351B2 (en) * 2000-02-18 2005-11-30 シャープ株式会社 High level synthesis method and recording medium used for the implementation
US7010788B1 (en) * 2000-05-19 2006-03-07 Hewlett-Packard Development Company, L.P. System for computing the optimal static schedule using the stored task execution costs with recent schedule execution costs
DE50115584D1 (en) 2000-06-13 2010-09-16 Krass Maren PIPELINE CT PROTOCOLS AND COMMUNICATION
US7343594B1 (en) 2000-08-07 2008-03-11 Altera Corporation Software-to-hardware compiler with symbol set inference analysis
EP1356400A2 (en) 2000-08-07 2003-10-29 Altera Corporation Inter-device communication interface
US8058899B2 (en) 2000-10-06 2011-11-15 Martin Vorbach Logic cell array and bus system
US6865527B2 (en) * 2000-12-18 2005-03-08 Hewlett-Packard Development Company, L.P. Method and apparatus for computing data storage assignments
US7444531B2 (en) 2001-03-05 2008-10-28 Pact Xpp Technologies Ag Methods and devices for treating and processing data
US7844796B2 (en) 2001-03-05 2010-11-30 Martin Vorbach Data processing device and method
US9037807B2 (en) 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US7249242B2 (en) 2002-10-28 2007-07-24 Nvidia Corporation Input pipeline registers for a node in an adaptive computing engine
US7752419B1 (en) 2001-03-22 2010-07-06 Qst Holdings, Llc Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US7653710B2 (en) 2002-06-25 2010-01-26 Qst Holdings, Llc. Hardware task manager
US8843928B2 (en) 2010-01-21 2014-09-23 Qst Holdings, Llc Method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations
US6836839B2 (en) 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7962716B2 (en) 2001-03-22 2011-06-14 Qst Holdings, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US6889275B2 (en) * 2001-04-23 2005-05-03 Interuniversitaire Micro-Elektronica Centrum (Imec Vzw) Resource interconnection patterns in a customized memory organization context
US6577678B2 (en) 2001-05-08 2003-06-10 Quicksilver Technology Method and system for reconfigurable channel coding
WO2002103551A1 (en) * 2001-06-15 2002-12-27 Cadence Design Systems, Inc. Enhancing mergeability of datapaths and reducing datapath widths by rebalancing data flow topology
AU2002347560A1 (en) * 2001-06-20 2003-01-02 Pact Xpp Technologies Ag Data processing method
US7996827B2 (en) * 2001-08-16 2011-08-09 Martin Vorbach Method for the translation of programs for reconfigurable architectures
US7434191B2 (en) 2001-09-03 2008-10-07 Pact Xpp Technologies Ag Router
US8686475B2 (en) 2001-09-19 2014-04-01 Pact Xpp Technologies Ag Reconfigurable elements
US7046635B2 (en) 2001-11-28 2006-05-16 Quicksilver Technology, Inc. System for authorizing functionality in adaptable hardware devices
US8412915B2 (en) 2001-11-30 2013-04-02 Altera Corporation Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements
US6986021B2 (en) 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US20030108012A1 (en) * 2001-12-12 2003-06-12 Quicksilver Technology, Inc. Method and system for detecting and identifying scrambling codes
US7215701B2 (en) 2001-12-12 2007-05-08 Sharad Sambhwani Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7403981B2 (en) 2002-01-04 2008-07-22 Quicksilver Technology, Inc. Apparatus and method for adaptive multimedia reception and transmission in communication environments
WO2003060747A2 (en) 2002-01-19 2003-07-24 Pact Xpp Technologies Ag Reconfigurable processor
AU2003214003A1 (en) 2002-02-18 2003-09-09 Pact Xpp Technologies Ag Bus systems and method for reconfiguration
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US20030212770A1 (en) * 2002-05-10 2003-11-13 Sreekrishna Kotnur System and method of controlling software components
US7328414B1 (en) 2003-05-13 2008-02-05 Qst Holdings, Llc Method and system for creating and programming an adaptive computing engine
US7660984B1 (en) 2003-05-13 2010-02-09 Quicksilver Technology Method and system for achieving individualized protected space in an operating system
US7093255B1 (en) * 2002-05-31 2006-08-15 Quicksilver Technology, Inc. Method for estimating cost when placing operations within a modulo scheduler when scheduling for processors with a large number of function units or reconfigurable data paths
US20040006667A1 (en) * 2002-06-21 2004-01-08 Bik Aart J.C. Apparatus and method for implementing adjacent, non-unit stride memory access patterns utilizing SIMD instructions
AU2003286131A1 (en) 2002-08-07 2004-03-19 Pact Xpp Technologies Ag Method and device for processing data
US7657861B2 (en) 2002-08-07 2010-02-02 Pact Xpp Technologies Ag Method and device for processing data
US6952821B2 (en) * 2002-08-19 2005-10-04 Hewlett-Packard Development Company, L.P. Method and system for memory management optimization
US8108656B2 (en) 2002-08-29 2012-01-31 Qst Holdings, Llc Task definition for specifying resource requirements
EP1537486A1 (en) 2002-09-06 2005-06-08 PACT XPP Technologies AG Reconfigurable sequencer structure
US7937591B1 (en) 2002-10-25 2011-05-03 Qst Holdings, Llc Method and system for providing a device which can be adapted on an ongoing basis
US7484079B2 (en) * 2002-10-31 2009-01-27 Hewlett-Packard Development Company, L.P. Pipeline stage initialization via task frame accessed by a memory pointer propagated among the pipeline stages
US7107199B2 (en) * 2002-10-31 2006-09-12 Hewlett-Packard Development Company, L.P. Method and system for the design of pipelines of processors
US8276135B2 (en) 2002-11-07 2012-09-25 Qst Holdings Llc Profiling of software and circuit designs utilizing data operation analyses
US7225301B2 (en) 2002-11-22 2007-05-29 Quicksilver Technologies External memory controller node
US7356805B2 (en) * 2003-01-02 2008-04-08 University Of Rochester Temporal affinity analysis using reuse signatures
EP1676208A2 (en) 2003-08-28 2006-07-05 PACT XPP Technologies AG Data processing device and method
US7127560B2 (en) * 2003-10-14 2006-10-24 International Business Machines Corporation Method of dynamically controlling cache size
JP2005173648A (en) * 2003-12-05 2005-06-30 Matsushita Electric Ind Co Ltd Method and device for high-level synthesis
WO2005089350A2 (en) * 2004-03-16 2005-09-29 Mark Pomponio Custom database system and method of building the same
US20100011018A1 (en) * 2004-03-16 2010-01-14 Vision Genesis, Inc. Custom database system and method of building the same
US7353491B2 (en) * 2004-05-28 2008-04-01 Peter Pius Gutberlet Optimization of memory accesses in a circuit design
US20060075157A1 (en) * 2004-09-28 2006-04-06 Paul Marchal Programmable memory interfacing device for use in active memory management
US7681187B2 (en) * 2005-03-31 2010-03-16 Nvidia Corporation Method and apparatus for register allocation in presence of hardware constraints
US8473934B2 (en) 2005-07-15 2013-06-25 Imec Method for mapping applications on a multiprocessor platform/system
EP1974265A1 (en) 2006-01-18 2008-10-01 PACT XPP Technologies AG Hardware definition method
US7693257B2 (en) * 2006-06-29 2010-04-06 Accuray Incorporated Treatment delivery optimization
US8365113B1 (en) * 2007-01-10 2013-01-29 Cadence Design Systems, Inc. Flow methodology for single pass parallel hierarchical timing closure of integrated circuit designs
US8977995B1 (en) * 2007-01-10 2015-03-10 Cadence Design Systems, Inc. Timing budgeting of nested partitions for hierarchical integrated circuit designs
US20080182021A1 (en) * 2007-01-31 2008-07-31 Simka Harsono S Continuous ultra-thin copper film formed using a low thermal budget
US7685181B2 (en) * 2007-02-26 2010-03-23 International Business Machines Corporation Method and system for utilizing a hierarchical bitmap structure to provide a fast and reliable mechanism to represent large deleted data sets in relational databases
US8122442B2 (en) * 2008-01-31 2012-02-21 Oracle America, Inc. Method and system for array optimization
US8755515B1 (en) 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US8099693B2 (en) * 2008-11-04 2012-01-17 Cadence Design Systems, Inc. Methods, systems, and computer program product for parallelizing tasks in processing an electronic circuit design
US8656332B2 (en) * 2009-02-26 2014-02-18 International Business Machines Corporation Automated critical area allocation in a physical synthesized hierarchical design
JP2011081457A (en) * 2009-10-02 2011-04-21 Sony Corp Information processing apparatus and method
US9681455B2 (en) * 2010-01-28 2017-06-13 Alcatel Lucent Methods for reducing interference in a communication system
WO2013048413A1 (en) * 2011-09-29 2013-04-04 Intel Corporation Cache and/or socket sensitive multi-processor cores breadth-first traversal
JP5687603B2 (en) 2011-11-09 2015-03-18 株式会社東芝 Program conversion apparatus, program conversion method, and conversion program
US8959469B2 (en) 2012-02-09 2015-02-17 Altera Corporation Configuring a programmable device using high-level language
US10354886B2 (en) 2013-02-22 2019-07-16 Synopsys, Inc. Hybrid evolutionary algorithm for triple-patterning
DE112014003741T5 (en) * 2013-08-15 2016-05-25 Synopsys, Inc. Detect and display a remediation guide for multi-structuring
US9747407B2 (en) 2014-02-20 2017-08-29 Synopsys, Inc. Categorized stitching guidance for triple-patterning technology
US9710590B2 (en) * 2014-12-31 2017-07-18 Arteris, Inc. Estimation of chip floorplan activity distribution
US20160202909A1 (en) * 2015-01-14 2016-07-14 College Of William And Mary I/o scheduling method using read prioritization to reduce application delay
US9871895B2 (en) * 2015-04-24 2018-01-16 Google Llc Apparatus and methods for optimizing dirty memory pages in embedded devices
WO2017033336A1 (en) * 2015-08-27 2017-03-02 三菱電機株式会社 Circuit design assistance device and circuit design assistance program
US10372037B2 (en) 2015-10-30 2019-08-06 Synopsys, Inc. Constructing fill shapes for double-patterning technology
US10395001B2 (en) 2015-11-25 2019-08-27 Synopsys, Inc. Multiple patterning layout decomposition considering complex coloring rules
US9495141B1 (en) * 2015-12-01 2016-11-15 International Business Machines Corporation Expanding inline function calls in nested inlining scenarios
US10311195B2 (en) 2016-01-15 2019-06-04 Synopsys, Inc. Incremental multi-patterning validation
EP3493160A1 (en) * 2016-07-29 2019-06-05 Sony Corporation Image processing device and image processing method
US10467195B2 (en) 2016-09-06 2019-11-05 Samsung Electronics Co., Ltd. Adaptive caching replacement manager with dynamic updating granulates and partitions for shared flash-based storage system
US10455045B2 (en) 2016-09-06 2019-10-22 Samsung Electronics Co., Ltd. Automatic data replica manager in distributed caching and data processing systems
JP6761182B2 (en) * 2017-03-14 2020-09-23 富士通株式会社 Information processing equipment, information processing methods and programs
US10768970B2 (en) 2017-12-29 2020-09-08 Virtual Instruments Corporation System and method of flow source discovery
US20200142746A1 (en) * 2017-12-29 2020-05-07 Virtual Instruments Corporation Methods and system for throttling analytics processing
US11262989B2 (en) * 2019-08-05 2022-03-01 Advanced Micro Devices, Inc. Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202975A (en) * 1990-06-11 1993-04-13 Supercomputer Systems Limited Partnership Method for optimizing instruction scheduling for a processor having multiple functional resources
US5327561A (en) * 1991-09-20 1994-07-05 International Business Machines Corporation System and method for solving monotone information propagation problems
JP3544214B2 (en) * 1992-04-29 2004-07-21 サン・マイクロシステムズ・インコーポレイテッド Method and system for monitoring processor status
US6064819A (en) * 1993-12-08 2000-05-16 Imec Control flow and memory management optimization
US5742814A (en) * 1995-11-01 1998-04-21 Imec Vzw Background memory allocation for multi-dimensional signal processing
US5664193A (en) * 1995-11-17 1997-09-02 Sun Microsystems, Inc. Method and apparatus for automatic selection of the load latency to be used in modulo scheduling in an optimizing compiler
US5978509A (en) * 1996-10-23 1999-11-02 Texas Instruments Incorporated Low power video decoder system with block-based motion compensation
US5930510A (en) * 1996-11-19 1999-07-27 Sun Microsystems, Inc. Method and apparatus for an improved code optimizer for pipelined computers
DE69804708T2 (en) * 1997-03-29 2002-11-14 Imec Vzw Method and device for size optimization of storage units
US6151705A (en) * 1997-10-30 2000-11-21 Hewlett-Packard Company Efficient use of the base register auto-increment feature of memory access instructions

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005066764A3 (en) * 2003-12-29 2006-01-26 Intel Corp Data layout mechanism to reduce hardware resource conflicts
WO2005066764A2 (en) * 2003-12-29 2005-07-21 Intel Corporation Data layout mechanism to reduce hardware resource conflicts
US20050149916A1 (en) * 2003-12-29 2005-07-07 Tatiana Shpeisman Data layout mechanism to reduce hardware resource conflicts
US8578312B2 (en) 2004-03-30 2013-11-05 Imec Method and apparatus for designing and manufacturing electronic circuits subject to leakage problems caused by temperature variations and/or aging
US20060253204A1 (en) * 2004-03-30 2006-11-09 Antonis Papanikolaou Method and apparatus for designing and manufacturing electronic circuits subject to leakage problems caused by temperature variations and/or ageing
US8578319B2 (en) * 2004-03-30 2013-11-05 Imec Method and apparatus for designing and manufacturing electronic circuits subject to process variations
US20050235232A1 (en) * 2004-03-30 2005-10-20 Antonis Papanikolaou Method and apparatus for designing and manufacturing electronic circuits subject to process variations
EP1583009A1 (en) * 2004-03-30 2005-10-05 Interuniversitair Micro-Elektronica Centrum Method and apparatus for designing and manufacturing electronic circuits subject to process variations
US8291400B1 (en) * 2007-02-07 2012-10-16 Tilera Corporation Communication scheduling for parallel processing architectures
US20110191758A1 (en) * 2010-01-29 2011-08-04 Michael Scharf Optimized Memory Allocator By Analyzing Runtime Statistics
US8181131B2 (en) 2010-04-30 2012-05-15 International Business Machines Corporation Enhanced analysis of array-based netlists via reparameterization
US8478574B2 (en) 2010-04-30 2013-07-02 International Business Machines Corporation Tracking array data contents across three-valued read and write operations
US8566764B2 (en) 2010-04-30 2013-10-22 International Business Machines Corporation Enhanced analysis of array-based netlists via phase abstraction
US8146034B2 (en) 2010-04-30 2012-03-27 International Business Machines Corporation Efficient Redundancy Identification, Redundancy Removal, and Sequential Equivalence Checking within Designs Including Memory Arrays.
US8307313B2 (en) 2010-05-07 2012-11-06 International Business Machines Corporation Minimizing memory array representations for enhanced synthesis and verification
US8336016B2 (en) 2010-05-07 2012-12-18 International Business Machines Corporation Eliminating, coalescing, or bypassing ports in memory array representations
US8291359B2 (en) * 2010-05-07 2012-10-16 International Business Machines Corporation Array concatenation in an integrated circuit design
US9128844B2 (en) * 2012-12-14 2015-09-08 International Business Machines Corporation Enhancing analytics performance using distributed multi-tiering
US20150046913A1 (en) * 2013-07-09 2015-02-12 International Business Machines Corporation Data splitting for multi-instantiated objects
US9311065B2 (en) * 2013-07-09 2016-04-12 International Business Machines Corporation Data splitting for multi-instantiated objects
CN113742080A (en) * 2020-09-10 2021-12-03 吕戈 Efficient construction method and device for immutable object execution environment

Also Published As

Publication number Publication date
EP0974898A2 (en) 2000-01-26
EP0974906A3 (en) 2008-12-24
US6421809B1 (en) 2002-07-16
EP0974898A3 (en) 2008-12-24
US6609088B1 (en) 2003-08-19
EP0974906A2 (en) 2000-01-26
US6449747B2 (en) 2002-09-10

Similar Documents

Publication Publication Date Title
US6449747B2 (en) Method for determining an optimized memory organization of a digital device
US8516454B2 (en) Efficient parallel computation of dependency problems
US9684494B2 (en) Efficient parallel computation of dependency problems
Catthoor et al. Data access and storage management for embedded programmable processors
Balasa et al. Background memory area estimation for multidimensional signal processing systems
Cong et al. Automatic memory partitioning and scheduling for throughput and power optimization
EP0867808B1 (en) Method and apparatus for size optimisation of storage units
US5099447A (en) Blocked matrix multiplication for computers with hierarchical memory
Phothilimthana et al. Swizzle inventor: data movement synthesis for GPU kernels
Lim et al. Communication-free parallelization via affine transformations
US20070074195A1 (en) Data transformations for streaming applications on multiprocessors
Yin et al. Memory-aware loop mapping on coarse-grained reconfigurable architectures
CN113748399A (en) Computation graph mapping in heterogeneous computers
Kelefouras et al. A methodology for speeding up fast fourier transform focusing on memory architecture utilization
Turkington et al. Outer loop pipelining for application specific datapaths in FPGAs
US20220164510A1 (en) Automated design of field programmable gate array or other logic device based on artificial intelligence and vectorization of behavioral source code
Martin et al. Constraint-driven instructions selection and application scheduling in the DURASE system
US7363459B2 (en) System and method of optimizing memory usage with data lifetimes
Brockmeyer et al. Low power storage cycle budget distribution tool support for hierarchical graphs
Diguet et al. Hierarchy exploration in high level memory management
US20120226890A1 (en) Accelerator and data processing method
Chu et al. Hardware selection and clustering in the HYPER synthesis system
Corvino et al. Architecture exploration for efficient data transfer and storage in data-parallel applications
US20240095309A1 (en) System and method for holistically optimizing dnn models for hardware accelerators
Véstias et al. System-level co-synthesis of dataflow dominated applications on reconfigurable hardware/software architectures

Legal Events

Date Code Title Description
CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20140910