WO1999038085A1 - Method and apparatus for enforcing ordered execution of reads and writes across a memory interface - Google Patents
Method and apparatus for enforcing ordered execution of reads and writes across a memory interface Download PDFInfo
- Publication number
- WO1999038085A1 WO1999038085A1 PCT/US1999/001387 US9901387W WO9938085A1 WO 1999038085 A1 WO1999038085 A1 WO 1999038085A1 US 9901387 W US9901387 W US 9901387W WO 9938085 A1 WO9938085 A1 WO 9938085A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- requests
- processor
- interface
- reordering
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/161—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
- G06F13/1621—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by maintaining request order
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
- G06F9/3834—Maintaining memory consistency
Definitions
- the present invention relates to read/write interfaces between processors and memories. More generally, it relates to interfaces between clients of a memory mapped resource and that resource. In a particular embodiment, the invention provides a solution to the problem of efficiently using the interface while still ensuring that reads and writes are performed in proper sequence when a particular sequence is required.
- Memory refers to a memory system, which may include data paths, controller chips, buffers, queues, and memory chips. While this disclosure describes the problems and solutions in data storage memory, it should be understood that the problems and solutions can be generalized in many cases for memory-mapped circuits which perform more than just storage of data (e.g,, memory-mapped I/O, memory-mapped compute devices) .
- a “memory location” (or simply “a location”) is an individually addressable unit of the memory that can be addressed and that holds data (or transports the data to and/or from an I/O device or a compute device) .
- a "client” is a central processing unit (CPU) , processor, I/O controller or other device which uses the services provided by the memory system.
- CPU central processing unit
- processor processor
- I/O controller I/O controller
- a "request” is an action performed by a client in using the services of a memory system.
- a "read request” (or simply “a read”) is a request from a client to the memory requesting the contents of a memory location specified by an address of the memory location to be read; the read request is accompanied by the address of the read memory location.
- a "write request” (or simply “a write”) is a request from a client to the memory requesting that the memory place a write value into a write memory location; the write request is accompanied by the write value and the address of the write memory location.
- An “acknowledgment” (or simply “an ack”) is an indication returned by the memory to the client indicating that a request has been satisfied; an acknowledgment to a read request includes the data read from the specified memory location.
- Pending reads is the set of read requests which are pending; a read request is “pending” from the time it is accepted by the memory until the memory issues an ack.
- Pending writes analogous to pending reads, is the set of write requests which are pending; a write request is “pending” from the time it is accepted by the memory until the memory issues an acknowledgment.
- concurrency When building memory systems for large computers, one feature which provides for high performance is concurrency, wherein more than one memory operation is in progress at the same time.
- concurrency One limitation on concurrency is that a CPU, or other client, requires memory consistency. A memory appears consistent when a "read" of a memory location returns a value most recently "written” in that location. In some systems with concurrency, reads and writes are reordered into an optimized execution order to achieve higher performance, however this may lead to loss of consistency.
- Consistency is easy to implement if memory requests are always processed in exactly the same order as they are issued by the client. Preserving the order exactly, however, is often not possible in high-performance memory designs which may need to reorder requests to speed up processing. For example, the system requirements might be such that read requests must be completed faster than writer requests because pending read requests hold up processing until the read data is returned. However reordering of requests is done, it must not violate the consistency that is inherent in the one-request-at-a-time memory model described above.
- One set of reordering constraints are as follows:
- Rule 1 A read of location X followed by a write of location X cannot be reordered among themselves.
- Rule A is often implemented by adding "store buffers" to the processor.
- Rule B is almost never implemented because its performance advantage is very slight. Nonetheless, Rules A and B give some insight into what can be done at the processor to increase concurrency and thus improve performance while still maintaining consistency.
- MEMBAR An MEMBAR instruction provides a way for a programmer to enforce an order of reads and writes issued by client.
- MEMBAR instructions are interspersed in instructions codes executed by a processor. When a processor is executing instructions and encounters a MEMBAR instruction, it holds up further read and write operations until the operations which preceded the MEMBAR instructions have completed.
- Sproull -Sutherland discloses a method of determining whether the read and write operations have been completed (that patent/application is 5 commonly assigned to the assignee of the present application and is incorporated herein by reference for all purposes) .
- SUN SPARC-V9 manual that reference explains how the ordering constraints are enforced by the processor. There, given a first operation and a second operation, if the second operation must not be performed before the first operation, the execution unit delays the submission of the second operation to the memory until the first operation is no longer pending.
- the client maintains its record of pending reads and writes by noting (a) when it issues each new request and (b) when each request is eventually acknowledged, signifying that the request is no longer pending.
- the processor holds up an operation instead of using a bandwidth-limited interface whenever the interface is available, performance may be lost as extra time would be needed to send the held-up request and the critical path involving that request would be lengthened. 6
- Fenwick appears to show how barrier instructions operate in the context of the Alpha 21164 microprocessor, built by Digital Equipment Corporation.
- a barrier instruction MB or "memory barrier”
- the MB instruction is reported off-chip, and may be used at the interface between the microprocessor and the memory bus, but the MB instructions do not apparently pass over the memory bus .
- the MB information is not needed beyond the bus.
- a similar instruction is used in the memory interface of most microprocessors (for example, waiting for all pending memory transactions to complete before allowing any new memory requests to be issued) , but the interface circuitry is commonly provided on the microprocessor chip itself.
- processor-memory interface which allows the processor to enforce execution order of concurrently submitted operations, even when multiple operations required to be ordered are submitted to the memory which may reorder operations for its own purposes.
- a memory interface is provided between a processor and a memory which is capable of multiple concurrent transactions or accesses.
- the interface between the processor and the memory carries read and write operations as well as "barrier" operations, where a barrier operation signals the non-reorderability of operations.
- the barrier operations are used in connection with resolved regions and unresolved regions of a processor system's architecture.
- the unresolved region is a region wherein operations may be reordered for efficiency or other reasons and the resolved region is a region wherein the 8 operations are in a fixed order from which they cannot be reordered.
- reordering constraints can survive the travel through the unresolved region so that any necessary reordering between the unresolved region and the resolved region can occur. Since the unresolved region extends into the memory, it is possible for the memory to perform optimization reordering or other reordering of operations. Once the operations reach the boundary between the unresolved region and the resolved region, the operations are reordered, as needed, to comply with the constraints of the barrier operations .
- the memory interface is an interface to one or more memory mapped input/output (I/O) devices or computational devices.
- memory operations are initiated by more than one processor.
- processor-memory boundary While the exact location of the processor-memory boundary might not be clear, the present invention is useful wherever reordering dictated by the memory system is being performed, as opposed to reordering dictated only by the processor.
- FIG. 1 is a block diagram of a processing system according to the present invention.
- FIG. 2 is a block diagram of a multiple processor processing system according to the present invention.
- FIG. 3 shows an example of a request stream as might be used in the present invention.
- FIG. 4 shows a variation of the request stream of
- FIG. 3 as might be used in a dual -path request stream system.
- FIG. 5 is a block diagram of a banked memory system according to the present invention. 9
- a client specifying the sequence must be able to also specify when a portion of the sequence must be handled in the order specified and not reordered.
- a processor might specify a sequence of memory requests. The memory requests are executed at a memory, after passing through processor logic and buffers, a processor-memory bus (which might be shared with more than one processor and/or more than one memory) , and an memory interface circuits interposed between the bus and the memory . Any of these intermediate elements might be adapted to reorder the sequence of memory requests.
- a memory interface circuit includes a paging unit, for loading and unloading pages of memory from a slow, large memory to a fast, core memory where all memory requests happen within the core memory
- the memory interface circuit might reorder memory requests so that all the requests to be done within one page of memory are done at once to reduce the number of page swaps required to fulfill all of the memory requests.
- the order of the requests can be determined at any point in the path of these memory requests from a processor to a memory. However, if a system does reorder requests, there are some points in the path where the order of the memory requests is not 10 necessarily resolved or resolvable. The collection of these points is referred to herein as the "unresolved region" of the path and an "unresolved" pathway is a pathway, such as a network or a bus, which carries requests in an unresolved order. When a processor must be able to specify a particular order of handling, at some point the unresolved region must end. Beyond that point is referred to herein as the "resolved region" of the path.
- operations are in a fixed order from which they will not be further reordered.
- FIGS. 1-2 processor systems in which an order enforcing system according to the present invention might be used as there shown.
- FIG. 1 shows a processor system 100 comprising a processor 102, a memory subsystem 104, an I/O subsystem 106 and a compute subsystem 108. Each of these components is coupled via a communications link 110. Each of the three subsystems is memory-mapped, i.e., processor 102 interfaces with the subsystem using addressed read and write requests as if the subsystem were an addressable memory. Communications link 110 could be a memory bus, a network, or the like.
- Memory subsystem 104 is shown comprising a memory interface circuit 120 and a memory array 122.
- processor 102 sends a read request or a write request over communications link 110 to memory subsystem 104, it is received and handled by memory interface circuit 120 and interface circuit 120 handles the storage and retrieval of data in the specified locations of memory array 122.
- I/O subsystem 106 is shown comprising a memory-mapped I/O interface circuit 130 and the I/O devices are shown generally as 132.
- Interface circuit 130 receives and handles requests from processor 102 in much the same way as interface circuit 120 handles memory requests.
- Memory-mapped I/O is not the focus of this description and is well known, so many details are omitted here for brevity. 11
- Compute subsystem 108 is shown comprising a memory-mapped compute interface circuit 140 and a compute device 142.
- interface circuit 140 receives and handles requests from processor 102 over communications link 110.
- Interface circuit 140 converts requests, which are formatted as requests to a particular memory location, into messages to and from compute device 142 according to a predetermined memory map and convention.
- computer device 142 might be a floating point processor and, by convention, a read request from a particular memory address might be a request for the results of a floating point operation, while a write request to a particular memory address might be an operand used in a floating point operation.
- the reordering enforcement system according to the present invention is not limited to bus-based systems.
- I/O subsystem 106 and compute subsystem 108 are not as important as the understanding that ordering of requests can be important in these subsystems. For example, if processor 102 sends a write request to I/O subsystem 106 to configure an external I/O device (such as initializing a serial communications circuit) then processor 102 sends a read request to gather data from that I/O device, those requests should appear at the I/O device in the order required by processor 102.
- interface circuit 140 or communications link 110 should return those requests to an order relative to each other that is the order in which processor 102 sent the requests, if either of those devices reordered the requests for its own internal efficiency.
- the request stream shows read requests, write requests and barrier requests.
- a barrier is sent to the memory subsystem to signal that the subsystem should not reorder requests across the barrier, i.e., that all requests received prior to the barrier must be handled before any request received after the barrier.
- the barrier requests are indicated by the label "MEMIBAR" which is short for "memory interface barrier.” MEMIBAR requests should not be confused with MEMBAR instructions, which are instructions inserted into a program to control the operation of a processor. By contrast, the MEMIBAR requests are sent from the processor to the memory subsystem to enforce ordering.
- requests are shown being sent to a memory subsystem, in order, from request 1 to request 14.
- read requests include an address
- write requests include an address and the data to be written (as the actual data is not relevant here, it is shown in FIG. 3(a) as "xx") .
- requests 5 and 11 are barriers, and therefore the memory subsystem is free to reorder requests 1-4 among themselves, 6-10 among themselves and 12-14 among themselves.
- the memory subsystem might otherwise group requests dealing with one page (requests 1, 4 and 6) to perform them before a page swap and group the remainder of the requests to perform them after the page swap.
- request 5 is a barrier
- request 6 cannot be reordered for execution before the page swap because that would require it to be executed before requests 2-3.
- the barrier at request 5 ensures that request 6 (a write to address 309F) does not get reordered relative to request 4 (a read from address 309F) . This is necessary to ensure that the correct, pre-write, value is returned for the read request.
- FIG. 3(b) shows an example of an alternate form of a request stream, wherein the over-restrictiveness can be avoided.
- the barrier requests include an address to indicate the requests for which the barrier applies.
- request 5 (“MEMIBAR 309F")
- that barrier constrains only the relative reordering of requests which deal with address 309F, namely requests 4 and 6.
- the memory subsystem can reorder requests 4 and 6 relative to requests 2 and 3 for more efficient paging.
- request 14 can be reordered relative to requests 10-13, thereby allowing two read requests to be handled with a single read, as might occur when two processors are reading the same memory address .
- barrier requests include addresses
- excess of barrier requests e.g., a multitude of consecutive barrier requests, one per address being constrained by a barrier
- the processor need not introduce barrier requests to enforce ordering of requests when one of the requests has already been acknowledged by the memory system. For example, the writes to location 3108 that appear on lines 3 and 13 must be ordered, but this example assumes that by the time the request on line 13 is issued, the request on line 3 has been acknowledged. In the Sproull-Sutherland dual-path memory, a more complex barrier procedure is needed.
- the client a processor, in this example
- HB half barrier
- the client retains a record of pending reads and pending writes, and checks a new request before sending it to the memory. If there is a possible conflict, the client first 15 sends the HB markers and then issues the new request .
- the memory system obeys the following rules:
- Rule M3 requires that the paths be synchronized by HB markers. One way to do this is, when one path (read or write) processes an HB marker, the memory must hold that path up until the other path (write or read, respectively) reaches an HB marker. Intermediate elements which handle requests, but which are elements that need not serialize memory accesses, need not hold up for HB markers. Thus, Rule M3 should be applied only to elements which must serialize requests, such as the read/write interface at a memory chip.
- read requests, write requests and HB markers travel through the memory system, they eventually come to a "memory chip" itself.
- reads and writes may be traveling in separate paths, much like a two-ported memory (i.e., having separate read and write ports) . These ports may be designed with a "recursive interface" as described by Sproull -Sutherland. Inside this memory chip, the read and write paths finally meet, both potentially accessing the same memory location. To avoid consistency problems, Rule M3 is enforced there.
- HB markers is inserted into the memory system by the client and those markers meet together at the memory chip, where they are used to synchronize the read and write channels.
- the memory system may then apply arbitrary policies to requests, give priority to reads, reorder writes with respect to each 16 other (e.g., to take advantage of fast "page mode" on memory chips) , etc.
- FIG. 4 shows the half barriers with addresses, as is the case in FIG. 3(b), but the dual -path memory system could also be implemented without half barrier addresses, as is the case in FIG. 3(a) .
- the HB markers would be more powerful than necessary to establish the required ordering constraints if they did not include addresses. If an HB marker must be inserted before a read of location X or before a write of location X, it is because there is a pending read or write request for location X that might conflict. In such cases, the memory subsystem need only guarantee that pending requests for location X are not reordered with respect to the marker. Therefore, if the address X is attached to the HB marker, potential conflicts can be avoided without excessive restraint.
- address bits to use in the marker might vary depending on the configuration of the memory system or the characteristics of the client.
- the subset might be the "low order" bits of the address, or the "high order” bits of the address. Those skilled in the art of memory design will recognize that these are only examples and that other subsets of address bits could be used. 17
- the objective in associating full or partial addresses with a marker is to reduce the frequency with which markers must be introduced into the memory system. The reason is that markers will prevent the memory system from reordering memory requests to achieve maximum performance.
- bank With full or partial address tags, order enforcement can also be used with "banked” memory.
- Large memories are often composed of memory banks, i.e., each bank is responsible for a range of memory locations.
- the memory system has some form of “distributor” that accepts memory requests and distributes them to the proper bank, i.e., the bank that contains the memory location specified in the read or write request.
- One special form of bank structure is known as "interleaving" in which low-order address bits select a memory bank. For example, in a two-bank system, even addresses are located in one bank and odd addresses in another.
- the distributor delivers memory requests according to which bank contains the addressed location. It must also deliver markers. Markers without addresses must be delivered to every bank, because the memory system cannot know which requests are being prevented from reordering. Thus, for example, in a two-bank system, when the distributor receives a marker along the read path, it must send a marker to each of the two banks along their respective read paths. However, if a marker contains an address, it is necessary to forward that marker only to the one bank that contains that address . Note that partial address tags can be used with banked memory to the same effect, so long as the tag identifies the bank.
- FIGS. 3-4 show operations on single memory addresses, it should be understood that the amount of memory processed as part of a particular request is not fixed, but can vary.
- a memory system might be configured to handle several sizes of requests, such as a single word read, a cache line read (e.g., 16 words in a cache line), single word write, cache line write, or writing selected bytes within a word.
- the "address" of the read or write request is the address of the first word of what may be a multi-word request. This may be important in deciding 18 whether a barrier that contains an address (first word address) can be reordered with respect to another request.
- FIG. 5 is a block diagram of a banked memory subsystem 500 illustrating these points.
- Banked memory subsystem 500 is shown with an interface circuit 501 coupling subsystem 500 and a processor and a distributor 502 which routes memory requests to the appropriate memory bank.
- Two bank memories 503 are shown, but it should be understood that the memory can be divided into more than two banks.
- read requests and write requests travel along separate paths, namely read path 504 and write path 506.
- Distributor 502 examines the address of each request and, in this example, routes requests with odd addresses to bank memory 503 (1) using a bank read path 508(1) and a bank write path 510(1) and routes requests with even addresses to bank memory 503(2) using a bank read path 508(2) and a bank write path 510(2) . Reordering for memory optimization might occur at either interface circuit 501, distributor 502 or at the inputs of bank memories 503. The flow of barrier requests will now be described.
- interface circuit 501 sends one half barrier along path 504 and one half barrier along path 506. As explained above, this will allow bank memory controller 500 to prevent reordering of read and write requests relative to each other even though they travel along separate paths .
- the half barriers are detected by distributor 502
- the half barrier from read path 504 is sent along bank read paths 508 and the half barrier from write path 506 is sent along bank write paths 510. Since a half barrier received on one of bank paths 508 or 510 will hold up memory accesses until the matching half barrier arrives, the broadcasting of half barriers to all bank memories 503 might be overly restrictive.
- the barriers can include addresses as described above, or can include enough of a partial address so that distributor 502 can identify the bank to which the barrier applies. If such addresses or partial addresses are included, distributor 502 19 can selectively route the half barriers to only the bank memory containing the address for which the barrier applies.
- distributor 502 does not need to hold up for half barrier markers to synchronize, but will send along half barrier markers and read or write request as received.
- a processor has an interface to a distributed memory having both local memory and "remote" memory, where the remote memory is connected to the processor by a high speed network, a bus extension, or the like.
- Half barrier markers will be sent to the remote memories just as they are sent to banks in the example shown in FIG. 5.
- the systems described above implement marker (half barrier) synchronization at the memory chip interface, other variations are possible. All that is required is that the read and write paths synchronize at some point in their processing, beyond which no reordering is permitted (i.e., there is a nonzero "resolved” region). Such a synchronization point is possible at many different points in a memory system.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU23361/99A AU2336199A (en) | 1998-01-23 | 1999-01-21 | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface |
JP2000528921A JP2002510079A (en) | 1998-01-23 | 1999-01-21 | Method and apparatus for forcing ordered execution of reads and writes between memory interfaces |
EP99903307A EP1047996A1 (en) | 1998-01-23 | 1999-01-21 | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/012,882 | 1998-01-23 | ||
US09/012,882 US6038646A (en) | 1998-01-23 | 1998-01-23 | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999038085A1 true WO1999038085A1 (en) | 1999-07-29 |
Family
ID=21757199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/001387 WO1999038085A1 (en) | 1998-01-23 | 1999-01-21 | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface |
Country Status (5)
Country | Link |
---|---|
US (1) | US6038646A (en) |
EP (1) | EP1047996A1 (en) |
JP (1) | JP2002510079A (en) |
AU (1) | AU2336199A (en) |
WO (1) | WO1999038085A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001033363A2 (en) * | 1999-11-02 | 2001-05-10 | Siemens Aktiengesellschaft | Bus system for simultaneous handling of various memory access procedures with a system-on-chip solution |
WO2008151101A1 (en) | 2007-06-01 | 2008-12-11 | Qualcomm Incorporated | Device directed memory barriers |
WO2011045555A1 (en) * | 2009-10-13 | 2011-04-21 | Arm Limited | Reduced latency barrier transaction requests in interconnects |
Families Citing this family (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140325175A1 (en) * | 2013-04-29 | 2014-10-30 | Pact Xpp Technologies Ag | Pipeline configuration protocol and configuration unit communication |
US6757791B1 (en) * | 1999-03-30 | 2004-06-29 | Cisco Technology, Inc. | Method and apparatus for reordering packet data units in storage queues for reading and writing memory |
US6256713B1 (en) * | 1999-04-29 | 2001-07-03 | International Business Machines Corporation | Bus optimization with read/write coherence including ordering responsive to collisions |
US8230411B1 (en) | 1999-06-10 | 2012-07-24 | Martin Vorbach | Method for interleaving a program over a plurality of cells |
US6678810B1 (en) | 1999-12-30 | 2004-01-13 | Intel Corporation | MFENCE and LFENCE micro-architectural implementation method and system |
US6988154B2 (en) | 2000-03-10 | 2006-01-17 | Arc International | Memory interface and method of interfacing between functional entities |
US6963967B1 (en) * | 2000-06-06 | 2005-11-08 | International Business Machines Corporation | System and method for enabling weak consistent storage advantage to a firmly consistent storage architecture |
US6826619B1 (en) | 2000-08-21 | 2004-11-30 | Intel Corporation | Method and apparatus for preventing starvation in a multi-node architecture |
US6487643B1 (en) | 2000-09-29 | 2002-11-26 | Intel Corporation | Method and apparatus for preventing starvation in a multi-node architecture |
US8058899B2 (en) | 2000-10-06 | 2011-11-15 | Martin Vorbach | Logic cell array and bus system |
US6772298B2 (en) | 2000-12-20 | 2004-08-03 | Intel Corporation | Method and apparatus for invalidating a cache line without data return in a multi-node architecture |
US6791412B2 (en) * | 2000-12-28 | 2004-09-14 | Intel Corporation | Differential amplifier output stage |
US7234029B2 (en) * | 2000-12-28 | 2007-06-19 | Intel Corporation | Method and apparatus for reducing memory latency in a cache coherent multi-node architecture |
US20020087775A1 (en) * | 2000-12-29 | 2002-07-04 | Looi Lily P. | Apparatus and method for interrupt delivery |
US20020087766A1 (en) * | 2000-12-29 | 2002-07-04 | Akhilesh Kumar | Method and apparatus to implement a locked-bus transaction |
US6721918B2 (en) | 2000-12-29 | 2004-04-13 | Intel Corporation | Method and apparatus for encoding a bus to minimize simultaneous switching outputs effect |
US9436631B2 (en) | 2001-03-05 | 2016-09-06 | Pact Xpp Technologies Ag | Chip including memory element storing higher level memory data on a page by page basis |
US9552047B2 (en) | 2001-03-05 | 2017-01-24 | Pact Xpp Technologies Ag | Multiprocessor having runtime adjustable clock and clock dependent power supply |
US9250908B2 (en) | 2001-03-05 | 2016-02-02 | Pact Xpp Technologies Ag | Multi-processor bus and cache interconnection system |
US9411532B2 (en) | 2001-09-07 | 2016-08-09 | Pact Xpp Technologies Ag | Methods and systems for transferring data between a processing device and external devices |
WO2002093365A1 (en) | 2001-05-11 | 2002-11-21 | Sospita As | Sequence numbering mechanism to ensure execution order integrity of inter-dependent smart card applications |
US10031733B2 (en) | 2001-06-20 | 2018-07-24 | Scientia Sol Mentis Ag | Method for processing data |
US9170812B2 (en) | 2002-03-21 | 2015-10-27 | Pact Xpp Technologies Ag | Data processing system having integrated pipelined array data processor |
US7394284B2 (en) | 2002-09-06 | 2008-07-01 | Pact Xpp Technologies Ag | Reconfigurable sequencer structure |
US7814488B1 (en) * | 2002-09-24 | 2010-10-12 | Oracle America, Inc. | Quickly reacquirable locks |
US7360069B2 (en) * | 2004-01-13 | 2008-04-15 | Hewlett-Packard Development Company, L.P. | Systems and methods for executing across at least one memory barrier employing speculative fills |
US7243200B2 (en) * | 2004-07-15 | 2007-07-10 | International Business Machines Corporation | Establishing command order in an out of order DMA command queue |
JP4327081B2 (en) * | 2004-12-28 | 2009-09-09 | 京セラミタ株式会社 | Memory access control circuit |
US7613886B2 (en) * | 2005-02-08 | 2009-11-03 | Sony Computer Entertainment Inc. | Methods and apparatus for synchronizing data access to a local memory in a multi-processor system |
US7617343B2 (en) * | 2005-03-02 | 2009-11-10 | Qualcomm Incorporated | Scalable bus structure |
US9026744B2 (en) * | 2005-03-23 | 2015-05-05 | Qualcomm Incorporated | Enforcing strongly-ordered requests in a weakly-ordered processing |
US7500045B2 (en) | 2005-03-23 | 2009-03-03 | Qualcomm Incorporated | Minimizing memory barriers when enforcing strongly-ordered requests in a weakly-ordered processing system |
US7574565B2 (en) * | 2006-01-13 | 2009-08-11 | Hitachi Global Storage Technologies Netherlands B.V. | Transforming flush queue command to memory barrier command in disk drive |
US7917676B2 (en) * | 2006-03-10 | 2011-03-29 | Qualcomm, Incorporated | Efficient execution of memory barrier bus commands with order constrained memory accesses |
US7818306B2 (en) * | 2006-03-24 | 2010-10-19 | International Business Machines Corporation | Read-copy-update (RCU) operations with reduced memory barrier usage |
US7783817B2 (en) * | 2006-08-31 | 2010-08-24 | Qualcomm Incorporated | Method and apparatus for conditional broadcast of barrier operations |
US8108584B2 (en) * | 2008-10-15 | 2012-01-31 | Intel Corporation | Use of completer knowledge of memory region ordering requirements to modify transaction attributes |
US8055816B2 (en) | 2009-04-09 | 2011-11-08 | Micron Technology, Inc. | Memory controllers, memory systems, solid state drives and methods for processing a number of commands |
US8417912B2 (en) | 2010-09-03 | 2013-04-09 | International Business Machines Corporation | Management of low-paging space conditions in an operating system |
US8782356B2 (en) * | 2011-12-09 | 2014-07-15 | Qualcomm Incorporated | Auto-ordering of strongly ordered, device, and exclusive transactions across multiple memory regions |
US9021228B2 (en) | 2013-02-01 | 2015-04-28 | International Business Machines Corporation | Managing out-of-order memory command execution from multiple queues while maintaining data coherency |
US9594713B2 (en) * | 2014-09-12 | 2017-03-14 | Qualcomm Incorporated | Bridging strongly ordered write transactions to devices in weakly ordered domains, and related apparatuses, methods, and computer-readable media |
US9946492B2 (en) * | 2015-10-30 | 2018-04-17 | Arm Limited | Controlling persistent writes to non-volatile memory based on persist buffer data and a persist barrier within a sequence of program instructions |
US11409530B2 (en) | 2018-08-16 | 2022-08-09 | Arm Limited | System, method and apparatus for executing instructions |
TWI773959B (en) | 2019-01-31 | 2022-08-11 | 美商萬國商業機器公司 | Data processing system, method and computer program product for handling an input/output store instruction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0679993A2 (en) * | 1994-04-28 | 1995-11-02 | Hewlett-Packard Company | A computer apparatus having special instructions to force ordered load and store operations |
WO1996030838A1 (en) * | 1995-03-31 | 1996-10-03 | Samsung & Electronic, Co. Ltd. | Memory controller which executes read and write commands out of order |
US5666506A (en) * | 1994-10-24 | 1997-09-09 | International Business Machines Corporation | Apparatus to dynamically control the out-of-order execution of load/store instructions in a processor capable of dispatchng, issuing and executing multiple instructions in a single processor cycle |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5222237A (en) * | 1988-02-02 | 1993-06-22 | Thinking Machines Corporation | Apparatus for aligning the operation of a plurality of processors |
US6088768A (en) * | 1993-12-28 | 2000-07-11 | International Business Machines Corporation | Method and system for maintaining cache coherence in a multiprocessor-multicache environment having unordered communication |
US5666494A (en) * | 1995-03-31 | 1997-09-09 | Samsung Electronics Co., Ltd. | Queue management mechanism which allows entries to be processed in any order |
-
1998
- 1998-01-23 US US09/012,882 patent/US6038646A/en not_active Expired - Lifetime
-
1999
- 1999-01-21 AU AU23361/99A patent/AU2336199A/en not_active Abandoned
- 1999-01-21 WO PCT/US1999/001387 patent/WO1999038085A1/en not_active Application Discontinuation
- 1999-01-21 JP JP2000528921A patent/JP2002510079A/en active Pending
- 1999-01-21 EP EP99903307A patent/EP1047996A1/en not_active Ceased
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0679993A2 (en) * | 1994-04-28 | 1995-11-02 | Hewlett-Packard Company | A computer apparatus having special instructions to force ordered load and store operations |
US5666506A (en) * | 1994-10-24 | 1997-09-09 | International Business Machines Corporation | Apparatus to dynamically control the out-of-order execution of load/store instructions in a processor capable of dispatchng, issuing and executing multiple instructions in a single processor cycle |
WO1996030838A1 (en) * | 1995-03-31 | 1996-10-03 | Samsung & Electronic, Co. Ltd. | Memory controller which executes read and write commands out of order |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001033363A2 (en) * | 1999-11-02 | 2001-05-10 | Siemens Aktiengesellschaft | Bus system for simultaneous handling of various memory access procedures with a system-on-chip solution |
WO2001033363A3 (en) * | 1999-11-02 | 2001-12-13 | Siemens Ag | Bus system for simultaneous handling of various memory access procedures with a system-on-chip solution |
WO2008151101A1 (en) | 2007-06-01 | 2008-12-11 | Qualcomm Incorporated | Device directed memory barriers |
US7984202B2 (en) * | 2007-06-01 | 2011-07-19 | Qualcomm Incorporated | Device directed memory barriers |
KR101149622B1 (en) * | 2007-06-01 | 2012-05-29 | 콸콤 인코포레이티드 | Device directed memory barriers |
EP2600254A1 (en) * | 2007-06-01 | 2013-06-05 | Qualcomm Incorporated | Device directed memory barriers |
JP2013242876A (en) * | 2007-06-01 | 2013-12-05 | Qualcomm Inc | Device-directed memory barriers |
WO2011045555A1 (en) * | 2009-10-13 | 2011-04-21 | Arm Limited | Reduced latency barrier transaction requests in interconnects |
US8607006B2 (en) | 2009-10-13 | 2013-12-10 | Arm Limited | Barrier transactions in interconnects |
US8856408B2 (en) | 2009-10-13 | 2014-10-07 | Arm Limited | Reduced latency barrier transaction requests in interconnects |
US9477623B2 (en) | 2009-10-13 | 2016-10-25 | Arm Limited | Barrier transactions in interconnects |
Also Published As
Publication number | Publication date |
---|---|
JP2002510079A (en) | 2002-04-02 |
EP1047996A1 (en) | 2000-11-02 |
AU2336199A (en) | 1999-08-09 |
US6038646A (en) | 2000-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6038646A (en) | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface | |
US6816947B1 (en) | System and method for memory arbitration | |
US5398325A (en) | Methods and apparatus for improving cache consistency using a single copy of a cache tag memory in multiple processor computer systems | |
US6920516B2 (en) | Anti-starvation interrupt protocol | |
US6643747B2 (en) | Processing requests to efficiently access a limited bandwidth storage area | |
KR20000022712A (en) | Non-uniform memory access(numa) data processing system that speculatively issues requests on a node interconnect | |
US20030014593A1 (en) | Incremental tag build for hierarchical memory architecture | |
US6014721A (en) | Method and system for transferring data between buses having differing ordering policies | |
JP2001117859A (en) | Bus controller | |
US5659707A (en) | Transfer labeling mechanism for multiple outstanding read requests on a split transaction bus | |
US6546465B1 (en) | Chaining directory reads and writes to reduce DRAM bandwidth in a directory based CC-NUMA protocol | |
CN115033184A (en) | Memory access processing device and method, processor, chip, board card and electronic equipment | |
US5655102A (en) | System and method for piggybacking of read responses on a shared memory multiprocessor bus | |
US6347349B1 (en) | System for determining whether a subsequent transaction may be allowed or must be allowed or must not be allowed to bypass a preceding transaction | |
JPH0628247A (en) | Dynamiccaly rearranged memory bank queue | |
US20070005865A1 (en) | Enforcing global ordering using an inter-queue ordering mechanism | |
US6836823B2 (en) | Bandwidth enhancement for uncached devices | |
US5895496A (en) | System for an method of efficiently controlling memory accesses in a multiprocessor computer system | |
US7406554B1 (en) | Queue circuit and method for memory arbitration employing same | |
CN100573489C (en) | DMAC issue mechanism via streaming ID method | |
US7073004B2 (en) | Method and data processing system for microprocessor communication in a cluster-based multi-processor network | |
US20140136796A1 (en) | Arithmetic processing device and method for controlling the same | |
US20030014592A1 (en) | Elimination of vertical bus queueing within a hierarchical memory architecture | |
JP2002024007A (en) | Processor system | |
USRE38514E1 (en) | System for and method of efficiently controlling memory accesses in a multiprocessor computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: KR |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 528921 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999903307 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1999903307 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWR | Wipo information: refused in national office |
Ref document number: 1999903307 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999903307 Country of ref document: EP |