WO2008014413A2 - Cross-threaded memory device and system - Google Patents

Cross-threaded memory device and system Download PDF

Info

Publication number
WO2008014413A2
WO2008014413A2 PCT/US2007/074513 US2007074513W WO2008014413A2 WO 2008014413 A2 WO2008014413 A2 WO 2008014413A2 US 2007074513 W US2007074513 W US 2007074513W WO 2008014413 A2 WO2008014413 A2 WO 2008014413A2
Authority
WO
WIPO (PCT)
Prior art keywords
memory
interfaces
data
devices
control
Prior art date
Application number
PCT/US2007/074513
Other languages
French (fr)
Other versions
WO2008014413A3 (en
Inventor
Frederick A. Ware
Kishore Kasamsetty
Lawrence Lai
Wayne Fang
Liang Peng
Original Assignee
Rambus Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/460,582 external-priority patent/US7769942B2/en
Application filed by Rambus Inc. filed Critical Rambus Inc.
Publication of WO2008014413A2 publication Critical patent/WO2008014413A2/en
Publication of WO2008014413A3 publication Critical patent/WO2008014413A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture

Definitions

  • TECHNICAL FIELD [0002] The disclosure herein relates to data storage and retrieval systems.
  • Figure 1 illustrates an embodiment of a cross-threaded memory system
  • Figure 2 illustrates the timing of a round-robin memory access scheme that may be applied within the cross-threaded memory system of Figure 1 ;
  • Figure 3 illustrates a more specific embodiment of a cross-threaded memory system in which buffer devices and memory devices are disposed within multi-chip-package memory subsystems;
  • Figure 4 illustrates an exemplary layout of the cross-threaded memory system of Figure 3, with memory subsystems disposed in a central region of a printed circuit board between central processing units or other memory access requestors;
  • Figure 5 is an exemplary timing diagram for a memory read operation carried out within the cross-threaded memory system of Figure 3;
  • Figure 6 is an exemplary timing diagram for a memory write operation carried out within the cross-threaded memory system of Figure 3;
  • Figure 7 illustrates an embodiment of an address buffer that may be used to implement the address buffer depicted in Figure 3;
  • Figure 8 illustrates an embodiment of a data buffer that may be used to implement the data buffers depicted in Figure 3;
  • Figure 9 illustrates an exemplary timing arrangement for a memory read operation within a cross-threaded memory system that includes the address buffer shown in Figure 7 and data buffers as shown in Figure 8;
  • Figure 10 illustrates an exemplary timing arrangement for a memory write operation within a cross-threaded memory system that includes the address buffer shown in Figure 7 and data buffers as shown in Figure 8;
  • Figure 11 illustrates an exemplary arrangement of memory access queues within the central processing units of Figure 3 and their relation to memory banks within memory devices of the memory subsystems;
  • Figure 12 illustrates an embodiment of a memory device having on-die integrated circuitry to enable multiple access requestors to simultaneously access respective storage resources
  • Figure 13 illustrates an embodiment of a memory module having multiple cross-threading memory devices coupled in a paired multi-drop configuration
  • Figure 14 illustrates an embodiment of a cross-threading memory device having two additional pairs of storage banks to provide a total of four sets of storage banks that may be independently and simultaneously accessed via two sets of four data input/output ports and four control ports;
  • Figure 15 illustrates an embodiment of a cross-threading memory device having conductive interconnects to couple counterpart data input/output ports together;
  • Figure 16 illustrates physical and logical views of a cross-threading memory device according to an embodiment having two separate instances of the cross-threading architecture depicted in Figure 12;
  • Figure 17 illustrates an embodiment of a memory system 810 in which individual cross- threading memory devices communicate via chip-to-chip interfaces to enable increased cross- threading operation.
  • a memory subsystem having one or more integrated-circuit (IC) devices that enable multiple memory access requestors to concurrently access a set of shared memory devices is disclosed in various embodiments.
  • each such IC device referred to herein as a buffer IC or buffer device, may include circuitry to switchably couple any one of the memory access requestors to any one of the memory devices and to concurrently couple each of the other memory access requestors to others of the memory devices in accordance with a channel select signal.
  • all the memory access requestors may concurrently access the collective memory devices during a given switching interval, with each requestor accessing a respective one of the memory devices.
  • the channel select signal may be changed to establish a different switched connection between requestors and memory devices for the subsequent switching interval.
  • the channel select signal may be stepped through a repeating sequence of values so that each of the memory access requestors is provided with time-multiplexed access to each of the memory devices in round-robin fashion.
  • multiple graphics controllers may be operated in parallel to carry out pipelined graphics processing operations using a shared memory structure and without requiring the controllers to become idle or otherwise wait while other controllers finish accessing a shared memory device.
  • cross-threads the concurrent accesses to the various memory devices by different controllers are referred to herein as cross-threads, and the overall memory system formed by the multiple controllers, one or more buffer devices and memory devices is referred to herein as a cross- threaded memory system.
  • each of the buffer devices may include multiple control interfaces and multiple memory interfaces.
  • each of the control interfaces may be coupled to a respective memory access requestor and each of the memory interfaces may be coupled to a respective memory device.
  • each of the memory access requestors may be a graphics controller or processor and may be implemented on a dedicated integrated circuit die or on a die that may include one or more other graphics controllers, and each of the memory devices may be an integrated circuit die or group of integrated circuit dice.
  • the integrated circuit dice on which the memory devices and buffer devices are formed may be disposed within a multiple-die IC package, including, without limitation, a system-in-package (SIP), package-in-package (PIP), package-on- package (POP) arrangement.
  • SIP system-in-package
  • PIP package-in-package
  • POP package-on- package
  • circuitry for enabling multiple concurrent accesses to different storage resources within a given memory device is provided in the integrated circuit memory device itself.
  • memory devices having multiple control and data interfaces for coupling to respective memory access requestors are disclosed in various embodiments. While the individual storage resources (e.g., storage banks and/or storage sub-banks) are single- ported in that only one access is carried out in the storage resource over a given interval, respective access paths to different storage resources are provided to enable different memory access requestors to concurrently read and/or write data to respective storage resources.
  • Figure 1 illustrates an embodiment of a cross-threaded memory system 100 that may include multiple memory access requestors 101 A-101D, buffer devices 103i-103 4 and memory devices 105W-105Z.
  • the memory access requestors may be special or general purpose processors, such as microprocessors, graphics processors, graphics controllers, microcontrollers and the like, or more task-specific devices such as direct-memory-access (DMA) controllers, application-specific integrated circuits (ASICs), or any other type of memory access requestor, including combinations of different types of memory access requestors.
  • DMA direct-memory-access
  • ASICs application-specific integrated circuits
  • each of the buffer devices 103 may be implemented in a respective integrated circuit die, though two or more (or all) of the buffer devices may be combined within a single integrated circuit die. Also, as discussed in further detail below, the buffer devices 103, memory devices 105 and/or memory access requestors 101 may be combined in a multi-chip package including, without limitation, a system-in-package (SIP), package-on-package (POP), package-in-package (PIP) or the like.
  • SIP system-in-package
  • POP package-on-package
  • PIP package-in-package
  • Each of the buffer devices 103 may include multiple control interfaces 115 (designated A-D) each coupled to a respective one of the requestors 101 A-IOlD via an n- conductor signal path 102, and also multiple memory interfaces 117 each coupled to a respective one of the memory devices 105W-105Z via an m-conductor signaling path 104.
  • Each of the n- conductor signal paths 102 is used to convey control and address information, as well as data, associated with each memory transaction.
  • Each of the m-conductor signaling paths 104 is similarly used to convey data and control/address information associated with each memory transaction.
  • control-side signaling paths 102 may be each formed by one or more signaling links (which may each include a single conductor in a single-ended signaling arrangement or two conductors in a differential signaling arrangement) that are fewer in number, but operated at higher signaling rate, than the signaling links which form the memory-side signaling paths 104 (i.e., the signaling paths between the buffer ICs 103 and the memory devices 105), thus enabling narrower but faster control-side signaling paths 102 to match the bandwidth of wider, but slower memory-side signaling paths 104.
  • Each of the buffer devices 103 may additionally include a switching circuit 119 or multiplexing circuit disposed between the control interfaces 115 and memory interfaces 117 to enable flexible, switched interconnection of the control interfaces 115 and memory interfaces 117.
  • the switching circuit 119 may couple any one of the control interfaces 115 exclusively to any one of the memory interfaces 117, and concurrently (i.e., at least partly overlapping in time) couple each of the other control interfaces exclusively to another of the memory interfaces.
  • control interfaces A, B, C and D may be switchably coupled to memory interfaces W, X Y and Z, respectively, in response to a first state of the channel select signal, while in a subsequent interval, the channel select signal may be changed so that control interfaces A, B, C and D are switchably coupled to memory interfaces X, Y, Z and W, respectively.
  • memory devices 105 may be implemented using virtually any type of storage technology, in the embodiment of Figure 1 and other embodiments described below, the memory devices 105 may be dynamic random access memory (DRAM) devices (including, for example and without limitation, DRAM devices of various data rates (SDR, DDR, etc.), graphics memory devices (e.g., GDDR), XDR memory devices, micro-threading memory devices, for example as described in U.S. Patent Application Publication No. US2006/0117155 Al, and so forth) having multiple storage banks (referred to herein simply as "banks") and that exhibit a minimum time delay (tRR) between successive accesses to rows within different banks and a minimum time delay (tRC) between successive accesses to different rows within the same bank.
  • DRAM dynamic random access memory
  • tRR minimum time delay
  • tRC minimum time delay
  • a minimum time delay may also be imposed between successive accesses to different columns of data within an activated row, where an activated row is one whose contents have been retrieved from an address selected row of DRAM storage cells and latched within a bank of sense amplifiers.
  • each of the four memory devices 105W-105Z may include a memory core formed by four address-selectable memory banks 107P-107S (the banks being designated P, Q, R and S) and control logic 110 to store data within and retrieve data from the memory core in response to memory access commands.
  • control logic 110 may include multiple data I/O ports coupled to respective memory-side data paths 104 and thus may receive slices of data (via each data I/O port) that collectively form a write data word to be stored in a memory write transaction or to output slices of data that collectively form a read data word in a memory read transaction.
  • One or more separate control ports may be provided within each memory device 105 for receipt of control information (e.g., commands or requests indicating the requested operation and, at least in the case of a memory read or write, one or more address values that specify the bank, row and/or column location to which the operation is directed), or the control information may be time-multiplexed onto one or more of the data paths 104 and received via the data I/O ports.
  • the control logic 110 may activate an address-specified row of storage cells within an address- specified bank (i.e., in an activate or activation operation), if the row has not already been activated, then may retrieve read data through one or more read accesses directed to address- specified column locations within the activated row of an address-specified bank.
  • the read data may be output to the buffer devices 103i-103 4 in respective slices (i.e., portions of the entire read data word) via data paths 104, and the buffer devices 103i-103 4 , in turn, may forward the read data to a selected one of memory access requestors 101 A-101D via switching circuits 119 and controller interfaces 115.
  • control logic 110 may also activate an address-specified row of storage cells within an address-specified bank, if not already activated, then may perform one or more write accesses directed to address-specified column locations within the activated row for an address-specified bank to store a write data word received via data paths 104.
  • Figure 2 illustrates the timing of a round-robin memory access scheme that may be applied within the cross-threaded memory system 100 of Figure 1.
  • a two-bit channel select signal (“Channel Select") may be provided to each of the buffer devices 103 and may be repeatedly stepped through states OO', '01 ', '10' and ' 11 ' in respective tRC intervals, 126i-126 4 .
  • each of the buffer devices 103 may couple control interface 115-A (i.e., interface A within control interfaces 115) to memory interface 117-W (i.e., interface W within memory interfaces 117) during interval 126] so that each of the four data I/O ports within memory 105W may be switchably coupled to requestor 101A via a respective one of the buffer devices 103i-103 4 .
  • memory device 105W may be accessed (i.e., through each of its four data I/O ports in parallel) by memory access requestor 101 A during each of four tRR intervals that make up tRC interval 126i as indicated by the designation 'A', 'A', 'A', 'A' in the 'Memory W access sequence of Figure 2.
  • memory access requestor 101B may be switchably coupled to memory device 105X via control interfaces 115-B and memory interfaces 117-X within the four buffer devices 103i-103 4 ; memory access requestor IOIC may be switchably coupled to memory device 105 Y via control interfaces 115-C; and memory interfaces 117- Y, and memory access requestor 101 D may be switchably coupled to memory device 105Z via control interfaces 115-D and memory interfaces 117-Z.
  • the channel select signal may be changed (i.e., stepped or sequenced) to state '01 ' to switchably couple memory access requestors 101 A, B, C and D to memory devices 105Z, W, X and Y, respectively.
  • the channel select signal may be changed to state '10' to switchably couple memory access requestors 101 A, B, C and D to memory devices 105 Y, Z, W and X, respectively, and in a final switching interval (tRC interval 126 4 ) before the channel select signal rolls over to repeat the channel selection sequence, the channel select signal may be changed to state '11 ' to couple memory access requestors 101 A, B, C and D to memory devices 105X, Y, Z and W, respectively.
  • each of the four memory access requestors 101 A-101D may access the four memory devices 105W-105Z during a respective tRC interval and, thus, the total time to sequence through each possible interconnection pattern is 4*tRC (where '*' denotes multiplication), a time interval referred to herein as a switch-pattern cycle time.
  • a bank-select value (or bank address) may be sequenced through each of four possible bank selection values during each switching interval 126 (i.e., each tRC interval) to enable each memory access requestor 101 to access each memory bank 107 of the selected memory device 105 in a respective tRR interval 127.
  • memory access requestor 101 A may be enabled to access memory banks 107P, 107Q, 107R and 107S, respectively, within memory device 105 W, and memory access requestors 101 B, IOIC and 10 ID are likewise (and concurrently) enabled to access memory banks 107P, 107Q, 107R and 107S within memory devices 105X, 105 Y and 105Z, respectively.
  • Other bank selection sequences may be applied in alternative embodiments, particularly where more or fewer banks 107 are provided within each memory device 105.
  • each of the multi-bank memory devices 105 has been described as being implemented by a single IC, multiple memory ICs may be accessed as a unit, referred to herein as a memory rank, with each memory device within the memory rank contributing a respective subset of the data I/O ports that form the total collection of data I/O ports shown for a given memory device 105.
  • FIG. 3 illustrates a more specific embodiment of a cross-threaded memory system 200 in which buffer devices (205I-205L and 206) and memory devices 207W-207Z may be disposed within multi-chip-package memory subsystems 203 r 203 4 .
  • each multi-chip package memory subsystem 203 is depicted and described as a system-in-package (SIP) arrangement (i.e., multiple die within a single integrated circuit package).
  • SIP system-in-package
  • the multi-chip package memory subsystems 203 may alternatively be, for example and without limitation, a system-on- chip (SOC), package-in-package (PIP ⁇ an arrangement in which two or more IC packages are included within a larger IC package), package-on-package (POP ⁇ an arrangement in which one or more IC packages are mounted or otherwise disposed on another IC package).
  • SOC system-on- chip
  • PIP package-in-package
  • POP package-on-package
  • the memory access requestors are depicted and described as central processing units (CPUs) 201 A-201D, though virtually any device or system of devices capable of initiating memory access requests, either in response to programmed control or requests or commands from another device, may alternatively be used to implement one or more of the CPUs 201. Further, for purposes of example only, a specific number of CPUs 201, memory subsystems 203 and memory devices/buffer devices (207, 205, 206) per memory subsystem 203 are shown. More or fewer CPUs, memory subsystems, memory devices and/or buffer devices may be provided in alternative embodiments.
  • CPUs central processing units
  • each memory subsystem 203 may include a set of four multi-bank memory devices 207 (four-bank memory devices in this example), a set of data buffer devices 205 I -205 L (data buffers) and an address buffer device 206 (address buffer).
  • Each memory device 207 may include a control logic circuit 211 having a data interface 212 and a command/address (CA) interface 214, with the data interface 212 including four data input/output (I/O) ports (DQ0-DQ3) coupled to data buffers 205I-205L, respectively, via data paths 216, and the CA interface 214 coupled to the address buffer 206 via CA path 218.
  • the memory devices 207 may be synchronous double-data rate (DDR) DRAM devices that respond to commands and addresses received at CA interface 214, by outputting read and receiving write data via data interface 212.
  • DDR synchronous double-data rate
  • timing information e.g., clocking information to time receipt of incoming command/address values and to provide a timing reference within the synchronous DRAM device, and strobe signals to time inbound and outbound data transfer
  • control information e.g., clock enable, chip select
  • each of the CPUs 201A-201D may include multiple memory access queues 221 (memory queues, for short) numbered 1-4, with each of the memory queues 221 coupled to a respective one of the memory subsystems 203 1 -203 4 via a set of control-side data paths 222 and a control-side command/address (CA) path 224.
  • each of the data paths 222 and CA paths 224 may be implemented by a single-bit differential, point-to-point signaling link that may be operated at a signaling rate that is an integer multiple of the signaling rate applied across the memory-side data paths and address path.
  • each of the five control-side signaling links coupled to a given memory queue 221 may operate at 2 Gigabits per second (2 Gb/s), while the memory- side data paths 216 are operated at 0.2 Gb/s and the memory-side CA path 218 may operate at 0.1 Gb/s.
  • 2 Gb/s 2 Gigabits per second
  • the memory- side data paths 216 are operated at 0.2 Gb/s
  • the memory-side CA path 218 may operate at 0.1 Gb/s.
  • each of the five buffer devices (205I-205L and 206) within a memory subsystem 203 may include multiple control interfaces 234 coupled respectively to CPUs 201 A-201D, multiple memory interfaces 236 coupled respectively to the constituent memory devices 207 of the memory subsystem, and switching circuitry 235 to enable concurrent and exclusive coupling between the control interfaces 234 and memory interfaces 236 as necessary to provide switched access to each of the memory devices by each of the CPUs.
  • there may be four memory interfaces 236 (designated W-Z, and thus referred to herein as 236-W, 236-X, 236-Y and 236-Z) coupled respectively to the four memory devices 207W-207Z, and four control interfaces 234 (designated A-D and referred to herein as 234-A, 234-B, 234-C and 234-D) coupled respectively to the four CPUs 201A-201D.
  • the number of memory interfaces 236 and/or control interfaces 234 may change with the number of memory devices and/or CPUs (or other memory access requestors).
  • Figure 4 illustrates an exemplary layout of the cross-threaded memory system 200 of Figure 3, with memory subsystems 203i-203 4 disposed in a central region of a printed circuit board 250 between CPUs 201 A-201D.
  • the memory subsystems may be SIPs (SIP1-SIP4) each having a substrate 255 with memory devices 207 W- 207Z mounted thereto.
  • the data buffers, 205I-205L may be mounted on the memory devices 207W-207Z, respectively, and the address buffer 206 may be disposed centrally on the substrate 255 between the memory devices 207.
  • Each of the CPUs 201A-201C is coupled to each of the SIP memory subsystems 203i-203 4 by a respective set of five point-to-point links 202 operated, for example, at 2Gb/s.
  • the memory subsystems 203 are depicted as mounted on their sides but may alternatively be disposed face-down or face-up on the printed circuit board 250.
  • the printed circuit board 250 itself may be a daughterboard having an interconnection structure (e.g., edge connector) for insertion within a socket of a larger circuit board or backplane, or may be a main board within a data processing system such as a gaming console, workstation, etc.
  • CPUs 201 and/or memory subsystems 203 may be provided in alternative embodiments, and the memory subsystems 203 may have more or fewer constituent buffer devices (205, 206) and/or memory devices 207 and may be implemented by structures other than system-in-package .
  • FIG. 5 is an exemplary timing diagram for a memory read operation carried out within the cross-threaded memory system 200 of Figure 3, and showing in particular the control information and data conveyed between memory queue 221-1 ("Queue 1") of CPU A and memory device 207 W ("Memory W") of memory subsystem 201 1 .
  • the tRC interval may be 80 nanoseconds (80ns), and the tRR interval may be 20ns.
  • This timing arrangement may permit a total of 40 bits of information (2 bits/ns) to be transferred via each of the five single-bit 2Gb/s links (i.e., 5xl-bit) between Queue 1 and SIPl in each tRR interval.
  • an activation command may be conveyed via the control-side command/address link (designated "Queue 1 : CA" in Figure 5) in the 20-bit (i.e., 10ns) interval that constitutes the first half of tRR interval 27I 1 .
  • the address buffer 206 may include circuitry to deserialize (i.e., convert to parallel form) the incoming serial command/address bit stream to form an activation control word 274 ("ACT") that includes the address of a row to be activated (i.e., a 13-bit row address value, "13xA,” in this example), and a corresponding row-activation command encoded into signals WE, CAS and RAS.
  • ACT activation control word 274
  • the row-activation control word 274 may be output onto the memory-side command/address path (designated "Memory W: CA” in Figure 5) at 0.1 Gb/s (e.g., at single-data rate with respect to a 100 MHz clock signal) and thus at a command path (tcABi ⁇ ) bit time of 10ns that spans the second half of tRR interval 27I 1 .
  • control-side CA path Because only sixteen bits of information are conveyed via the memory-side CA path per 10ns interval, versus 20-bits via the control-side CA path, additional bandwidth may be available on the control-side CA path (8 bits per tRR interval or 4 bits per command/address transfer) and may be used to convey error information and/or to support error handling protocols as discussed below.
  • a column read command may be conveyed to the address buffer via the control-side CA path.
  • the address buffer may convert the serial bit stream in which the column read command is conveyed into a sixteen bit column-read control word 276 that includes three-bit column-read code (signaled by the encoding of WE, CAS and RAS signals) and a 13-bit column address.
  • the control word 276 is output to memory device 207W (i.e., as shown in Figure 4) during the first half of tRR interval 27I 2 .
  • the correspondence between the command/address information conveyed via the control-side CA path and the memory-side CA path is shown in Figure 5 by the lightly shaded activation command and darker shaded column read command.
  • Memory device 207W may respond to the activation control word 274 by activating the address-specified row of memory cells within a selected memory bank, thus making the contents of the row available for read and write access in subsequent column operations.
  • the bank address may be stepped through a predetermined sequence of values in successive tRR intervals 271 and thus may be generated within the address buffer (e.g., by a modulo counter), within one or more of the CPUs 201 shown in Figure 3, or within another integrated circuit device, not shown in Figure 3.
  • memory device 207W may perform a column read operation at the column address specified in association with the column-read control word 276 (and in the bank specified by the sequenced bank address) to retrieve a data word 280 that is output to the data buffers 205 starting a predetermined time, tCAC, after the column-read control word 276 has been received. More specifically, the data word 280 may be output during four successive 5ns data-bit intervals (i.e., to QB i ⁇ , 0.2Gb/s) within tRR interval 27I 3 , and in respective slices via the four byte-wide data lanes, DQ0-DQ3, that constitute the 32-bit data path coupled to memory device 207W.
  • the data word 280 may be output during four successive 5ns data-bit intervals (i.e., to QB i ⁇ , 0.2Gb/s) within tRR interval 27I 3 , and in respective slices via the four byte-wide data lanes, DQ0-DQ3, that constitute the 32-bit data path coupled
  • the signals output via each of the data lanes may include eight data bits ("8xQ”) and may be accompanied by a differential data strobe signal (“2xDQS”) that is used to time sampling of the read data within the data buffers 205.
  • 8xQ eight data bits
  • 2xDQS differential data strobe signal
  • a total of 128 bits of read data are output from memory device W in response to the column read command, with four bytes being output via respective memory-side byte lanes in each of four consecutive 5 ns data-bit intervals.
  • the read data may be output in more serial form (282) from the data buffers 205 to CPU 201A where it is buffered in memory queue 221-1.
  • each of the data buffers 205I-205L may output a set of eight data bits, 8xQ, along with error bits EW and ER in each 5ns interval of tRR interval 27I 4 and via a respective one of control-side data links DQ0-DQ3.
  • each data buffer may output 32 bits of data and eight error bits over tRR interval 27I 4 , with the data buffers collectively returning 128 bits of data and 32 bits of error information to CPU 20 IA in response to the activation and column read commands issued in tRR interval 271 ⁇ .
  • the error read bit, ER, included with each read data byte may be generated by an error-bit generator (e.g., a parity bit generator) within one of the data buffers 205 based on the corresponding read data byte.
  • the error write bit, EW may be generated based on one or more write data bytes received within the data buffer in prior write transactions.
  • Figure 6 is an exemplary timing diagram for a memory write operation carried out within the cross-threaded memory system 200 of Figure 3 and, like Figure 5, shows in particular the control information and data conveyed between memory queue 221-1 (Queue 1) of CPU 201 A and memory device 207 W of memory subsystem 203 1 .
  • the tRC interval may be 80ns, and the tRR interval 20ns, thus permitting a total of 40 bits of information (2 bits/ns) to be transferred via each of the 2Gb/s links between Queue 1 and memory subsystem 203 1 in each tRR interval.
  • an activation command may be conveyed via the control-side CA link in the 20-bit interval that constitutes the first half of tRR interval 31 I 1 , and may be deserialized by the address buffer to generate a row-activation control word 320 ("ACT") that may include the address of the row to be activated (13xA), and a corresponding row-activation command in a 3 -bit command code (encoded within WE, CAS and RAS signals).
  • ACT row-activation control word 320
  • the row-activation control word 320 may be output onto the memory-side CA path at 0.1 Gb/s ("ACT") at a command path bit time (tc AB i ⁇ ) of 10ns and thus spans the second half of tRR interval 31 I 1 . Because only sixteen bits of information are conveyed via the memory-side CA path per 10ns interval versus 20-bits via the control-side CA path, additional bandwidth may be available on the control-side CA path and may be used convey error information and/or to support error handling protocols.
  • a column write command is conveyed to the address buffer 206 via the control-side CA path.
  • the address buffer 206 converts the serialized write command into a 16-bit column write control word 322 ("WR") that includes three-bit column- write code (encoded within the WE, CAS and RAS signals) and a 13 -bit column address.
  • WR column write control word 322
  • the correspondence between the command/address information conveyed via the control-side CA path and the memory-side CA path is shown by light grey shading for row-activation control word 320 and dark grey shading for column write control word WR 322.
  • Memory device 207 W may respond to the row-activation control word ACT 320 by activating the address-specified row of memory cells within a selected memory bank, thus making the contents of the row available for read and write access in subsequent column operations.
  • the bank address may be stepped through a predetermined sequence of values in successive tRR intervals and thus may be generated within address buffer 206 (e.g., by a modulo counter), within one or more of the CPUs 201, or within another integrated circuit device.
  • write data may be transferred from the data buffers 205I-205L to memory device 207W for storage therein at the column address specified within the column write control word (and in the bank specified by the sequenced bank address), thus effecting a column write operation.
  • the write data may be output from the data buffers 205 to memory device 207W during four successive 5ns data-bit intervals (i.e., toQ B i ⁇ , 0.2Gb/s) within tRR interval 311 3 , and in respective slices via the four byte-wide data lanes, DQ0-DQ3, that constitute the 32-bit data path coupled to memory device 207W.
  • the signals transmitted to the memory device 207W may be counterparts to those transmitted by the memory device 207W during a memory read, and thus include eight data bits (8xQ) accompanied by a differential data strobe signal (2xDQS).
  • a total of 128 bits of write data may be transmitted to memory device W in conjunction with the column write command, with four bytes being output via respective byte lanes DQ0-DQ3 in each of four consecutive 5 ns data-bit intervals.
  • the memory device 207W may store the write data at the address-specified column location of the address-specified bank to conclude the memory write operation.
  • Figure 7 illustrates an embodiment of an address buffer 350 that may be used to implement the address buffer 206 of Figure 3.
  • the address buffer 350 may include four conversion circuits 351 ! -35I 4 , each having a high-speed serial control interface 352 to receive serialized command/address signals (ADR A -ADR D ) from a respective memory access requestor (e.g., a respective one of CPUs 201A-201D in Figure 3), and a memory interface 375 to output command/address information in parallel to a respective memory device (e.g., a respective one of memory devices 207W-207Z in Figure 3).
  • a respective memory device e.g., a respective one of memory devices 207W-207Z in Figure 3
  • each control interface 352 may be a single- link differential interface having a differential receiver 353 to sample an incoming signal at 2Gb/s.
  • Single-ended signaling interfaces may be provided in alternative embodiments.
  • a relatively low- frequency clock signal referred to herein as a framing signal 370 (“Frame") may be supplied to the address buffer 350 (and to each of the corresponding data buffers as described below) to provide a frequency reference and to frame transmission of related groups of signals.
  • the framing signal 370 may be a 100 MHz clock having a rising edge at the start of each half tRR interval, and thus frames 20-bit transmissions on the 2 Gb/s control-side data and command/address paths, two-bit transmissions on the 0.2 Gb/s memory-side data paths, and single-bit transmissions on the 0.1 Gb/s memory- side command/address paths.
  • the address buffer 350 (and corresponding data buffers) may include clocking circuitry (e.g., phase-locked-loop or delay-locked-loop circuitry and corresponding phase-adjust circuitry) to generate 2Gb/s control-side timing signals having desired phase offsets relative to the framing signal 370 or another reference.
  • the address buffer 350 may similarly include clock synthesis circuitry to generate timing signals (e.g., clock signal, CK, and write data strobe DQS) that are output to the memory devices to time reception of command/address and write data signals, and to enable the memory devices to generate read data timing signals (e.g., read data strobe, DQS).
  • timing signals e.g., clock signal, CK, and write data strobe DQS
  • read data timing signals e.g., read data strobe, DQS
  • the incoming 2Gb/s command/address signal, ADR A is sampled and deserialized (i.e., converted to parallel form) by receiver 353 to generate a 10-bit parallel command/address value 354 (PA A ) every 5 ns (i.e., at 0.2Gb/s).
  • PA A parallel command/address value
  • each command/address value 354 includes eight bits of command/address information and an error-check bit (e.g., a parity bit), and is supplied to an error detection circuit 355 and also to an input port of a four-port multiplexer 357i (or other selector circuit).
  • the error detection circuit 355 generates an error-check bit based on the corresponding command/address byte and compares the generated error-check bit with the received error-check bit to generate an error indication 380 (ERA A ) having a high or low state (signaling error or no error) according to whether the error-check bits match.
  • Counterpart address conversion circuits 351 2 -352 4 simultaneously generate error indications, ERA B , ERA C and ERA D , SO that four error indications 380 are generated during each 5ns command/address reception interval.
  • Channel multiplexer 357i outputs either command/address value PA A (354) or one of the three command/address values PA B -PA 0 from counterpart conversion circuits 351, as a selected command/address value 360, depending on the state of a channel select signal 356.
  • Each of the channel multiplexers 357 2 -357 4 within the counterpart conversion circuits 351 2 -351 4 are coupled to receive the PA A -PA D values at respective input ports in an interconnection order that yields the following selection of command/address values (360) for the four possible values of a two-bit channel select signal 356:
  • the selected command/address value 360 is supplied to a delay circuit 359 which introduces a selectable delay in accordance with a delay select value 358.
  • the delay circuit 359 is implemented by shift register in which the selected command/address value 360 is shifted forward from tail to head in response to a shift-enable signal (e.g., in response to the 2Gb/s sampling clock signal or a phase-shifted and/or frequency-divided version thereof), with the total number of storage stages from tail-to-head being selected to achieve a desired delay between receipt of an incoming serialized command/address value at control interface 352, and output of a final command code and address value at memory interface 375.
  • a shift-enable signal e.g., in response to the 2Gb/s sampling clock signal or a phase-shifted and/or frequency-divided version thereof
  • the resulting delayed command/address value 362 is supplied to a 2:1 deserializing circuit 361 which converts each successive pair of delayed, 10-bit command/address values 362 (each value 362 received at 0.2Gb/s) to a final 20-bit command/address value 364, with the resulting sequence of final command/address values 364 being output at 0.1Gb/s. As shown, within each 20-bit command/address value, four bits are unused, and the remaining 16 bits are output via memory interface 375.
  • command transmitter 365 outputs a 3-bit command encoded into signals WEw, RASw and CASw (the 'W subscript denoting that the command is directed to Memory W), and address transmitter 367 outputs a corresponding 13-bit address value, Aw[12:0].
  • Counterpart conversion circuits 351 2 -351 4 concurrently output 3 -bit command codes and 13 -bit address values directed to memory devices X, Y and Z.
  • a set of configuration signals 374 may be provided to the address buffer 350 to control various functions (e.g., establishing termination impedance, signaling calibration, etc.) and operating modes therein.
  • the address buffer 350 includes circuitry to support operation as either an address buffer as described above and in reference to address buffer 206 of Figure 3, or a data buffer as described below and in reference to data buffer 205 of Figure 3.
  • a given buffer device may be programmed to operate as either an address buffer or a data buffer, thus avoiding the need to fabricate separate integrated circuit devices.
  • Other configurable aspects of the device may include error detection policies, delay ranges, signal fan-out, signals driven on otherwise unused portions of the 20-bit output bandwidth, and so forth.
  • the configuration signals may also be used to select timing calibration modes during which phase offsets between reference and internal clock signals (or strobe signals or other timing signals) are established.
  • FIG 8 illustrates an embodiment of a data buffer 400 that may be used to implement data buffers 205I-205L of Figure 3.
  • Data buffer 400 includes four conversion circuits 401 1 -401 4 , each having a high-speed serial interface 402 to support serialized read and write data transfer to/from a respective memory access requestor (e.g., a respective one of four CPUs 201A- 20 ID in Figure 3), and a lower-speed parallel-I/O memory interface 432 to support parallel read and write data transfer to/from a respective one of memory devices W-Z (e.g., memory devices 207W-207Z in Figure 3).
  • a respective memory access requestor e.g., a respective one of four CPUs 201A- 20 ID in Figure 3
  • a lower-speed parallel-I/O memory interface 432 to support parallel read and write data transfer to/from a respective one of memory devices W-Z (e.g., memory devices 207W-207Z in Figure 3).
  • each high-speed serial interface 402 may include a single-link, differential signal receiver to sample an incoming serial data signal at 2Gb/s.
  • the framing signal 370 provides a frequency reference and frames transmission of related groups of signals as described in reference to Figure 7.
  • the framing signal 370 may be a 100 MHz clock signal having a rising edge at the start of each half tRR interval, and thus frames 20-bit transmissions over the control-side signal link coupled to interface 402, and two-bit transmissions on each memory-side data line coupled to interface 432.
  • the data buffer 400 may include clocking circuitry (e.g., locked-loop circuitry and corresponding timing adjustment circuitry) to generate 2Gb/s control-side timing signals having desired phase offsets relative to the framing signal 370, as well as clock synthesis circuitry to generate timing signals (e.g., strobe signals and clock signals having a desired phase relationship to the framing signal 370) that are output to the memory devices W-Z to time reception of address and write data (e.g., clock signal, CK, and write data strobe DQS) therein, and to enable the memory devices to generate read data timing signals (e.g., read data strobe, DQS).
  • clocking circuitry e.g., locked-loop circuitry and corresponding timing adjustment circuitry
  • clock synthesis circuitry to generate timing signals (e.g., strobe signals and clock signals having a desired phase relationship to the framing signal 370) that are output to the memory devices W-Z to time reception of address and write data (e.g
  • write data delivered in the incoming 2Gb/s data signal, D A may be sampled and deserialized by receiver 403 to generate a 10-bit parallel data value 404 every 5 ns (i.e., at 0.2Gb/s), PD A -
  • each data value 404 may include a write data byte (i.e., 8 bits of write data), a data mask bit that indicates whether the write data value is to be written within the selected memory device, and an error-check bit generated by the memory access requestor based on the write data byte and mask bit.
  • Data value 404 may be supplied to an error detection circuit 405 and also to an input port of channel multiplexer 407 1 (or other selector circuit).
  • the error detection circuit 405 re-generates an error-check bit based on the write data byte and data mask bit, and compares the re-generated error-check bit with the received error-check bit to generate a write-data error indication 412 (ERW A ) having a high or low state (signaling error or no error) according to whether the error-check bits match.
  • the write-data error indication 412 may be supplied to an error generator circuit 433 along with the address-error indicator 380, ERA A , generated by counterpart address conversion circuit 35I 1 of Figure 7.
  • the other conversion circuits 401 2 -401 4 may generate write-data error indications 412, ERW B , ERW C and ERW 0 simultaneously with conversion circuit 401 1 (i.e., so that four error indications are generated within the data buffer 400 during each 5ns interval), and may include counterpart error generator circuits 433 to process corresponding write-data error indications 412 (i.e., ERW B -ERWD) as well as the address-error indications 380 (i.e., ERA B -ERA D ) from a respective one of address/conversion circuits 351 2 -351 4 .
  • counterpart error generator circuits 433 to process corresponding write-data error indications 412 (i.e., ERW B -ERWD) as well as the address-error indications 380 (i.e., ERA B -ERA D ) from a respective one of address/conversion circuits 351 2 -351 4 .
  • error generator circuit 433 generates a read-data error indication (ERR A ) based on read data received from the memory-side data interface and packs the read error information, write-data error indication and address-error indication into a parallel read-data value 420 (PQ A ) to be returned to the memory access requestor as part of a data read operation.
  • ERP A read-data error indication
  • PQ A parallel read-data value 420
  • the channel multiplexer 407 outputs either write data value PD A (404) or one of the three write data values PD B -PD D from counterpart data conversion circuits 401 2 -401 4 , as a selected write data value 408, depending on the state of channel select signal 356.
  • Each of the channel multiplexers 407 2 -407 4 within the counterpart conversion circuits 401 2 -401 4 may be coupled to receive the PD A -PD D values (404) at respective input ports in an interconnection order that yields the following selection of write data values (408) for the four possible values of a two-bit channel select signal 356:
  • the selected write data value 408 may be supplied to a delay circuit 409 which introduces a selectable delay in accordance with a delay select value 434 (which may be the same as or different from delay select value 358 of Figure 7).
  • a delay select value 434 which may be the same as or different from delay select value 358 of Figure 7.
  • the resulting delayed write data value 410 may be output at 0.2Gb/s via memory interface 432. More specifically, the write-data byte (DQw) is output by data transmitter 411 and write data mask bit (DMw) is output by mask transmitter 413, with one of the ten bits of the write data value 410 being unused.
  • a strobe generator 417 is provided to generate a data strobe signal (DQS) that is output in a desired phase relationship with the write data and mask bit (note that the data strobe signal may be differential or single-ended, depending upon the application).
  • DQS data strobe signal
  • the data strobe signal may be aligned with mid-points of data eyes to establish a desired, quadrature sampling point, and may transition in synchronism with each successive write-data/mask output, thereby cycling at a maximum frequency of 100 MHz (toggling at 200 MHz).
  • conversion circuit 401 ⁇ may include a clock transmitter 419 and clock-enable transmitter 421 to output, respectively, a differential clock signal (CKw) and corresponding clock-enable signal (CKEw), thereby providing a master clock signal to the memory device that may be used to synchronize internal operations and time reception of selected signals therein (e.g., command and address signals).
  • the frame signal 370 may be output as the clock signal (e.g., at 100 MHz), though a phase-adjust circuit may be provided to establish a desired phasing between the clock signal, CK, and write data signals.
  • Circuitry may also be provided to deassert the clock-enable signal, CKE, if no transactions are directed to the corresponding memory device, thus disabling clocking of the memory device and saving power.
  • a bank address transmitter 423 may be provided to transmit bank address signals, BAw, to the memory device based on the incoming bank address signal BA[1 :0] 372.
  • the bank address 372 may be sequenced through a predetermined pattern by a memory access requestor (e.g. one of the CPUs 201 of Figure 3) or other device to enable round-robin or other sequential access to each of the storage banks within the corresponding memory device.
  • the same set of clock, clock-enable and bank address signals may be provided to each of the memory devices within a given memory subsystem, and therefore that the signal transmitters 419, 421 and 423 within conversion circuit 401 1 may be used to supply the clock, clock-enable and bank- address signals to each memory device.
  • the clock, clock-enable and bank-address transmitters within the other conversion circuits 401 2 -40l 4 and within other data buffers 400 may be left unconnected or may be omitted altogether.
  • each conversion circuit 401 may include transmitters 419, 421 and 423 to drive the clock, clock- enable and bank address signals to a respective one of the memory devices (W-Z) within a memory subsystem, in which case the corresponding signal transmitters may still be left unconnected (or omitted altogether) and the signal transmitters within the other three data buffers 400 used to drive clock, clock-enable and bank address signals to the remaining three memory devices.
  • a subset of the conversion circuits 401 within a given data buffer 400 may drive clock, clock-enable and bank-address signals to respective subsets of the memory devices (e.g., two of the conversion circuits 401 may each drive clock, clock-enable and bank address signals to a respective pair of memory devices).
  • read data is received within conversion circuits 4011-401 4 via respective byte- wide data paths (i.e., DQw, as shown, and DQx-DQz, not specifically labeled) and sampled in receiver circuits 431 (i.e., one byte-wide receiver 431 per conversion circuit 401) in response to a data strobe signal (DQS) output from the memory device via the differential DQS signal link.
  • DQw data strobe signal
  • DQS data strobe signal
  • the resulting read data byte 440 is forwarded to error generator circuit 433, which generates an error-check bit (e.g., a parity bit based on the read data byte 440) to be returned to the memory access requestor along with information that indicates, based on error indications 380 and 412, whether an error has occurred within a previously received write data byte or command/address value.
  • An error-identifier encoding scheme may be used to indicate the specific write data byte and/or command/address value (i.e., within a sequence of prior write data bytes or command/address values) in which the error was detected. Embodiments of such error-identifier encoding scheme are described, for example and without limitation, in U.S. Patent Application No. 11/330,524, filed January 11, 2006 and entitled Unidirectional Error Code Transfer for a Bidirectional Link.” U.S. application No. 11/330,524 is hereby incorporated by reference.
  • the error generator 433 outputs a 10-bit read-data value 420 (PQ A ), which may be supplied to an input port of channel multiplexer 435.
  • the read-data value 420 may include the read data byte received from the corresponding memory device, the error-check bit generated based on the read data byte, and an error-indication bit that forms part of a sequence of error-indication bits within the above-mentioned error-identification scheme (i.e., identifying write-data errors and/or command/address errors).
  • Read values PQ B -PQ D from the other conversion circuits 401 2 - 401 4 may be received at the remaining input ports of the channel multiplexer 435 to enable read data to be returned from any of memory devices W-Z to the memory access requestor coupled to data conversion circuit 40I 1 .
  • Each of the channel multiplexers 435 within the counterpart conversion circuits 401 2 -401 4 may be coupled to receive the PQ A -PQ D values (420) at respective input ports in an interconnection order that yields the following selection of read data values (448) for the four possible values of a two-bit channel select signal 356 (note that a separate channel select signal may be provided to control the read data path):
  • Channel multiplexer 435 outputs the selected read-data value 448 to delay circuit 437 in accordance with the channel select signal 356, and the delay circuit 437 delays the selected read-data value 448 by some time interval as generally described in reference to Figure 7 (e.g., the time interval indicated by the delay select value 434 or a different delay select value).
  • a sequence of delayed-read data values 450 are output from the delay circuit 437 at 0.2Gb/s and provided to a serializing output driver 439 which outputs the read data and error information included therewith via high-speed serial interface 402 at 2Gb/s.
  • Figure 9 illustrates an exemplary timing arrangement for a memory read operation within a cross-threaded memory system that includes the address buffer 350 shown in Figure 7 and data buffers 400 as shown in Figure 8.
  • a pair of 20-bit serial command/address values 501 and 502 are output via the serial, high-speed command/address link between a first control queue of CPU A and a corresponding conversion circuit 351 within address buffer 350 (designated "CPUA: 1 -ADR").
  • Serial command/address buffer 350 converts each of the serial command/address values 501, 502 into a respective parallel 13-bit address value and corresponding 3-bit command value and outputs the parallel address and command values via memory-side address lines A[12:0] and command lines (WE, CAS, RAS), respectively. More specifically, the serial command/address value 501, is output, in parallel form, as an activation command (ACT) and corresponding row address (ROW) as shown at 505, and serial command/address value 502 is output as a column-read command (READ) and corresponding column address (COL) as shown at 506.
  • ACT activation command
  • ROW row address
  • READ column-read command
  • COL column address
  • a clock signal "CK ⁇ " (e.g., the frame signal or a clock signal derived from the frame signal), is output from at least one of the data buffers 400 along with a clock-enable signal (CKE), and rotating bank address (BA).
  • the bank address may be sequenced (e.g., rotated) between bank selection values, P, Q, R, S, in successive tRR intervals.
  • the clock signal is transmitted in rising-edge alignment with the activation and column-read commands so that the falling edge of the clock signal (or phase adjusted version thereof) may be used to trigger sampling of the command and address signals at the memory device.
  • the phase relationship of CK and the command and address signals may be shifted from that shown.
  • the time delay (tRCD) between receipt of the activation command 505 and the column-read command 506 is one clock cycle
  • the time delay (tCAC) between receipt of the column-read command 506 and the output of read data on the memory-side data path is also one clock cycle.
  • Different timing delays may apply in different embodiments.
  • read data is output via the 32-bit data interface of the selected memory device, with each of four data bytes being output to a respective data buffer 400 via a byte-wide data lane (DQ0[7:0]-DQ3[7:0]).
  • four slices of read data are routed back to the memory access requestor via four data buffers 400, respectively (e.g., via data buffers D I -D L as described in reference to Figure 3).
  • the bit time on each data line (tDQBi ⁇ ) is 5ns in this example, thus effecting a double data rate transfer as a different set of data bits are transmitted during each half-cycle of the clock signal, CK.
  • Other data rates may be applied in alternative embodiments or different operating modes.
  • a data strobe signal DQS may be output along with each byte and may be edge-aligned with the read data as shown (with the data receiver within the data buffer having timing delay circuitry to establish a quadrature sampling offset relative to the edge-aligned strobe) or may be quadrature aligned with the read data.
  • the data mask signal line which may be viewed as completing the data lane for each of lanes DQO- DQ3, may remain unused during memory read operations.
  • the data buffers may output the read data to the appropriate control queue within the memory access requestor along with the above-described error information. More specifically, each of the data buffers 400 (e.g., buffers D I -D L as shown in Figure 3), may output two 20-bit serial read data bursts 526 in succession via a respective one of the control-side data links (designated CPU(A:0)-Di through CPU(A: 1)-D L in Figure 9) to effect a 40-bit transmission per data buffer and 160 bits in the aggregate.
  • the control-side data links designated CPU(A:0)-Di through CPU(A: 1)-D L in Figure 9
  • each 20-bit serial read data burst 526 includes the two bytes 522 output from the memory device during the corresponding portion of the prior tRR interval, as well as an error-check bit (ER) per read data byte, and an error bit (EW) that may be used as part of an error signaling protocol to identify errors detected in preceding write-data or command/address transfers.
  • the 160 bits transferred via the highspeed serial links include the 128 bits of read data output from the memory device, and 32 bits of error information.
  • Figure 10 illustrates an exemplary timing arrangement for a memory write operation within a cross-threaded memory system that includes the address buffer 350 shown in Figure 7 and data buffers 400 as shown in Figure 8.
  • the memory write operation may be initiated by a pair of 20-bit serial command/address values 551 and 552 transmitted via the high-speed serial command/address link CPUA: 1 -ADR.
  • Address buffer 350 may convert each of the serial command/address values 551, 552 into a respective parallel 13-bit address value and corresponding 3 -bit command value and outputs the parallel address and command values via memory-side address lines A[12:0] and command lines (WE, CAS, RAS), respectively.
  • serial command/address value 551 may be output, in parallel form, as an activation command (ACT) and corresponding row address (ROW) as shown at 555, and serial command/address value transmitted in the following tRR interval 552 is output as a column write command (WRITE) and corresponding column address (COL) as shown at 556.
  • a clock signal (CK ⁇ ) may be output from at least one of the data buffers 400 along with a clock-enable signal (CKE), and rotating bank address (BA).
  • CKE clock-enable signal
  • BA rotating bank address
  • the time delay (tRCD) between receipt of the activation command 555 (ACT) and the column write command 556 (WRITE) is one clock cycle.
  • write data may be output from the CPUA control queue, to each of four data buffers via respective high-speed serial data links CPU(A: 1 )-Di - CPU(A: 1 )-D L .
  • the write data output via each link may include two 20-bit data bursts (560) per tRR interval, with each 20-bit data burst 560 including two write data bytes, two data mask bits and two error-check bits; one data mask bit and one error-check bit per data byte.
  • four write data bytes, four data mask bits and four error-check bits may be transmitted to each of the four data buffers per tRR interval, thus effecting a total transfer of 128 write data bits (16 bytes), 16 data mask bits and 16 error-check bits, for a total of 160 bits per column write operation.
  • the time delay between receipt of the activation command and the column write command, tRCD may be one clock cycle
  • the time delay between receipt of the column-read command and write data output on the memory-side data path, tCWD may also be clock cycle (different timing delays may apply in different embodiments).
  • each of the data buffers may output a sequence of four write data bytes to the selected memory device via a respective one of data lanes DQO- DQ3, with each 20 bit write data value 560 being output in a successive pair of byte- wide data transfers 562.
  • a data strobe signal, DQS may be output in either quadrature or edge alignment with the write data (quadrature alignment is shown in Figure 10) via the data strobe line, and a data mask value is output via the data mask line.
  • a total of four bytes (32 bits) and four corresponding data mask bits may be provided to the selected memory device via respective data lanes, with a total of 16 bytes (128 bits) and 16 data mask bits being provided per column write operation.
  • FIG 11 illustrates an exemplary arrangement of memory access queues within the CPUs 201A-201D of Figure 3 and their relation to memory banks P-S within memory devices 207W-207Z of memory subsystems 203 [-2034.
  • each of the CPUs 201 may include four queue arrays 60Oi -60O 4 , one for each of the memory subsystems 203, with each queue array 600 including four columns of control queues that correspond to the memory devices 207 W- 207Z within the corresponding memory subsystem 203, and four rows of control queues that correspond to banks P, Q, R and S within the individual memory devices 207.
  • queue array 60Oi within each of the CPUs 201A-201D includes a control queue 605 at column three and row three (i.e., starting from left most column 1 and topmost row 1) that corresponds to the third bank (R) within the third memory device (Y) of memory subsystem 2031.
  • queue array 60O 4 within each of the CPUs includes a control queue 607 at column four, row one that corresponds to the first bank (P) within the fourth memory device (Z) of memory subsystem 203 4 . Note that a similar queue arrangement may be implemented with other types of memory access requestors.
  • the address values associated with the memory access requests are parsed to determine which memory subsystem 203, memory device 207, and memory bank 209 is to be accessed to carry out the request, and the appropriate command, address and data are queued therein.
  • write data may be queued along with the memory address and transferred to the target memory subsystem, memory device and memory bank in queued order.
  • the returned read data may be queued in an outbound queue (e.g., part of or associated with the control queue which sourced the corresponding memory read command) or similar structure for return to an external requestor or other circuitry (e.g., core processing circuitry) within the host device.
  • Figure 12 illustrates physical and logical views (631, 633) of a memory device 630 according to an embodiment that may be used in a cross-threaded memory system and that includes on-die integrated circuitry to enable multiple access requestors to simultaneously access respective storage resources. That is, instead of providing separate integrated circuit buffer devices as, for example, in the embodiment of Figure 1, circuitry for performing the signal conversion and multiplexing (i.e., switching) functions described above may be provided on the integrated circuit die that includes the core storage array (or arrays) and access control circuitry.
  • circuitry for performing the signal conversion and multiplexing (i.e., switching) functions described above may be provided on the integrated circuit die that includes the core storage array (or arrays) and access control circuitry.
  • two sets of storage banks 635 W and 635X form the core storage arrays of memory device 630 and are partitioned into lateral sets of sub-banks along a symmetry line 639. More specifically, the individual sets of sub-banks 637wo and 637wi on opposite sides of the symmetry line 639 collectively form storage banks 635W (the "W" storage banks), and individual sets of sub-banks 637 ⁇ 0 and 637 ⁇ i collectively form storage banks 635X (the "X" storage banks).
  • the memory device 630 additionally includes a data interface 641 and control interface 645.
  • the data interface 641 is partitioned along symmetry line 639 into a pair of lateral data interfaces 643 0 and 643 b
  • Each lateral data interface 643 includes a set of data input/output (I/O) ports (dA-dD) for connection to a respective memory access requestor (not shown), so that the memory device 630 supports direct (or indirect) connection to as many as four memory access requestors (e.g., CPUs, memory controllers and so forth as described above). More or fewer data I/O ports may be provided within data interface 641 in alternative embodiments, thus permitting more or fewer connections to memory access requestors.
  • each of the data I/O ports, dA-dD, within a given lateral data interface 643 includes a set of eight differential transceivers to receive write data and output read data in bytes (i.e., 8-bit values) via respective differential data links (i.e., 8 DQ pairs).
  • Single- ended transceivers may be used to send and receive signals via single-ended signaling links in alternative embodiments.
  • each of the data I/O ports within a lateral data interface 643 o is coupled to a multiplexing circuit 649 0 which responds to a channel-select signal (not shown) to switchably couple one of the data I/O ports, dA-dD, to sub-bank set 637wo via internal data path 65Oo and another of the data I/O ports to sub-bank set 637 ⁇ 0 via internal data path 65I 0 .
  • multiplexing circuit 649 1 similarly responds to the channel- select signal to switchably couple one of data I/O ports dA-dD to sub-bank set 637wi via internal data path 65Oi and another of the data I/O ports to sub-bank set 637 ⁇ i via internal data path 651 ⁇ .
  • any one of the four data I/O ports in a given lateral data interface 643 may be switchably coupled to an internal data path 650 to access the 637w sub-banks during a given interval, and any other of the four data I/O ports may be switchably coupled to internal data path 651 to access the 637 ⁇ sub-banks during that same interval.
  • the width of the individual internal data paths 65Oo, 650i, 65I 0 and 651 1 corresponds to the width of the data I/O port (i.e., 8 bits wide in this example), though serializing circuitry may be provided at the interface to the sets of sub-banks (and/or within the data I/O ports themselves) for converting a wider read data word retrieved from a selected sub-bank into a sequence of byte- sized read data values (and conversely deserializing a sequence of byte-sized write data values to form a wider write data word for storage in the selected sub-bank).
  • control interface 645 is organized in generally the same manner as the lateral data interfaces 643, and includes four control ports, cA-cD, each for coupling to a respective memory access requestor (more or fewer control ports may be provided in alternative embodiments) and a multiplexer (“cMux”) for switchably coupling one of the control ports, and thus the corresponding memory access requestor, to a selected one of access control logic circuits 65 Iw and 65 l ⁇ for the W and X storage banks, respectively.
  • cMux multiplexer
  • one of the memory access requestors may be switchably coupled to access control logic 65 Iw, while any other one of the memory access requestors is simultaneously switchably coupled to the access control logic 65 l ⁇ , thus permitting two memory access requestors to concurrently (i.e., at least partly overlapping in time) issue memory access commands or requests to the access control logic circuits 65 Iw and 65 l ⁇ , and thereby initiate independent, concurrent memory accesses in the W and X storage banks.
  • logical view 633 it can be seen that the counterpart 8-bit data I/O ports (dA-dD) within lateral data interfaces 643 0 and 643 1 collectively form respective 16-bit data I/O ports (dA-dD) within the overall data interface 641 which are coupled via multiplexer 649 (a logical representation of the two multiplexers 64% and 649 1 shown in physical view 631) to storage banks sets 635W and 635X via respective 16-bit internal data paths 650 and 651.
  • the control interface is omitted from logical view 633 to avoid obscuring the data interconnection arrangement.
  • each of the multiple data I/O ports and control ports may be used to access a separate storage resource during a given time interval.
  • the cross-threading memory device 630 has multiple memory access interfaces, each of the storage resources (i.e., storage banks or sub-banks) itself is single-ported in the embodiment of Figure 12, having a single set of bit lines for data storage and retrieval, and thus supporting access by only one memory access requestor at a time (i.e., a single-port storage cell performs only one transaction at a particular time).
  • Such an arrangement is in contrast to a multi-port storage array that has two or more sets of bit lines coupled to the array of storage elements to support two simultaneous accesses to the same storage array. More specifically, a multi-port storage cell (i.e., constituent element of a multi-port storage array) typically needs a word line and a bit line per port. Such an arrangement consumes area and is typically only used for register arrays and other storage applications that need a limited number of storage cells.
  • the memory device of Figure 12 and other embodiments herein may be applied within memory systems that utilize storage cells with a single access port (one access is occurring at any point in time) in which there is a single word line and a single bit line (examples of such memories include, without limitation, commodity DDR, GDDR and XDR memories in which concurrent operations are generally possible only because there are multiple independent banks capable of staggered (pipelined) operation). If the bit line is differential, then there may be two conductors forming the bit line. In some memory components there may be a separate read and write word line for a storage cell, but they are generally not utilized simultaneously.
  • a bank array (such as array W in Figure 12) formed with single-port storage cells may perform more than one transaction at a time, but only one transaction phase (activate, read, write, precharge) is occurring in the bank array at a particular time (i.e., the transactions are pipelined). Also, with two or more bank arrays (W and X in Figure 12) it is possible to perform the same phase of two different transactions simultaneously within the memory component (i.e., operations may be "micro- threaded").
  • FIG. 13 illustrates an embodiment of a memory module 670 having multiple cross- threading memory devices 630 0 -630 3 (also referred to as MemO-Mem3) coupled in a paired multi-drop configuration. That is, even numbered memory devices 63O 0 and 63O 2 (each of which may be implemented generally as described in reference to Figure 12) are coupled in a multidrop arrangement to first set of module data paths 671 AE -671 DE , while odd numbered memory devices 63O 1 and 63O 3 are coupled in a multi-drop arrangement to a second set of module data paths 671 AO -671 DO -
  • each memory access requestor may simultaneously access storage resources within a selected pair of cross-threading memory devices 630.
  • a round-robin sequence of memory accesses directed to storage banks W, X, Y and Z may be carried out via data I/O ports A, B, C and D (and corresponding control ports) as follows:
  • memory devices 63O 2 and 63O 3 are disposed on the backside of the module substrate 675 (i.e., on an opposite face of the module substrate from memory devices 63Oo and 63O 1 ) and form a separately selectable rank of memory devices (i.e., a rank being a group of memory devices that may be selected and/or enabled as a unit to output read data or receive write data) from memory devices 63O 0 and 63O 1 .
  • a rank being a group of memory devices that may be selected and/or enabled as a unit to output read data or receive write data
  • any number of additional pairs of cross-threading memory devices 630 may be provided and connected to the access requestors AR1-AR4 (or to more or fewer access requestors) to provide increased data transfer width.
  • memory devices 63Oo and 63Oi may be concurrently accessed via respective 16-bit data I/O paths 671 AE and 671 AO by access requestor ARl, thus enabling 32-bit read and write data transfer to the W or X storage banks (i.e., in simultaneous 16-bit accesses to constituent sub-banks). Similar 32-bit read and write transfers may be carried out simultaneously in the alternate storage banks (X or W) of memory devices 63Oo and 630i , and in the Y and Z sub-banks of memory devices 63O 2 and 63O 3 .
  • the effective read/write data width may be increased to 64-bits, 96-bits, 128-bits and so forth.
  • 16-bit per data I/O port is illustrated in the embodiments of Figures 12 and 13
  • narrower or wider data I/O ports may be provided in alternative embodiments, thereby also permitting smaller or larger read/write data widths with respect to individual memory access requestors.
  • more or fewer data I/O ports may be provided in each cross-threaded memory device 630 to support connection to, and/or simultaneous access by, more or fewer memory access requestors.
  • Figure 14 illustrates physical and logical views (691 and 693) of a cross-threading memory device 690 according to an embodiment in which two additional pairs of storage banks, 695 Y and 695Z, are provided for a total of four sets of storage banks (695 W, 695X, 695Y and 695Z) that may be independently and simultaneously accessed via two sets of four data I/O ports (dA-dD) and four control ports (cA-cD).
  • dA-dD data I/O ports
  • cA-cD control ports
  • W and X storage sub-banks on either side of symmetry line 698 are disposed in an interleaved arrangement and coupled to the multiplexers 70I 0 and 7011 via respective internal data paths 7O2wo and 702 ⁇ 0 (and 702wi and 702 ⁇ 0, and the Y and Z storage sub-banks, 697 ⁇ o/697zo and 697 ⁇ i/697zi are similarly interleaved and coupled to the multiplexers 70 Io and 7011 via respective internal data paths 702 Y 0 and 7O2zo (and 702 ⁇ i and 702zi).
  • the storage-subbanks 697 may be disposed in a non-interleaved arrangement, for example, with W sub-banks and Y sub- banks disposed together nearest the multiplexers 701, and the X sub-banks and Z sub-banks disposed together further from the multiplexers 701.
  • each of multiplexers 70I 0 and 701 1 may include circuitry for establishing two independent 8-bit data-path interconnections (e.g., via 2x8 DQ pairs) between first and second data I/O ports (i.e., any two of data I/O ports, dA-dD) and address-selected sub-banks within the W, X, Y or Z sets of storage banks 697.
  • first and second data I/O ports i.e., any two of data I/O ports, dA-dD
  • address-selected sub-banks within the W, X, Y or Z sets of storage banks 697.
  • control multiplexer (“cMux”) includes circuitry for establishing four independent control-path interconnections (instead of just two as in the embodiment of Figure 11) between the four control ports (cA-cD) and access control logic 7O5o/7O5 ! for the W, X, Y and Z storage banks.
  • the cross-threading memory device 690 effectively provides simultaneous access to four sets of storage banks (W-Z) via respective 16-bit data paths, with multiplexer 701 (a logical representation of the function collectively performed by physical-view multiplexers 70I 0 and 7Oh) switchably coupling each of data I/O ports dA-dD (logical representations of the pairs of data I/O ports dA-dD within physical view 691) to a storage bank within a respective set of the W-Z storage banks.
  • the control path includes four differential request (RQ) pairs to establish a 4-bit wide RQ path (one per memory access requestor), though additional request lines per memory access requestor may be provided in alternative embodiments.
  • RQ differential request
  • Figure 15 illustrates physical and logical views (731, 733) of a cross-threading memory device 730 according to an embodiment that includes conductive interconnects 739 to couple the counterpart data I/O ports dA-dD on either side of symmetry line 734 together, and thus enable the storage sub-banks on either side of the symmetry line 734 to be accessed independently of counter-part sub-banks on the opposite side of the symmetry line.
  • cross-threading memory device 730 is depicted as including four sets of storage banks 735W-735Z (each having an 8-bit data interface) instead of two sets of storage banks (W and X) each having (collectively) a 16-bit data interface as in the embodiment of Figure 12.
  • the data multiplexers 7370 and 7371 may each be modified to provide, individually, for two simultaneous connections between first and second data I/O ports (any two of data I/O ports dA-dD) and the adjacent sets of storage banks (i.e., storage banks W and X in the case of multiplexer 7370, and storage banks Y and Z in the case of multiplexer 7371).
  • the control multiplexer, cMux is constructed generally as described in reference to Figure 14 to enable simultaneous receipt and simultaneous execution of four independent memory access commands from respective access requestors and thus may include four- way switched paths between the control interfaces, cA-cD, and access control logic 7410 (for the W and X storage banks) and 7411 (for the Y and Z storage banks).
  • the cross-threading memory device 730 includes four sets of storage banks (W, X, Y and Z), each independently accessible by each of four data I/O ports (and therefore four memory access requestors) via respective 8-bit internal data paths 743X-743Z and 4x8-bit multiplexer 737 (a logical representation of the two 2x8-bit multiplexers 7370 and 7371 shown in physical view 731).
  • cross-threading memory device 730 includes a 32-bit wide data interface in the aggregate (i.e., formed by four 8-bit data interfaces) and a 4-bit wide request interface, not specifically shown in logical view 733.
  • the request interface may include additional links to provide additional request bandwidth as necessary to convey address and control information to the cross-threading memory device 730.
  • conductive interconnects 739 are formed by integrated-circuit metal layers (or other conductive structures) that serve to permanently wire counterpart data I/O ports on either side of symmetry line 734 together.
  • the conductive interconnects 739 may be switched (e.g., through one or more transistor switches such as pass gates or the like), thereby enabling the counterpart data I/O ports on opposite sides of symmetry line 734 to be switchably coupled or decoupled from one another and thus effect the logical architecture shown at 733 or, when decoupled, the logical architecture shown at 633 in Figure 12.
  • the conductive interconnects 739 may be formed by interconnections external to memory device 730 (i.e., off-chip or at least external to the integrated circuit package), for example through circuit board trace interconnection or other external interconnection and thus enable memory device 730 to be used in either of the configurations shown in Figures 12 and 15.
  • Figure 16 illustrates physical and logical views (781, 783) of a cross-threading memory device 780 according to an embodiment having two separate instances (7850, 7851) of the cross-threading architecture of Figure 12 disposed on a single die and in which like data I/O ports on either side of symmetry line 789 are coupled together via on-die conductive interconnects 791 (e.g., metal layer interconnects) or external interconnects.
  • on-die conductive interconnects 791 e.g., metal layer interconnects
  • 16-bit wide interfaces to the individual sets of storage banks formed by counterpart sub-banks on either side of symmetry line 790 as described in reference to Figure 12
  • four different sets of storage banks, W, X, Y and Z may be accessed simultaneously by four different memory access requestors.
  • each of four sets of storage banks may be accessed by a respective 16-bit internal data path 795W-795Z (each being a logical representation of the two 8-bit internal data paths between respective sets of storage sub-banks on either side of symmetry line 790), thereby establishing a 64-bit (or 64 DQ pair in a differential signaling embodiment) aggregate internal data path.
  • each of four memory access requestors may simultaneously access a respective one of the sets of storage banks via a respective data I/O ports dA-dD (and corresponding control ports cA-cD, not shown in logical view 783) and multiplexer 797.
  • Figure 17 illustrates an embodiment of a memory system 810 in which individual cross-threading memory devices communicate via chip-to-chip interfaces to enable increased cross-threading operation.
  • data multiplexer 820m within memory device 811 Om includes an output coupled to an input of counterpart data multiplexer 82On within memory device 811 On, and multiplexer 82On likewise has an output coupled to an input of multiplexer 820m.
  • each multiplexer 820m/820n may switchably connect internal data paths 8250 and 8251 to one of three input sources; one of the two local data I/O ports (designated 'dA' and 'dB' (815A, 815B) in memory device 8110m and 'dC and 'dD' (815C, 815D) in memory device 811On) or a remote data I/O port selected by the counterpart multiplexer.
  • a select signal 827 is provided to the multiplexers 820 in each memory device pair to control the data I/O-to-core connection as shown in the table at 830.
  • data I/O port dA within memory device 8110m is switchably coupled via multiplexer 820m to an address-selected one of storage banks W (817W);
  • data I/O port dC within memory device 820On is switchably coupled via multiplexers 82On and 820m (i.e., by n-to-m path, "N") to an address-selected one of storage banks X (817X);
  • data I/O port dB is switchably coupled via multiplexers 820m and 82On (i.e., by m-to-n path, "M") to an address-selected one of storage banks Z (817Z);
  • data I/O port dD within memory device 820On is switchably coupled via multiplexer 82On to an address-selected one of storage banks Y (817Y).
  • the select signal 827 is stepped through a sequence of values (e.g., from 00 to 01 to 10, to 11 as shown), the switched interconnections between data I/O ports and sets of storage banks is likewise switched, thus enabling round-robin access to each of the four sets of storage resources by each of four memory access requestors in a respective time interval.
  • the select signal 827 is transitioned to a '01 ' value, data I/O ports dA, dB, dC and dD are switchably coupled to storage bank sets X, Y, W and Z, respectively.
  • select signal 827 is transitioned to ' 10', data I/O ports dA, dB, dC and dD are switchably coupled to storage bank sets Z, W, Y, X, respectively, and then, when the select signal 827 is transitioned to ' 11 ', to Y, X, Z, W, respectively.
  • overall device latency may be increased relative to embodiments described above, but throughput may be maintained.
  • the select signal 827 may be generated by one or more of the access requestors (depicted as controllers 201A-201D in Figure 17, though virtually any type of memory access requestor/controller may be used), or may be generated by a another control device, or even within one or more of the memory devices themselves.
  • control ports within each of the memory devices 811 may also include inter-coupled control multiplexers to enable control information to be passed between paired memory devices.
  • memory access queue 0 within each of controllers 201 A-201D is coupled to a respective one of the four I/O ports A-D within memory device pair 8110m/811On.
  • Memory access queue 1 within each of controllers 201 A-201D is likewise coupled to a respective I/O port within memory device pair 811 lm/811 In, and memory access queues 2 and 3 within each of controllers 201A-201D are likewise coupled to respective sets of I/O ports within memory device pairs 8112m/8112n and 8113m/8113n.
  • particular numbers of memory devices, memory access requestors, I/O ports per memory device and/or memory access queues per memory access requestor may be increased or decreased according to application demands.
  • each of the signaling links coupled between a given data I/O port or control port may be single-ended or differential and may include any number of constituent signaling links, including a single-bit signaling link.
  • the data rate over a given signaling path between a memory access requestor and memory device may be faster or slower than corresponding data and/or control path within the memory device, thus enabling internal data paths to be wider or narrower and internal data transfer rates to be slower or faster than data path widths/data transfer rates on the requestor-to-memory-device signaling paths.
  • An integrated circuit (IC) device comprising: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to switchably couple any one of the control interfaces to any one of the memory interfaces.
  • IC integrated circuit
  • the IC device of clause 1 further comprising signal conversion circuitry to convert between a first number of signals conveyed via the plurality of control interfaces and a second number of signals conveyed via the plurality of memory interfaces, the second number being larger than the first number.
  • each signal of the first number of signals is received or transmitted at or above a first signaling rate and wherein each signal of the second number of signals is received or transmitted at or below a signaling rate that is lower than the first signaling rate.
  • the IC device of clause 3 further comprising configuration circuitry coupled to the signal conversion circuitry to control the conversion between the first number of signals and the second number of signals according to a configuration value.
  • control information includes address information that specifies a storage location within one of the memory IC devices at which data is to be stored or from which data is to be retrieved.
  • the conversion circuitry includes deserializing circuitry to convert bits conveyed serially in one of the first signals to a parallel series of bits that constitute at least a portion of the second signals.
  • each of the control interfaces comprises circuitry to receive a respective one of the first number of signals via a high-speed signaling link.
  • each of the memory interfaces is configured to connect to an industry-standard memory component.
  • a method of operation within an integrated circuit (IC) device comprising: receiving information relating to a plurality of concurrent memory access operations via respective control interfaces: switchably coupling each of the control interfaces to a respective one of a plurality of memory interfaces in a first interconnection pattern during a first interval; outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces, respectively, according to the first interconnection pattern.
  • IC integrated circuit
  • receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of address values that indicate respective memory locations to be accessed in the plurality of concurrent memory access operations.
  • receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of sets of write data to be stored within respective memory devices in the plurality of concurrent memory access operations.
  • outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces comprises outputting the information relating to the plurality of concurrent memory access operations to respective memory IC devices coupled to the plurality of memory interfaces.
  • a system comprising: a plurality of memory IC devices; and a first buffer integrated circuit (IC) device having a plurality of memory interfaces coupled respectively to the plurality of memory IC devices, control interfaces to couple to respective requestor IC devices, and switching circuitry to couple each of the control interfaces concurrently to a respective one of the memory interfaces in accordance with a selection value.
  • IC buffer integrated circuit
  • each of the memory IC devices comprises a first data interface coupled to a respective one of the memory interfaces of the first buffer IC device, and a second data interface coupled to a respective one of the memory interfaces of the second buffer IC device.
  • each of the requestor IC devices comprises a first output node coupled to a respective one of the control interfaces of the first buffer IC device; and a second output node coupled to a respective one of the control interfaces of the second buffer IC device.
  • each of the requestor IC devices comprises a first data input/output (I/O) node coupled to a respective one of the control interfaces of the first buffer IC device to convey read data retrieved from, or write data to be stored within, one of the memory IC devices, and wherein each of the requestor IC devices further comprises a control node coupled to a respective one of the control interfaces of the second buffer IC device to convey an address value that specifies a storage location within the one of the memory IC devices from which the read data is to be retrieved or at which the write data is to be stored.
  • I/O data input/output
  • An integrated circuit (IC) device comprising: means for receiving information relating to respective memory access requests from respective requestor IC devices; means for conveying the information relating to memory access requests to respective memory IC devices; and means for enabling any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
  • Computer-readable media having information embodied therein that includes a description of an integrated circuit (IC) device, the information including descriptions of: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and
  • IC integrated circuit
  • switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to enable any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
  • a method of operation within an integrated-circuit (IC) memory device comprising: during a first interval, concurrently accessing a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and during a second interval, concurrently accessing a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
  • IC integrated-circuit
  • transferring data between the first memory array and the first external signaling interface comprises transferring data between the first memory array and the first external signaling interface via a multiplexer circuit.
  • switching connections within the multiplexer circuit comprises transitioning a select signal from a first state to a second state after the first interval, the select signal being supplied to the multiplexer circuit to control switched connections between inputs and outputs of the multiplexer circuit.
  • An integrated-circuit (IC) memory device comprising: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval.
  • the first memory array comprises a plurality of storage banks, each of the storage banks including a plurality of rows of storage cells.
  • the first external signaling interface comprises output circuitry to output data onto an external signaling path, the output circuitry being switchably coupled, via the multiplexer, to the first memory array during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
  • first external signaling interface further comprises receive circuitry to receive data via the external signaling path, the receive circuitry being switchably coupled, via the multiplexer, to the first memory during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
  • the multiplexer comprises an input to receive a select signal, the select signal being in a first state during the first interval to switchably connect, via the multiplexer, the first memory array to the first external signaling interface and the second memory array to the second external signaling interface, and the select signal being in a second state during the second interval to switchably connect, via the multiplexer, the first memory array to the second external signaling interface and the second memory array to the first external signaling interface.
  • the memory device of clause 38 further comprising first and second memory control interfaces to receive first and second memory access commands, the first memory access command including an address value that indicates a first storage location to be accessed within the first memory array via the first external signaling interface, and the second memory access command including an address value that indicates a second storage location to be accessed within the second memory array via the first external signaling interface.
  • An integrated-circuit (IC) memory device comprising: first and second memory arrays; first and second external signaling interfaces; means for concurrently accessing, during a first interval, a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and means for concurrently accessing, during a second interval, a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
  • IC integrated-circuit
  • Computer-readable media having information embodied therein that includes a description of an integrated-circuit (IC) memory device, the information including descriptions of: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval.
  • IC integrated-circuit
  • circuits disclosed herein may be described using computer aided design tools and expressed (or represented), as data and/or instructions embodied in various computer-readable media, in terms of their behavioral, register transfer, logic component, transistor, layout geometries, and/or other characteristics. Formats of files and other objects in which such circuit expressions may be implemented include, but are not limited to, formats supporting behavioral languages such as C, Verilog, and VHDL, formats supporting register level description languages like RTL, and formats supporting geometry description languages such as GDSII, GDSIII, GDSIV, CIF, MEBES and any other suitable formats and languages.
  • Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof.
  • Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, etc.).
  • Such data and/or instruction-based expressions of the above described circuits may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs including, without limitation, net-list generation programs, place and route programs and the like, to generate a representation or image of a physical manifestation of such circuits.
  • a processing entity e.g., one or more processors
  • Such representation or image may thereafter be used in device fabrication, for example, by enabling generation of one or more masks that are used to form various components of the circuits in a device fabrication process.
  • circuit elements or circuit blocks shown or described as multi-conductor signal links may alternatively be single-conductor signal links, and single conductor signal links may alternatively be multi-conductor signal links.
  • Signals and signaling paths shown or described as being single-ended may also be differential, and vice-versa.
  • signals described or depicted as having active-high or active-low logic levels may have opposite logic levels in alternative embodiments.
  • MOS metal oxide semiconductor
  • bipolar technology bipolar technology
  • a signal is said to be “asserted” when the signal is driven to a low or high logic state (or charged to a high logic state or discharged to a low logic state) to indicate a particular condition.
  • a signal is said to be “deasserted” to indicate that the signal is driven (or charged or discharged) to a state other than the asserted state (including a high or low logic state, or the floating state that may occur when the signal driving circuit is transitioned to a high impedance condition, such as an open drain or open collector condition).
  • a signal driving circuit is said to "output" a signal to a signal receiving circuit when the signal driving circuit asserts (or deasserts, if explicitly stated or indicated by context) the signal on a signal line coupled between the signal driving and signal receiving circuits.
  • a signal line is said to be “activated” when a signal is asserted on the signal line, and “deactivated” when the signal is deasserted.
  • the prefix symbol "/" attached to signal names indicates that the signal is an active low signal (i.e., the asserted state is a logic low state).
  • a line over a signal name e.g.,
  • Integrated circuit device "programming” may include, for example and without limitation, loading a control value into a register or other storage circuit within the device in response to a host instruction and thus controlling an operational aspect of the device, establishing a device configuration or controlling an operational aspect of the device through a one-time programming operation (e.g., blowing fuses within a configuration circuit during device production), and/or connecting one or more selected pins or other contact structures of the device to reference voltage lines (also referred to as strapping) to establish a particular device configuration or operation aspect of the device.
  • exemplary is used to express an example, not a preference or requirement.

Abstract

Within an integrated-circuit (IC) memory device, and during a first interval, a first storage location within a first memory array and a second storage location within a second memory array are concurrently accessed via first and second external signaling interfaces, respectively. During a second interval, a third storage location within the first memory array and a fourth storage location within the second memory array are concurrently accessed via the first and second external signaling interfaces.

Description

CROSS-THREADED MEMORY DEVICE AND SYSTEM
Inventor(s): Frederick A. Ware Kishore Kasamsetty Lawrence Lai Wayne Fang Liang Peng
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Patent Application No. 11/460,582, filed July 27, 2006 entitled "CROSS-THREADED MEMORY SYSTEM" and to Provisional U.S. Patent Application No. 60/870,824, filed December 19, 2006 entitled "CROSS-THREADED MEMORY DEVICE". Both of the aforementioned applications are hereby incorporated by reference in their entirety.
TECHNICAL FIELD [0002] The disclosure herein relates to data storage and retrieval systems.
BACKGROUND
[0003] Memory bandwidth is a key factor in the performance of modern gaming systems and has increased with each new generation largely through increases in signaling rate and input/output (I/O) pins. Unfortunately, pin count and signaling rate are beginning to approach physical limits so that further increases must overcome difficult challenges and will likely be unable to keep pace with the increased memory bandwidth demanded by next-generation systems.
[0004] One alternative to increasing pin count or signaling rate is to add additional graphics controllers to achieve increased parallel processing within a graphics pipeline. Unfortunately, many of the data structures that need to be accessed to carry out the functions within the graphics pipeline tend to be shared so that, even if multiple graphics controllers are provided, a performance penalty is typically incurred each time two controllers contend for a shared data structure, as one of the controllers generally must wait for the other to finish accessing the memory in which the shared data structure is stored. BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The disclosure herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Figure 1 illustrates an embodiment of a cross-threaded memory system;
Figure 2 illustrates the timing of a round-robin memory access scheme that may be applied within the cross-threaded memory system of Figure 1 ;
Figure 3 illustrates a more specific embodiment of a cross-threaded memory system in which buffer devices and memory devices are disposed within multi-chip-package memory subsystems;
Figure 4 illustrates an exemplary layout of the cross-threaded memory system of Figure 3, with memory subsystems disposed in a central region of a printed circuit board between central processing units or other memory access requestors;
Figure 5 is an exemplary timing diagram for a memory read operation carried out within the cross-threaded memory system of Figure 3;
Figure 6 is an exemplary timing diagram for a memory write operation carried out within the cross-threaded memory system of Figure 3;
Figure 7 illustrates an embodiment of an address buffer that may be used to implement the address buffer depicted in Figure 3;
Figure 8 illustrates an embodiment of a data buffer that may be used to implement the data buffers depicted in Figure 3;
Figure 9 illustrates an exemplary timing arrangement for a memory read operation within a cross-threaded memory system that includes the address buffer shown in Figure 7 and data buffers as shown in Figure 8;
Figure 10 illustrates an exemplary timing arrangement for a memory write operation within a cross-threaded memory system that includes the address buffer shown in Figure 7 and data buffers as shown in Figure 8;
Figure 11 illustrates an exemplary arrangement of memory access queues within the central processing units of Figure 3 and their relation to memory banks within memory devices of the memory subsystems;
Figure 12 illustrates an embodiment of a memory device having on-die integrated circuitry to enable multiple access requestors to simultaneously access respective storage resources; Figure 13 illustrates an embodiment of a memory module having multiple cross-threading memory devices coupled in a paired multi-drop configuration;
Figure 14 illustrates an embodiment of a cross-threading memory device having two additional pairs of storage banks to provide a total of four sets of storage banks that may be independently and simultaneously accessed via two sets of four data input/output ports and four control ports;
Figure 15 illustrates an embodiment of a cross-threading memory device having conductive interconnects to couple counterpart data input/output ports together;
Figure 16 illustrates physical and logical views of a cross-threading memory device according to an embodiment having two separate instances of the cross-threading architecture depicted in Figure 12; and
Figure 17 illustrates an embodiment of a memory system 810 in which individual cross- threading memory devices communicate via chip-to-chip interfaces to enable increased cross- threading operation. DETAILED DESCRIPTION
[0006] A memory subsystem having one or more integrated-circuit (IC) devices that enable multiple memory access requestors to concurrently access a set of shared memory devices is disclosed in various embodiments. In one embodiment, each such IC device, referred to herein as a buffer IC or buffer device, may include circuitry to switchably couple any one of the memory access requestors to any one of the memory devices and to concurrently couple each of the other memory access requestors to others of the memory devices in accordance with a channel select signal. By this arrangement, all the memory access requestors may concurrently access the collective memory devices during a given switching interval, with each requestor accessing a respective one of the memory devices. At the conclusion of the switching interval, the channel select signal may be changed to establish a different switched connection between requestors and memory devices for the subsequent switching interval. In one embodiment, for example, the channel select signal may be stepped through a repeating sequence of values so that each of the memory access requestors is provided with time-multiplexed access to each of the memory devices in round-robin fashion. By this operation, for example, multiple graphics controllers may be operated in parallel to carry out pipelined graphics processing operations using a shared memory structure and without requiring the controllers to become idle or otherwise wait while other controllers finish accessing a shared memory device. Viewing each sequence of accesses from a given controller to a given memory device as a memory access thread, the concurrent accesses to the various memory devices by different controllers are referred to herein as cross-threads, and the overall memory system formed by the multiple controllers, one or more buffer devices and memory devices is referred to herein as a cross- threaded memory system.
[0007] In one embodiment, each of the buffer devices may include multiple control interfaces and multiple memory interfaces. When configured in a data processing system such as a gaming console or other memory- intensive system, each of the control interfaces may be coupled to a respective memory access requestor and each of the memory interfaces may be coupled to a respective memory device. More specifically, in a particular graphics processing embodiment, each of the memory access requestors may be a graphics controller or processor and may be implemented on a dedicated integrated circuit die or on a die that may include one or more other graphics controllers, and each of the memory devices may be an integrated circuit die or group of integrated circuit dice. Further, the integrated circuit dice on which the memory devices and buffer devices are formed may be disposed within a multiple-die IC package, including, without limitation, a system-in-package (SIP), package-in-package (PIP), package-on- package (POP) arrangement.
[0008] In another embodiment, circuitry for enabling multiple concurrent accesses to different storage resources within a given memory device is provided in the integrated circuit memory device itself. For example, memory devices having multiple control and data interfaces for coupling to respective memory access requestors are disclosed in various embodiments. While the individual storage resources (e.g., storage banks and/or storage sub-banks) are single- ported in that only one access is carried out in the storage resource over a given interval, respective access paths to different storage resources are provided to enable different memory access requestors to concurrently read and/or write data to respective storage resources. [0009] Figure 1 illustrates an embodiment of a cross-threaded memory system 100 that may include multiple memory access requestors 101 A-101D, buffer devices 103i-1034 and memory devices 105W-105Z. The memory access requestors (collectively, 101) may be special or general purpose processors, such as microprocessors, graphics processors, graphics controllers, microcontrollers and the like, or more task-specific devices such as direct-memory-access (DMA) controllers, application-specific integrated circuits (ASICs), or any other type of memory access requestor, including combinations of different types of memory access requestors. In the embodiment shown, each of the buffer devices 103 may be implemented in a respective integrated circuit die, though two or more (or all) of the buffer devices may be combined within a single integrated circuit die. Also, as discussed in further detail below, the buffer devices 103, memory devices 105 and/or memory access requestors 101 may be combined in a multi-chip package including, without limitation, a system-in-package (SIP), package-on-package (POP), package-in-package (PIP) or the like.
[0010] Each of the buffer devices 103 may include multiple control interfaces 115 (designated A-D) each coupled to a respective one of the requestors 101 A-IOlD via an n- conductor signal path 102, and also multiple memory interfaces 117 each coupled to a respective one of the memory devices 105W-105Z via an m-conductor signaling path 104. Each of the n- conductor signal paths 102 is used to convey control and address information, as well as data, associated with each memory transaction. Each of the m-conductor signaling paths 104 is similarly used to convey data and control/address information associated with each memory transaction. In one embodiment, the control-side signaling paths 102 (i.e., the signaling paths between the buffer ICs 103 and the memory access requestors 101) may be each formed by one or more signaling links (which may each include a single conductor in a single-ended signaling arrangement or two conductors in a differential signaling arrangement) that are fewer in number, but operated at higher signaling rate, than the signaling links which form the memory-side signaling paths 104 (i.e., the signaling paths between the buffer ICs 103 and the memory devices 105), thus enabling narrower but faster control-side signaling paths 102 to match the bandwidth of wider, but slower memory-side signaling paths 104. The path width (i.e., number of constituent links within a given signaling path) and signaling rate relationship may be reversed in alternative embodiments (i.e., narrower but faster memory-side signaling path), or may be substantially balanced. Also, the bandwidth of the control-side and memory-side signaling paths may not exactly match, thus providing headroom to convey error information or other signaling control and/or system control information in otherwise unused bandwidth. [0011] Each of the buffer devices 103 may additionally include a switching circuit 119 or multiplexing circuit disposed between the control interfaces 115 and memory interfaces 117 to enable flexible, switched interconnection of the control interfaces 115 and memory interfaces 117. More specifically, depending on the state of a channel select signal (not specifically shown in Figure 1), the switching circuit 119 may couple any one of the control interfaces 115 exclusively to any one of the memory interfaces 117, and concurrently (i.e., at least partly overlapping in time) couple each of the other control interfaces exclusively to another of the memory interfaces. For example, during a first switching interval, individual control interfaces A, B, C and D (i.e., within control interfaces 115) may be switchably coupled to memory interfaces W, X Y and Z, respectively, in response to a first state of the channel select signal, while in a subsequent interval, the channel select signal may be changed so that control interfaces A, B, C and D are switchably coupled to memory interfaces X, Y, Z and W, respectively. Other interconnection patterns are possible and, as discussed below, when the channel select signal is sequenced through a repeating pattern in which each control interface is coupled one-after- another to each of the memory interfaces, concurrent, round-robin access to each of the memory devices 105W-105Z may be provided to each of the memory access requestors 101 A-IOlD, thereby providing each memory access requestor 101 with complete and continuous access to the shared memory formed by memory devices 105.
[0012] Though memory devices 105 may be implemented using virtually any type of storage technology, in the embodiment of Figure 1 and other embodiments described below, the memory devices 105 may be dynamic random access memory (DRAM) devices (including, for example and without limitation, DRAM devices of various data rates (SDR, DDR, etc.), graphics memory devices (e.g., GDDR), XDR memory devices, micro-threading memory devices, for example as described in U.S. Patent Application Publication No. US2006/0117155 Al, and so forth) having multiple storage banks (referred to herein simply as "banks") and that exhibit a minimum time delay (tRR) between successive accesses to rows within different banks and a minimum time delay (tRC) between successive accesses to different rows within the same bank. A minimum time delay (tCC) may also be imposed between successive accesses to different columns of data within an activated row, where an activated row is one whose contents have been retrieved from an address selected row of DRAM storage cells and latched within a bank of sense amplifiers. In the particular embodiment of Figure 1, each of the four memory devices 105W-105Z may include a memory core formed by four address-selectable memory banks 107P-107S (the banks being designated P, Q, R and S) and control logic 110 to store data within and retrieve data from the memory core in response to memory access commands. In the particular embodiment shown, the control logic 110 may include multiple data I/O ports coupled to respective memory-side data paths 104 and thus may receive slices of data (via each data I/O port) that collectively form a write data word to be stored in a memory write transaction or to output slices of data that collectively form a read data word in a memory read transaction. One or more separate control ports may be provided within each memory device 105 for receipt of control information (e.g., commands or requests indicating the requested operation and, at least in the case of a memory read or write, one or more address values that specify the bank, row and/or column location to which the operation is directed), or the control information may be time-multiplexed onto one or more of the data paths 104 and received via the data I/O ports. In a memory read operation, the control logic 110 may activate an address-specified row of storage cells within an address- specified bank (i.e., in an activate or activation operation), if the row has not already been activated, then may retrieve read data through one or more read accesses directed to address- specified column locations within the activated row of an address-specified bank. The read data may be output to the buffer devices 103i-1034 in respective slices (i.e., portions of the entire read data word) via data paths 104, and the buffer devices 103i-1034, in turn, may forward the read data to a selected one of memory access requestors 101 A-101D via switching circuits 119 and controller interfaces 115. In a memory write operation, the control logic 110 may also activate an address-specified row of storage cells within an address-specified bank, if not already activated, then may perform one or more write accesses directed to address-specified column locations within the activated row for an address-specified bank to store a write data word received via data paths 104.
[0013] Figure 2 illustrates the timing of a round-robin memory access scheme that may be applied within the cross-threaded memory system 100 of Figure 1. A two-bit channel select signal ("Channel Select") may be provided to each of the buffer devices 103 and may be repeatedly stepped through states OO', '01 ', '10' and ' 11 ' in respective tRC intervals, 126i-1264. By this arrangement, each of the buffer devices 103 may couple control interface 115-A (i.e., interface A within control interfaces 115) to memory interface 117-W (i.e., interface W within memory interfaces 117) during interval 126] so that each of the four data I/O ports within memory 105W may be switchably coupled to requestor 101A via a respective one of the buffer devices 103i-1034. Consequently, memory device 105W may be accessed (i.e., through each of its four data I/O ports in parallel) by memory access requestor 101 A during each of four tRR intervals that make up tRC interval 126i as indicated by the designation 'A', 'A', 'A', 'A' in the 'Memory W access sequence of Figure 2. During the same tRC interval (126i) memory access requestor 101B may be switchably coupled to memory device 105X via control interfaces 115-B and memory interfaces 117-X within the four buffer devices 103i-1034; memory access requestor IOIC may be switchably coupled to memory device 105 Y via control interfaces 115-C; and memory interfaces 117- Y, and memory access requestor 101 D may be switchably coupled to memory device 105Z via control interfaces 115-D and memory interfaces 117-Z. In the subsequent switching interval (i.e., tRC interval 1262), the channel select signal may be changed (i.e., stepped or sequenced) to state '01 ' to switchably couple memory access requestors 101 A, B, C and D to memory devices 105Z, W, X and Y, respectively. In the following switching interval (tRC interval 1263), the channel select signal may be changed to state '10' to switchably couple memory access requestors 101 A, B, C and D to memory devices 105 Y, Z, W and X, respectively, and in a final switching interval (tRC interval 1264) before the channel select signal rolls over to repeat the channel selection sequence, the channel select signal may be changed to state '11 ' to couple memory access requestors 101 A, B, C and D to memory devices 105X, Y, Z and W, respectively.
[0014] In the particular embodiment of Figures 1 and 2, four different channel select values may be applied to enable each of the four memory access requestors 101 A-101D to access the four memory devices 105W-105Z during a respective tRC interval and, thus, the total time to sequence through each possible interconnection pattern is 4*tRC (where '*' denotes multiplication), a time interval referred to herein as a switch-pattern cycle time. [0015] Still referring to Figure 2, in one embodiment, a bank-select value (or bank address) may be sequenced through each of four possible bank selection values during each switching interval 126 (i.e., each tRC interval) to enable each memory access requestor 101 to access each memory bank 107 of the selected memory device 105 in a respective tRR interval 127. Thus, during the four tRR intervals that constitute switching interval 12O1, memory access requestor 101 A may be enabled to access memory banks 107P, 107Q, 107R and 107S, respectively, within memory device 105 W, and memory access requestors 101 B, IOIC and 10 ID are likewise (and concurrently) enabled to access memory banks 107P, 107Q, 107R and 107S within memory devices 105X, 105 Y and 105Z, respectively. Other bank selection sequences may be applied in alternative embodiments, particularly where more or fewer banks 107 are provided within each memory device 105. Also, while each of the multi-bank memory devices 105 has been described as being implemented by a single IC, multiple memory ICs may be accessed as a unit, referred to herein as a memory rank, with each memory device within the memory rank contributing a respective subset of the data I/O ports that form the total collection of data I/O ports shown for a given memory device 105.
[0016] Figure 3 illustrates a more specific embodiment of a cross-threaded memory system 200 in which buffer devices (205I-205L and 206) and memory devices 207W-207Z may be disposed within multi-chip-package memory subsystems 203r2034. In the particular embodiment of Figure 3 and in other embodiments described below, each multi-chip package memory subsystem 203 is depicted and described as a system-in-package (SIP) arrangement (i.e., multiple die within a single integrated circuit package). In all such cases, the multi-chip package memory subsystems 203 may alternatively be, for example and without limitation, a system-on- chip (SOC), package-in-package (PIP ~ an arrangement in which two or more IC packages are included within a larger IC package), package-on-package (POP ~ an arrangement in which one or more IC packages are mounted or otherwise disposed on another IC package). Also, in the embodiment of Figure 3 and other embodiments described below, the memory access requestors are depicted and described as central processing units (CPUs) 201 A-201D, though virtually any device or system of devices capable of initiating memory access requests, either in response to programmed control or requests or commands from another device, may alternatively be used to implement one or more of the CPUs 201. Further, for purposes of example only, a specific number of CPUs 201, memory subsystems 203 and memory devices/buffer devices (207, 205, 206) per memory subsystem 203 are shown. More or fewer CPUs, memory subsystems, memory devices and/or buffer devices may be provided in alternative embodiments. [0017] In one embodiment, shown in the Figure 3 detail view of memory subsystem 203 i (i.e., SIPl), each memory subsystem 203 may include a set of four multi-bank memory devices 207 (four-bank memory devices in this example), a set of data buffer devices 205I-205L (data buffers) and an address buffer device 206 (address buffer). Each memory device 207 may include a control logic circuit 211 having a data interface 212 and a command/address (CA) interface 214, with the data interface 212 including four data input/output (I/O) ports (DQ0-DQ3) coupled to data buffers 205I-205L, respectively, via data paths 216, and the CA interface 214 coupled to the address buffer 206 via CA path 218. For purposes of example, the memory devices 207 may be synchronous double-data rate (DDR) DRAM devices that respond to commands and addresses received at CA interface 214, by outputting read and receiving write data via data interface 212. As discussed further below, timing information (e.g., clocking information to time receipt of incoming command/address values and to provide a timing reference within the synchronous DRAM device, and strobe signals to time inbound and outbound data transfer) as well as other control information (e.g., clock enable, chip select) and the like may also conveyed via the CA path 218 and/or the data paths 216. [0018] In one embodiment, each of the CPUs 201A-201D may include multiple memory access queues 221 (memory queues, for short) numbered 1-4, with each of the memory queues 221 coupled to a respective one of the memory subsystems 2031 -2034 via a set of control-side data paths 222 and a control-side command/address (CA) path 224. Further, in the particular embodiment shown, each of the data paths 222 and CA paths 224 may be implemented by a single-bit differential, point-to-point signaling link that may be operated at a signaling rate that is an integer multiple of the signaling rate applied across the memory-side data paths and address path. For example, in one implementation, each of the five control-side signaling links coupled to a given memory queue 221 may operate at 2 Gigabits per second (2 Gb/s), while the memory- side data paths 216 are operated at 0.2 Gb/s and the memory-side CA path 218 may operate at 0.1 Gb/s. These exemplary signaling rates and path widths are carried forward in further embodiments described below, but may be different in alternative embodiments. [0019] As in the embodiment of Figure 1, each of the five buffer devices (205I-205L and 206) within a memory subsystem 203 may include multiple control interfaces 234 coupled respectively to CPUs 201 A-201D, multiple memory interfaces 236 coupled respectively to the constituent memory devices 207 of the memory subsystem, and switching circuitry 235 to enable concurrent and exclusive coupling between the control interfaces 234 and memory interfaces 236 as necessary to provide switched access to each of the memory devices by each of the CPUs. In the particular example shown, there may be four memory interfaces 236 (designated W-Z, and thus referred to herein as 236-W, 236-X, 236-Y and 236-Z) coupled respectively to the four memory devices 207W-207Z, and four control interfaces 234 (designated A-D and referred to herein as 234-A, 234-B, 234-C and 234-D) coupled respectively to the four CPUs 201A-201D. The number of memory interfaces 236 and/or control interfaces 234 may change with the number of memory devices and/or CPUs (or other memory access requestors).
[0020] Figure 4 illustrates an exemplary layout of the cross-threaded memory system 200 of Figure 3, with memory subsystems 203i-2034 disposed in a central region of a printed circuit board 250 between CPUs 201 A-201D. In the particular embodiment shown, the memory subsystems may be SIPs (SIP1-SIP4) each having a substrate 255 with memory devices 207 W- 207Z mounted thereto. The data buffers, 205I-205L may be mounted on the memory devices 207W-207Z, respectively, and the address buffer 206 may be disposed centrally on the substrate 255 between the memory devices 207. Each of the CPUs 201A-201C is coupled to each of the SIP memory subsystems 203i-2034 by a respective set of five point-to-point links 202 operated, for example, at 2Gb/s. The memory subsystems 203 are depicted as mounted on their sides but may alternatively be disposed face-down or face-up on the printed circuit board 250. The printed circuit board 250 itself may be a daughterboard having an interconnection structure (e.g., edge connector) for insertion within a socket of a larger circuit board or backplane, or may be a main board within a data processing system such as a gaming console, workstation, etc. As discussed above, more or fewer CPUs 201 and/or memory subsystems 203 may be provided in alternative embodiments, and the memory subsystems 203 may have more or fewer constituent buffer devices (205, 206) and/or memory devices 207 and may be implemented by structures other than system-in-package .
[0021] Figure 5 is an exemplary timing diagram for a memory read operation carried out within the cross-threaded memory system 200 of Figure 3, and showing in particular the control information and data conveyed between memory queue 221-1 ("Queue 1") of CPU A and memory device 207 W ("Memory W") of memory subsystem 2011. In the particular embodiment shown, the tRC interval may be 80 nanoseconds (80ns), and the tRR interval may be 20ns. This timing arrangement may permit a total of 40 bits of information (2 bits/ns) to be transferred via each of the five single-bit 2Gb/s links (i.e., 5xl-bit) between Queue 1 and SIPl in each tRR interval. More specifically, at the start of a memory read transaction, an activation command may be conveyed via the control-side command/address link (designated "Queue 1 : CA" in Figure 5) in the 20-bit (i.e., 10ns) interval that constitutes the first half of tRR interval 27I 1. As described in further detail below, the address buffer 206 may include circuitry to deserialize (i.e., convert to parallel form) the incoming serial command/address bit stream to form an activation control word 274 ("ACT") that includes the address of a row to be activated (i.e., a 13-bit row address value, "13xA," in this example), and a corresponding row-activation command encoded into signals WE, CAS and RAS. In the embodiment shown, the row-activation control word 274 may be output onto the memory-side command/address path (designated "Memory W: CA" in Figure 5) at 0.1 Gb/s (e.g., at single-data rate with respect to a 100 MHz clock signal) and thus at a command path (tcABiτ) bit time of 10ns that spans the second half of tRR interval 27I1. Because only sixteen bits of information are conveyed via the memory-side CA path per 10ns interval, versus 20-bits via the control-side CA path, additional bandwidth may be available on the control-side CA path (8 bits per tRR interval or 4 bits per command/address transfer) and may be used to convey error information and/or to support error handling protocols as discussed below.
[0022] During the second half of tRR interval 27I 1, while the activate command and corresponding address are conveyed to memory device W via the memory-side CA path, a column read command may be conveyed to the address buffer via the control-side CA path. As with the activate command/address, the address buffer may convert the serial bit stream in which the column read command is conveyed into a sixteen bit column-read control word 276 that includes three-bit column-read code (signaled by the encoding of WE, CAS and RAS signals) and a 13-bit column address. The control word 276 is output to memory device 207W (i.e., as shown in Figure 4) during the first half of tRR interval 27I2. The correspondence between the command/address information conveyed via the control-side CA path and the memory-side CA path is shown in Figure 5 by the lightly shaded activation command and darker shaded column read command.
[0023] Memory device 207W may respond to the activation control word 274 by activating the address-specified row of memory cells within a selected memory bank, thus making the contents of the row available for read and write access in subsequent column operations. As discussed above in reference to Figure 2, the bank address may be stepped through a predetermined sequence of values in successive tRR intervals 271 and thus may be generated within the address buffer (e.g., by a modulo counter), within one or more of the CPUs 201 shown in Figure 3, or within another integrated circuit device, not shown in Figure 3. In any case, after the row activation is completed, memory device 207W may perform a column read operation at the column address specified in association with the column-read control word 276 (and in the bank specified by the sequenced bank address) to retrieve a data word 280 that is output to the data buffers 205 starting a predetermined time, tCAC, after the column-read control word 276 has been received. More specifically, the data word 280 may be output during four successive 5ns data-bit intervals (i.e., toQBiτ, 0.2Gb/s) within tRR interval 27I3, and in respective slices via the four byte-wide data lanes, DQ0-DQ3, that constitute the 32-bit data path coupled to memory device 207W. In one embodiment, the signals output via each of the data lanes may include eight data bits ("8xQ") and may be accompanied by a differential data strobe signal ("2xDQS") that is used to time sampling of the read data within the data buffers 205. Thus, a total of 128 bits of read data are output from memory device W in response to the column read command, with four bytes being output via respective memory-side byte lanes in each of four consecutive 5 ns data-bit intervals. In the tRR interval immediately following output of read data word 280 from memory device 207 (i.e., tRR interval 27I4), the read data may be output in more serial form (282) from the data buffers 205 to CPU 201A where it is buffered in memory queue 221-1. As shown, each of the data buffers 205I-205L may output a set of eight data bits, 8xQ, along with error bits EW and ER in each 5ns interval of tRR interval 27I4 and via a respective one of control-side data links DQ0-DQ3. Thus, each data buffer may output 32 bits of data and eight error bits over tRR interval 27I4, with the data buffers collectively returning 128 bits of data and 32 bits of error information to CPU 20 IA in response to the activation and column read commands issued in tRR interval 271 \. As discussed in further detail below, the error read bit, ER, included with each read data byte may be generated by an error-bit generator (e.g., a parity bit generator) within one of the data buffers 205 based on the corresponding read data byte. The error write bit, EW, may be generated based on one or more write data bytes received within the data buffer in prior write transactions.
[0024] Figure 6 is an exemplary timing diagram for a memory write operation carried out within the cross-threaded memory system 200 of Figure 3 and, like Figure 5, shows in particular the control information and data conveyed between memory queue 221-1 (Queue 1) of CPU 201 A and memory device 207 W of memory subsystem 2031. As in Figure 5, the tRC interval may be 80ns, and the tRR interval 20ns, thus permitting a total of 40 bits of information (2 bits/ns) to be transferred via each of the 2Gb/s links between Queue 1 and memory subsystem 2031 in each tRR interval. At the start of a memory write transaction, an activation command may be conveyed via the control-side CA link in the 20-bit interval that constitutes the first half of tRR interval 31 I 1, and may be deserialized by the address buffer to generate a row-activation control word 320 ("ACT") that may include the address of the row to be activated (13xA), and a corresponding row-activation command in a 3 -bit command code (encoded within WE, CAS and RAS signals). As in Figure 5, the row-activation control word 320 may be output onto the memory-side CA path at 0.1 Gb/s ("ACT") at a command path bit time (tcABiτ) of 10ns and thus spans the second half of tRR interval 31 I 1. Because only sixteen bits of information are conveyed via the memory-side CA path per 10ns interval versus 20-bits via the control-side CA path, additional bandwidth may be available on the control-side CA path and may be used convey error information and/or to support error handling protocols.
[0025] During the second half of tRR interval 3112, while the activation control word 320 is conveyed to memory device 207W via the memory-side CA path, a column write command is conveyed to the address buffer 206 via the control-side CA path. As with the activate command/address, the address buffer 206 converts the serialized write command into a 16-bit column write control word 322 ("WR") that includes three-bit column- write code (encoded within the WE, CAS and RAS signals) and a 13 -bit column address. The correspondence between the command/address information conveyed via the control-side CA path and the memory-side CA path is shown by light grey shading for row-activation control word 320 and dark grey shading for column write control word WR 322.
[0026] Memory device 207 W may respond to the row-activation control word ACT 320 by activating the address-specified row of memory cells within a selected memory bank, thus making the contents of the row available for read and write access in subsequent column operations. As discussed above, the bank address may be stepped through a predetermined sequence of values in successive tRR intervals and thus may be generated within address buffer 206 (e.g., by a modulo counter), within one or more of the CPUs 201, or within another integrated circuit device. In any case, after the row activation is completed and a predetermined time, tCAC, after the column write control word 322 has been received, write data may be transferred from the data buffers 205I-205L to memory device 207W for storage therein at the column address specified within the column write control word (and in the bank specified by the sequenced bank address), thus effecting a column write operation. As shown, the write data may be output from the data buffers 205 to memory device 207W during four successive 5ns data-bit intervals (i.e., toQBiτ, 0.2Gb/s) within tRR interval 3113, and in respective slices via the four byte-wide data lanes, DQ0-DQ3, that constitute the 32-bit data path coupled to memory device 207W. In one embodiment, the signals transmitted to the memory device 207W may be counterparts to those transmitted by the memory device 207W during a memory read, and thus include eight data bits (8xQ) accompanied by a differential data strobe signal (2xDQS). Accordingly, a total of 128 bits of write data may be transmitted to memory device W in conjunction with the column write command, with four bytes being output via respective byte lanes DQ0-DQ3 in each of four consecutive 5 ns data-bit intervals. The memory device 207W may store the write data at the address-specified column location of the address-specified bank to conclude the memory write operation.
[0027] Figure 7 illustrates an embodiment of an address buffer 350 that may be used to implement the address buffer 206 of Figure 3. As shown, the address buffer 350 may include four conversion circuits 351 !-35I4, each having a high-speed serial control interface 352 to receive serialized command/address signals (ADRA-ADRD) from a respective memory access requestor (e.g., a respective one of CPUs 201A-201D in Figure 3), and a memory interface 375 to output command/address information in parallel to a respective memory device (e.g., a respective one of memory devices 207W-207Z in Figure 3). Following the timing and path- width examples described in reference to Figures 3-5, each control interface 352 may be a single- link differential interface having a differential receiver 353 to sample an incoming signal at 2Gb/s. Single-ended signaling interfaces may be provided in alternative embodiments. In one embodiment, a relatively low- frequency clock signal referred to herein as a framing signal 370 ("Frame") may be supplied to the address buffer 350 (and to each of the corresponding data buffers as described below) to provide a frequency reference and to frame transmission of related groups of signals. For example, in one embodiment, the framing signal 370 may be a 100 MHz clock having a rising edge at the start of each half tRR interval, and thus frames 20-bit transmissions on the 2 Gb/s control-side data and command/address paths, two-bit transmissions on the 0.2 Gb/s memory-side data paths, and single-bit transmissions on the 0.1 Gb/s memory- side command/address paths. The address buffer 350 (and corresponding data buffers) may include clocking circuitry (e.g., phase-locked-loop or delay-locked-loop circuitry and corresponding phase-adjust circuitry) to generate 2Gb/s control-side timing signals having desired phase offsets relative to the framing signal 370 or another reference. The address buffer 350 (and corresponding data buffers) may similarly include clock synthesis circuitry to generate timing signals (e.g., clock signal, CK, and write data strobe DQS) that are output to the memory devices to time reception of command/address and write data signals, and to enable the memory devices to generate read data timing signals (e.g., read data strobe, DQS). [0028] Referring to address conversion circuit 35I 1, which is representative of the operation of counterpart address conversion circuits 3512-3514, the incoming 2Gb/s command/address signal, ADRA, is sampled and deserialized (i.e., converted to parallel form) by receiver 353 to generate a 10-bit parallel command/address value 354 (PAA) every 5 ns (i.e., at 0.2Gb/s). In one embodiment, each command/address value 354 includes eight bits of command/address information and an error-check bit (e.g., a parity bit), and is supplied to an error detection circuit 355 and also to an input port of a four-port multiplexer 357i (or other selector circuit). The error detection circuit 355 generates an error-check bit based on the corresponding command/address byte and compares the generated error-check bit with the received error-check bit to generate an error indication 380 (ERAA) having a high or low state (signaling error or no error) according to whether the error-check bits match. Counterpart address conversion circuits 3512-3524 simultaneously generate error indications, ERAB, ERAC and ERAD, SO that four error indications 380 are generated during each 5ns command/address reception interval.
[0029] Channel multiplexer 357i outputs either command/address value PAA (354) or one of the three command/address values PAB-PA0 from counterpart conversion circuits 351, as a selected command/address value 360, depending on the state of a channel select signal 356. Each of the channel multiplexers 3572-3574 within the counterpart conversion circuits 3512-3514 are coupled to receive the PAA-PAD values at respective input ports in an interconnection order that yields the following selection of command/address values (360) for the four possible values of a two-bit channel select signal 356:
Figure imgf000018_0001
Table 1
[0030] Still referring to representative conversion circuit 35I 1, the selected command/address value 360 is supplied to a delay circuit 359 which introduces a selectable delay in accordance with a delay select value 358. For example, in one embodiment, the delay circuit 359 is implemented by shift register in which the selected command/address value 360 is shifted forward from tail to head in response to a shift-enable signal (e.g., in response to the 2Gb/s sampling clock signal or a phase-shifted and/or frequency-divided version thereof), with the total number of storage stages from tail-to-head being selected to achieve a desired delay between receipt of an incoming serialized command/address value at control interface 352, and output of a final command code and address value at memory interface 375. After passing through the delay circuit 359 (which may alternatively be disposed in advance of the channel multiplexer 357i), the resulting delayed command/address value 362 is supplied to a 2:1 deserializing circuit 361 which converts each successive pair of delayed, 10-bit command/address values 362 (each value 362 received at 0.2Gb/s) to a final 20-bit command/address value 364, with the resulting sequence of final command/address values 364 being output at 0.1Gb/s. As shown, within each 20-bit command/address value, four bits are unused, and the remaining 16 bits are output via memory interface 375. More specifically, command transmitter 365 outputs a 3-bit command encoded into signals WEw, RASw and CASw ( the 'W subscript denoting that the command is directed to Memory W), and address transmitter 367 outputs a corresponding 13-bit address value, Aw[12:0]. Counterpart conversion circuits 3512-3514 concurrently output 3 -bit command codes and 13 -bit address values directed to memory devices X, Y and Z. [0031] Still referring to Figure 7, a set of configuration signals 374 (Config[2:0]) may be provided to the address buffer 350 to control various functions (e.g., establishing termination impedance, signaling calibration, etc.) and operating modes therein. For example, in one embodiment, the address buffer 350 includes circuitry to support operation as either an address buffer as described above and in reference to address buffer 206 of Figure 3, or a data buffer as described below and in reference to data buffer 205 of Figure 3. In this way, a given buffer device may be programmed to operate as either an address buffer or a data buffer, thus avoiding the need to fabricate separate integrated circuit devices. Other configurable aspects of the device may include error detection policies, delay ranges, signal fan-out, signals driven on otherwise unused portions of the 20-bit output bandwidth, and so forth. The configuration signals may also be used to select timing calibration modes during which phase offsets between reference and internal clock signals (or strobe signals or other timing signals) are established. [0032] Figure 8 illustrates an embodiment of a data buffer 400 that may be used to implement data buffers 205I-205L of Figure 3. Data buffer 400 includes four conversion circuits 4011-4014, each having a high-speed serial interface 402 to support serialized read and write data transfer to/from a respective memory access requestor (e.g., a respective one of four CPUs 201A- 20 ID in Figure 3), and a lower-speed parallel-I/O memory interface 432 to support parallel read and write data transfer to/from a respective one of memory devices W-Z (e.g., memory devices 207W-207Z in Figure 3). Following the timing and path- width examples described in reference to Figures 3-5, each high-speed serial interface 402 may include a single-link, differential signal receiver to sample an incoming serial data signal at 2Gb/s. The framing signal 370 provides a frequency reference and frames transmission of related groups of signals as described in reference to Figure 7. In the embodiment of Figure 8, and corresponding timing diagrams described below, the framing signal 370 may be a 100 MHz clock signal having a rising edge at the start of each half tRR interval, and thus frames 20-bit transmissions over the control-side signal link coupled to interface 402, and two-bit transmissions on each memory-side data line coupled to interface 432. As with the address buffer of Figure 7, the data buffer 400 may include clocking circuitry (e.g., locked-loop circuitry and corresponding timing adjustment circuitry) to generate 2Gb/s control-side timing signals having desired phase offsets relative to the framing signal 370, as well as clock synthesis circuitry to generate timing signals (e.g., strobe signals and clock signals having a desired phase relationship to the framing signal 370) that are output to the memory devices W-Z to time reception of address and write data (e.g., clock signal, CK, and write data strobe DQS) therein, and to enable the memory devices to generate read data timing signals (e.g., read data strobe, DQS).
[0033] Referring to conversion circuit 4011, which is representative of the operation of counterpart conversion circuits 4012-4013, write data delivered in the incoming 2Gb/s data signal, DA, may be sampled and deserialized by receiver 403 to generate a 10-bit parallel data value 404 every 5 ns (i.e., at 0.2Gb/s), PDA- In one embodiment, each data value 404 may include a write data byte (i.e., 8 bits of write data), a data mask bit that indicates whether the write data value is to be written within the selected memory device, and an error-check bit generated by the memory access requestor based on the write data byte and mask bit. Data value 404 may be supplied to an error detection circuit 405 and also to an input port of channel multiplexer 4071 (or other selector circuit). The error detection circuit 405 re-generates an error-check bit based on the write data byte and data mask bit, and compares the re-generated error-check bit with the received error-check bit to generate a write-data error indication 412 (ERWA) having a high or low state (signaling error or no error) according to whether the error-check bits match. The write-data error indication 412 may be supplied to an error generator circuit 433 along with the address-error indicator 380, ERAA, generated by counterpart address conversion circuit 35I 1 of Figure 7. The other conversion circuits 4012-4014 may generate write-data error indications 412, ERWB, ERWC and ERW0 simultaneously with conversion circuit 4011 (i.e., so that four error indications are generated within the data buffer 400 during each 5ns interval), and may include counterpart error generator circuits 433 to process corresponding write-data error indications 412 (i.e., ERWB-ERWD) as well as the address-error indications 380 (i.e., ERAB-ERAD) from a respective one of address/conversion circuits 3512-3514. As discussed below, error generator circuit 433 generates a read-data error indication (ERRA) based on read data received from the memory-side data interface and packs the read error information, write-data error indication and address-error indication into a parallel read-data value 420 (PQA) to be returned to the memory access requestor as part of a data read operation.
[0034] The channel multiplexer 407] outputs either write data value PDA (404) or one of the three write data values PDB-PDD from counterpart data conversion circuits 4012-4014, as a selected write data value 408, depending on the state of channel select signal 356. Each of the channel multiplexers 4072-4074 within the counterpart conversion circuits 4012-4014 may be coupled to receive the PDA-PDD values (404) at respective input ports in an interconnection order that yields the following selection of write data values (408) for the four possible values of a two-bit channel select signal 356:
Figure imgf000021_0001
Table 2
[0035] As with the selected command/address value 360 of Figure 7, the selected write data value 408 may be supplied to a delay circuit 409 which introduces a selectable delay in accordance with a delay select value 434 (which may be the same as or different from delay select value 358 of Figure 7). After passing through the delay circuit 409 (which may alternatively be disposed in advance of the multiplexer 4071), the resulting delayed write data value 410 may be output at 0.2Gb/s via memory interface 432. More specifically, the write-data byte (DQw) is output by data transmitter 411 and write data mask bit (DMw) is output by mask transmitter 413, with one of the ten bits of the write data value 410 being unused. In one embodiment, a strobe generator 417 is provided to generate a data strobe signal (DQS) that is output in a desired phase relationship with the write data and mask bit (note that the data strobe signal may be differential or single-ended, depending upon the application). For example, in one implementation, the data strobe signal may be aligned with mid-points of data eyes to establish a desired, quadrature sampling point, and may transition in synchronism with each successive write-data/mask output, thereby cycling at a maximum frequency of 100 MHz (toggling at 200 MHz).
[0036] In the embodiment of Figure 8, conversion circuit 401 \ may include a clock transmitter 419 and clock-enable transmitter 421 to output, respectively, a differential clock signal (CKw) and corresponding clock-enable signal (CKEw), thereby providing a master clock signal to the memory device that may be used to synchronize internal operations and time reception of selected signals therein (e.g., command and address signals). In one embodiment, the frame signal 370 may be output as the clock signal (e.g., at 100 MHz), though a phase-adjust circuit may be provided to establish a desired phasing between the clock signal, CK, and write data signals. Circuitry may also be provided to deassert the clock-enable signal, CKE, if no transactions are directed to the corresponding memory device, thus disabling clocking of the memory device and saving power. A bank address transmitter 423 may be provided to transmit bank address signals, BAw, to the memory device based on the incoming bank address signal BA[1 :0] 372. As discussed, the bank address 372 may be sequenced through a predetermined pattern by a memory access requestor (e.g. one of the CPUs 201 of Figure 3) or other device to enable round-robin or other sequential access to each of the storage banks within the corresponding memory device.
[0037] Referring to Figure 8 and Figure 3, it should be noted that the same set of clock, clock-enable and bank address signals (collectively 438) may be provided to each of the memory devices within a given memory subsystem, and therefore that the signal transmitters 419, 421 and 423 within conversion circuit 4011 may be used to supply the clock, clock-enable and bank- address signals to each memory device. In such an arrangement, the clock, clock-enable and bank-address transmitters within the other conversion circuits 4012-40l4 and within other data buffers 400 may be left unconnected or may be omitted altogether. Alternatively, each conversion circuit 401 may include transmitters 419, 421 and 423 to drive the clock, clock- enable and bank address signals to a respective one of the memory devices (W-Z) within a memory subsystem, in which case the corresponding signal transmitters may still be left unconnected (or omitted altogether) and the signal transmitters within the other three data buffers 400 used to drive clock, clock-enable and bank address signals to the remaining three memory devices. In yet another alternative embodiment, a subset of the conversion circuits 401 within a given data buffer 400 may drive clock, clock-enable and bank-address signals to respective subsets of the memory devices (e.g., two of the conversion circuits 401 may each drive clock, clock-enable and bank address signals to a respective pair of memory devices). [0038] During a memory read operation, read data is received within conversion circuits 4011-4014 via respective byte- wide data paths (i.e., DQw, as shown, and DQx-DQz, not specifically labeled) and sampled in receiver circuits 431 (i.e., one byte-wide receiver 431 per conversion circuit 401) in response to a data strobe signal (DQS) output from the memory device via the differential DQS signal link. The resulting read data byte 440 is forwarded to error generator circuit 433, which generates an error-check bit (e.g., a parity bit based on the read data byte 440) to be returned to the memory access requestor along with information that indicates, based on error indications 380 and 412, whether an error has occurred within a previously received write data byte or command/address value. An error-identifier encoding scheme may be used to indicate the specific write data byte and/or command/address value (i.e., within a sequence of prior write data bytes or command/address values) in which the error was detected. Embodiments of such error-identifier encoding scheme are described, for example and without limitation, in U.S. Patent Application No. 11/330,524, filed January 11, 2006 and entitled Unidirectional Error Code Transfer for a Bidirectional Link." U.S. application No. 11/330,524 is hereby incorporated by reference.
[0039] Continuing with the read data path within the embodiment of Figure 8, the error generator 433 outputs a 10-bit read-data value 420 (PQA), which may be supplied to an input port of channel multiplexer 435. In one embodiment, the read-data value 420 may include the read data byte received from the corresponding memory device, the error-check bit generated based on the read data byte, and an error-indication bit that forms part of a sequence of error-indication bits within the above-mentioned error-identification scheme (i.e., identifying write-data errors and/or command/address errors). Read values PQB-PQD from the other conversion circuits 4012- 4014 may be received at the remaining input ports of the channel multiplexer 435 to enable read data to be returned from any of memory devices W-Z to the memory access requestor coupled to data conversion circuit 40I 1. Each of the channel multiplexers 435 within the counterpart conversion circuits 4012-4014 may be coupled to receive the PQA-PQD values (420) at respective input ports in an interconnection order that yields the following selection of read data values (448) for the four possible values of a two-bit channel select signal 356 (note that a separate channel select signal may be provided to control the read data path):
Figure imgf000023_0001
Table 3
[0040] Channel multiplexer 435 outputs the selected read-data value 448 to delay circuit 437 in accordance with the channel select signal 356, and the delay circuit 437 delays the selected read-data value 448 by some time interval as generally described in reference to Figure 7 (e.g., the time interval indicated by the delay select value 434 or a different delay select value). By this operation, a sequence of delayed-read data values 450 are output from the delay circuit 437 at 0.2Gb/s and provided to a serializing output driver 439 which outputs the read data and error information included therewith via high-speed serial interface 402 at 2Gb/s. [0041] Figure 9 illustrates an exemplary timing arrangement for a memory read operation within a cross-threaded memory system that includes the address buffer 350 shown in Figure 7 and data buffers 400 as shown in Figure 8. Initially, a pair of 20-bit serial command/address values 501 and 502 are output via the serial, high-speed command/address link between a first control queue of CPU A and a corresponding conversion circuit 351 within address buffer 350 (designated "CPUA: 1 -ADR"). Address buffer 350 converts each of the serial command/address values 501, 502 into a respective parallel 13-bit address value and corresponding 3-bit command value and outputs the parallel address and command values via memory-side address lines A[12:0] and command lines (WE, CAS, RAS), respectively. More specifically, the serial command/address value 501, is output, in parallel form, as an activation command (ACT) and corresponding row address (ROW) as shown at 505, and serial command/address value 502 is output as a column-read command (READ) and corresponding column address (COL) as shown at 506. As described in reference to Figure 8, a clock signal "CK±" (e.g., the frame signal or a clock signal derived from the frame signal), is output from at least one of the data buffers 400 along with a clock-enable signal (CKE), and rotating bank address (BA). As discussed, the bank address may be sequenced (e.g., rotated) between bank selection values, P, Q, R, S, in successive tRR intervals. As shown, the clock signal is transmitted in rising-edge alignment with the activation and column-read commands so that the falling edge of the clock signal (or phase adjusted version thereof) may be used to trigger sampling of the command and address signals at the memory device. In other embodiments the phase relationship of CK and the command and address signals may be shifted from that shown. In the timing arrangement of Figure 9, the time delay (tRCD) between receipt of the activation command 505 and the column-read command 506 is one clock cycle, and the time delay (tCAC) between receipt of the column-read command 506 and the output of read data on the memory-side data path, is also one clock cycle. Different timing delays may apply in different embodiments.
[0042] Still referring to Figure 9, read data is output via the 32-bit data interface of the selected memory device, with each of four data bytes being output to a respective data buffer 400 via a byte-wide data lane (DQ0[7:0]-DQ3[7:0]). By this operation, four slices of read data are routed back to the memory access requestor via four data buffers 400, respectively (e.g., via data buffers DI-DL as described in reference to Figure 3). As shown, the bit time on each data line (tDQBiτ) is 5ns in this example, thus effecting a double data rate transfer as a different set of data bits are transmitted during each half-cycle of the clock signal, CK. Other data rates may be applied in alternative embodiments or different operating modes.
[0043] In one embodiment, the overall data transfer takes place over a 20ns tRR interval, and thus includes four successive byte- wide data transfers (i.e., burst length = 4 bytes) per data lane for a total of 128 bits of data (16 data bytes) per column read. A data strobe signal DQS may be output along with each byte and may be edge-aligned with the read data as shown (with the data receiver within the data buffer having timing delay circuitry to establish a quadrature sampling offset relative to the edge-aligned strobe) or may be quadrature aligned with the read data. The data mask signal line, which may be viewed as completing the data lane for each of lanes DQO- DQ3, may remain unused during memory read operations.
[0044] In the tRR interval that follows transmission of the read data from the memory device to data buffers 400, the data buffers may output the read data to the appropriate control queue within the memory access requestor along with the above-described error information. More specifically, each of the data buffers 400 (e.g., buffers DI-DL as shown in Figure 3), may output two 20-bit serial read data bursts 526 in succession via a respective one of the control-side data links (designated CPU(A:0)-Di through CPU(A: 1)-DL in Figure 9) to effect a 40-bit transmission per data buffer and 160 bits in the aggregate. As shown, each 20-bit serial read data burst 526 includes the two bytes 522 output from the memory device during the corresponding portion of the prior tRR interval, as well as an error-check bit (ER) per read data byte, and an error bit (EW) that may be used as part of an error signaling protocol to identify errors detected in preceding write-data or command/address transfers. Accordingly, the 160 bits transferred via the highspeed serial links include the 128 bits of read data output from the memory device, and 32 bits of error information.
[0045] Figure 10 illustrates an exemplary timing arrangement for a memory write operation within a cross-threaded memory system that includes the address buffer 350 shown in Figure 7 and data buffers 400 as shown in Figure 8. The memory write operation may be initiated by a pair of 20-bit serial command/address values 551 and 552 transmitted via the high-speed serial command/address link CPUA: 1 -ADR. Address buffer 350 may convert each of the serial command/address values 551, 552 into a respective parallel 13-bit address value and corresponding 3 -bit command value and outputs the parallel address and command values via memory-side address lines A[12:0] and command lines (WE, CAS, RAS), respectively. More specifically, the serial command/address value 551 may be output, in parallel form, as an activation command (ACT) and corresponding row address (ROW) as shown at 555, and serial command/address value transmitted in the following tRR interval 552 is output as a column write command (WRITE) and corresponding column address (COL) as shown at 556. As discussed in reference to Figures 8 and 9, a clock signal (CK±) may be output from at least one of the data buffers 400 along with a clock-enable signal (CKE), and rotating bank address (BA). As in the timing arrangement of Figure 9, the time delay (tRCD) between receipt of the activation command 555 (ACT) and the column write command 556 (WRITE) is one clock cycle. [0046] In the tRR interval immediately following transmission of the serial command/address values 551 and 552 to the address buffer, write data may be output from the CPUA control queue, to each of four data buffers via respective high-speed serial data links CPU(A: 1 )-Di - CPU(A: 1 )-DL. In one embodiment, the write data output via each link may include two 20-bit data bursts (560) per tRR interval, with each 20-bit data burst 560 including two write data bytes, two data mask bits and two error-check bits; one data mask bit and one error-check bit per data byte. By this operation, four write data bytes, four data mask bits and four error-check bits may be transmitted to each of the four data buffers per tRR interval, thus effecting a total transfer of 128 write data bits (16 bytes), 16 data mask bits and 16 error-check bits, for a total of 160 bits per column write operation.
[0047] Following the example in Figure 9, the time delay between receipt of the activation command and the column write command, tRCD, may be one clock cycle, and the time delay between receipt of the column-read command and write data output on the memory-side data path, tCWD, may also be clock cycle (different timing delays may apply in different embodiments). Accordingly, during the tRR interval that follows write data transmission from the memory access requestor to the data buffers, each of the data buffers may output a sequence of four write data bytes to the selected memory device via a respective one of data lanes DQO- DQ3, with each 20 bit write data value 560 being output in a successive pair of byte- wide data transfers 562. A data strobe signal, DQS, may be output in either quadrature or edge alignment with the write data (quadrature alignment is shown in Figure 10) via the data strobe line, and a data mask value is output via the data mask line. Thus, a total of four bytes (32 bits) and four corresponding data mask bits may be provided to the selected memory device via respective data lanes, with a total of 16 bytes (128 bits) and 16 data mask bits being provided per column write operation.
[0048] Figure 11 illustrates an exemplary arrangement of memory access queues within the CPUs 201A-201D of Figure 3 and their relation to memory banks P-S within memory devices 207W-207Z of memory subsystems 203 [-2034. As shown, each of the CPUs 201 may include four queue arrays 60Oi -60O4, one for each of the memory subsystems 203, with each queue array 600 including four columns of control queues that correspond to the memory devices 207 W- 207Z within the corresponding memory subsystem 203, and four rows of control queues that correspond to banks P, Q, R and S within the individual memory devices 207. Thus, for example, queue array 60Oi within each of the CPUs 201A-201D includes a control queue 605 at column three and row three (i.e., starting from left most column 1 and topmost row 1) that corresponds to the third bank (R) within the third memory device (Y) of memory subsystem 2031. As another example, queue array 60O4 within each of the CPUs includes a control queue 607 at column four, row one that corresponds to the first bank (P) within the fourth memory device (Z) of memory subsystem 2034. Note that a similar queue arrangement may be implemented with other types of memory access requestors. In one embodiment, as memory access requests are received (or generated, for example as part of program execution), the address values associated with the memory access requests are parsed to determine which memory subsystem 203, memory device 207, and memory bank 209 is to be accessed to carry out the request, and the appropriate command, address and data are queued therein. In the case of a memory write operation, write data may be queued along with the memory address and transferred to the target memory subsystem, memory device and memory bank in queued order. In a memory read operation, the returned read data may be queued in an outbound queue (e.g., part of or associated with the control queue which sourced the corresponding memory read command) or similar structure for return to an external requestor or other circuitry (e.g., core processing circuitry) within the host device.
[0049] Figure 12 illustrates physical and logical views (631, 633) of a memory device 630 according to an embodiment that may be used in a cross-threaded memory system and that includes on-die integrated circuitry to enable multiple access requestors to simultaneously access respective storage resources. That is, instead of providing separate integrated circuit buffer devices as, for example, in the embodiment of Figure 1, circuitry for performing the signal conversion and multiplexing (i.e., switching) functions described above may be provided on the integrated circuit die that includes the core storage array (or arrays) and access control circuitry. [0050] Referring first to physical view 631, two sets of storage banks 635 W and 635X form the core storage arrays of memory device 630 and are partitioned into lateral sets of sub-banks along a symmetry line 639. More specifically, the individual sets of sub-banks 637wo and 637wi on opposite sides of the symmetry line 639 collectively form storage banks 635W (the "W" storage banks), and individual sets of sub-banks 637χ0 and 637χi collectively form storage banks 635X (the "X" storage banks). The memory device 630 additionally includes a data interface 641 and control interface 645. The data interface 641 is partitioned along symmetry line 639 into a pair of lateral data interfaces 6430 and 643 b Each lateral data interface 643 includes a set of data input/output (I/O) ports (dA-dD) for connection to a respective memory access requestor (not shown), so that the memory device 630 supports direct (or indirect) connection to as many as four memory access requestors (e.g., CPUs, memory controllers and so forth as described above). More or fewer data I/O ports may be provided within data interface 641 in alternative embodiments, thus permitting more or fewer connections to memory access requestors. Also, in the particular embodiment shown, each of the data I/O ports, dA-dD, within a given lateral data interface 643 includes a set of eight differential transceivers to receive write data and output read data in bytes (i.e., 8-bit values) via respective differential data links (i.e., 8 DQ pairs). Single- ended transceivers may be used to send and receive signals via single-ended signaling links in alternative embodiments.
[0051] Still referring to Figure 12, each of the data I/O ports within a lateral data interface 643 o is coupled to a multiplexing circuit 6490 which responds to a channel-select signal (not shown) to switchably couple one of the data I/O ports, dA-dD, to sub-bank set 637wo via internal data path 65Oo and another of the data I/O ports to sub-bank set 637χ0 via internal data path 65I0. Within lateral data interface 643], multiplexing circuit 6491 similarly responds to the channel- select signal to switchably couple one of data I/O ports dA-dD to sub-bank set 637wi via internal data path 65Oi and another of the data I/O ports to sub-bank set 637χi via internal data path 651 \. By this arrangement, any one of the four data I/O ports in a given lateral data interface 643 may be switchably coupled to an internal data path 650 to access the 637w sub-banks during a given interval, and any other of the four data I/O ports may be switchably coupled to internal data path 651 to access the 637χ sub-banks during that same interval. In the particular embodiment shown, the width of the individual internal data paths 65Oo, 650i, 65I0 and 6511 corresponds to the width of the data I/O port (i.e., 8 bits wide in this example), though serializing circuitry may be provided at the interface to the sets of sub-banks (and/or within the data I/O ports themselves) for converting a wider read data word retrieved from a selected sub-bank into a sequence of byte- sized read data values (and conversely deserializing a sequence of byte-sized write data values to form a wider write data word for storage in the selected sub-bank).
[0052] Still referring to the physical view (631) of memory device 630, the control interface 645 is organized in generally the same manner as the lateral data interfaces 643, and includes four control ports, cA-cD, each for coupling to a respective memory access requestor (more or fewer control ports may be provided in alternative embodiments) and a multiplexer ("cMux") for switchably coupling one of the control ports, and thus the corresponding memory access requestor, to a selected one of access control logic circuits 65 Iw and 65 lχ for the W and X storage banks, respectively. By this arrangement, one of the memory access requestors may be switchably coupled to access control logic 65 Iw, while any other one of the memory access requestors is simultaneously switchably coupled to the access control logic 65 lχ, thus permitting two memory access requestors to concurrently (i.e., at least partly overlapping in time) issue memory access commands or requests to the access control logic circuits 65 Iw and 65 lχ, and thereby initiate independent, concurrent memory accesses in the W and X storage banks. [0053] Referring to logical view 633, it can be seen that the counterpart 8-bit data I/O ports (dA-dD) within lateral data interfaces 6430 and 6431 collectively form respective 16-bit data I/O ports (dA-dD) within the overall data interface 641 which are coupled via multiplexer 649 (a logical representation of the two multiplexers 64% and 6491 shown in physical view 631) to storage banks sets 635W and 635X via respective 16-bit internal data paths 650 and 651. The control interface is omitted from logical view 633 to avoid obscuring the data interconnection arrangement.
[0054] Reflecting on the memory device of Figure 12, referred to herein as a cross-threading memory device, it can be seen that each of the multiple data I/O ports and control ports may be used to access a separate storage resource during a given time interval. However, while the cross-threading memory device 630 has multiple memory access interfaces, each of the storage resources (i.e., storage banks or sub-banks) itself is single-ported in the embodiment of Figure 12, having a single set of bit lines for data storage and retrieval, and thus supporting access by only one memory access requestor at a time (i.e., a single-port storage cell performs only one transaction at a particular time). Such an arrangement is in contrast to a multi-port storage array that has two or more sets of bit lines coupled to the array of storage elements to support two simultaneous accesses to the same storage array. More specifically, a multi-port storage cell (i.e., constituent element of a multi-port storage array) typically needs a word line and a bit line per port. Such an arrangement consumes area and is typically only used for register arrays and other storage applications that need a limited number of storage cells. By contrast, the memory device of Figure 12 and other embodiments herein may be applied within memory systems that utilize storage cells with a single access port (one access is occurring at any point in time) in which there is a single word line and a single bit line (examples of such memories include, without limitation, commodity DDR, GDDR and XDR memories in which concurrent operations are generally possible only because there are multiple independent banks capable of staggered (pipelined) operation). If the bit line is differential, then there may be two conductors forming the bit line. In some memory components there may be a separate read and write word line for a storage cell, but they are generally not utilized simultaneously. Note that a bank array (such as array W in Figure 12) formed with single-port storage cells may perform more than one transaction at a time, but only one transaction phase (activate, read, write, precharge) is occurring in the bank array at a particular time (i.e., the transactions are pipelined). Also, with two or more bank arrays (W and X in Figure 12) it is possible to perform the same phase of two different transactions simultaneously within the memory component (i.e., operations may be "micro- threaded").
[0055] Figure 13 illustrates an embodiment of a memory module 670 having multiple cross- threading memory devices 6300-6303 (also referred to as MemO-Mem3) coupled in a paired multi-drop configuration. That is, even numbered memory devices 63O0 and 63O2 (each of which may be implemented generally as described in reference to Figure 12) are coupled in a multidrop arrangement to first set of module data paths 671AE-671DE, while odd numbered memory devices 63O1 and 63O3 are coupled in a multi-drop arrangement to a second set of module data paths 671AO-671DO- By this arrangement and by coupling each of multiple access requestors, AR1-AR4 to a respective one of module data paths 671AE-671DE and to a respective one of module data paths 671AO-671 DO as shown, one memory device in each multi-drop pair may be accessed by a first pair of memory access requestors via a first pair of data I/O ports (and corresponding control ports, not shown), and the other memory device in each multi-drop pair may be accessed by a second pair of memory access requestors via a second pair of data I/O ports. Consequently, in a system having four memory access requestors as shown in Figure 13, each memory access requestor may simultaneously access storage resources within a selected pair of cross-threading memory devices 630. In the particular embodiment of Figure 13, for example, a round-robin sequence of memory accesses directed to storage banks W, X, Y and Z may be carried out via data I/O ports A, B, C and D (and corresponding control ports) as follows:
Figure imgf000031_0001
Table 4 ("sb" = storage bank and "port" = data I/O port)
[0056] In one embodiment, memory devices 63O2 and 63O3 are disposed on the backside of the module substrate 675 (i.e., on an opposite face of the module substrate from memory devices 63Oo and 63O1) and form a separately selectable rank of memory devices (i.e., a rank being a group of memory devices that may be selected and/or enabled as a unit to output read data or receive write data) from memory devices 63O0 and 63O1. Also, while two pairs of cross- threading memory devices 630 are shown, any number of additional pairs of cross-threading memory devices 630 may be provided and connected to the access requestors AR1-AR4 (or to more or fewer access requestors) to provide increased data transfer width. For example, in the embodiment of Figure 13, memory devices 63Oo and 63Oi may be concurrently accessed via respective 16-bit data I/O paths 671AE and 671AO by access requestor ARl, thus enabling 32-bit read and write data transfer to the W or X storage banks (i.e., in simultaneous 16-bit accesses to constituent sub-banks). Similar 32-bit read and write transfers may be carried out simultaneously in the alternate storage banks (X or W) of memory devices 63Oo and 630i , and in the Y and Z sub-banks of memory devices 63O2 and 63O3. By increasing the number of memory device pairs to four, six, eight, etc., the effective read/write data width may be increased to 64-bits, 96-bits, 128-bits and so forth. Also, while 16-bit per data I/O port is illustrated in the embodiments of Figures 12 and 13, narrower or wider data I/O ports may be provided in alternative embodiments, thereby also permitting smaller or larger read/write data widths with respect to individual memory access requestors. Further, as discussed above, more or fewer data I/O ports may be provided in each cross-threaded memory device 630 to support connection to, and/or simultaneous access by, more or fewer memory access requestors.
[0057] Figure 14 illustrates physical and logical views (691 and 693) of a cross-threading memory device 690 according to an embodiment in which two additional pairs of storage banks, 695 Y and 695Z, are provided for a total of four sets of storage banks (695 W, 695X, 695Y and 695Z) that may be independently and simultaneously accessed via two sets of four data I/O ports (dA-dD) and four control ports (cA-cD). In the particular embodiment shown, W and X storage sub-banks on either side of symmetry line 698 (i.e., W sub-banks 697wo and
Figure imgf000032_0001
and X sub- banks 697χo and 697χi) are disposed in an interleaved arrangement and coupled to the multiplexers 70I0 and 7011 via respective internal data paths 7O2wo and 702χ0 (and 702wi and 702χ0, and the Y and Z storage sub-banks, 697γo/697zo and 697γi/697zi are similarly interleaved and coupled to the multiplexers 70 Io and 7011 via respective internal data paths 702 Y0 and 7O2zo (and 702γi and 702zi). In alternative embodiments, the storage-subbanks 697 may be disposed in a non-interleaved arrangement, for example, with W sub-banks and Y sub- banks disposed together nearest the multiplexers 701, and the X sub-banks and Z sub-banks disposed together further from the multiplexers 701. Also, to enable simultaneous access to all four of the storage banks, 695 W, 695X, 695 Y and 695Z, each of multiplexers 70I0 and 7011 may include circuitry for establishing two independent 8-bit data-path interconnections (e.g., via 2x8 DQ pairs) between first and second data I/O ports (i.e., any two of data I/O ports, dA-dD) and address-selected sub-banks within the W, X, Y or Z sets of storage banks 697. By this arrangement, four different memory access requestors may concurrently access the four different storage banks 695W-695Z during a given time interval; two via multiplexer 70I0, and two via multiplexer 701 \. Similarly, control multiplexer ("cMux") includes circuitry for establishing four independent control-path interconnections (instead of just two as in the embodiment of Figure 11) between the four control ports (cA-cD) and access control logic 7O5o/7O5! for the W, X, Y and Z storage banks. Thus, as shown in logical view 693, the cross-threading memory device 690 effectively provides simultaneous access to four sets of storage banks (W-Z) via respective 16-bit data paths, with multiplexer 701 (a logical representation of the function collectively performed by physical-view multiplexers 70I0 and 7Oh) switchably coupling each of data I/O ports dA-dD (logical representations of the pairs of data I/O ports dA-dD within physical view 691) to a storage bank within a respective set of the W-Z storage banks. Thus, the aggregate data path width of the cross-threading memory device 690 in the exemplary embodiment shown is 4x16 = 64 bits (or 64 DQ pairs in a differential signaling embodiment). Also, in the embodiment shown, the control path includes four differential request (RQ) pairs to establish a 4-bit wide RQ path (one per memory access requestor), though additional request lines per memory access requestor may be provided in alternative embodiments. [0058] Figure 15 illustrates physical and logical views (731, 733) of a cross-threading memory device 730 according to an embodiment that includes conductive interconnects 739 to couple the counterpart data I/O ports dA-dD on either side of symmetry line 734 together, and thus enable the storage sub-banks on either side of the symmetry line 734 to be accessed independently of counter-part sub-banks on the opposite side of the symmetry line. By this arrangement, the counterpart sets of sub-banks 637W0/637W1 and 637X0/637X1 within the embodiment of Figure 12 are effectively converted to independently accessible sets of storage banks, W, X, Y and Z. Accordingly, cross-threading memory device 730 is depicted as including four sets of storage banks 735W-735Z (each having an 8-bit data interface) instead of two sets of storage banks (W and X) each having (collectively) a 16-bit data interface as in the embodiment of Figure 12. Also, as in the embodiment of Figure 14, the data multiplexers 7370 and 7371 may each be modified to provide, individually, for two simultaneous connections between first and second data I/O ports (any two of data I/O ports dA-dD) and the adjacent sets of storage banks (i.e., storage banks W and X in the case of multiplexer 7370, and storage banks Y and Z in the case of multiplexer 7371). The control multiplexer, cMux, is constructed generally as described in reference to Figure 14 to enable simultaneous receipt and simultaneous execution of four independent memory access commands from respective access requestors and thus may include four- way switched paths between the control interfaces, cA-cD, and access control logic 7410 (for the W and X storage banks) and 7411 (for the Y and Z storage banks). Referring to logical view 733, the cross-threading memory device 730 includes four sets of storage banks (W, X, Y and Z), each independently accessible by each of four data I/O ports (and therefore four memory access requestors) via respective 8-bit internal data paths 743X-743Z and 4x8-bit multiplexer 737 (a logical representation of the two 2x8-bit multiplexers 7370 and 7371 shown in physical view 731). Thus, cross-threading memory device 730 includes a 32-bit wide data interface in the aggregate (i.e., formed by four 8-bit data interfaces) and a 4-bit wide request interface, not specifically shown in logical view 733. As discussed above, instead of a single-link per memory access requestor, the request interface may include additional links to provide additional request bandwidth as necessary to convey address and control information to the cross-threading memory device 730. [0059] Still referring to Figure 15, in one embodiment, conductive interconnects 739 are formed by integrated-circuit metal layers (or other conductive structures) that serve to permanently wire counterpart data I/O ports on either side of symmetry line 734 together. Alternatively, the conductive interconnects 739 may be switched (e.g., through one or more transistor switches such as pass gates or the like), thereby enabling the counterpart data I/O ports on opposite sides of symmetry line 734 to be switchably coupled or decoupled from one another and thus effect the logical architecture shown at 733 or, when decoupled, the logical architecture shown at 633 in Figure 12. Also, the conductive interconnects 739 may be formed by interconnections external to memory device 730 (i.e., off-chip or at least external to the integrated circuit package), for example through circuit board trace interconnection or other external interconnection and thus enable memory device 730 to be used in either of the configurations shown in Figures 12 and 15.
[0060] Figure 16 illustrates physical and logical views (781, 783) of a cross-threading memory device 780 according to an embodiment having two separate instances (7850, 7851) of the cross-threading architecture of Figure 12 disposed on a single die and in which like data I/O ports on either side of symmetry line 789 are coupled together via on-die conductive interconnects 791 (e.g., metal layer interconnects) or external interconnects. By this arrangement, 16-bit wide interfaces to the individual sets of storage banks (formed by counterpart sub-banks on either side of symmetry line 790 as described in reference to Figure 12) are maintained, yet four different sets of storage banks, W, X, Y and Z, may be accessed simultaneously by four different memory access requestors. That is, as shown in logical view 783, each of four sets of storage banks (W, X, Y, Z) may be accessed by a respective 16-bit internal data path 795W-795Z (each being a logical representation of the two 8-bit internal data paths between respective sets of storage sub-banks on either side of symmetry line 790), thereby establishing a 64-bit (or 64 DQ pair in a differential signaling embodiment) aggregate internal data path. Also, as in the logical view 691 of memory device 690 (Figure 14), each of four memory access requestors may simultaneously access a respective one of the sets of storage banks via a respective data I/O ports dA-dD (and corresponding control ports cA-cD, not shown in logical view 783) and multiplexer 797.
[0061] Figure 17 illustrates an embodiment of a memory system 810 in which individual cross-threading memory devices communicate via chip-to-chip interfaces to enable increased cross-threading operation. Referring to the detail view of memory devices 8110m and 811On, for example, data multiplexer 820m within memory device 811 Om includes an output coupled to an input of counterpart data multiplexer 82On within memory device 811 On, and multiplexer 82On likewise has an output coupled to an input of multiplexer 820m. By this arrangement, each multiplexer 820m/820n may switchably connect internal data paths 8250 and 8251 to one of three input sources; one of the two local data I/O ports (designated 'dA' and 'dB' (815A, 815B) in memory device 8110m and 'dC and 'dD' (815C, 815D) in memory device 811On) or a remote data I/O port selected by the counterpart multiplexer. In one embodiment, a select signal 827 is provided to the multiplexers 820 in each memory device pair to control the data I/O-to-core connection as shown in the table at 830. Thus, when the select signal 827 is in an initial state, OO', data I/O port dA within memory device 8110m is switchably coupled via multiplexer 820m to an address-selected one of storage banks W (817W); data I/O port dC within memory device 820On is switchably coupled via multiplexers 82On and 820m (i.e., by n-to-m path, "N") to an address-selected one of storage banks X (817X); data I/O port dB is switchably coupled via multiplexers 820m and 82On (i.e., by m-to-n path, "M") to an address-selected one of storage banks Z (817Z); and data I/O port dD within memory device 820On is switchably coupled via multiplexer 82On to an address-selected one of storage banks Y (817Y). As the select signal 827 is stepped through a sequence of values (e.g., from 00 to 01 to 10, to 11 as shown), the switched interconnections between data I/O ports and sets of storage banks is likewise switched, thus enabling round-robin access to each of the four sets of storage resources by each of four memory access requestors in a respective time interval. For example, when the select signal 827 is transitioned to a '01 ' value, data I/O ports dA, dB, dC and dD are switchably coupled to storage bank sets X, Y, W and Z, respectively. Thereafter, when the select signal 827 is transitioned to ' 10', data I/O ports dA, dB, dC and dD are switchably coupled to storage bank sets Z, W, Y, X, respectively, and then, when the select signal 827 is transitioned to ' 11 ', to Y, X, Z, W, respectively. Depending on the time required for the chip-to-chip signaling used to establish doubly-multiplexed interconnection paths through multiplexers 820m and 82On, overall device latency may be increased relative to embodiments described above, but throughput may be maintained. The select signal 827 may be generated by one or more of the access requestors (depicted as controllers 201A-201D in Figure 17, though virtually any type of memory access requestor/controller may be used), or may be generated by a another control device, or even within one or more of the memory devices themselves. Although not specifically shown, control ports within each of the memory devices 811 may also include inter-coupled control multiplexers to enable control information to be passed between paired memory devices. [0062] Still referring to Figure 17, only a portion of the interconnections between memory access queues (0-3) within controllers 201A-201D (i.e., memory access requestors) and pairs of memory devices 811 are shown. For example, memory access queue 0 within each of controllers 201 A-201D is coupled to a respective one of the four I/O ports A-D within memory device pair 8110m/811On. Memory access queue 1 within each of controllers 201 A-201D is likewise coupled to a respective I/O port within memory device pair 811 lm/811 In, and memory access queues 2 and 3 within each of controllers 201A-201D are likewise coupled to respective sets of I/O ports within memory device pairs 8112m/8112n and 8113m/8113n. As in all embodiments described herein, particular numbers of memory devices, memory access requestors, I/O ports per memory device and/or memory access queues per memory access requestor may be increased or decreased according to application demands. Also, each of the signaling links coupled between a given data I/O port or control port (the control ports and their interconnections to the memory access requestors are not specifically shown in Figure 17) may be single-ended or differential and may include any number of constituent signaling links, including a single-bit signaling link. Further, the data rate over a given signaling path between a memory access requestor and memory device may be faster or slower than corresponding data and/or control path within the memory device, thus enabling internal data paths to be wider or narrower and internal data transfer rates to be slower or faster than data path widths/data transfer rates on the requestor-to-memory-device signaling paths.
[0063] Aspects of various embodiments disclosed herein are set forth, for example and without limitation, in the following numbered clauses:
1. An integrated circuit (IC) device comprising: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to switchably couple any one of the control interfaces to any one of the memory interfaces.
2. The IC device of clause 1 further comprising signal conversion circuitry to convert between a first number of signals conveyed via the plurality of control interfaces and a second number of signals conveyed via the plurality of memory interfaces, the second number being larger than the first number.
3. The IC device of clause 2 wherein each signal of the first number of signals is received or transmitted at or above a first signaling rate and wherein each signal of the second number of signals is received or transmitted at or below a signaling rate that is lower than the first signaling rate.
4. The IC device of clause 3 further comprising configuration circuitry coupled to the signal conversion circuitry to control the conversion between the first number of signals and the second number of signals according to a configuration value.
5. The IC device of clause 4 wherein the configuration value indicates whether the first number of signals convey (1) control information for a data storage or retrieval operation or (2) data stored or retrieved in a data storage or retrieval operation.
6. The IC device of clause 4 wherein the control information includes address information that specifies a storage location within one of the memory IC devices at which data is to be stored or from which data is to be retrieved.
7. The IC device of clause 3 wherein the conversion circuitry includes deserializing circuitry to convert bits conveyed serially in one of the first signals to a parallel series of bits that constitute at least a portion of the second signals.
8. The IC device of clause 2 wherein error detection information is conveyed within the first number of signals.
9. The IC device of clause 2 wherein each of the control interfaces comprises circuitry to receive a respective one of the first number of signals via a high-speed signaling link.
10. The IC device of clause 2 wherein each of the memory interfaces is configured to connect to an industry- standard memory component.
11. A method of operation within an integrated circuit (IC) device, the method comprising: receiving information relating to a plurality of concurrent memory access operations via respective control interfaces: switchably coupling each of the control interfaces to a respective one of a plurality of memory interfaces in a first interconnection pattern during a first interval; outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces, respectively, according to the first interconnection pattern.
12. The method of clause 11 wherein receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of address values that indicate respective memory locations to be accessed in the plurality of concurrent memory access operations.
13. The method of clause 11 wherein receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of sets of write data to be stored within respective memory devices in the plurality of concurrent memory access operations.
14. The method of clause 11 further comprising switchably coupling each of the control interfaces to a respective one of the plurality of memory interfaces in a second interconnection pattern during a second interval, wherein each of the control interfaces is coupled to a different one of the memory interfaces during the second interval than during the first interval.
15. The method of clause 11 further comprising switchably coupling the control interfaces to the plurality of memory interfaces, respectively, in a repeating sequence of different interconnection patterns, and wherein the first interconnection pattern during the first interval comprises one interconnection pattern within the sequence of different interconnection patterns.
16. The method of clause 11 wherein outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces comprises outputting the information relating to the plurality of concurrent memory access operations to respective memory IC devices coupled to the plurality of memory interfaces.
17. A system comprising: a plurality of memory IC devices; and a first buffer integrated circuit (IC) device having a plurality of memory interfaces coupled respectively to the plurality of memory IC devices, control interfaces to couple to respective requestor IC devices, and switching circuitry to couple each of the control interfaces concurrently to a respective one of the memory interfaces in accordance with a selection value.
18. The system of clause 17 comprising an input to receive, as the selection value, a repeating sequence of values that enable shared, time-multiplexed access to each of the memory IC devices by each of the requestor IC devices.
19. The system of clause 18 further comprising the requestor IC devices, and wherein circuitry to generate the selection value is included within at least one of the plurality of requestor IC devices.
20. The system of clause 17 further comprising a second buffer IC device having memory interfaces coupled respectively to the plurality of memory IC devices, control interfaces to couple respectively to the requestor IC devices, and switching circuitry to couple each of the control interfaces concurrently to a respective one of the memory interfaces in accordance with the selection value, and wherein the plurality of requestor IC devices are coupled respectively to the control interfaces of the second buffer IC and the plurality of memory IC devices are coupled respectively to the memory interfaces of the second buffer IC.
21. The system of clause 20 wherein each of the memory IC devices comprises a first data interface coupled to a respective one of the memory interfaces of the first buffer IC device, and a second data interface coupled to a respective one of the memory interfaces of the second buffer IC device.
22. The system of clause 20 further comprising the requestor IC devices, and wherein each of the requestor IC devices comprises a first output node coupled to a respective one of the control interfaces of the first buffer IC device; and a second output node coupled to a respective one of the control interfaces of the second buffer IC device. 23. The system of clause 20 further comprising the requestor IC devices, and wherein each of the requestor IC devices comprises a first data input/output (I/O) node coupled to a respective one of the control interfaces of the first buffer IC device to convey read data retrieved from, or write data to be stored within, one of the memory IC devices, and wherein each of the requestor IC devices further comprises a control node coupled to a respective one of the control interfaces of the second buffer IC device to convey an address value that specifies a storage location within the one of the memory IC devices from which the read data is to be retrieved or at which the write data is to be stored.
24. The system of clause 23 wherein the first buffer IC device and second IC device are interchangeable and each include configuration circuitry to control operation as an address buffer according to a configuration setting.
25. The system of clause 23 wherein the first buffer IC device and second IC device are interchangeable and each include configuration circuitry to control operation as a data buffer according to a configuration setting.
26. The system of clause 17 wherein the plurality of memory IC devices and the first buffer IC device are included within at least one of a system-in-package, package-on-package or package-in-package.
27. An integrated circuit (IC) device comprising: means for receiving information relating to respective memory access requests from respective requestor IC devices; means for conveying the information relating to memory access requests to respective memory IC devices; and means for enabling any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
28. Computer-readable media having information embodied therein that includes a description of an integrated circuit (IC) device, the information including descriptions of: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and
switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to enable any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
29. A method of operation within an integrated-circuit (IC) memory device, the method comprising: during a first interval, concurrently accessing a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and during a second interval, concurrently accessing a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
30. The method of clause 29 wherein concurrently accessing a first storage location within a first memory array and a second storage location within a second memory array comprises concurrently transferring data between the first memory array and the first external signaling interface and between the second memory array and the second external signaling interface.
31. The method of clause 30 wherein transferring data between the first memory array and the first external signaling interface comprises transferring data between the first memory array and the first external signaling interface via a multiplexer circuit.
32. The method of clause 31 further comprising switching connections within the multiplexer circuit after the first interval to couple the first memory array to the second external signaling path, and wherein accessing the third storage location within the first memory array via the second external interface comprises transferring data between the first memory array and the second external signaling interface via the multiplexer circuit.
33. The method of clause 32 wherein switching connections within the multiplexer circuit comprises transitioning a select signal from a first state to a second state after the first interval, the select signal being supplied to the multiplexer circuit to control switched connections between inputs and outputs of the multiplexer circuit.
34. The method of clause 29 further comprising receiving first and second memory access commands via first and second memory control interfaces, respectively, the first memory access command including an address value that indicates the first storage location and the second memory access command including an address value that indicates the second storage location.
35. The method of clause 34 wherein at least one of the first and second memory access commands is a memory read command.
36. The method of clause 34 wherein at least one of the first and second memory access commands is a memory write command and wherein concurrently accessing a first storage location within a first memory array and a second storage location within a second memory array comprises receiving write data and write mask signals via at least one of the first and second external signaling interfaces.
37. The method of clause 34 further comprising receiving third and fourth memory access commands via the first and second memory control interfaces, respectively, the third memory access command including an address value that indicates the third storage location and the fourth memory access command including an address value that indicates the fourth storage location.
38. An integrated-circuit (IC) memory device comprising: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval. 39. The memory device of clause 38 wherein the first memory array comprises a plurality of storage banks, each of the storage banks including a plurality of rows of storage cells.
40. The memory device of clause 39 wherein the plurality of rows of storage cells comprise dynamic random access memory (DRAM) cells.
41. The memory device of clause 38 wherein the first external signaling interface comprises output circuitry to output data onto an external signaling path, the output circuitry being switchably coupled, via the multiplexer, to the first memory array during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
42. The memory device of clause 38 wherein the first external signaling interface further comprises receive circuitry to receive data via the external signaling path, the receive circuitry being switchably coupled, via the multiplexer, to the first memory during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
43. The memory device of clause 38 wherein the multiplexer comprises an input to receive a select signal, the select signal being in a first state during the first interval to switchably connect, via the multiplexer, the first memory array to the first external signaling interface and the second memory array to the second external signaling interface, and the select signal being in a second state during the second interval to switchably connect, via the multiplexer, the first memory array to the second external signaling interface and the second memory array to the first external signaling interface.
44. The memory device of clause 43 further comprising an external signaling input to receive the select signal.
45. The memory device of clause 38 wherein the first memory array comprises a single access port and wherein the second memory array comprises a single access port.
46. The memory device of clause 38 further comprising first and second memory control interfaces to receive first and second memory access commands, the first memory access command including an address value that indicates a first storage location to be accessed within the first memory array via the first external signaling interface, and the second memory access command including an address value that indicates a second storage location to be accessed within the second memory array via the first external signaling interface.
47. An integrated-circuit (IC) memory device comprising: first and second memory arrays; first and second external signaling interfaces; means for concurrently accessing, during a first interval, a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and means for concurrently accessing, during a second interval, a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
48. Computer-readable media having information embodied therein that includes a description of an integrated-circuit (IC) memory device, the information including descriptions of: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval.
[0064] It should be noted that the various circuits disclosed herein may be described using computer aided design tools and expressed (or represented), as data and/or instructions embodied in various computer-readable media, in terms of their behavioral, register transfer, logic component, transistor, layout geometries, and/or other characteristics. Formats of files and other objects in which such circuit expressions may be implemented include, but are not limited to, formats supporting behavioral languages such as C, Verilog, and VHDL, formats supporting register level description languages like RTL, and formats supporting geometry description languages such as GDSII, GDSIII, GDSIV, CIF, MEBES and any other suitable formats and languages. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof. Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, etc.).
[0065] When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the above described circuits may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs including, without limitation, net-list generation programs, place and route programs and the like, to generate a representation or image of a physical manifestation of such circuits. Such representation or image may thereafter be used in device fabrication, for example, by enabling generation of one or more masks that are used to form various components of the circuits in a device fabrication process. [0066] In the foregoing description and in the accompanying drawings, specific terminology and drawing symbols have been set forth to provide a thorough understanding of the present invention. In some instances, the terminology and symbols may imply specific details that are not required to practice the invention. For example, any of the specific numbers of bits, signal path widths, signaling or operating frequencies, component circuits or devices and the like may be different from those described above in alternative embodiments. Also, the interconnection between circuit elements or circuit blocks shown or described as multi-conductor signal links may alternatively be single-conductor signal links, and single conductor signal links may alternatively be multi-conductor signal links. Signals and signaling paths shown or described as being single-ended may also be differential, and vice-versa. Similarly, signals described or depicted as having active-high or active-low logic levels may have opposite logic levels in alternative embodiments. Component circuitry within integrated circuit devices may be implemented using metal oxide semiconductor (MOS) technology, bipolar technology or any other technology in which logical and analog circuits may be implemented. With respect to terminology, a signal is said to be "asserted" when the signal is driven to a low or high logic state (or charged to a high logic state or discharged to a low logic state) to indicate a particular condition. Conversely, a signal is said to be "deasserted" to indicate that the signal is driven (or charged or discharged) to a state other than the asserted state (including a high or low logic state, or the floating state that may occur when the signal driving circuit is transitioned to a high impedance condition, such as an open drain or open collector condition). A signal driving circuit is said to "output" a signal to a signal receiving circuit when the signal driving circuit asserts (or deasserts, if explicitly stated or indicated by context) the signal on a signal line coupled between the signal driving and signal receiving circuits. A signal line is said to be "activated" when a signal is asserted on the signal line, and "deactivated" when the signal is deasserted. Additionally, the prefix symbol "/" attached to signal names indicates that the signal is an active low signal (i.e., the asserted state is a logic low state). A line over a signal name (e.g.,
' < signal name > ') is also used to indicate an active low signal. The term "coupled" is used herein to express a direct connection as well as a connection through one or more intervening circuits or structures. Integrated circuit device "programming" may include, for example and without limitation, loading a control value into a register or other storage circuit within the device in response to a host instruction and thus controlling an operational aspect of the device, establishing a device configuration or controlling an operational aspect of the device through a one-time programming operation (e.g., blowing fuses within a configuration circuit during device production), and/or connecting one or more selected pins or other contact structures of the device to reference voltage lines (also referred to as strapping) to establish a particular device configuration or operation aspect of the device. The term "exemplary" is used to express an example, not a preference or requirement.
[0067] While the invention has been described with reference to specific embodiments thereof, it will be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, features or aspects of any of the embodiments may be applied, at least where practicable, in combination with any other of the embodiments or in place of counterpart features or aspects thereof. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMSWhat is claimed is:
1. An integrated circuit (IC) device comprising: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to switchably couple any one of the control interfaces to any one of the memory interfaces.
2. The IC device of claim 1 further comprising signal conversion circuitry to convert between a first number of signals conveyed via the plurality of control interfaces and a second number of signals conveyed via the plurality of memory interfaces, the second number being larger than the first number.
3. The IC device of claim 2 wherein each signal of the first number of signals is received or transmitted at or above a first signaling rate and wherein each signal of the second number of signals is received or transmitted at or below a signaling rate that is lower than the first signaling rate.
4. The IC device of claim 3 further comprising configuration circuitry coupled to the signal conversion circuitry to control the conversion between the first number of signals and the second number of signals according to a configuration value.
5. The IC device of claim 4 wherein the configuration value indicates whether the first number of signals convey (1) control information for a data storage or retrieval operation or (2) data stored or retrieved in a data storage or retrieval operation.
6. The IC device of claim 4 wherein the control information includes address information that specifies a storage location within one of the memory IC devices at which data is to be stored or from which data is to be retrieved.
7. The IC device of claim 3 wherein the conversion circuitry includes deserializing circuitry to convert bits conveyed serially in one of the first signals to a parallel series of bits that constitute at least a portion of the second signals.
8. The IC device of claim 2 wherein error detection information is conveyed within the first number of signals.
9. The IC device of claim 2 wherein each of the control interfaces comprises circuitry to receive a respective one of the first number of signals via a high-speed signaling link.
10. The IC device of claim 2 wherein each of the memory interfaces is configured to connect to an industry-standard memory component.
11. A method of operation within an integrated circuit (IC) device, the method comprising: receiving information relating to a plurality of concurrent memory access operations via respective control interfaces; switchably coupling each of the control interfaces to a respective one of a plurality of memory interfaces in a first interconnection pattern during a first interval; outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces, respectively, according to the first interconnection pattern.
12. The method of claim 11 wherein receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of address values that indicate respective memory locations to be accessed in the plurality of concurrent memory access operations.
13. The method of claim 11 wherein receiving information relating to a plurality of concurrent memory access operations via respective control interfaces comprises receiving a plurality of sets of write data to be stored within respective memory devices in the plurality of concurrent memory access operations.
14. The method of claim 11 further comprising switchably coupling each of the control interfaces to a respective one of the plurality of memory interfaces in a second interconnection pattern during a second interval, wherein each of the control interfaces is coupled to a different one of the memory interfaces during the second interval than during the first interval.
15. The method of claim 11 further comprising switchably coupling the control interfaces to the plurality of memory interfaces, respectively, in a repeating sequence of different interconnection patterns, and wherein the first interconnection pattern during the first interval comprises one interconnection pattern within the sequence of different interconnection patterns.
16. The method of claim 11 wherein outputting the information relating to the plurality of concurrent memory access operations via the plurality of memory interfaces comprises outputting the information relating to the plurality of concurrent memory access operations to respective memory IC devices coupled to the plurality of memory interfaces.
17. A system comprising: a plurality of memory IC devices; and a first buffer integrated circuit (IC) device having a plurality of memory interfaces coupled respectively to the plurality of memory IC devices, control interfaces to couple to respective requestor IC devices, and switching circuitry to couple each of the control interfaces concurrently to a respective one of the memory interfaces in accordance with a selection value.
18. The system of claim 17 comprising an input to receive, as the selection value, a repeating sequence of values that enable shared, time-multiplexed access to each of the memory IC devices by each of the requestor IC devices.
19. The system of claim 18 further comprising the requestor IC devices, and wherein circuitry to generate the selection value is included within at least one of the plurality of requestor IC devices.
20. The system of claim 17 further comprising a second buffer IC device having memory interfaces coupled respectively to the plurality of memory IC devices, control interfaces to couple respectively to the requestor IC devices, and switching circuitry to couple each of the control interfaces concurrently to a respective one of the memory interfaces in accordance with the selection value, and wherein the plurality of requestor IC devices are coupled respectively to the control interfaces of the second buffer IC and the plurality of memory IC devices are coupled respectively to the memory interfaces of the second buffer IC.
21. The system of claim 20 wherein each of the memory IC devices comprises a first data interface coupled to a respective one of the memory interfaces of the first buffer IC device, and a second data interface coupled to a respective one of the memory interfaces of the second buffer IC device.
22. The system of claim 20 further comprising the requestor IC devices, and wherein each of the requestor IC devices comprises a first output node coupled to a respective one of the control interfaces of the first buffer IC device; and a second output node coupled to a respective one of the control interfaces of the second buffer IC device.
23. The system of claim 20 further comprising the requestor IC devices, and wherein each of the requestor IC devices comprises a first data input/output (I/O) node coupled to a respective one of the control interfaces of the first buffer IC device to convey read data retrieved from, or write data to be stored within, one of the memory IC devices, and wherein each of the requestor IC devices further comprises a control node coupled to a respective one of the control interfaces of the second buffer IC device to convey an address value that specifies a storage location within the one of the memory IC devices from which the read data is to be retrieved or at which the write data is to be stored.
24. The system of claim 23 wherein the first buffer IC device and second IC device are interchangeable and each include configuration circuitry to control operation as an address buffer according to a configuration setting.
25. The system of claim 23 wherein the first buffer IC device and second IC device are interchangeable and each include configuration circuitry to control operation as a data buffer according to a configuration setting.
26. The system of claim 17 wherein the plurality of memory IC devices and the first buffer IC device are included within at least one of a system-in-package, package-on-package or package-in-package .
27. An integrated circuit (IC) device comprising: means for receiving information relating to respective memory access requests from respective requestor IC devices; means for conveying the information relating to memory access requests to respective memory IC devices; and means for enabling any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
28. Computer-readable media having information embodied therein that includes a description of an integrated circuit (IC) device, the information including descriptions of: a plurality of control interfaces to receive information relating to respective memory access requests from respective requestor IC devices; a plurality of memory interfaces to convey the information relating to memory access requests to respective memory IC devices; and
switch circuitry coupled to the plurality of control interfaces and the plurality of memory interfaces to enable any one of the control interfaces to be switchably and exclusively coupled to any one of the memory interfaces concurrently with switched and exclusive coupling of the other control interfaces to the other memory interfaces.
29. A method of operation within an integrated-circuit (IC) memory device, the method comprising: during a first interval, concurrently accessing a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and during a second interval, concurrently accessing a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
30. The method of claim 29 wherein concurrently accessing a first storage location within a first memory array and a second storage location within a second memory array comprises concurrently transferring data between the first memory array and the first external signaling interface and between the second memory array and the second external signaling interface.
31. The method of claim 30 wherein transferring data between the first memory array and the first external signaling interface comprises transferring data between the first memory array and the first external signaling interface via a multiplexer circuit.
32. The method of claim 31 further comprising switching connections within the multiplexer circuit after the first interval to couple the first memory array to the second external signaling path, and wherein accessing the third storage location within the first memory array via the second external interface comprises transferring data between the first memory array and the second external signaling interface via the multiplexer circuit.
33. The method of claim 32 wherein switching connections within the multiplexer circuit comprises transitioning a select signal from a first state to a second state after the first interval, the select signal being supplied to the multiplexer circuit to control switched connections between inputs and outputs of the multiplexer circuit.
34. The method of claim 29 further comprising receiving first and second memory access commands via first and second memory control interfaces, respectively, the first memory access command including an address value that indicates the first storage location and the second memory access command including an address value that indicates the second storage location.
35. The method of claim 34 wherein at least one of the first and second memory access commands is a memory read command.
36. The method of claim 34 wherein at least one of the first and second memory access commands is a memory write command and wherein concurrently accessing a first storage location within a first memory array and a second storage location within a second memory array comprises receiving write data and write mask signals via at least one of the first and second external signaling interfaces.
37. The method of claim 34 further comprising receiving third and fourth memory access commands via the first and second memory control interfaces, respectively, the third memory access command including an address value that indicates the third storage location and the fourth memory access command including an address value that indicates the fourth storage location.
38. An integrated-circuit (IC) memory device comprising: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval.
39. The memory device of claim 38 wherein the first memory array comprises a plurality of storage banks, each of the storage banks including a plurality of rows of storage cells.
40. The memory device of claim 39 wherein the plurality of rows of storage cells comprise dynamic random access memory (DRAM) cells.
41. The memory device of claim 38 wherein the first external signaling interface comprises output circuitry to output data onto an external signaling path, the output circuitry being switchably coupled, via the multiplexer, to the first memory array during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
42. The memory device of claim 38 wherein the first external signaling interface further comprises receive circuitry to receive data via the external signaling path, the receive circuitry being switchably coupled, via the multiplexer, to the first memory during the first time interval, and switchably coupled, via the multiplexer, to the second memory array during the second time interval.
43. The memory device of claim 38 wherein the multiplexer comprises an input to receive a select signal, the select signal being in a first state during the first interval to switchably connect, via the multiplexer, the first memory array to the first external signaling interface and the second memory array to the second external signaling interface, and the select signal being in a second state during the second interval to switchably connect, via the multiplexer, the first memory array to the second external signaling interface and the second memory array to the first external signaling interface.
44. The memory device of claim 43 further comprising an external signaling input to receive the select signal.
45. The memory device of claim 38 wherein the first memory array comprises a single access port and wherein the second memory array comprises a single access port.
46. The memory device of claim 38 further comprising first and second memory control interfaces to receive first and second memory access commands, the first memory access command including an address value that indicates a first storage location to be accessed within the first memory array via the first external signaling interface, and the second memory access command including an address value that indicates a second storage location to be accessed within the second memory array via the first external signaling interface.
47. An integrated-circuit (IC) memory device comprising: first and second memory arrays; first and second external signaling interfaces; means for concurrently accessing, during a first interval, a first storage location within a first memory array via a first external signaling interface and a second storage location within a second memory array via a second external signaling interface; and means for concurrently accessing, during a second interval, a third storage location within the first memory array via the second external signaling interface and a fourth storage location within the second memory array via the first external signaling interface.
48. Computer-readable media having information embodied therein that includes a description of an integrated-circuit (IC) memory device, the information including descriptions of: first and second external signaling interfaces; first and second memory arrays; and a multiplexer coupled to the first and second external signaling interfaces and to the first and second memory arrays, the multiplexer enabling concurrent access to the first and second memory arrays via the first and second external signaling interfaces, respectively, during a first interval, and enabling concurrent access to the first and second memory arrays via the second and first external signaling interfaces, respectively, during a second time interval.
PCT/US2007/074513 2006-07-27 2007-07-26 Cross-threaded memory device and system WO2008014413A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/460,582 2006-07-27
US11/460,582 US7769942B2 (en) 2006-07-27 2006-07-27 Cross-threaded memory system
US87082406P 2006-12-19 2006-12-19
US60/870,824 2006-12-19

Publications (2)

Publication Number Publication Date
WO2008014413A2 true WO2008014413A2 (en) 2008-01-31
WO2008014413A3 WO2008014413A3 (en) 2008-05-29

Family

ID=38982356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/074513 WO2008014413A2 (en) 2006-07-27 2007-07-26 Cross-threaded memory device and system

Country Status (1)

Country Link
WO (1) WO2008014413A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011163229A1 (en) * 2010-06-25 2011-12-29 Qualcomm Incorporated Multi-channel multi-port memory
CN110299157A (en) * 2013-11-11 2019-10-01 拉姆伯斯公司 Use the mass-storage system of standard controller component

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995005635A1 (en) * 1993-08-19 1995-02-23 Mmc Networks, Inc. Multiple-port shared memory interface and associated method
US20050021884A1 (en) * 2003-07-22 2005-01-27 Jeddeloh Joseph M. Apparatus and method for direct memory access in a hub-based memory system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995005635A1 (en) * 1993-08-19 1995-02-23 Mmc Networks, Inc. Multiple-port shared memory interface and associated method
US20050021884A1 (en) * 2003-07-22 2005-01-27 Jeddeloh Joseph M. Apparatus and method for direct memory access in a hub-based memory system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011163229A1 (en) * 2010-06-25 2011-12-29 Qualcomm Incorporated Multi-channel multi-port memory
US8380940B2 (en) 2010-06-25 2013-02-19 Qualcomm Incorporated Multi-channel multi-port memory
CN102959530A (en) * 2010-06-25 2013-03-06 高通股份有限公司 Multi-channel multi-port memory
JP2013534010A (en) * 2010-06-25 2013-08-29 クアルコム,インコーポレイテッド Multi-channel multi-port memory
CN110299157A (en) * 2013-11-11 2019-10-01 拉姆伯斯公司 Use the mass-storage system of standard controller component

Also Published As

Publication number Publication date
WO2008014413A3 (en) 2008-05-29

Similar Documents

Publication Publication Date Title
US11194749B2 (en) Cross-threaded memory system
US20200194052A1 (en) Memory System Topologies Including A Buffer Device And An Integrated Circuit Memory Device
US7818497B2 (en) Buffered memory module supporting two independent memory channels
US7840748B2 (en) Buffered memory module with multiple memory device data interface ports supporting double the memory capacity
US7865674B2 (en) System for enhancing the memory bandwidth available through a memory module
US7899983B2 (en) Buffered memory module supporting double the memory device data width in the same physical space as a conventional memory module
US7640386B2 (en) Systems and methods for providing memory modules with multiple hub devices
KR100201057B1 (en) Integrated circuit i/o using a high performance bus interface
US7171534B2 (en) System and method for multi-modal memory controller system operation
KR101364348B1 (en) Memory system and method using stacked memory device dice, and system using the memory system
US8194085B2 (en) Apparatus, system, and method for graphics memory hub
US20100005218A1 (en) Enhanced cascade interconnected memory system
US8019919B2 (en) Method for enhancing the memory bandwidth available through a memory module
US20140068169A1 (en) Independent Threading Of Memory Devices Disposed On Memory Modules
US7965530B2 (en) Memory modules and memory systems having the same
WO2017172287A2 (en) Read delivery for memory subsystem with narrow bandwidth repeater channel
WO2017172286A1 (en) Write delivery for memory subsystem with narrow bandwidth repeater channel
US20230410890A1 (en) Memory System Topologies Including A Memory Die Stack
CN110633230A (en) High bandwidth DIMM
US20220413768A1 (en) Memory module with double data rate command and data interfaces supporting two-channel and four-channel modes
WO2008014413A2 (en) Cross-threaded memory device and system
US20090319748A1 (en) Memory system and memory device
US20220358072A1 (en) Memory module adapter card with multiplexer circuitry
US20230044892A1 (en) Multi-channel memory module
US20230393740A1 (en) Four way pseudo split die dynamic random access memory (dram) architecture

Legal Events

Date Code Title Description
NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07840537

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 07840537

Country of ref document: EP

Kind code of ref document: A2