WO2017019054A1 - Systems and methods facilitating multi-word atomic operation support for system on chip environments - Google Patents

Systems and methods facilitating multi-word atomic operation support for system on chip environments Download PDF

Info

Publication number
WO2017019054A1
WO2017019054A1 PCT/US2015/042595 US2015042595W WO2017019054A1 WO 2017019054 A1 WO2017019054 A1 WO 2017019054A1 US 2015042595 W US2015042595 W US 2015042595W WO 2017019054 A1 WO2017019054 A1 WO 2017019054A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
instruction
memory
width
operand
Prior art date
Application number
PCT/US2015/042595
Other languages
French (fr)
Inventor
Millind Mittal
Original Assignee
Applied Micro Circuits Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Applied Micro Circuits Corporation filed Critical Applied Micro Circuits Corporation
Priority to PCT/US2015/042595 priority Critical patent/WO2017019054A1/en
Publication of WO2017019054A1 publication Critical patent/WO2017019054A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency

Definitions

  • This disclosure relates to systems-on-chips (SoCs), systems, and methods facilitating multi-word atomic operating support for SoCs.
  • SoCs typically integrate several components of a computer on a single chip substrate. Specifically, SoCs integrate analog, mixed-signal, digital and/or radio frequency circuitry on a single chip substrate, and can increase
  • CPU central processing unit
  • Atomicity in execution of operations is desirable as consumers of data may read an intermediate, erroneous value of a non-atomic operation if reading is performed during execution.
  • a method involves receiving, at a processor, an instruction associated with a calling process; and determining a first memory width associated with an operator of the instruction and a width of at least one operand of the instruction.
  • a computer method implemented in an SoC involves receiving information indicative of an instruction associated with a calling process; and determining a first memory width associated with execution of the instruction.
  • a system on chip comprises a central processing unit configured to execute an instruction associated with a calling process; and an atomic engine component.
  • the atomic engine component is coupled to the central processing unit and configured to: receive the instruction; and determine a first memory width associated with execution of the instruction, based on an operator of the instruction and a width of at least one operand of the instruction.
  • One or more embodiments can advantageously provide multiword atomic operation support for system memory and/or for SoC memory. For example, multi-word atomic operation can be facilitated for tables in SoC memory.
  • an "atomic" operation is a CPU instruction that executes in a single CPU cycle and/or a CPU instruction for which an operation will complete execution without being interrupted by the actions of another thread.
  • One or more of the embodiments described herein can be employed in or to provide any number of different systems including, but not limited to, data center computers, cloud computing systems, embedded communication processors, enterprise servers (e.g., multiple CPU server systems) or the like.
  • FIG. 1 is a block diagram illustrating an embodiment of an SoC for which multi-word atomic operation support can be facilitated.
  • FIG. 2 is a block diagram illustrating an embodiment of an interface system between CPUs and an atomic engine (AE) component of an
  • SoC facilitating multi-word atomic operation support.
  • FIG. 3 is a block diagram illustrating an embodiment of a work message facilitating multi-word atomic operation support in an SoC.
  • FIG. 4 is a block diagram illustrating an embodiment of a completion message facilitating multi-word atomic operations support in an
  • FIG. 5 illustrates a flow diagram of an embodiment of a method facilitating multi-word atomic operation support in an SoC.
  • FIG. 6 illustrates a block diagram of an example electronic computing environment that can be implemented to facilitate multi-word atomic operation support in an SoC.
  • FIG. 1 is a block diagram illustrating an embodiment of an SoC for which multi-word atomic operation support can be facilitated.
  • the term "multi-word" atomic operation support means atomic operation support employing a memory location that has a width greater than one word (i.e., 16 bits or 2 bytes). Some embodiments described herein can provide atomic operation support employing a memory location that has a width less than or equal to one word.
  • FIG. 1 is a simplified subset of select components of SoC shown merely for providing context for the embodiments described herein. In various embodiments, alternative or additional components can be included in SoC 100.
  • an SoC can be or include server and/or general processor functionality.
  • SoC 100 includes one or more CPUs 110, 112, 114, SoC memory 116, graphics processing unit (GPU) 1 8, radio component 120, caches 122, 124, 126, memory controller 128 and/or input/output (I/O) bridge 102.
  • CPUs 110, 112, 114 SoC memory 116
  • graphics processing unit (GPU) 1 8 graphics processing unit
  • radio component 120 caches 122, 124, 126
  • memory controller 128 memory controller 128
  • I/O input/output
  • one or more of CPUs 110, 112, 114, SoC memory 1 16, GPU 118, radio component 120, caches 122, 124, 126, memory controller 128 and/or I/O bridge 102 can be electrically and/or communicatively coupled to one another to facilitate multi-word atomic operation on SoC 100.
  • CPUs 110, 112, 114 can be communicatively coupled to respective caches 122, 124, 126 and/or SoC memory 116.
  • Caches 122, 124, 126 can store data duplicating one or more values of data stored in SoC memory 116 in various embodiments.
  • SoC memory 1 6 can be any number of different types of memory including, but not limited to, read only memory (ROM), random access memory (RAM), flash memory and/or electrically erasable programmable read only memory (EEPROM).
  • SoC memory 116 can be a computer-readable storage medium storing instructions, computer code and/or functions executable by CPUs 110, 112, 114.
  • SoC memory 116 can store instructions, computer code and/or functions executable by CPUs 110, 112, 1 14 described herein to facilitate multi-word atomic operation support.
  • Memory controller 128 includes circuitry that manages and/or controls the flow of data to and/or from SoC memory 1 16.
  • memory controller 128 includes logic for reading from and/or writing to SoC memory 116.
  • CPUs 1 0, 112, 114 can include circuitry configured to fetch data from respective caches 122, 124, 126 and/or SoC memory 116, and perform one or more arithmetic or logical operations on the fetched data.
  • each CPU has a corresponding cache, while in other words
  • CPUs 110, 1 12, 114 can be a processor designed by ARM Holdings or a processor having x86 architecture.
  • one or more of CPUs 110, 112, 114 can be 64-bit server on chip processors designed by ARM Holdings configured to provide server functionality via SoC 100.
  • SoC 100 can serve data to one or more clients.
  • SoC 100 can be or be included in data center computers, cloud computing systems, embedded communication processors, enterprise servers (e.g., multiple CPU server systems) or the like.
  • Radio component 120 can include circuitry configured to transmit and/or receive radio frequency (RF) signals to and/or from SoC 100.
  • radio component 120 can operate according to any number of different telecommunication protocols for communication of voice, video and/or data traffic.
  • radio component 120 can operate according to Wireless Fidelity (Wi-Fi), 4G Long-Term Evolution (LTE) and/or BLUETOOTH® protocols.
  • GPU 118 can include circuitry to process graphics information and/or create visual images for output to a display component of a device associated with SoC 100.
  • I/O Bridge 102 can include circuitry facilitating communication between the CPU and/or one or more components on SoC 100. In some embodiments, I/O Bridge 102 can also include circuitry facilitating
  • I/O Bridge 102 includes a Northbridge component (not shown) that facilitates communication between CPUs 1 10, 112, 114 and one or more other
  • I/O Bridge 102 can also include circuitry providing a Southbridge component (not shown) that facilitates I/O functionality between SoC 100 and one or more peripheral components that can be communicatively coupled to SoC 100.
  • I/O Bridge 102 includes AE component 104.
  • AE component 104 includes circuitry that can perform one or more operations to facilitate multi-word atomic support for SoC 100.
  • Input queue 106 and output queue 108 can be communicatively coupled to AE component 104 and/or any number of components of SoC 100 to facilitate processing by AE component 104 for provisioning of the multi-word atomic operation support on SoC 100.
  • AE component 104 will be described in greater detail with reference to FIGs. 1, 2, 3, 4 and 5.
  • AE component 104 can include circuitry for receiving, via interface 132, data 130 associated with a calling process. Data 130 can be output to interface 132 from one of CPUs 110, 112, 114. In some embodiments, data 130 can include information indicative of an instruction associated with a calling process.
  • FIG. 2 is a block diagram illustrating an embodiment of an interface system between CPUs and an AE component of an SoC facilitating multi-word atomic operation support. Repetitive description of like elements employed in respective embodiments of systems and/or apparatus described herein are omitted for sake of brevity.
  • An I/O agent of AE component 104 can receive data 130 for processing.
  • eight input/output queue pairs can be provided between CPUs 110, 1 12, 114 and AE component 104.
  • any number of input/output queue pairs can be provided between CPUs 110, 112, 114 and AE component 104.
  • more input/output queue pairs can be provided to provide atomic operation support for SoCs facilitating data-intensive functionality.
  • AE component 104 can identify one or more operators and one or more operands of the instruction. For example, AE component 104 can evaluate the information indicative of the instruction and determine the number and/or type of operands embodied in the instruction and/or the number or width of the operands embodied in the instruction.
  • the instruction can include a Fetch-and-Add operator and an operand having a value indicative of amount to be added to an existing value in a designated memory location.
  • AE component 104 can place information indicative of the instruction, information indicative of one or more operators and/or information indicative of one or more operands in input queue 106.
  • input queue 106 is a First in, First out (FIFO) queue.
  • input queue 106 in which input queue 106 is a FIFO queue, instructions can be processed in the order in which they were received with the oldest instruction received being processed before the more-recently received instructions.
  • input queue 106 can be a work queue that can receive one or more scheduled tasks/services associated with the instruction.
  • FIG. 3 is a block diagram
  • one or more portions of the instruction received by AE component 104 can include the format or content shown as work message 300.
  • the interface between CPU 110 and AE component 104 can be AtomicUpdateCmd Message based.
  • work message 300 can be 32 bytes long while, in other embodiments, work message 300 can be 64 bytes long.
  • Work message 300 can include a number of different fields, which can be in any order and/or can be adjacent or non-adjacent one another.
  • the embodiment of work message 300 shown includes message format field 302, opcode field 304, operand field 306 and message sequence number (MSN) field 308.
  • Message format field 302 can be provided in a byte of information of work message 300.
  • message format field can be the first byte of information in work message 300, and can describe the layout (e.g., number of bytes, etc.) of work message 300.
  • Opcode field 304 can include information specifying one or more operations to be performed as part of the instruction. The operations can be extracted from work message 300 by AE component 104 and identified as operations to be performed via the instruction.
  • opcode field 304 includes a Fetch-and-Add CPU instruction 310, an Update CPU instruction 312 and a Compare-and-Swap CPU instruction 314.
  • one or more of the instructions can be provided in opcode field 304 and/or one or more different instructions can be provided in opcode field 304.
  • instructions indicated in opcode field 304 can be atomic instructions.
  • Fetch-and-Add CPU instruction 310 is an atomic instruction that can increment the contents of a memory location.
  • the memory location can be specified by an address and the Fetch-and-Add CPU instruction 310 can atomically modify the information at the memory location.
  • a Fetch-and- Add CPU instruction 310 increments the value at the memory location by an amount indicated by the Fetch-and-Add operand 316 within operand field 306.
  • the Fetch-and-Add CPU instruction 310 can be 16 bytes in embodiments in which the work message format 300 is 32 bytes long.
  • the 16 bytes Fetch-and-Add CPU instruction 310 can include a Fetch-and-Add 0 CPU instruction (not shown), which can be employed for an atomic read while an Ld SIMD can be employed for 128 bit read from a memory location.
  • Ld represents a load instruction
  • SIMD represents a single instruction multiple data operation.
  • opcode field 304 can also include an Update CPU instruction 312 and a Compare-and-Swap CPU instruction 314.
  • Update CPU instruction 312 can update information at a memory location with a new value indicated by the information stored at Update operand 318.
  • Compare-and-Swap CPU instruction 314 can be an atomic instruction that can compare the contents of a memory location to a given value and, only if the value and the contents of the memory location are the same, store a third value into the memory location.
  • Operand field 306 can include a Fetch-and-Add CPU operand 316, Update operand 318 and/or Compare-and-Swap CPU operand 320.
  • the operand can be one or more values employed in executing the corresponding CPU instruction indicated by opcode 304.
  • Fetch-and-Add operand can be 16 bytes long with a 42 bit pointer in some embodiments.
  • the Update operand 318 can be from five bytes to 48 bytes in some embodiments. In some embodiments, Update operand 318 can be larger if necessary.
  • the start pointer can be 41 bytes long in some embodiments.
  • the Compare-and-Swap operand 320 can be the value with which the value in the memory location is compared and/or the new value that is provided in the memory location if the values are the same.
  • the Compare-and-Swap operand 320 can from five bytes to 48 bytes (or larger if necessary).
  • the start pointer can be 41 bytes long and the swap data can be up to 48 bytes long.
  • the pointer to the compare data can be 41 bytes long.
  • MSN field 308 can include information indicative of a sequence number of the message.
  • MSN field 308 can be 15 bytes long in some embodiments.
  • the information stored at MSN field 308 can be employed to determine the position of work message 300 in a set of work messages in input queue 106.
  • the I/O agent of AE component 104 can reserve (or place a hold) and/or lock a memory location.
  • the AE component 104 can determine the width of the memory location based on the width of one or more of the operands in some embodiments. For example, AE component can select a width for the memory location that is greater than or equal to the width of the operand.
  • AE component 104 can evaluate the one or more operators (e.g., addition operator associated with Fetch-and-Add CPU instruction) for the instruction and the width of one or more of the operands (e.g., Fetch-and-Add operand 316) and determine a width of a memory location for reservation by AE component 104.
  • AE component 104 can reserve and lock a memory location of one or more of cache 122, 124, 126 and/or SoC memory 116 (or one or more tables of SoC memory 116) having a width that is greater than or equal to the width of one or more of the operands.
  • AE component 104 can reserve and lock a memory location of cache 122, 124, 126 and/or SoC memory 116 having a width that is greater than or equal to the width of the result of execution of the instruction.
  • AE component 104 can determine the width of the memory location based on the width of one or more of the operands and at least one operator of the instruction. For example, AE component can select a width for the memory location that is greater than or equal to the width of the result of applying the operator to one or more operands.
  • AE component 104 can determine the width of the memory location based on the width of the result of executing the instruction. For example, AE component can select a width for the memory location that is greater than or equal to the width of the result of executing the instruction.
  • the width of the memory location determined by AE component 104 for reservation and/or locking has as width greater than one word, equal to one word or less than one word. Accordingly, a memory location having width of multiple word lengths can be reserved to facilitate atomic operation support on the SoC (e.g., SoC 100).
  • AE component 104 can apply the one or more operators to the one or more operands of the instruction and output the result to output queue 108.
  • AE component 104 can execute the instruction and output the result to output queue 108.
  • the result can be output for collection by the calling process associated with the instruction.
  • the result can be output in the form of (e.g., including one or more of the fields of) completion message 400.
  • FIG. 4 is a block diagram illustrating an embodiment of a completion message facilitating multi-word atomic operation support in an SoC. Repetitive description of like elements employed in respective embodiments of systems and/or apparatus described herein are omitted for sake of brevity.
  • completion message 400 can include MSN field 402, status field 404 and/or return value for Fetch-and-Add CPU instruction 406.
  • the MSN field 402 includes information indicative of the message number.
  • Completion message format 400 can be 64 bytes long in some embodiments to provide efficiency for the memory interface.
  • One or more of the embodiments described herein can extend the typical architecture that employs the use of eight byte counters to 16 byte counters to 64 bytes counters in various embodiments.
  • embodiments described herein can atomically read, modify and write data a substantial number of bytes to facilitate data sharing operations.
  • an atomic update can be performed to 128 bit counters.
  • an atomic update can be performed to forwarding or access control list (ACL) table where there is a set of fields that have to be updated atomically.
  • the ACL table can be a table in the memory that includes a list of permissions associated with an object.
  • the ACL can indicate which users or processes are granted access to a particular object and/or the operations that can be performed on a particular object.
  • one or more interrupt wires 202 or one or more interrupt messages can be employed to transmit notification from AE component 104 to a processor (e.g., one or more of CPUs 110, 112, 114) that one or more responses have been placed in an output queue (e.g., one or more of output queues 108, 206).
  • a processor e.g., one or more of CPUs 110, 112, 114
  • an output queue e.g., one or more of output queues 108, 206.
  • one or more embodiments described herein can allow memory controller 128 to maintain access granularity of 64 bytes while cache coherency is also maintained at a granularity of 64 bytes. Any one of CPUs 110, 112, 114 reading any part of 64 bytes of information from respective caches 122, 124, 126 goes through a coherence protocol for that line. In various different embodiments, any number of different coherence protocols can be employed for the embodiments described herein. If any cache has the line in modified state, the line is provided by the cache and not memory controller 128. Similar to I/O Bridge 102, AE component 104 can perform cache coherency protocol for coherent requests to SoC memory 116 and/or system memory (not shown).
  • FIG. 5 illustrates a flow diagram of an embodiment of a method facilitating multi-word atomic operation support in an SoC.
  • method 500 can include receiving, at a processor, an indicative associated with a calling process.
  • the instruction can be or be included in work message 300.
  • the instruction can include one or more operators and one or more operands.
  • the information indicative of the instruction can be stored in a FIFO input queue prior to the determination at 504 of method 500.
  • method 500 can include determining a first memory width associated with an operator of the instruction and a width of at least one operand of the instruction. For example, from the input queue, the I/O agent of the AE component can identify the operator and the widths of the one or more operands necessary for execution of the instruction. A memory location having a second memory width can then be reserved. In some embodiments, the second memory width is substantially equal to or greater than the first memory width. In some embodiments, the second memory width is substantially equal to or greater than a word width. The result of applying the operator to the operand can be output for collection by the calling process. For example, with reference to FIGs.
  • AE component 104 can apply the operator (e.g., indicated in opcode fields 310, 312, 314) to the one or more operands (e.g., indicated in operand fields 316, 318, 320) and place the result of the application of the operator to the one or more operands in an output queue for collection by the calling process.
  • operator e.g., indicated in opcode fields 310, 312, 314
  • operands e.g., indicated in operand fields 316, 318, 320
  • FIG. 6 illustrates a block diagram of an example electronic computing environment that can be implemented to facilitate multi-word atomic operation support in a SoC.
  • FIG. 6 illustrates an example of a computing system environment 600 in which some aspects of the disclosed subject matter can be implemented, although the computing system environment 600 is one example of a computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
  • FIG. 6 is an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 610.
  • Components of computer 610 may include a processing unit 620, a memory 630, and a system bus 690 that couples various system components including the system memory to the processing unit 620.
  • Computer 610 typically includes a variety of computer readable media.
  • the memory 630 may include computer storage media in the form of volatile and/or nonvolatile memory such as ROM and/or RAM.
  • Memory 630 typically also contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620.
  • the computer 610 may also include other removable/non-removable,
  • the computer 610 can operate in a networked or distributed environment using logical connections to one or more other remote remotes
  • remote computer 670 which can in turn have media capabilities different from device 610.
  • the logical connections depicted in FIG. 6 include a network 680, such local area network (LAN) or a wide area network (WAN), but can also include other networks/buses, either wired or wireless.
  • LAN local area network
  • WAN wide area network
  • Computer-readable media can include hardware media, software media, non-transitory media, or transport media.

Abstract

Systems and methods that facilitate multi-word atomic operation support for systems on chip are described. One method involves: receiving an instruction associated with a calling process, and determining a first memory width associated with execution of the instruction based on an operator of the instruction and a width of at least one operand of the instruction. The instruction can be associated with an atomic operation.

Description

SYSTEMS AND METHODS FACILITATING MULTI-WORD ATOMIC OPERATION SUPPORT FOR SYSTEM ON CHIP ENVIRONMENTS
TECHNICAL FIELD
[0001] This disclosure relates to systems-on-chips (SoCs), systems, and methods facilitating multi-word atomic operating support for SoCs.
BACKGROUND
[0002] Advancements in computing technology and a need for greater data management have led to an increase in fabrication of SoC integrated circuits. SoCs typically integrate several components of a computer on a single chip substrate. Specifically, SoCs integrate analog, mixed-signal, digital and/or radio frequency circuitry on a single chip substrate, and can increase
processing power by using multiple processors and an on-chip interconnection.
[0003] Different types of central processing unit (CPU) instructions can be executed within the SoC architecture. Atomicity in execution of operations is desirable as consumers of data may read an intermediate, erroneous value of a non-atomic operation if reading is performed during execution.
SUMMARY
[0004] In an embodiment, a method involves receiving, at a processor, an instruction associated with a calling process; and determining a first memory width associated with an operator of the instruction and a width of at least one operand of the instruction.
[0005] In another embodiment, a computer method implemented in an SoC is provided. The computer method involves receiving information indicative of an instruction associated with a calling process; and determining a first memory width associated with execution of the instruction.
[0006] In another embodiment, a system on chip comprises a central processing unit configured to execute an instruction associated with a calling process; and an atomic engine component. The atomic engine component is coupled to the central processing unit and configured to: receive the instruction; and determine a first memory width associated with execution of the instruction, based on an operator of the instruction and a width of at least one operand of the instruction.
[0007] One or more embodiments can advantageously provide multiword atomic operation support for system memory and/or for SoC memory. For example, multi-word atomic operation can be facilitated for tables in SoC memory. As used herein, an "atomic" operation is a CPU instruction that executes in a single CPU cycle and/or a CPU instruction for which an operation will complete execution without being interrupted by the actions of another thread. One or more of the embodiments described herein can be employed in or to provide any number of different systems including, but not limited to, data center computers, cloud computing systems, embedded communication processors, enterprise servers (e.g., multiple CPU server systems) or the like.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIG. 1 is a block diagram illustrating an embodiment of an SoC for which multi-word atomic operation support can be facilitated.
[0009] FIG. 2 is a block diagram illustrating an embodiment of an interface system between CPUs and an atomic engine (AE) component of an
SoC facilitating multi-word atomic operation support.
[0010] FIG. 3 is a block diagram illustrating an embodiment of a work message facilitating multi-word atomic operation support in an SoC.
[0011] FIG. 4 is a block diagram illustrating an embodiment of a completion message facilitating multi-word atomic operations support in an
SoC.
[0012] FIG. 5 illustrates a flow diagram of an embodiment of a method facilitating multi-word atomic operation support in an SoC.
[0013] FIG. 6 illustrates a block diagram of an example electronic computing environment that can be implemented to facilitate multi-word atomic operation support in an SoC.
DETAILED DESCRIPTION
[0014] FIG. 1 is a block diagram illustrating an embodiment of an SoC for which multi-word atomic operation support can be facilitated. As used herein, the term "multi-word" atomic operation support means atomic operation support employing a memory location that has a width greater than one word (i.e., 16 bits or 2 bytes). Some embodiments described herein can provide atomic operation support employing a memory location that has a width less than or equal to one word. FIG. 1 is a simplified subset of select components of SoC shown merely for providing context for the embodiments described herein. In various embodiments, alternative or additional components can be included in SoC 100. As used herein, an SoC can be or include server and/or general processor functionality.
[0015] As shown, SoC 100 includes one or more CPUs 110, 112, 114, SoC memory 116, graphics processing unit (GPU) 1 8, radio component 120, caches 122, 124, 126, memory controller 128 and/or input/output (I/O) bridge 102. There is no particular limit to the number of CPUs. In various
embodiments, one or more of CPUs 110, 112, 114, SoC memory 1 16, GPU 118, radio component 120, caches 122, 124, 126, memory controller 128 and/or I/O bridge 102 can be electrically and/or communicatively coupled to one another to facilitate multi-word atomic operation on SoC 100.
[0016] CPUs 110, 112, 114 can be communicatively coupled to respective caches 122, 124, 126 and/or SoC memory 116. Caches 122, 124, 126 can store data duplicating one or more values of data stored in SoC memory 116 in various embodiments.
[00 7] SoC memory 1 6 can be any number of different types of memory including, but not limited to, read only memory (ROM), random access memory (RAM), flash memory and/or electrically erasable programmable read only memory (EEPROM). In some embodiments, SoC memory 116 can be a computer-readable storage medium storing instructions, computer code and/or functions executable by CPUs 110, 112, 114. For example, SoC memory 116 can store instructions, computer code and/or functions executable by CPUs 110, 112, 1 14 described herein to facilitate multi-word atomic operation support. Memory controller 128 includes circuitry that manages and/or controls the flow of data to and/or from SoC memory 1 16. For example, memory controller 128 includes logic for reading from and/or writing to SoC memory 116.
[0018] CPUs 1 0, 112, 114 can include circuitry configured to fetch data from respective caches 122, 124, 126 and/or SoC memory 116, and perform one or more arithmetic or logical operations on the fetched data. In some embodiments, each CPU has a corresponding cache, while in other
embodiments, a subset of CPUs have a corresponding cache. In various embodiments, CPUs 110, 1 12, 114 can be a processor designed by ARM Holdings or a processor having x86 architecture. In one embodiment, for example, one or more of CPUs 110, 112, 114 can be 64-bit server on chip processors designed by ARM Holdings configured to provide server functionality via SoC 100. For example, in some embodiments, SoC 100 can serve data to one or more clients. In other examples, SoC 100 can be or be included in data center computers, cloud computing systems, embedded communication processors, enterprise servers (e.g., multiple CPU server systems) or the like.
[0019] Radio component 120 can include circuitry configured to transmit and/or receive radio frequency (RF) signals to and/or from SoC 100. In various embodiments, radio component 120 can operate according to any number of different telecommunication protocols for communication of voice, video and/or data traffic. For example, radio component 120 can operate according to Wireless Fidelity (Wi-Fi), 4G Long-Term Evolution (LTE) and/or BLUETOOTH® protocols. GPU 118 can include circuitry to process graphics information and/or create visual images for output to a display component of a device associated with SoC 100.
[0020] I/O Bridge 102 can include circuitry facilitating communication between the CPU and/or one or more components on SoC 100. In some embodiments, I/O Bridge 102 can also include circuitry facilitating
communication between SoC 100 and one or more peripheral components communicatively coupled to SoC 100. In some embodiments, for example, I/O Bridge 102 includes a Northbridge component (not shown) that facilitates communication between CPUs 1 10, 112, 114 and one or more other
components of SoC 100. I/O Bridge 102 can also include circuitry providing a Southbridge component (not shown) that facilitates I/O functionality between SoC 100 and one or more peripheral components that can be communicatively coupled to SoC 100.
[0021] Also shown in FIG. 1, I/O Bridge 102 includes AE component 104. AE component 104 includes circuitry that can perform one or more operations to facilitate multi-word atomic support for SoC 100. Input queue 106 and output queue 108 can be communicatively coupled to AE component 104 and/or any number of components of SoC 100 to facilitate processing by AE component 104 for provisioning of the multi-word atomic operation support on SoC 100.
[0022] AE component 104 will be described in greater detail with reference to FIGs. 1, 2, 3, 4 and 5. AE component 104 can include circuitry for receiving, via interface 132, data 130 associated with a calling process. Data 130 can be output to interface 132 from one of CPUs 110, 112, 114. In some embodiments, data 130 can include information indicative of an instruction associated with a calling process.
[0023] FIG. 2 is a block diagram illustrating an embodiment of an interface system between CPUs and an AE component of an SoC facilitating multi-word atomic operation support. Repetitive description of like elements employed in respective embodiments of systems and/or apparatus described herein are omitted for sake of brevity.
[0024] An I/O agent of AE component 104 can receive data 130 for processing. In some embodiments, eight input/output queue pairs can be provided between CPUs 110, 1 12, 114 and AE component 104. In other embodiments, any number of input/output queue pairs can be provided between CPUs 110, 112, 114 and AE component 104. For example, more input/output queue pairs can be provided to provide atomic operation support for SoCs facilitating data-intensive functionality.
[0025] After receipt of data 130 (e.g., the instruction associated with the calling process), AE component 104 can identify one or more operators and one or more operands of the instruction. For example, AE component 104 can evaluate the information indicative of the instruction and determine the number and/or type of operands embodied in the instruction and/or the number or width of the operands embodied in the instruction. By way of example, the instruction can include a Fetch-and-Add operator and an operand having a value indicative of amount to be added to an existing value in a designated memory location.
[0026] In some embodiments, AE component 104 can place information indicative of the instruction, information indicative of one or more operators and/or information indicative of one or more operands in input queue 106. In some embodiments, input queue 106 is a First in, First out (FIFO) queue.
Accordingly, in these embodiments in which input queue 106 is a FIFO queue, instructions can be processed in the order in which they were received with the oldest instruction received being processed before the more-recently received instructions. In some embodiments, input queue 106 can be a work queue that can receive one or more scheduled tasks/services associated with the instruction.
[0027] An example of the information output to input queue 106 by AE component 104 can be as shown in FIG. 3. FIG. 3 is a block diagram
illustrating an embodiment of a format for a work message facilitating multi-word atomic operation support in an SoC. Repetitive description of like elements employed in respective embodiments of systems and/or apparatus described herein are omitted for sake of brevity.
[0028] In FIG. 3, one or more portions of the instruction received by AE component 104 can include the format or content shown as work message 300. In some embodiments, the interface between CPU 110 and AE component 104 can be AtomicUpdateCmd Message based.
[0029] In some embodiments, work message 300 can be 32 bytes long while, in other embodiments, work message 300 can be 64 bytes long. Work message 300 can include a number of different fields, which can be in any order and/or can be adjacent or non-adjacent one another. The embodiment of work message 300 shown includes message format field 302, opcode field 304, operand field 306 and message sequence number (MSN) field 308.
[0030] Message format field 302 can be provided in a byte of information of work message 300. For example, message format field can be the first byte of information in work message 300, and can describe the layout (e.g., number of bytes, etc.) of work message 300.
[0031] Opcode field 304 can include information specifying one or more operations to be performed as part of the instruction. The operations can be extracted from work message 300 by AE component 104 and identified as operations to be performed via the instruction. In the example shown, opcode field 304 includes a Fetch-and-Add CPU instruction 310, an Update CPU instruction 312 and a Compare-and-Swap CPU instruction 314. In other embodiments, one or more of the instructions can be provided in opcode field 304 and/or one or more different instructions can be provided in opcode field 304. In various embodiments, instructions indicated in opcode field 304 can be atomic instructions. [0032] Fetch-and-Add CPU instruction 310 is an atomic instruction that can increment the contents of a memory location. For example, the memory location can be specified by an address and the Fetch-and-Add CPU instruction 310 can atomically modify the information at the memory location. A Fetch-and- Add CPU instruction 310 increments the value at the memory location by an amount indicated by the Fetch-and-Add operand 316 within operand field 306.
[0033] In some embodiments, the Fetch-and-Add CPU instruction 310 can be 16 bytes in embodiments in which the work message format 300 is 32 bytes long. The 16 bytes Fetch-and-Add CPU instruction 310 can include a Fetch-and-Add 0 CPU instruction (not shown), which can be employed for an atomic read while an Ld SIMD can be employed for 128 bit read from a memory location. As used herein, "Ld" represents a load instruction and "SIMD" represents a single instruction multiple data operation.
[0034] In some embodiments, as shown, opcode field 304 can also include an Update CPU instruction 312 and a Compare-and-Swap CPU instruction 314. Update CPU instruction 312 can update information at a memory location with a new value indicated by the information stored at Update operand 318.
[0035] Compare-and-Swap CPU instruction 314 can be an atomic instruction that can compare the contents of a memory location to a given value and, only if the value and the contents of the memory location are the same, store a third value into the memory location.
[0036] Operand field 306 can include a Fetch-and-Add CPU operand 316, Update operand 318 and/or Compare-and-Swap CPU operand 320. The operand can be one or more values employed in executing the corresponding CPU instruction indicated by opcode 304.
[0037] In some embodiments, Fetch-and-Add operand can be 16 bytes long with a 42 bit pointer in some embodiments. The Update operand 318 can be from five bytes to 48 bytes in some embodiments. In some embodiments, Update operand 318 can be larger if necessary. The start pointer can be 41 bytes long in some embodiments.
[0038] In some embodiments, the Compare-and-Swap operand 320 can be the value with which the value in the memory location is compared and/or the new value that is provided in the memory location if the values are the same. The Compare-and-Swap operand 320, can from five bytes to 48 bytes (or larger if necessary). The start pointer can be 41 bytes long and the swap data can be up to 48 bytes long. The pointer to the compare data can be 41 bytes long.
[0039] MSN field 308 can include information indicative of a sequence number of the message. MSN field 308 can be 15 bytes long in some embodiments. In some embodiments, the information stored at MSN field 308 can be employed to determine the position of work message 300 in a set of work messages in input queue 106.
[0040] As a function of the determined width, the I/O agent of AE component 104 can reserve (or place a hold) and/or lock a memory location. The AE component 104 can determine the width of the memory location based on the width of one or more of the operands in some embodiments. For example, AE component can select a width for the memory location that is greater than or equal to the width of the operand.
[0041] For example, AE component 104 can evaluate the one or more operators (e.g., addition operator associated with Fetch-and-Add CPU instruction) for the instruction and the width of one or more of the operands (e.g., Fetch-and-Add operand 316) and determine a width of a memory location for reservation by AE component 104. In some embodiments, AE component 104 can reserve and lock a memory location of one or more of cache 122, 124, 126 and/or SoC memory 116 (or one or more tables of SoC memory 116) having a width that is greater than or equal to the width of one or more of the operands. In other embodiments, AE component 104 can reserve and lock a memory location of cache 122, 124, 126 and/or SoC memory 116 having a width that is greater than or equal to the width of the result of execution of the instruction.
[0042] In some embodiments, AE component 104 can determine the width of the memory location based on the width of one or more of the operands and at least one operator of the instruction. For example, AE component can select a width for the memory location that is greater than or equal to the width of the result of applying the operator to one or more operands.
[0043] In some embodiments, AE component 104 can determine the width of the memory location based on the width of the result of executing the instruction. For example, AE component can select a width for the memory location that is greater than or equal to the width of the result of executing the instruction.
[0044] In various different embodiments, the width of the memory location determined by AE component 104 for reservation and/or locking has as width greater than one word, equal to one word or less than one word. Accordingly, a memory location having width of multiple word lengths can be reserved to facilitate atomic operation support on the SoC (e.g., SoC 100).
[0045] AE component 104 can apply the one or more operators to the one or more operands of the instruction and output the result to output queue 108. In some embodiments, AE component 104 can execute the instruction and output the result to output queue 108. In either embodiment, the result can be output for collection by the calling process associated with the instruction.
[0046] In some embodiments, the result can be output in the form of (e.g., including one or more of the fields of) completion message 400. FIG. 4 is a block diagram illustrating an embodiment of a completion message facilitating multi-word atomic operation support in an SoC. Repetitive description of like elements employed in respective embodiments of systems and/or apparatus described herein are omitted for sake of brevity.
[0047] In FIG. 4, completion message 400 can include MSN field 402, status field 404 and/or return value for Fetch-and-Add CPU instruction 406. In some embodiments, the MSN field 402 includes information indicative of the message number. Completion message format 400 can be 64 bytes long in some embodiments to provide efficiency for the memory interface.
[0048] One or more of the embodiments described herein can extend the typical architecture that employs the use of eight byte counters to 16 byte counters to 64 bytes counters in various embodiments. By way of example, embodiments described herein can atomically read, modify and write data a substantial number of bytes to facilitate data sharing operations.
[0049] In some embodiments, for example, an atomic update can be performed to 128 bit counters. In some embodiments, an atomic update can be performed to forwarding or access control list (ACL) table where there is a set of fields that have to be updated atomically. The ACL table can be a table in the memory that includes a list of permissions associated with an object. The ACL can indicate which users or processes are granted access to a particular object and/or the operations that can be performed on a particular object.
[0050] In FIG. 2, one or more interrupt wires 202 or one or more interrupt messages can be employed to transmit notification from AE component 104 to a processor (e.g., one or more of CPUs 110, 112, 114) that one or more responses have been placed in an output queue (e.g., one or more of output queues 108, 206).
[0051] Referring to FIGs. 1 and 2, one or more embodiments described herein can allow memory controller 128 to maintain access granularity of 64 bytes while cache coherency is also maintained at a granularity of 64 bytes. Any one of CPUs 110, 112, 114 reading any part of 64 bytes of information from respective caches 122, 124, 126 goes through a coherence protocol for that line. In various different embodiments, any number of different coherence protocols can be employed for the embodiments described herein. If any cache has the line in modified state, the line is provided by the cache and not memory controller 128. Similar to I/O Bridge 102, AE component 104 can perform cache coherency protocol for coherent requests to SoC memory 116 and/or system memory (not shown).
[0052] FIG. 5 illustrates a flow diagram of an embodiment of a method facilitating multi-word atomic operation support in an SoC. As shown, at 502, method 500 can include receiving, at a processor, an indicative associated with a calling process. For example, with reference to FIG. 3, the instruction can be or be included in work message 300. The instruction can include one or more operators and one or more operands. In some embodiments, the information indicative of the instruction can be stored in a FIFO input queue prior to the determination at 504 of method 500.
[0053] At 504, method 500 can include determining a first memory width associated with an operator of the instruction and a width of at least one operand of the instruction. For example, from the input queue, the I/O agent of the AE component can identify the operator and the widths of the one or more operands necessary for execution of the instruction. A memory location having a second memory width can then be reserved. In some embodiments, the second memory width is substantially equal to or greater than the first memory width. In some embodiments, the second memory width is substantially equal to or greater than a word width. The result of applying the operator to the operand can be output for collection by the calling process. For example, with reference to FIGs. 3 and 4, AE component 104 can apply the operator (e.g., indicated in opcode fields 310, 312, 314) to the one or more operands (e.g., indicated in operand fields 316, 318, 320) and place the result of the application of the operator to the one or more operands in an output queue for collection by the calling process.
[0054] The techniques described herein can be applied to any device and/or network in which multi-word atomic operation support is desirable in a multiprocessor system. It is to be understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the embodiments, e.g., anywhere that a device may wish to implement power management for a multiprocessor system. Accordingly, the below general purpose remote computer described below in FIG. 6 is one example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction.
[0055] FIG. 6 illustrates a block diagram of an example electronic computing environment that can be implemented to facilitate multi-word atomic operation support in a SoC. FIG. 6 illustrates an example of a computing system environment 600 in which some aspects of the disclosed subject matter can be implemented, although the computing system environment 600 is one example of a computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
[0056] FIG. 6 is an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 610. Components of computer 610 may include a processing unit 620, a memory 630, and a system bus 690 that couples various system components including the system memory to the processing unit 620.
[0057] Computer 610 typically includes a variety of computer readable media. The memory 630 may include computer storage media in the form of volatile and/or nonvolatile memory such as ROM and/or RAM. Memory 630 typically also contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. The computer 610 may also include other removable/non-removable,
volatile/nonvolatile computer storage media.
[0058] The computer 610 can operate in a networked or distributed environment using logical connections to one or more other remote
computer(s), such as remote computer 670, which can in turn have media capabilities different from device 610. The logical connections depicted in FIG. 6 include a network 680, such local area network (LAN) or a wide area network (WAN), but can also include other networks/buses, either wired or wireless.
[0059] The disclosed subject matter can be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed subject matter. Computer-readable media can include hardware media, software media, non-transitory media, or transport media.

Claims

CLAIMS What is claimed is:
1. A method, comprising:
receiving, at a processor, data associated with a calling process; and determining a first memory width associated with execution of the data based on an operator of the data and a width of at least one operand of the data.
2. The method of claim 1 , further comprising:
reserving and locking a memory location comprising a second memory width that is substantially equal to or greater than the first memory width;
outputting a result of applying the operator to the at least one operand for collection by the calling process; and
storing the data in a First in, First out input queue prior to the
determining.
3. The method of claim 1 , wherein the data comprises a message comprising: a first field identifying the operator, a second field identifying the operand, and a third field identifying a sequence number for the message.
4. The method of claim 1 , wherein the data is associated with an atomic operation, and further comprising:
evaluating information indicative of the data and determining at least one of a quantity of operands embodied in the data or a type of operands embodied in the data.
5. A system on chip, comprising:
a central processing unit configured to execute data associated with a calling process; and
an atomic engine component coupled to the central processing unit and configured to:
receive the data; and
determine a first memory width associated with execution of the data based on an operator of the data and a width of at least one operand of the data.
6. The system on chip of claim 5, wherein the atomic engine component is further configured to: reserve and lock a memory location comprising a second memory width that is substantially equal to or greater than the first memory width; and
output a result of applying the operator to the at least one operand for collection by the calling process.
7. The system on chip of claim 5, wherein the data comprises a message comprising: a first field identifying the operator, a second field identifying the operand, and a third field identifying a sequence number for the message.
8. A computer method implemented in a system on chip, comprising: receiving information indicative of an instruction associated with a calling process; and
determining a first memory width associated with execution of the instruction based on an operator of the instruction and a width of at least one operand of the instruction.
9. The computer method of claim 8, further comprising one of:
reserving and locking a memory location having a second memory width that is less than the first memory width; or
reserving and locking a memory location having a second memory width that is approximately equal to or greater than a word length.
10. The computer method of claim 9, further comprising:
outputting the instruction to an input queue prior to the reserving and locking, wherein the input queue is a work input queue or a First in, First out input queue.
PCT/US2015/042595 2015-07-29 2015-07-29 Systems and methods facilitating multi-word atomic operation support for system on chip environments WO2017019054A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/042595 WO2017019054A1 (en) 2015-07-29 2015-07-29 Systems and methods facilitating multi-word atomic operation support for system on chip environments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/042595 WO2017019054A1 (en) 2015-07-29 2015-07-29 Systems and methods facilitating multi-word atomic operation support for system on chip environments

Publications (1)

Publication Number Publication Date
WO2017019054A1 true WO2017019054A1 (en) 2017-02-02

Family

ID=57884808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/042595 WO2017019054A1 (en) 2015-07-29 2015-07-29 Systems and methods facilitating multi-word atomic operation support for system on chip environments

Country Status (1)

Country Link
WO (1) WO2017019054A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11620222B2 (en) 2020-08-28 2023-04-04 Samsung Electronics Co., Ltd. Methods and apparatus for atomic operations with multiple processing paths

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040042251A1 (en) * 2002-08-30 2004-03-04 Hee Bok Kang Nonvolatile ferroelectric memory control device
US20090086737A1 (en) * 2007-09-29 2009-04-02 Applied Micro Circuits Corporation System-on-chip communication manager
WO2011065618A1 (en) * 2009-11-26 2011-06-03 서울대학교 산학협력단 Network-on-chip system including active memory processor
US20140089259A1 (en) * 2011-06-01 2014-03-27 Huawei Technologies Co., Ltd. Operation method and apparatus for data storage system
US20140156907A1 (en) * 2012-12-05 2014-06-05 Douglas A. Palmer Smart memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040042251A1 (en) * 2002-08-30 2004-03-04 Hee Bok Kang Nonvolatile ferroelectric memory control device
US20090086737A1 (en) * 2007-09-29 2009-04-02 Applied Micro Circuits Corporation System-on-chip communication manager
WO2011065618A1 (en) * 2009-11-26 2011-06-03 서울대학교 산학협력단 Network-on-chip system including active memory processor
US20140089259A1 (en) * 2011-06-01 2014-03-27 Huawei Technologies Co., Ltd. Operation method and apparatus for data storage system
US20140156907A1 (en) * 2012-12-05 2014-06-05 Douglas A. Palmer Smart memory

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11620222B2 (en) 2020-08-28 2023-04-04 Samsung Electronics Co., Ltd. Methods and apparatus for atomic operations with multiple processing paths

Similar Documents

Publication Publication Date Title
US8056080B2 (en) Multi-core/thread work-group computation scheduler
KR20180053359A (en) Efficient scheduling of multi-version tasks
US9164799B2 (en) Multiprocessor system
US8464269B2 (en) Handling and reporting of object state transitions on a multiprocess architecture
DE112011101725T5 (en) Sub-buffer objects
DE102018129341A1 (en) Method and apparatus for multi-load and multi-store vector instructions
CN108459913B (en) Data parallel processing method and device and server
KR20070108329A (en) Hardware sharing system and method
US8626799B2 (en) Mapping data structures
US10067763B2 (en) Handling unaligned load operations in a multi-slice computer processor
WO2016181648A1 (en) Accelerator control device, accelerator control method, and recording medium
US20160224268A1 (en) Extendible input/output data mechanism for accelerators
US9582340B2 (en) File lock
US10310857B2 (en) Systems and methods facilitating multi-word atomic operation support for system on chip environments
DE112013007703T5 (en) Command and logic for identifying instructions for retirement in a multi-stranded out-of-order processor
WO2017019054A1 (en) Systems and methods facilitating multi-word atomic operation support for system on chip environments
US20170192790A1 (en) Providing task-triggered determinisitic operational mode for simultaneous multi-threaded superscalar processor
US20120159640A1 (en) Acquiring Access To A Token Controlled System Resource
CN105912394B (en) Thread processing method and system
US9223699B2 (en) Cache management in managed runtime environments
US20170161114A1 (en) Method and apparatus for time-based scheduling of tasks
CN115904644A (en) Task scheduling method, electronic device and computer program product
US10768902B2 (en) Actor model programming
US20150363903A1 (en) Wavefront Resource Virtualization
CN110955546B (en) Memory address monitoring method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15899845

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.05.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15899845

Country of ref document: EP

Kind code of ref document: A1