US20060179196A1 - Priority registers for biasing access to shared resources - Google Patents

Priority registers for biasing access to shared resources Download PDF

Info

Publication number
US20060179196A1
US20060179196A1 US11/051,148 US5114805A US2006179196A1 US 20060179196 A1 US20060179196 A1 US 20060179196A1 US 5114805 A US5114805 A US 5114805A US 2006179196 A1 US2006179196 A1 US 2006179196A1
Authority
US
United States
Prior art keywords
register
priority
processor core
die
shared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/051,148
Other versions
US7380038B2 (en
Inventor
Jan Gray
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US11/051,148 priority Critical patent/US7380038B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRAY, JAN STEPHEN
Publication of US20060179196A1 publication Critical patent/US20060179196A1/en
Application granted granted Critical
Publication of US7380038B2 publication Critical patent/US7380038B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1663Access to shared memory

Definitions

  • the present invention relates generally to multicore processors. More particularly, the present invention relates to priority registers for biasing allocation of or access to shared resources, and still more particularly, to registers that store priority values which bias an arbiter performing arbitration among a plurality of processor cores competing for the shared resources.
  • transistors have been getting smaller and faster.
  • a typical consumer microprocessor can contain over 100 million transistors subsisting on a die no bigger than a hundred square millimeters.
  • it can handle clock speeds in the range of 3 GHz.
  • processors Another concern in current microprocessors is the relatively slow connection between the processor and main memory.
  • a typical processor runs several hundred times faster than information can be fetched from memory, so that a processor waits an eternity, relatively speaking, for data to arrive from memory.
  • One solution to these problems is to exploit parallelism by dividing a processing chip into multiple cores. For example, a hypothetical notebook processor might have eight cores, where a program customized for such a chip could present many threads of execution, each running simultaneously on a different core.
  • each core will have its own local resources, such as register files, branch predictors, and local caches, and it will also share resources with other cores, such as on-die L3 caches, memory channels, and possibly shared functional units.
  • shared resources may need to be arbitrated not only in a fair and neutral way, as in the case of balanced parallel software codes, but also in a biased manner, as when some cores are running main user computation threads while other cores are running lower priority or “housekeeping” threads. It would be advantageous to provide a mechanism for software to manage, influence, or bias arbitration of such shared resources among a plurality of cores running threads of differing performance requirements.
  • processor cores located on a die typically have their own local computing resources, but they may also share resources with other processor cores on the die.
  • a first priority register is provided, where the first priority register corresponds to a first processor core on the die.
  • the die may contain a second processor core which has its own corresponding second priority register. Values are stored in both the first priority register and the second priority register. These values are then used to bias the access of the two processor cores to shared resources, so that the first processor core accesses shared resources at the expense of the second processor core, or vice versa.
  • the priority registers bias an arbiter allocating access to shared resources.
  • the priority registers tag shared resource access signals emanating from processor cores. Threads running on the processor cores with higher tagged values receive proportionately more access to shared resources than threads with lower tagged values. For example, one thread could receive 75% of the shared resources while a lower priority thread would receive the remaining 25% of shared resources. Values stored in the priority registers can be constant or can be relative to other priority registers.
  • the operating system sets the values in the priority registers, which can be privileged read and write registers.
  • the values can be updated based on a variety of events, such as context switches.
  • changing software conditions such as process or thread reprioritization and rescheduling
  • hardware resource arbitration decisions which in turn ensures higher perceived system performance, because higher priority tasks receive preferential access to shared resources.
  • FIG. 1A provides a schematic diagram of an exemplary networked or distributed computing environment
  • FIG. 1B provides a brief general description of a suitable computing device in connection with which the invention may be implemented
  • FIG. 2 illustrates competition among processor cores for shared resources, where the shared resources may be located on the die or off the die;
  • FIG. 3 illustrates shared resource arbitration among processor cores with priority registers, where the priority registers bias an arbiter
  • FIG. 4A illustrates a high level view of a multi-core system, where an arbiter provides access to shared resources
  • FIG. 4B illustrates a detailed view of a multi-core system, where individual thread contexts are assigned priority registers
  • FIG. 5 illustrates typical values assigned to priority registers, and how these values determine shared resource allocation
  • FIG. 6 illustrates the setting of values in priority registers using software, specifically, the operating system, in order to bias shared resource allocation
  • FIG. 7 illustrates an exemplary implementation of the shared resource allocation process.
  • FIG. 1A a networked computing environment is set forth as illustrated in FIG. 1A .
  • This networked computing environment is an extension of the basic computing environment illustrated in FIG. 1B , which is suitable to implement the software and/or hardware techniques associated with the invention.
  • FIGS. 2 through 7 illustrate various aspects of the invention.
  • FIG. 2 illustrates processor cores competing for shared resources, whether they are local shared resources or off-die shared resources.
  • FIG. 3 focuses on the arbitration process among the various processors. Such an arbitration process is biased by the values stored in the priority registers.
  • FIGS. 4A and 4B present a high-level and a detailed view, respectively, of certain aspects of the invention.
  • FIG. 4A illustrates how an arbiter arbitrates between two processor cores access to shared resources.
  • FIG. 4B gives a more detailed illustration of a multi-threaded system, with multiple thread contexts, where each thread is tagged by some priority register value.
  • FIG. 5 demonstrates how priority register values bias the arbitration process performed by an arbiter. Thus, specific values are given and the priority relationship between various threads is determined.
  • FIG. 6 illustrates an operating system that can set the priority register values based on context switches or on apparent need.
  • FIG. 7 illustrates an exemplary implementation of the resource allocation process, where the operating system sets the priority register values and the arbiter arbitrates the available shared resources.
  • FIG. 1A provides a schematic diagram of an exemplary networked or distributed computing environment 100 a .
  • the distributed computing environment 100 a comprises computing objects 10 a , 10 b , etc. and computing objects or devices 110 a , 110 b , 110 c , etc. These objects may comprise programs, methods, data stores, programmable logic, etc.
  • the objects may comprise portions of the same or different devices such as PDAs, televisions, MP3 players, personal computers, etc.
  • Each object can communicate with another object by way of the communications network 14 .
  • This network may itself comprise other computing objects and computing devices that provide services to the system of FIG. 1A , and may itself represent multiple interconnected networks.
  • each object 10 a , 10 b , etc. or 110 a , 110 b , 110 c , etc. may contain an application that might make use of an API, or other object, software, firmware and/or hardware, to request use of the processes used to implement the object persistence methods of the present invention.
  • an object such as 110 c
  • an object such as 110 c
  • the physical environment depicted may show the connected devices as computers, such illustration is merely exemplary and the physical environment may alternatively be depicted or described comprising various digital devices such as PDAs, televisions, MP3 players, etc., software objects such as interfaces, COM objects and the like.
  • computing systems may be connected together by wired or wireless systems, by local networks or widely distributed networks.
  • networks are coupled to the Internet, which provides the infrastructure for widely distributed computing and encompasses many different networks. Any of the infrastructures may be used for exemplary communications made incident to the present invention.
  • the Internet commonly refers to the collection of networks and gateways that utilize the TCP/IP suite of protocols, which are well-known in the art of computer networking.
  • TCP/IP is an acronym for “Transmission Control Protocol/Internet Protocol.”
  • the Internet can be described as a system of geographically distributed remote computer networks interconnected by computers executing networking protocols that allow users to interact and share information over the network(s). Because of such wide-spread information sharing, remote networks such as the Internet have thus far generally evolved into an open system for which developers can design software applications for performing specialized operations or services, essentially without restriction.
  • the network infrastructure enables a host of network topologies such as client/server, peer-to-peer, or hybrid architectures.
  • the “client” is a member of a class or group that uses the services of another class or group to which it is not related.
  • a client is a process, i.e., roughly a set of instructions or tasks, that requests a service provided by another program.
  • the client process utilizes the requested service without having to “know” any working details about the other program or the service itself.
  • a client/server architecture particularly a networked system
  • a client is usually a computer that accesses shared network resources provided by another computer, e.g., a server.
  • computers 110 a , 110 b , etc. can be thought of as clients and computer 10 a , 10 b , etc. can be thought of as servers, although any computer could be considered a client, a server, or both, depending on the circumstances. Any of these computing devices may be processing data in a manner that implicates the object persistence techniques of the invention.
  • a server is typically a remote computer system accessible over a remote or local network, such as the Internet.
  • the client process may be active in a first computer system, and the server process may be active in a second computer system, communicating with one another over a communications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server.
  • Any software objects utilized pursuant to the persistence mechanism of the invention may be distributed across multiple computing devices.
  • Client(s) and server(s) may communicate with one another utilizing the functionality provided by a protocol layer.
  • HTTP HyperText Transfer Protocol
  • WWW World Wide Web
  • a computer network address such as an Internet Protocol (IP) address or other reference such as a Universal Resource Locator (URL) can be used to identify the server or client computers to each other.
  • IP Internet Protocol
  • URL Universal Resource Locator
  • Communication can be provided over any available communications medium.
  • FIG. 1A illustrates an exemplary networked or distributed environment 100 a , with a server in communication with client computers via a network/bus, in which the present invention may be employed.
  • the network/bus 14 may be a LAN, WAN, intranet, the Internet, or some other network medium, with a number of client or remote computing devices 110 a , 110 b , 110 c , 110 d , 110 e , etc., such as a portable computer, handheld computer, thin client, networked appliance, or other device, such as a VCR, TV, oven, light, heater and the like in accordance with the present invention. It is thus contemplated that the present invention may apply to any computing device in connection with which it is desirable to maintain a persisted object.
  • the servers 10 a , 10 b , etc. can be servers with which the clients 110 a , 110 b , 110 c , 110 d , 110 e , etc. communicate via any of a number of known protocols such as HTTP.
  • Servers 10 a , 10 b , etc. may also serve as clients 110 a , 110 b , 110 c , 110 d , 110 e , etc., as may be characteristic of a distributed computing environment 100 a.
  • Communications may be wired or wireless, where appropriate.
  • Client devices 110 a , 110 b , 110 c , 110 d , 110 e , etc. may or may not communicate via communications network/bus 14 , and may have independent communications associated therewith. For example, in the case of a TV or VCR, there may or may not be a networked aspect to the control thereof.
  • Any computer 10 a , 10 b , 110 a , 110 b , etc. may be responsible for the maintenance and updating of a database, memory, or other storage element 20 for storing data processed according to the invention.
  • the present invention can be utilized in a computer network environment 100 a having client computers 110 a , 110 b , etc. that can access and interact with a computer network/bus 14 and server computers 10 a , 10 b , etc. that may interact with client computers 110 a , 110 b , etc. and other like devices, and databases 20 .
  • FIG. 1B and the following discussion are intended to provide a brief general description of a suitable computing device in connection with which the invention may be implemented.
  • any of the client and server computers or devices illustrated in FIG. 1B may take this form.
  • handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the present invention, i.e., anywhere from which data may be generated, processed, received and/or transmitted in a computing environment.
  • a general purpose computer is described below, this is but one example, and the present invention may be implemented with a thin client having network/bus interoperability and interaction.
  • the present invention may be implemented in an environment of networked hosted services in which very little or minimal client resources are implicated, e.g., a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.
  • a networked environment in which the client device serves merely as an interface to the network/bus such as an object placed in an appliance.
  • the invention can be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application or server software that operates in accordance with the invention.
  • Software may be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices.
  • program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types.
  • the functionality of the program modules may be combined or distributed as desired in various embodiments.
  • the invention may be practiced with other computer system configurations and protocols.
  • PCs personal computers
  • automated teller machines server computers
  • hand-held or laptop devices multi-processor systems
  • microprocessor-based systems programmable consumer electronics
  • network PCs appliances
  • lights environmental control elements
  • minicomputers mainframe computers and the like.
  • FIG. 1B thus illustrates an example of a suitable computing system environment 100 b in which the invention may be implemented, although as made clear above, the computing system environment 100 b is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 b be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100 b.
  • an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110 .
  • Components of computer 110 may include, but are not limited to, a processing unit 120 , a system memory 130 , and a system bus 121 that couples various system components including the system memory to the processing unit 120 .
  • the system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus (also known as Mezzanine bus).
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Computer 110 typically includes a variety of computer readable media.
  • Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may comprise computer storage media and communication media.
  • Computer storage media include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110 .
  • Communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
  • the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132 .
  • ROM read only memory
  • RAM random access memory
  • BIOS basic input/output system
  • RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120 .
  • FIG. 1B illustrates operating system 134 , application programs 135 , other program modules 136 , and program data 137 .
  • the computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
  • FIG. 8 illustrates a hard disk drive 141 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152 , and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 , such as a CD-RW, DVD-RW or other optical media.
  • removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM and the like.
  • the hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140
  • magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150 .
  • hard disk drive 141 is illustrated as storing operating system 144 , application programs 145 , other program modules 146 and program data 147 . Note that these components can either be the same as or different from operating system 134 , application programs 135 , other program modules 136 and program data 137 . Operating system 144 , application programs 145 , other program modules 146 and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 110 through input devices such as a keyboard 162 and pointing device 161 , such as a mouse, trackball or touch pad.
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • a graphics interface 182 may also be connected to the system bus 121 .
  • One or more graphics processing units (GPUs) 184 may communicate with graphics interface 182 .
  • a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190 , which may in turn communicate with video memory 186 .
  • computers may also include other peripheral output devices such as speakers 197 and printer 196 , which may be connected through an output peripheral interface 195 .
  • the computer 110 may operate in a networked or distributed environment using logical connections to one or more remote computers, such as a remote computer 180 .
  • the remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110 , although only a memory storage device 181 has been illustrated in FIG. 1B .
  • the logical connections depicted in FIG. 1B include a local area network (LAN) 171 and a wide area network (WAN) 173 , but may also include other networks/buses.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in homes, offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170 .
  • the computer 110 When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173 , such as the Internet.
  • the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160 , or other appropriate mechanism.
  • program modules depicted relative to the computer 110 may be stored in the remote memory storage device.
  • FIG. 1B illustrates remote application programs 185 as residing on memory device 181 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • the priority register seeks to bias arbitration for shared resources in favor of one processor core at the expense of another processor core.
  • each core's priority register value is applied, or “tagged,” to the shared resource access signals emanating from that core.
  • the shared resource's arbiter may compare these priority tag values to determine which processor core should receive access to the shared resource.
  • the operating system can set a per core priority value in the register based on context switches or whenever an adjustment is proper. The notion is not to starve lower priority requests but rather to bias higher priority requests in such a way that they receive, say, 75% of the available resources, and the lower requests receive 25%.
  • FIG. 2 illustrates one aspect of the invention, where processor cores are located on a common die and have to compete for shared resources 200 .
  • a first processor core A 202 and a second processor core B 204 constitute a dual-core processor design stored on a die 201 .
  • Numerous processor cores can be stored on the die 201 , but for simplicity only two are shown.
  • this aspect of the invention also applies to a multiple dice architecture, where a plurality of processor cores are stored on each die, and multiple dice are operatively connected.
  • each processor core has its own local resources and also shares non-local resources.
  • processor core A 202 has its local resources A 206 but also shares local shared resources 210 .
  • processor core B 204 has its own local resources B 208 but shares with processor core A 202 the shared resources 210 .
  • resources are not limited to on-die (on-chip) resources, namely local resources A 206 , local resources B 208 , and local shared resources 210 , but also include off-die (off-chip) shared resources 212 .
  • Typical local resources such as resources A 206 and B 208 , include register files, translation lookaside buffers (TLBs), branch predictors, local instruction caches (I-caches), data caches (D-caches), and the like.
  • Typical on-die shared resources such as local shared resources 210 , include shared on-die L3 caches, the individual cache lines they manage, external memory controllers (also known as external memory channels), shared functional units, and so on.
  • Typical off-die shared resources include external caches and network interface controllers.
  • processor core A 202 can potentially compete with processor core B 204 for the shared on-chip shared resources 210 and off-die shared resources 212 . Such competition must be arbitrated or scheduled if multiple processor core are to work in the most productive and efficient manner.
  • FIG. 3 illustrates, in one aspect of the invention, how processor core resource arbitration on a die 301 is implemented 300 .
  • An arbiter 303 takes input from at least two priority registers, in this case, priority register A 306 and priority register B 308 , respectively.
  • Individual register inputs represent some values, either constant or relative values, that let the arbiter 303 know which processor core has higher priority.
  • Each register is associated to a processor core.
  • register A 306 is associated to processor core A 302
  • register B is associated to processor core B 304 .
  • the arbiter 303 decides which processor core will have access to the local shared resources 310 or to the off-die shared resources 312 .
  • the arbiter 303 allocates resources based on the register inputs, namely, register A 306 and register B 308 values.
  • register A 306 has a higher priority value than register B 308
  • processor core A 302 will have priority over processor core B 304 to access the local shared resources 310 or the off-die shared resources 312 .
  • register B 308 has higher priority than register A 306 , then the reverse is true. This is just one exemplary way to distribute resources, and equivalent devices to an arbitrator can be used.
  • each resource has its own arbiter.
  • a local shared resource has an arbiter and an off-die resource has an arbiter.
  • Each arbiter can sit on top of its respective resource and evaluate priority values that are tagged to resource access signals emanating from processor cores sharing a resource.
  • the off-die resources 312 can have their own arbiter (not shown). More specifically, each resource can have its own arbiter.
  • the illustrated off-die resources 312 are an abstraction and in fact may comprise of any number of individual off-die resources, each of which can have its own arbiter.
  • the arbiter 303 with priority register A 306 and priority register B 308 affords biased resource arbitration. This means that the arbitration process is no longer necessarily fair and neutral but rather favors one processor core over another processor core. As mentioned above, this biasing is accomplished through the priority registers, where the registers could be privileged read and write registers that can be loaded with priorities.
  • FIG. 4A illustrates a high-level view of an aspect of the present invention 400 a , and it should be compared to FIG. 4B , as it is described below.
  • the priority registers 406 and 408 tag resource request signals emanating from the corresponding processor cores A 402 and B 404 , respectively, with certain priority values. Based on these values, the arbiter 410 decides the amount of access processor core A 402 will have and the amount of access that processor core B 404 will have to the available shared resources 412 .
  • the amount of access can be biased such that, for example, processor core A 402 will have access 75% of the time and processor B 404 will have access 25% of the time.
  • processor core A 402 will 100% of access to the shared resources 412 or that processor core B 404 will have 100% of access to the shared resources 412 .
  • processor core A 402 will have had 75% of the shared resources 412 and processor core B 404 will have had 25% of the shared resources 412 .
  • priority values in the registers are not fixed and can be changed by the operating system according to need.
  • an operating system may host a number of processes. Each process may consist of several (software) threads, that typically share the processes' resources including address space, but which have distinct copies of a processor's “architected state”, e.g. program counter and register file, as well some memory designated for a subroutine call stack, and other software “per-thread state.”
  • architected state e.g. program counter and register file
  • One responsibility of the operating system is to schedule these processes' software threads—any given thread may be assigned to a given processor core.
  • a given core may contain a plurality of thread contexts. Each thread context may store one thread's architected state (as described above).
  • a given process's threads may be scheduled to run on several processor cores, or on several thread contexts (i.e. hardware threads) within a single processor core in a chip multiprocessor.
  • FIG. 4B illustrates, especially in contrast to FIG. 4A , is processor cores with a plurality of hardware thread contexts 400 b .
  • a processor core hosts one software thread at a time.
  • a multithreaded core may host a multitude of software threads via its multitude of hardware thread contexts.
  • processor core A 402 has three thread contexts, namely, thread context A 403 , thread context B 405 , and thread context C 407 .
  • Each of these thread contexts has its corresponding priority register, register A 415 , register B 417 , and register C 419 , respectively.
  • Resource access requests that result from computations on each thread context are tagged with a software thread's priority register value.
  • priority registers may be set as follows: ⁇ A 1 : 10 ; A 2 : 20 ; B 1 : 15 ; B 2 : 25 ⁇ .
  • processor core A 402 running thread A 2 issues a request R 1 (A) with priority PR(A 2 ) while simultaneously processor core B 404 running thread B 2 issues a request R 1 (B) with priority PR(B 2 ) to the same (now contended) resource R 1
  • R 1 's arbiter ARB(R 1 ) compares PR(A 2 ) and PR(B 2 ) to help determine, with bias, which request ought to be granted and which denied.
  • PR(A 2 ) is less than PR(B 2 ) and so there is a greater likelihood that ARB(R 1 ) will grant access to request R 1 (B) that came from core B running thread B 2 .
  • thread A 506 has a priority of 1, which is stored in its corresponding register 518 ; thread B 508 has an assigned priority of 2 stored in its corresponding register 520 ; and, thread C 510 has an assigned priority of 4 stored in its corresponding register 522 .
  • thread C 510 when a given access to shared resource 512 is contended, thread C 510 will enjoy, for example, more favorable arbiter outcomes (i.e. granted access) than thread A 506 . Over a series of 70 contended accesses, for example, there might be 40 grants to thread C 510 versus 20 grants to thread B 508 versus 10 grants to thread A 506 . This is just one example. A difference in priority register values may be weighted differently by different arbiters over different shared resources.
  • the arbiter 503 can decide access to the shared resources 512 not only among the threads on processor core A 502 but also the threads on processor core B 504 , or both. For instance, just as in the example given above with respect to processor core A 502 , the access to the shared resources 512 can be determined for thread D 512 with an assigned priority of 3, thread E 514 with an assigned priority of 7, and thread F 516 with an assigned priority of 11. Additionally, the arbiter 503 can decide among all six threads 506 , 508 , 510 , 512 , 514 , and 516 as to the priority of each with respect to the other.
  • a priority register may be a vector of priority registers, with different values for different resource categories.
  • FIG. 6 software is used to bias priority registers 600 .
  • the system illustrated in FIG. 6 is similar to the system illustrated in FIG. 4A , where FIG. 6 shows a processor core A 606 , a corresponding priority register A 606 , an arbiter 603 determining access to shared resources 608 , and a processor core. B 604 competing with processor core A 602 for the shared resources 608 , where processor core B 604 has its corresponding priority register B 610 .
  • An operating system 612 sets priority register A 606 to a certain value and priority register B 610 to a certain value.
  • This value is some constant value or a relative value to other threads' priorities. As just described, these values will subsequently help bias outcomes amongst contended shared hardware resources as the threads issue work against these resources.
  • the operating system 612 determines each scheduled thread's priority register value as a function of operating system notions of process and thread priority, themselves determined to optimize interactive response times, throughput, power management, or other concerns.
  • user application software hosted by the operating system may issue an operating system call to adjust its thread(s) priorities. Additionally, in some other aspects, user application software may be permitted to read or modify its thread's priority register directly.
  • priority biasing by the operating system will typically not starve lower priority threads, but rather will bias higher priority threads to a disproportionate ratio of access to the shared resources. For example, given two threads, the higher priority thread on might get 60% of accesses to the contended resources and a lower priority thread might get the remaining 40% of access. But, the lower priority thread would not be starved to the point that it would not get any uses of the shared resources—although such a scenario is not precluded (depending upon design of priority register values and upon design of each shared resource arbiter).
  • FIG. 7 displays a simple implementation process of the invention 700 .
  • an operating system schedules, that is, determines, assignments of runnable software threads to cores or hardware thread contexts within cores. Then, at step 704 , the operating system determines what hardware priority register values to assign to each such thread.
  • the operating system performs context switching, which may suspend some of the currently scheduled threads. It then initializes each core or hardware thread context with that software thread's thread state, including its program counter, register file, and its hardware thread priority register.
  • each thread runs for an interval of time, i.e., a time slice, perhaps for a few milliseconds or a few million processor core cycles.
  • each processor core or each thread within each core may issue many thousands of resource requests to on-chip resources.
  • Each such request is automatically accompanied, i.e., tagged, with that thread's priority register value, which is a mechanism implemented in hardware.
  • step 710 when accesses to a shared resource happens to be contented, the priority tags of the agents that initialized the access are compared, and this may bias the arbitration outcome.
  • step 712 after some period of time, in response to some event (time slice expiration, interrupt, system call, or other event), the operating system may reprioritize its software thread priorities and may then update the hardware thread priority registers to reflect these new priorities. Alternatively, after some period of time, the operating system may elect to reschedule its runnable software threads, assigning a different set of software threads to hardware cores or thread contexts. In the end, this process returns to step 702 and begins again.

Abstract

A priority register is provided for each of a multiple processor cores of a chip multiprocessor, where the priority register stores values that are used to bias resources available to the multiple processor cores. Even though such multiple processor cores have their own local resources, they must compete for shared resources. These shared resources may be stored on the chip or off the chip. The priority register biases the arbitration process that arbitrates access to or ongoing use of the shared resources based on the values stored in the priority registers. The way it accomplishes such biasing is by tagging operations issued from the multiple processor cores with the priority values, and then comparing the values within each arbiter of the shared resources.

Description

    COPYRIGHT NOTICE AND PERMISSION
  • A portion of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright© 2004, Microsoft Corp.
  • FIELD OF THE INVENTION
  • The present invention relates generally to multicore processors. More particularly, the present invention relates to priority registers for biasing allocation of or access to shared resources, and still more particularly, to registers that store priority values which bias an arbiter performing arbitration among a plurality of processor cores competing for the shared resources.
  • BACKGROUND
  • Over the last several decades, transistors have been getting smaller and faster. For example, today a typical consumer microprocessor can contain over 100 million transistors subsisting on a die no bigger than a hundred square millimeters. At the same time, it can handle clock speeds in the range of 3 GHz.
  • With this increasing miniaturization, heat and power budgets are becoming more crucial. The peak power consumption of a microprocessor has soared to well over 100 watts in recent times as chipmakers have increased clock speeds, and the thermal density in excess of 100 W/cm2 is approaching practical limits.
  • Another concern in current microprocessors is the relatively slow connection between the processor and main memory. A typical processor runs several hundred times faster than information can be fetched from memory, so that a processor waits an eternity, relatively speaking, for data to arrive from memory.
  • One way to resolve this problem is to employ on-chip memory caches and instruction-level parallelism to keep the processors busy on one set of instructions while other instructions are waiting for data to arrive. However, even this instruction-level parallelism is approaching its limits, because an exponential growth in transistors and power is required to achieve a modest improvement in instruction-level parallelism.
  • One solution to these problems is to exploit parallelism by dividing a processing chip into multiple cores. For example, a hypothetical notebook processor might have eight cores, where a program customized for such a chip could present many threads of execution, each running simultaneously on a different core.
  • In such a multicore system, each core will have its own local resources, such as register files, branch predictors, and local caches, and it will also share resources with other cores, such as on-die L3 caches, memory channels, and possibly shared functional units. Such shared resources may need to be arbitrated not only in a fair and neutral way, as in the case of balanced parallel software codes, but also in a biased manner, as when some cores are running main user computation threads while other cores are running lower priority or “housekeeping” threads. It would be advantageous to provide a mechanism for software to manage, influence, or bias arbitration of such shared resources among a plurality of cores running threads of differing performance requirements.
  • SUMMARY OF THE INVENTION
  • In a multi-core system, processor cores located on a die typically have their own local computing resources, but they may also share resources with other processor cores on the die. In one aspect of the invention, a first priority register is provided, where the first priority register corresponds to a first processor core on the die. The die may contain a second processor core which has its own corresponding second priority register. Values are stored in both the first priority register and the second priority register. These values are then used to bias the access of the two processor cores to shared resources, so that the first processor core accesses shared resources at the expense of the second processor core, or vice versa.
  • In another aspect of the invention, the priority registers bias an arbiter allocating access to shared resources. Specifically, the priority registers tag shared resource access signals emanating from processor cores. Threads running on the processor cores with higher tagged values receive proportionately more access to shared resources than threads with lower tagged values. For example, one thread could receive 75% of the shared resources while a lower priority thread would receive the remaining 25% of shared resources. Values stored in the priority registers can be constant or can be relative to other priority registers.
  • In another aspect of the invention, the operating system sets the values in the priority registers, which can be privileged read and write registers. The values can be updated based on a variety of events, such as context switches. Thus, changing software conditions (such as process or thread reprioritization and rescheduling) can directly influence hardware resource arbitration decisions, which in turn ensures higher perceived system performance, because higher priority tasks receive preferential access to shared resources.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing summary, as well as the following detailed description of the invention, is better understood when read in conjunction with the appended drawings. In order to illustrate the invention, exemplary embodiments are shown depicting various aspects of the invention. However, the invention is not limited to the specific systems and methods disclosed. The following figures are included:
  • FIG. 1A provides a schematic diagram of an exemplary networked or distributed computing environment;
  • FIG. 1B provides a brief general description of a suitable computing device in connection with which the invention may be implemented;
  • FIG. 2 illustrates competition among processor cores for shared resources, where the shared resources may be located on the die or off the die;
  • FIG. 3 illustrates shared resource arbitration among processor cores with priority registers, where the priority registers bias an arbiter;
  • FIG. 4A illustrates a high level view of a multi-core system, where an arbiter provides access to shared resources;
  • FIG. 4B illustrates a detailed view of a multi-core system, where individual thread contexts are assigned priority registers;
  • FIG. 5 illustrates typical values assigned to priority registers, and how these values determine shared resource allocation;
  • FIG. 6 illustrates the setting of values in priority registers using software, specifically, the operating system, in order to bias shared resource allocation; and
  • FIG. 7 illustrates an exemplary implementation of the shared resource allocation process.
  • DETAILED DESCRIPTION OF ILLUSTRATIVE ASPECTS OF THE INVENTION
  • Certain specific details are set forth in the following description and figures to provide a thorough understanding of various aspects of the invention. Certain well-known details often associated with computing and software technology are not set forth in the following disclosure, however, to avoid unnecessarily obscuring the various aspects of the invention. Further, those of ordinary skill in the relevant art will understand that they can practice other aspects of the invention without one or more of the details described below. Finally, while various methods are described with reference to steps and sequences in the following disclosure, the description as such is for providing a clear implementation of aspects of the invention, and the steps and sequences of steps should not be taken as required to practice this invention.
  • The following detailed description will generally reflect the summary of the invention, as set forth above, further explaining and expanding the definitions of the various aspects of the invention as necessary. The first part of the description is dedicated to exploring the exemplary computing and network environments. To this end, a networked computing environment is set forth as illustrated in FIG. 1A. This networked computing environment is an extension of the basic computing environment illustrated in FIG. 1B, which is suitable to implement the software and/or hardware techniques associated with the invention.
  • The second part of the description is dedicated to exploring priority registers for biasing shared resource arbitration. To this end, FIGS. 2 through 7 illustrate various aspects of the invention. FIG. 2 illustrates processor cores competing for shared resources, whether they are local shared resources or off-die shared resources. FIG. 3 focuses on the arbitration process among the various processors. Such an arbitration process is biased by the values stored in the priority registers. FIGS. 4A and 4B present a high-level and a detailed view, respectively, of certain aspects of the invention. FIG. 4A illustrates how an arbiter arbitrates between two processor cores access to shared resources. FIG. 4B gives a more detailed illustration of a multi-threaded system, with multiple thread contexts, where each thread is tagged by some priority register value. FIG. 5 demonstrates how priority register values bias the arbitration process performed by an arbiter. Thus, specific values are given and the priority relationship between various threads is determined. In one interesting aspect of the invention, FIG. 6 illustrates an operating system that can set the priority register values based on context switches or on apparent need. Finally, FIG. 7 illustrates an exemplary implementation of the resource allocation process, where the operating system sets the priority register values and the arbiter arbitrates the available shared resources.
  • Exemplary Computing Environment
  • FIG. 1A provides a schematic diagram of an exemplary networked or distributed computing environment 100 a. The distributed computing environment 100 a comprises computing objects 10 a, 10 b, etc. and computing objects or devices 110 a, 110 b, 110 c, etc. These objects may comprise programs, methods, data stores, programmable logic, etc. The objects may comprise portions of the same or different devices such as PDAs, televisions, MP3 players, personal computers, etc. Each object can communicate with another object by way of the communications network 14. This network may itself comprise other computing objects and computing devices that provide services to the system of FIG. 1A, and may itself represent multiple interconnected networks. In accordance with an aspect of the invention, each object 10 a, 10 b, etc. or 110 a, 110 b, 110 c, etc. may contain an application that might make use of an API, or other object, software, firmware and/or hardware, to request use of the processes used to implement the object persistence methods of the present invention.
  • It can also be appreciated that an object, such as 110 c, may be hosted on another computing device 10 a, 10 b, etc. or 110 a, 110 b, etc. Thus, although the physical environment depicted may show the connected devices as computers, such illustration is merely exemplary and the physical environment may alternatively be depicted or described comprising various digital devices such as PDAs, televisions, MP3 players, etc., software objects such as interfaces, COM objects and the like.
  • There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems may be connected together by wired or wireless systems, by local networks or widely distributed networks. Currently, many of the networks are coupled to the Internet, which provides the infrastructure for widely distributed computing and encompasses many different networks. Any of the infrastructures may be used for exemplary communications made incident to the present invention.
  • The Internet commonly refers to the collection of networks and gateways that utilize the TCP/IP suite of protocols, which are well-known in the art of computer networking. TCP/IP is an acronym for “Transmission Control Protocol/Internet Protocol.” The Internet can be described as a system of geographically distributed remote computer networks interconnected by computers executing networking protocols that allow users to interact and share information over the network(s). Because of such wide-spread information sharing, remote networks such as the Internet have thus far generally evolved into an open system for which developers can design software applications for performing specialized operations or services, essentially without restriction.
  • Thus, the network infrastructure enables a host of network topologies such as client/server, peer-to-peer, or hybrid architectures. The “client” is a member of a class or group that uses the services of another class or group to which it is not related. Thus, in computing, a client is a process, i.e., roughly a set of instructions or tasks, that requests a service provided by another program. The client process utilizes the requested service without having to “know” any working details about the other program or the service itself. In a client/server architecture, particularly a networked system, a client is usually a computer that accesses shared network resources provided by another computer, e.g., a server. In the example of FIG. 1A, computers 110 a, 110 b, etc. can be thought of as clients and computer 10 a, 10 b, etc. can be thought of as servers, although any computer could be considered a client, a server, or both, depending on the circumstances. Any of these computing devices may be processing data in a manner that implicates the object persistence techniques of the invention.
  • A server is typically a remote computer system accessible over a remote or local network, such as the Internet. The client process may be active in a first computer system, and the server process may be active in a second computer system, communicating with one another over a communications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server. Any software objects utilized pursuant to the persistence mechanism of the invention may be distributed across multiple computing devices.
  • Client(s) and server(s) may communicate with one another utilizing the functionality provided by a protocol layer. For example, HyperText Transfer Protocol (HTTP) is a common protocol that is used in conjunction with the World Wide Web (WWW), or “the Web.” Typically, a computer network address such as an Internet Protocol (IP) address or other reference such as a Universal Resource Locator (URL) can be used to identify the server or client computers to each other. The network address can be referred to as a URL address. Communication can be provided over any available communications medium.
  • Thus, FIG. 1A illustrates an exemplary networked or distributed environment 100 a, with a server in communication with client computers via a network/bus, in which the present invention may be employed. The network/bus 14 may be a LAN, WAN, intranet, the Internet, or some other network medium, with a number of client or remote computing devices 110 a, 110 b, 110 c, 110 d, 110 e, etc., such as a portable computer, handheld computer, thin client, networked appliance, or other device, such as a VCR, TV, oven, light, heater and the like in accordance with the present invention. It is thus contemplated that the present invention may apply to any computing device in connection with which it is desirable to maintain a persisted object.
  • In a network environment 100 a in which the communications network/bus 14 is the Internet, for example, the servers 10 a, 10 b, etc. can be servers with which the clients 110 a, 110 b, 110 c, 110 d, 110 e, etc. communicate via any of a number of known protocols such as HTTP. Servers 10 a, 10 b, etc. may also serve as clients 110 a, 110 b, 110 c, 110 d, 110 e, etc., as may be characteristic of a distributed computing environment 100 a.
  • Communications may be wired or wireless, where appropriate. Client devices 110 a, 110 b, 110 c, 110 d, 110 e, etc. may or may not communicate via communications network/bus 14, and may have independent communications associated therewith. For example, in the case of a TV or VCR, there may or may not be a networked aspect to the control thereof. Each client computer 110 a, 110 b, 110 c, 110 d, 110 e, etc. and server computer 10 a, 10 b, etc. may be equipped with various application program modules or objects 135 and with connections or access to various types of storage elements or objects, across which files or data streams may be stored or to which portion(s) of files or data streams may be downloaded, transmitted or migrated. Any computer 10 a, 10 b, 110 a, 110 b, etc. may be responsible for the maintenance and updating of a database, memory, or other storage element 20 for storing data processed according to the invention. Thus, the present invention can be utilized in a computer network environment 100 a having client computers 110 a, 110 b, etc. that can access and interact with a computer network/bus 14 and server computers 10 a, 10 b, etc. that may interact with client computers 110 a, 110 b, etc. and other like devices, and databases 20.
  • FIG. 1B and the following discussion are intended to provide a brief general description of a suitable computing device in connection with which the invention may be implemented. For example, any of the client and server computers or devices illustrated in FIG. 1B may take this form. It should be understood, however, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the present invention, i.e., anywhere from which data may be generated, processed, received and/or transmitted in a computing environment. While a general purpose computer is described below, this is but one example, and the present invention may be implemented with a thin client having network/bus interoperability and interaction. Thus, the present invention may be implemented in an environment of networked hosted services in which very little or minimal client resources are implicated, e.g., a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance. In essence, anywhere that data may be stored or from which data may be retrieved or transmitted to another computer is a desirable, or suitable, environment for operation of the object persistence methods of the invention.
  • Although not required, the invention can be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application or server software that operates in accordance with the invention. Software may be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices. Generally, program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments. Moreover, the invention may be practiced with other computer system configurations and protocols. Other well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers (PCs), automated teller machines, server computers, hand-held or laptop devices, multi-processor systems, microprocessor-based systems, programmable consumer electronics, network PCs, appliances, lights, environmental control elements, minicomputers, mainframe computers and the like.
  • FIG. 1B thus illustrates an example of a suitable computing system environment 100 b in which the invention may be implemented, although as made clear above, the computing system environment 100 b is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 b be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100 b.
  • With reference to FIG. 1B, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110. Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus (also known as Mezzanine bus).
  • Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110. Communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
  • The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, FIG. 1B illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
  • The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 8 illustrates a hard disk drive 141 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156, such as a CD-RW, DVD-RW or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM and the like. The hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
  • The drives and their associated computer storage media discussed above and illustrated in FIG. 1B provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In FIG. 1B, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146 and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136 and program data 137. Operating system 144, application programs 145, other program modules 146 and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 110 through input devices such as a keyboard 162 and pointing device 161, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus 121, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A graphics interface 182 may also be connected to the system bus 121. One or more graphics processing units (GPUs) 184 may communicate with graphics interface 182. A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190, which may in turn communicate with video memory 186. In addition to monitor 191, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195.
  • The computer 110 may operate in a networked or distributed environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in FIG. 1B. The logical connections depicted in FIG. 1B include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks/buses. Such networking environments are commonplace in homes, offices, enterprise-wide computer networks, intranets and the Internet.
  • When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 1B illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • Aspects of The Priority Register
  • The priority register seeks to bias arbitration for shared resources in favor of one processor core at the expense of another processor core. Specifically, in a multi-core architecture, each core's priority register value is applied, or “tagged,” to the shared resource access signals emanating from that core. In the event multiple cores contend for a shared resource, the shared resource's arbiter may compare these priority tag values to determine which processor core should receive access to the shared resource. The operating system can set a per core priority value in the register based on context switches or whenever an adjustment is proper. The notion is not to starve lower priority requests but rather to bias higher priority requests in such a way that they receive, say, 75% of the available resources, and the lower requests receive 25%.
  • FIG. 2 illustrates one aspect of the invention, where processor cores are located on a common die and have to compete for shared resources 200. In FIG. 2, a first processor core A 202 and a second processor core B 204 constitute a dual-core processor design stored on a die 201. Numerous processor cores can be stored on the die 201, but for simplicity only two are shown. Likewise, this aspect of the invention also applies to a multiple dice architecture, where a plurality of processor cores are stored on each die, and multiple dice are operatively connected.
  • In FIG. 2, each processor core has its own local resources and also shares non-local resources. Specifically, processor core A 202 has its local resources A 206 but also shares local shared resources 210. Similarly, processor core B 204 has its own local resources B 208 but shares with processor core A 202 the shared resources 210. Moreover, resources are not limited to on-die (on-chip) resources, namely local resources A 206, local resources B 208, and local shared resources 210, but also include off-die (off-chip) shared resources 212. Typical local resources, such as resources A 206 and B 208, include register files, translation lookaside buffers (TLBs), branch predictors, local instruction caches (I-caches), data caches (D-caches), and the like. Typical on-die shared resources, such as local shared resources 210, include shared on-die L3 caches, the individual cache lines they manage, external memory controllers (also known as external memory channels), shared functional units, and so on. Typical off-die shared resources include external caches and network interface controllers.
  • While the local resources are dedicated to individual processor cores, the shared resources are not so dedicated, and thus multiple processor cores can compete for such shared resources. In FIG. 2, processor core A 202 can potentially compete with processor core B 204 for the shared on-chip shared resources 210 and off-die shared resources 212. Such competition must be arbitrated or scheduled if multiple processor core are to work in the most productive and efficient manner.
  • In existing computer systems, resource contention among competing processor cores is typically arbitrated in a fair and neutral way, which in not inappropriate for balanced parallel software code. However, there are cases where it is desired that the arbitration be biased in such a way that one processor core running user computations at a higher priority be afforded more resources than one running computations at a lower priority. Referring to FIG. 2, it may be more desirable in some cases that processor core A 202 receive more of the shared resources 210 than processor core B 204, since processor core A 202 is running higher priority computations. In short, it is desirable to bias the arbitration of a myriad of resources in favor of one processor core at the expense of another processor core.
  • Biasing the arbitration of resources can be accomplished through priority registers. FIG. 3 illustrates, in one aspect of the invention, how processor core resource arbitration on a die 301 is implemented 300. An arbiter 303 takes input from at least two priority registers, in this case, priority register A 306 and priority register B 308, respectively. Individual register inputs represent some values, either constant or relative values, that let the arbiter 303 know which processor core has higher priority. Each register is associated to a processor core. For example, register A 306 is associated to processor core A 302, and register B is associated to processor core B 304. There is one priority register dedicated to each processor core in the illustrated aspect of the invention. However, the registers do not have to be so limited, as for example, multiple registers could be assigned to each process running multiple threads.
  • The arbiter 303 decides which processor core will have access to the local shared resources 310 or to the off-die shared resources 312. In one aspect of the invention, the arbiter 303 allocates resources based on the register inputs, namely, register A 306 and register B 308 values. Thus, if register A 306 has a higher priority value than register B 308, processor core A 302 will have priority over processor core B 304 to access the local shared resources 310 or the off-die shared resources 312. If, on the other hand, register B 308 has higher priority than register A 306, then the reverse is true. This is just one exemplary way to distribute resources, and equivalent devices to an arbitrator can be used.
  • Additionally, in another aspect of the invention, each resource has its own arbiter. Thus, a local shared resource has an arbiter and an off-die resource has an arbiter. Each arbiter can sit on top of its respective resource and evaluate priority values that are tagged to resource access signals emanating from processor cores sharing a resource. Thus, in FIG. 3, even though for simplicity's sake only one arbiter 303 is shown, namely, the arbiter 303 for the local shared on-die resources 310, the off-die resources 312 can have their own arbiter (not shown). More specifically, each resource can have its own arbiter. Thus, the illustrated off-die resources 312 are an abstraction and in fact may comprise of any number of individual off-die resources, each of which can have its own arbiter.
  • As mentioned above, existing computer systems disburse resources among competing processors in a fair and neutral way. In contrast to the fair and neutral scheme, the arbiter 303 with priority register A 306 and priority register B 308 affords biased resource arbitration. This means that the arbitration process is no longer necessarily fair and neutral but rather favors one processor core over another processor core. As mentioned above, this biasing is accomplished through the priority registers, where the registers could be privileged read and write registers that can be loaded with priorities.
  • FIG. 4A illustrates a high-level view of an aspect of the present invention 400 a, and it should be compared to FIG. 4B, as it is described below. There is a priority register A 406 within processor core A 402 and a priority register B 408 within processor core B 404. The priority registers 406 and 408 tag resource request signals emanating from the corresponding processor cores A 402 and B 404, respectively, with certain priority values. Based on these values, the arbiter 410 decides the amount of access processor core A 402 will have and the amount of access that processor core B 404 will have to the available shared resources 412. The amount of access can be biased such that, for example, processor core A 402 will have access 75% of the time and processor B 404 will have access 25% of the time. However, based on any one given outcome of resource arbitration, it might be the case that processor core A 402 will 100% of access to the shared resources 412 or that processor core B 404 will have 100% of access to the shared resources 412. But over numerous such contended-access arbitrations, processor core A 402 will have had 75% of the shared resources 412 and processor core B 404 will have had 25% of the shared resources 412. Moreover, it should be noted that such priority values in the registers are not fixed and can be changed by the operating system according to need.
  • By way of background, an operating system may host a number of processes. Each process may consist of several (software) threads, that typically share the processes' resources including address space, but which have distinct copies of a processor's “architected state”, e.g. program counter and register file, as well some memory designated for a subroutine call stack, and other software “per-thread state.” One responsibility of the operating system is to schedule these processes' software threads—any given thread may be assigned to a given processor core. In some computer architectures, a given core may contain a plurality of thread contexts. Each thread context may store one thread's architected state (as described above). There may or may not be a correlation between the software threads belonging to one operating system-managed software process and the processor cores or thread contexts within processor cores: a given process's threads may be scheduled to run on several processor cores, or on several thread contexts (i.e. hardware threads) within a single processor core in a chip multiprocessor.
  • What FIG. 4B illustrates, especially in contrast to FIG. 4A, is processor cores with a plurality of hardware thread contexts 400 b. Typically, a processor core hosts one software thread at a time. But, a multithreaded core may host a multitude of software threads via its multitude of hardware thread contexts. Thus, processor core A 402 has three thread contexts, namely, thread context A 403, thread context B 405, and thread context C 407. Each of these thread contexts has its corresponding priority register, register A 415, register B 417, and register C 419, respectively. Resource access requests that result from computations on each thread context are tagged with a software thread's priority register value.
  • For example, when two threads A1 and A2 (not shown in FIG. 4B) are scheduled to two thread contexts, for example, thread context A 403 and thread context B 405, in processor core A 402, and two threads B1 and B2 are scheduled to two thread contexts, for example, thread context D 409 and thread context E 411, in processor core B 404, their priority registers may be set as follows: {A1: 10; A2: 20; B1: 15; B2: 25}. When processor core A 402 running thread A2 issues a request R1(A) with priority PR(A2) while simultaneously processor core B 404 running thread B2 issues a request R1(B) with priority PR(B2) to the same (now contended) resource R1, R1's arbiter ARB(R1) compares PR(A2) and PR(B2) to help determine, with bias, which request ought to be granted and which denied. Here PR(A2) is less than PR(B2) and so there is a greater likelihood that ARB(R1) will grant access to request R1(B) that came from core B running thread B2.
  • Priority to shared resources based on priority register values is explicitly illustrated in FIG. 5 to supplement FIG. 4B discussed directly above. In processor core A 502, thread A 506 has a priority of 1, which is stored in its corresponding register 518; thread B 508 has an assigned priority of 2 stored in its corresponding register 520; and, thread C 510 has an assigned priority of 4 stored in its corresponding register 522.
  • Thus, when a given access to shared resource 512 is contended, thread C 510 will enjoy, for example, more favorable arbiter outcomes (i.e. granted access) than thread A 506. Over a series of 70 contended accesses, for example, there might be 40 grants to thread C 510 versus 20 grants to thread B 508 versus 10 grants to thread A 506. This is just one example. A difference in priority register values may be weighted differently by different arbiters over different shared resources. Given two priority value tags PR1 and PR2, one arbiter might choose to always grant the request to PR1 whenever PR1>PR2; another may use a more sophisticated approach which takes into account PR1, PR2, and history of past arbitration decisions to provide enough fairness that PR2 is not starved for access indefinitely. The values of 1, 2, and 4 stored in priority registers 518, 520, and 522, respectively, are just illustrative of the kinds of values that may be used, and are not limited to being multiples of each other or to any absolute values. Rather, the values may reflect any kind of relationship of access times of the threads that is desired.
  • Moreover, the arbiter 503 can decide access to the shared resources 512 not only among the threads on processor core A 502 but also the threads on processor core B 504, or both. For instance, just as in the example given above with respect to processor core A 502, the access to the shared resources 512 can be determined for thread D 512 with an assigned priority of 3, thread E 514 with an assigned priority of 7, and thread F 516 with an assigned priority of 11. Additionally, the arbiter 503 can decide among all six threads 506, 508, 510, 512, 514, and 516 as to the priority of each with respect to the other. Thus, for example, thread C 510 with a stored value of 4 will receive less access to the shared resources 512 than thread F 516 with a stored value of 11. Although discrete atomic values are considered here, in full generality, a priority register may be a vector of priority registers, with different values for different resource categories.
  • Next, in another aspect of the invention, in FIG. 6, software is used to bias priority registers 600. The system illustrated in FIG. 6 is similar to the system illustrated in FIG. 4A, where FIG. 6 shows a processor core A 606, a corresponding priority register A 606, an arbiter 603 determining access to shared resources 608, and a processor core. B 604 competing with processor core A 602 for the shared resources 608, where processor core B 604 has its corresponding priority register B 610.
  • An operating system 612 sets priority register A 606 to a certain value and priority register B 610 to a certain value. This value, as mentioned with reference to FIG. 5, is some constant value or a relative value to other threads' priorities. As just described, these values will subsequently help bias outcomes amongst contended shared hardware resources as the threads issue work against these resources. The operating system 612 determines each scheduled thread's priority register value as a function of operating system notions of process and thread priority, themselves determined to optimize interactive response times, throughput, power management, or other concerns.
  • Additionally, in some aspects of the invention, user application software hosted by the operating system may issue an operating system call to adjust its thread(s) priorities. Additionally, in some other aspects, user application software may be permitted to read or modify its thread's priority register directly.
  • To reiterate again, priority biasing by the operating system will typically not starve lower priority threads, but rather will bias higher priority threads to a disproportionate ratio of access to the shared resources. For example, given two threads, the higher priority thread on might get 60% of accesses to the contended resources and a lower priority thread might get the remaining 40% of access. But, the lower priority thread would not be starved to the point that it would not get any uses of the shared resources—although such a scenario is not precluded (depending upon design of priority register values and upon design of each shared resource arbiter).
  • FIG. 7 displays a simple implementation process of the invention 700. At step 702, an operating system schedules, that is, determines, assignments of runnable software threads to cores or hardware thread contexts within cores. Then, at step 704, the operating system determines what hardware priority register values to assign to each such thread.
  • At step 706, the operating system performs context switching, which may suspend some of the currently scheduled threads. It then initializes each core or hardware thread context with that software thread's thread state, including its program counter, register file, and its hardware thread priority register.
  • At step 708, each thread runs for an interval of time, i.e., a time slice, perhaps for a few milliseconds or a few million processor core cycles. During this time, each processor core or each thread within each core may issue many thousands of resource requests to on-chip resources. Each such request is automatically accompanied, i.e., tagged, with that thread's priority register value, which is a mechanism implemented in hardware.
  • At step 710, when accesses to a shared resource happens to be contented, the priority tags of the agents that initialized the access are compared, and this may bias the arbitration outcome. At step 712, after some period of time, in response to some event (time slice expiration, interrupt, system call, or other event), the operating system may reprioritize its software thread priorities and may then update the hardware thread priority registers to reflect these new priorities. Alternatively, after some period of time, the operating system may elect to reschedule its runnable software threads, assigning a different set of software threads to hardware cores or thread contexts. In the end, this process returns to step 702 and begins again.
  • While the present invention has been described in connection with the preferred aspects, as illustrated in the various figures, it is understood that other similar aspects may be used or modifications and additions may be made to the described aspects for performing the same function of the present invention without deviating therefrom. For example, a priority register storing priority values was described, where these values subsequently tag resource access signals emanating from the processor cores to bias shared resource arbitration. However, other equivalent devices to the priority register are also contemplated by the teachings herein. Therefore, the present invention should not be limited to any single aspect, but rather construed in breadth and scope in accordance with the appended claims.

Claims (20)

1. A multi-core computer system, comprising:
a first processor core on a die and a second processor core on the die;
a first priority register associated with the first processor core and a second priority register associated with the second processor core; and
an arbiter circuit operatively coupled to the first processor core and the second processor core, wherein the arbiter circuit allocates access to a shared resource available to the first processor core and the second processor core based on a first value stored in the first priory register and a second value stored in the second priority register.
2. The computer system according to claim 1, wherein the first processor core on the die is operatively coupled to a third processor core on a different die.
3. The computer system according to claim 1, wherein the shared resource is located on the die.
4. The computer system according to claim 3, wherein the shared resource includes at least one of L3 caches, external memory controllers, and shared functional units.
5. The computer system according to claim 1, wherein the shared resource is located off the die.
6. The computer system according to claim 5, wherein the shared resource includes at least one of external caches and network interface controllers.
7. The computer system according to claim 1, wherein at least one of the first priority register and the second priority register is a privileged read and write register.
8. The computer system according to claim 1, further comprising a computer operating system, wherein at least one of the first priority value and the second priority value is set by the computer operating system.
9. The computer system according to claim 8, wherein the operating system sets the first priority value and the second priority value based on a context switch or an apparent need.
10. The computer system according to claim 1, wherein at least the first processor core has a first thread context and a second thread context, wherein the first priority register is associated with the first thread context, and wherein a third priority register is associated with the second thread context.
11. A method of biasing shared resource arbitration, comprising:
setting a priority value to a first register, wherein the priority value is tagged to an operation of a first processor residing on a die;
setting a priority value to a second register, wherein the priority value of the second register is tagged to an operation of a second processor residing on the die; and
biasing arbitration to a resource shared by the first processor and the second processor, wherein the biasing is based on the comparison of the priority value of the first register to the priority value of the second register.
12. The method of biasing shared resource arbitration according to claim 11, further comprising setting at least one of the priority value of the first register and the priority value of the second register with a new value.
13. The method of biasing shared resource arbitration according to claim 11, wherein the arbitration is performed by an arbiter.
14. The method of biasing shared resource arbitration according to claim 11, wherein the setting of the first register priority value and the setting of the second register priority value is performed by an operating system.
15. The method of biasing shared resource arbitration according to claim 14, wherein the operating system sets the first register priority value and the second register priority value based on a context switch or an apparent need.
16. The method of biasing shared resource arbitration according to claim 11, wherein the shared resources are accessible on the die.
17. The method of biasing shared resource arbitration according to claim 16, wherein the shared resources accessible on the die include L3 caches, external memory controllers, and shared function units.
18. The method of biasing shared resource arbitration according to claim 11, wherein the shared resources are accessible off the die.
19. The method of biasing shared resource arbitration according to claim 18, wherein the shared resources accessible off the die include at least one of external caches and network interface controllers.
20. The method of biasing shared resource arbitration according to claim 11, wherein at least the first processor core has a first thread context and a second thread context, wherein the first priority register is associated with the first thread context, and wherein a third priority register is associated with the second thread context.
US11/051,148 2005-02-04 2005-02-04 Priority registers for biasing access to shared resources Expired - Fee Related US7380038B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/051,148 US7380038B2 (en) 2005-02-04 2005-02-04 Priority registers for biasing access to shared resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/051,148 US7380038B2 (en) 2005-02-04 2005-02-04 Priority registers for biasing access to shared resources

Publications (2)

Publication Number Publication Date
US20060179196A1 true US20060179196A1 (en) 2006-08-10
US7380038B2 US7380038B2 (en) 2008-05-27

Family

ID=36781198

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/051,148 Expired - Fee Related US7380038B2 (en) 2005-02-04 2005-02-04 Priority registers for biasing access to shared resources

Country Status (1)

Country Link
US (1) US7380038B2 (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155903A1 (en) * 2005-01-13 2006-07-13 Matsushita Electric Industrial Co., Ltd. Resource management device
US20070067531A1 (en) * 2005-08-22 2007-03-22 Pasi Kolinummi Multi-master interconnect arbitration with time division priority circulation and programmable bandwidth/latency allocation
US20080022283A1 (en) * 2006-07-19 2008-01-24 International Business Machines Corporation Quality of service scheduling for simultaneous multi-threaded processors
US20080071947A1 (en) * 2006-09-14 2008-03-20 Fischer Matthew L Method of balancing I/O device interrupt service loading in a computer system
WO2008043295A1 (en) * 2006-09-26 2008-04-17 Hangzhou H3C Technologies Co., Ltd. Method and device for increasing speed of accessing critical resource by multi-core system
US20090006771A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Digital data management using shared memory pool
US20090100220A1 (en) * 2007-10-15 2009-04-16 Elpida Memory, Inc. Memory system, control method thereof and computer system
US20090138670A1 (en) * 2007-11-27 2009-05-28 Microsoft Corporation software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US20090157970A1 (en) * 2007-12-13 2009-06-18 International Business Machines Corporation Method and system for intelligent and dynamic cache replacement management based on efficient use of cache for individual processor core
US20090217280A1 (en) * 2008-02-21 2009-08-27 Honeywell International Inc. Shared-Resource Time Partitioning in a Multi-Core System
US20090249352A1 (en) * 2008-03-25 2009-10-01 Hohensee Paul H Resource Utilization Monitor
US20100313203A1 (en) * 2009-06-04 2010-12-09 International Business Machines Corporation System and method to control heat dissipitation through service level analysis
WO2012145416A1 (en) * 2011-04-20 2012-10-26 Marvell World Trade Ltd. Variable length arbitration
US20130132708A1 (en) * 2010-07-27 2013-05-23 Fujitsu Limited Multi-core processor system, computer product, and control method
US20130174173A1 (en) * 2009-08-11 2013-07-04 Clarion Co., Ltd. Data processor and data processing method
US20130318280A1 (en) * 2012-05-22 2013-11-28 Xockets IP, LLC Offloading of computation for rack level servers and corresponding methods and systems
US20140052961A1 (en) * 2011-02-17 2014-02-20 Martin Vorbach Parallel memory systems
US8732368B1 (en) 2005-02-17 2014-05-20 Hewlett-Packard Development Company, L.P. Control system for resource selection between or among conjoined-cores
US8935699B1 (en) * 2011-10-28 2015-01-13 Amazon Technologies, Inc. CPU sharing techniques
US20150074378A1 (en) * 2013-09-06 2015-03-12 Futurewei Technologies, Inc. System and Method for an Asynchronous Processor with Heterogeneous Processors
US9003168B1 (en) * 2005-02-17 2015-04-07 Hewlett-Packard Development Company, L. P. Control system for resource selection between or among conjoined-cores
US9104485B1 (en) 2011-10-28 2015-08-11 Amazon Technologies, Inc. CPU sharing techniques
US20170046202A1 (en) * 2014-04-30 2017-02-16 Huawei Technologies Co.,Ltd. Computer, control device, and data processing method
US9934195B2 (en) 2011-12-21 2018-04-03 Mediatek Sweden Ab Shared resource digital signal processors
CN109542625A (en) * 2018-11-29 2019-03-29 郑州云海信息技术有限公司 A kind of storage resource control method, device and electronic equipment
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US10908955B2 (en) 2018-03-22 2021-02-02 Honeywell International Inc. Systems and methods for variable rate limiting of shared resource access
US11210104B1 (en) * 2020-09-11 2021-12-28 Apple Inc. Coprocessor context priority
US11341071B1 (en) * 2021-04-20 2022-05-24 Dell Products L.P. Arbitrating serial bus access
US20220206862A1 (en) * 2020-12-25 2022-06-30 Intel Corporation Autonomous and extensible resource control based on software priority hint
US11507420B2 (en) 2015-06-11 2022-11-22 Honeywell International Inc. Systems and methods for scheduling tasks using sliding time windows
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094664A1 (en) * 2005-10-21 2007-04-26 Kimming So Programmable priority for concurrent multi-threaded processors
US8024731B1 (en) * 2007-04-25 2011-09-20 Apple Inc. Assigning priorities to threads of execution
US8413153B2 (en) * 2009-06-12 2013-04-02 Freescale Semiconductor Inc. Methods and systems for sharing common job information
US8984198B2 (en) * 2009-07-21 2015-03-17 Microchip Technology Incorporated Data space arbiter
US20110055482A1 (en) * 2009-08-28 2011-03-03 Broadcom Corporation Shared cache reservation
US10698859B2 (en) 2009-09-18 2020-06-30 The Board Of Regents Of The University Of Texas System Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture
WO2011159309A1 (en) 2010-06-18 2011-12-22 The Board Of Regents Of The University Of Texas System Combined branch target and predicate prediction
US8949836B2 (en) * 2011-04-01 2015-02-03 International Business Machines Corporation Transferring architected state between cores
US8977795B1 (en) * 2011-10-27 2015-03-10 Marvell International Ltd. Method and apparatus for preventing multiple threads of a processor from accessing, in parallel, predetermined sections of source code
KR101810468B1 (en) * 2011-11-28 2017-12-19 엘지전자 주식회사 Mobile terminal and control method thereof
US9104478B2 (en) 2012-06-15 2015-08-11 Freescale Semiconductor, Inc. System and method for improved job processing of a number of jobs belonging to communication streams within a data processor
US9286118B2 (en) 2012-06-15 2016-03-15 Freescale Semiconductor, Inc. System and method for improved job processing to reduce contention for shared resources
US9141168B2 (en) 2012-08-17 2015-09-22 Hewlett-Packard Development Company L.P. Operation mode of processor
US9632977B2 (en) 2013-03-13 2017-04-25 Nxp Usa, Inc. System and method for ordering packet transfers in a data processor
US9792252B2 (en) 2013-05-31 2017-10-17 Microsoft Technology Licensing, Llc Incorporating a spatial array into one or more programmable processor cores
US9940136B2 (en) 2015-06-26 2018-04-10 Microsoft Technology Licensing, Llc Reuse of decoded instructions
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US9720693B2 (en) 2015-06-26 2017-08-01 Microsoft Technology Licensing, Llc Bulk allocation of instruction blocks to a processor instruction window
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
US11204871B2 (en) * 2015-06-30 2021-12-21 Advanced Micro Devices, Inc. System performance management using prioritized compute units
US9886081B2 (en) 2015-09-16 2018-02-06 Qualcomm Incorporated Managing power-down modes
US10776115B2 (en) 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US11126433B2 (en) 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US10180840B2 (en) 2015-09-19 2019-01-15 Microsoft Technology Licensing, Llc Dynamic generation of null instructions
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US10936316B2 (en) 2015-09-19 2021-03-02 Microsoft Technology Licensing, Llc Dense read encoding for dataflow ISA
US10095519B2 (en) 2015-09-19 2018-10-09 Microsoft Technology Licensing, Llc Instruction block address register
US10061584B2 (en) 2015-09-19 2018-08-28 Microsoft Technology Licensing, Llc Store nullification in the target field
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US10719321B2 (en) 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
US10678544B2 (en) 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US10871967B2 (en) 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US10452399B2 (en) 2015-09-19 2019-10-22 Microsoft Technology Licensing, Llc Broadcast channel architectures for block-based processors
US20170083327A1 (en) 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Implicit program order
US10198263B2 (en) 2015-09-19 2019-02-05 Microsoft Technology Licensing, Llc Write nullification
US10031756B2 (en) 2015-09-19 2018-07-24 Microsoft Technology Licensing, Llc Multi-nullification
US11106467B2 (en) 2016-04-28 2021-08-31 Microsoft Technology Licensing, Llc Incremental scheduler for out-of-order block ISA processors
US11531552B2 (en) 2017-02-06 2022-12-20 Microsoft Technology Licensing, Llc Executing multiple programs simultaneously on a processor core
US10963379B2 (en) 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths
US10824429B2 (en) 2018-09-19 2020-11-03 Microsoft Technology Licensing, Llc Commit logic and precise exceptions in explicit dataflow graph execution architectures

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339443A (en) * 1991-11-19 1994-08-16 Sun Microsystems, Inc. Arbitrating multiprocessor accesses to shared resources
US5870560A (en) * 1995-12-29 1999-02-09 Bull Hn Information Systems Italia S.P.A. Arbitraion unit with round-robin priority, particularly for multiprocessor systems with syncronous symmetrical processors
US5884051A (en) * 1997-06-13 1999-03-16 International Business Machines Corporation System, methods and computer program products for flexibly controlling bus access based on fixed and dynamic priorities
US6279066B1 (en) * 1997-11-14 2001-08-21 Agere Systems Guardian Corp. System for negotiating access to a shared resource by arbitration logic in a shared resource negotiator
US20020057711A1 (en) * 2000-11-15 2002-05-16 Nguyen Duy Q. External bus arbitration technique for multicore DSP device
US20020062427A1 (en) * 2000-08-21 2002-05-23 Gerard Chauvel Priority arbitration based on current task and MMU
US20020078119A1 (en) * 2000-12-04 2002-06-20 International Business Machines Corporation System and method for improved complex storage locks
US20020083251A1 (en) * 2000-08-21 2002-06-27 Gerard Chauvel Task based priority arbitration
US20020116438A1 (en) * 2000-12-22 2002-08-22 Steven Tu Method and apparatus for shared resource management in a multiprocessing system
US20030145144A1 (en) * 2002-01-30 2003-07-31 International Business Machines Corporation N-way pseudo cross-bar using discrete processor local busses
US20060037020A1 (en) * 2004-08-12 2006-02-16 International Business Machines Corporation Scheduling threads in a multiprocessor computer
US20060064695A1 (en) * 2004-09-23 2006-03-23 Burns David W Thread livelock unit
US7051133B2 (en) * 2002-11-25 2006-05-23 Renesas Technology Corp. Arbitration circuit and data processing system

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339443A (en) * 1991-11-19 1994-08-16 Sun Microsystems, Inc. Arbitrating multiprocessor accesses to shared resources
US5870560A (en) * 1995-12-29 1999-02-09 Bull Hn Information Systems Italia S.P.A. Arbitraion unit with round-robin priority, particularly for multiprocessor systems with syncronous symmetrical processors
US5884051A (en) * 1997-06-13 1999-03-16 International Business Machines Corporation System, methods and computer program products for flexibly controlling bus access based on fixed and dynamic priorities
US6279066B1 (en) * 1997-11-14 2001-08-21 Agere Systems Guardian Corp. System for negotiating access to a shared resource by arbitration logic in a shared resource negotiator
US20020062427A1 (en) * 2000-08-21 2002-05-23 Gerard Chauvel Priority arbitration based on current task and MMU
US20020083251A1 (en) * 2000-08-21 2002-06-27 Gerard Chauvel Task based priority arbitration
US6684280B2 (en) * 2000-08-21 2004-01-27 Texas Instruments Incorporated Task based priority arbitration
US20020057711A1 (en) * 2000-11-15 2002-05-16 Nguyen Duy Q. External bus arbitration technique for multicore DSP device
US7006521B2 (en) * 2000-11-15 2006-02-28 Texas Instruments Inc. External bus arbitration technique for multicore DSP device
US6910212B2 (en) * 2000-12-04 2005-06-21 International Business Machines Corporation System and method for improved complex storage locks
US20020078119A1 (en) * 2000-12-04 2002-06-20 International Business Machines Corporation System and method for improved complex storage locks
US20020116438A1 (en) * 2000-12-22 2002-08-22 Steven Tu Method and apparatus for shared resource management in a multiprocessing system
US20060136925A1 (en) * 2000-12-22 2006-06-22 Steven Tu Method and apparatus for shared resource management in a multiprocessing system
US20030145144A1 (en) * 2002-01-30 2003-07-31 International Business Machines Corporation N-way pseudo cross-bar using discrete processor local busses
US7051133B2 (en) * 2002-11-25 2006-05-23 Renesas Technology Corp. Arbitration circuit and data processing system
US20060037020A1 (en) * 2004-08-12 2006-02-16 International Business Machines Corporation Scheduling threads in a multiprocessor computer
US20060064695A1 (en) * 2004-09-23 2006-03-23 Burns David W Thread livelock unit

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155903A1 (en) * 2005-01-13 2006-07-13 Matsushita Electric Industrial Co., Ltd. Resource management device
US8732368B1 (en) 2005-02-17 2014-05-20 Hewlett-Packard Development Company, L.P. Control system for resource selection between or among conjoined-cores
US9003168B1 (en) * 2005-02-17 2015-04-07 Hewlett-Packard Development Company, L. P. Control system for resource selection between or among conjoined-cores
US20070067531A1 (en) * 2005-08-22 2007-03-22 Pasi Kolinummi Multi-master interconnect arbitration with time division priority circulation and programmable bandwidth/latency allocation
US20080022283A1 (en) * 2006-07-19 2008-01-24 International Business Machines Corporation Quality of service scheduling for simultaneous multi-threaded processors
US20080229321A1 (en) * 2006-07-19 2008-09-18 International Business Machines Corporation Quality of service scheduling for simultaneous multi-threaded processors
US8869153B2 (en) 2006-07-19 2014-10-21 International Business Machines Corporation Quality of service scheduling for simultaneous multi-threaded processors
US20080071947A1 (en) * 2006-09-14 2008-03-20 Fischer Matthew L Method of balancing I/O device interrupt service loading in a computer system
US9032127B2 (en) * 2006-09-14 2015-05-12 Hewlett-Packard Development Company, L.P. Method of balancing I/O device interrupt service loading in a computer system
US20090070560A1 (en) * 2006-09-26 2009-03-12 Dan Meng Method and Apparatus for Accelerating the Access of a Multi-Core System to Critical Resources
WO2008043295A1 (en) * 2006-09-26 2008-04-17 Hangzhou H3C Technologies Co., Ltd. Method and device for increasing speed of accessing critical resource by multi-core system
US8190857B2 (en) 2006-09-26 2012-05-29 Hangzhou H3C Technologies, Co., Ltd Deleting a shared resource node after reserving its identifier in delete pending queue until deletion condition is met to allow continued access for currently accessing processor
US7698528B2 (en) 2007-06-28 2010-04-13 Microsoft Corporation Shared memory pool allocation during media rendering
WO2009006016A3 (en) * 2007-06-28 2009-04-23 Microsoft Corp Digital data management using shared memory pool
US20090006771A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Digital data management using shared memory pool
US20090100220A1 (en) * 2007-10-15 2009-04-16 Elpida Memory, Inc. Memory system, control method thereof and computer system
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US20090138670A1 (en) * 2007-11-27 2009-05-28 Microsoft Corporation software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7844779B2 (en) 2007-12-13 2010-11-30 International Business Machines Corporation Method and system for intelligent and dynamic cache replacement management based on efficient use of cache for individual processor core
US20090157970A1 (en) * 2007-12-13 2009-06-18 International Business Machines Corporation Method and system for intelligent and dynamic cache replacement management based on efficient use of cache for individual processor core
US20090217280A1 (en) * 2008-02-21 2009-08-27 Honeywell International Inc. Shared-Resource Time Partitioning in a Multi-Core System
US20090249352A1 (en) * 2008-03-25 2009-10-01 Hohensee Paul H Resource Utilization Monitor
US8683483B2 (en) * 2008-03-25 2014-03-25 Oracle America, Inc. Resource utilization monitor
US8904394B2 (en) * 2009-06-04 2014-12-02 International Business Machines Corporation System and method for controlling heat dissipation through service level agreement analysis by modifying scheduled processing jobs
US20100313203A1 (en) * 2009-06-04 2010-12-09 International Business Machines Corporation System and method to control heat dissipitation through service level analysis
US10606643B2 (en) 2009-06-04 2020-03-31 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US10592284B2 (en) 2009-06-04 2020-03-17 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US10073717B2 (en) 2009-06-04 2018-09-11 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US10073716B2 (en) 2009-06-04 2018-09-11 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US9219657B2 (en) 2009-06-04 2015-12-22 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US9442767B2 (en) 2009-06-04 2016-09-13 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US9442768B2 (en) 2009-06-04 2016-09-13 International Business Machines Corporation System and method to control heat dissipation through service level analysis
US20130174173A1 (en) * 2009-08-11 2013-07-04 Clarion Co., Ltd. Data processor and data processing method
US9176771B2 (en) * 2009-08-11 2015-11-03 Clarion Co., Ltd. Priority scheduling of threads for applications sharing peripheral devices
US20130132708A1 (en) * 2010-07-27 2013-05-23 Fujitsu Limited Multi-core processor system, computer product, and control method
US20140052961A1 (en) * 2011-02-17 2014-02-20 Martin Vorbach Parallel memory systems
US10031888B2 (en) * 2011-02-17 2018-07-24 Hyperion Core, Inc. Parallel memory systems
WO2012145416A1 (en) * 2011-04-20 2012-10-26 Marvell World Trade Ltd. Variable length arbitration
US9507742B2 (en) 2011-04-20 2016-11-29 Marvell World Trade Ltd. Variable length arbitration
CN103620568A (en) * 2011-04-20 2014-03-05 马维尔国际贸易有限公司 Variable length arbitration
US9104485B1 (en) 2011-10-28 2015-08-11 Amazon Technologies, Inc. CPU sharing techniques
US8935699B1 (en) * 2011-10-28 2015-01-13 Amazon Technologies, Inc. CPU sharing techniques
US9934195B2 (en) 2011-12-21 2018-04-03 Mediatek Sweden Ab Shared resource digital signal processors
US20130318268A1 (en) * 2012-05-22 2013-11-28 Xockets IP, LLC Offloading of computation for rack level servers and corresponding methods and systems
US20140165196A1 (en) * 2012-05-22 2014-06-12 Xockets IP, LLC Efficient packet handling, redirection, and inspection using offload processors
US20130318280A1 (en) * 2012-05-22 2013-11-28 Xockets IP, LLC Offloading of computation for rack level servers and corresponding methods and systems
US20150074378A1 (en) * 2013-09-06 2015-03-12 Futurewei Technologies, Inc. System and Method for an Asynchronous Processor with Heterogeneous Processors
US10133578B2 (en) * 2013-09-06 2018-11-20 Huawei Technologies Co., Ltd. System and method for an asynchronous processor with heterogeneous processors
US20170046202A1 (en) * 2014-04-30 2017-02-16 Huawei Technologies Co.,Ltd. Computer, control device, and data processing method
US10572309B2 (en) * 2014-04-30 2020-02-25 Huawei Technologies Co., Ltd. Computer system, and method for processing multiple application programs
US11507420B2 (en) 2015-06-11 2022-11-22 Honeywell International Inc. Systems and methods for scheduling tasks using sliding time windows
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US10908955B2 (en) 2018-03-22 2021-02-02 Honeywell International Inc. Systems and methods for variable rate limiting of shared resource access
CN109542625A (en) * 2018-11-29 2019-03-29 郑州云海信息技术有限公司 A kind of storage resource control method, device and electronic equipment
US11210104B1 (en) * 2020-09-11 2021-12-28 Apple Inc. Coprocessor context priority
US20220083343A1 (en) * 2020-09-11 2022-03-17 Apple Inc. Coprocessor Context Priority
US11768690B2 (en) * 2020-09-11 2023-09-26 Apple Inc. Coprocessor context priority
US20220206862A1 (en) * 2020-12-25 2022-06-30 Intel Corporation Autonomous and extensible resource control based on software priority hint
US11341071B1 (en) * 2021-04-20 2022-05-24 Dell Products L.P. Arbitrating serial bus access

Also Published As

Publication number Publication date
US7380038B2 (en) 2008-05-27

Similar Documents

Publication Publication Date Title
US7380038B2 (en) Priority registers for biasing access to shared resources
US8443373B2 (en) Efficient utilization of idle resources in a resource manager
US7036123B2 (en) System using fair-share scheduling technique to schedule processes within each processor set based on the number of shares assigned to each process group
TWI407373B (en) Resource management in a multicore architecture
US6272517B1 (en) Method and apparatus for sharing a time quantum
Reda et al. Rein: Taming tail latency in key-value stores via multiget scheduling
US7487317B1 (en) Cache-aware scheduling for a chip multithreading processor
Pyarali et al. Techniques for enhancing real-time CORBA quality of service
US10013264B2 (en) Affinity of virtual processor dispatching
US20070079021A1 (en) Selective I/O prioritization by system process/thread and foreground window identification
US20080229319A1 (en) Global Resource Allocation Control
JP2010044784A (en) Scheduling request in system
CN109917705B (en) Multi-task scheduling method
US11422858B2 (en) Linked workload-processor-resource-schedule/processing-system—operating-parameter workload performance system
Minaeva et al. Scalable and efficient configuration of time-division multiplexed resources
Singh et al. A non-database operations aware priority ceiling protocol for hard real-time database systems
KR20140111834A (en) Method and system for scheduling computing
CN115408117A (en) Coroutine operation method and device, computer equipment and storage medium
Etsion et al. Process prioritization using output production: scheduling for multimedia
Kang et al. Priority-driven spatial resource sharing scheduling for embedded graphics processing units
KR100757791B1 (en) Shared resource arbitration method and apparatus
KR100651722B1 (en) Method of configuring Linux kernel for supporting real time performance and test method for supporting real time performance
Lagerstrom et al. PScheD Political scheduling on the CRAY T3E
CN108009074B (en) Multi-core system real-time evaluation method based on model and dynamic analysis
JPWO2011114533A1 (en) Multi-core processor system, control program, and control method

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GRAY, JAN STEPHEN;REEL/FRAME:015727/0186

Effective date: 20050204

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034543/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20200527