US20130226986A1 - Hardware computing system with extended processing and method of operation thereof - Google Patents
Hardware computing system with extended processing and method of operation thereof Download PDFInfo
- Publication number
- US20130226986A1 US20130226986A1 US13/846,187 US201313846187A US2013226986A1 US 20130226986 A1 US20130226986 A1 US 20130226986A1 US 201313846187 A US201313846187 A US 201313846187A US 2013226986 A1 US2013226986 A1 US 2013226986A1
- Authority
- US
- United States
- Prior art keywords
- command
- application manager
- remote application
- bus
- remote
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/545—Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
Definitions
- the present invention relates generally to a hardware computing system, and more particularly to a system for accelerating application execution.
- Operating systems in computers enable the computers to communicate with external resources for execution of commands related to an application.
- the operating system typically handles direct control of items associated with computer usage including keyboard, display, disk storage, network facilities, printers, modems, etc.
- the operating system in a computer is typically designed to cause a general purpose central processing unit (“CPU”) to perform tasks including the managing of local and network file systems, memory, peripheral device drivers, and processes including application processes.
- CPU central processing unit
- the throughput of devices external to the CPU is subject to the limitations imposed by the CPU when the operating system places responsibility for managing these devices on the CPU.
- reliability of the overall software-hardware system, including the CPU, running the operating system, in association with the devices will depend, among other things, on the operating system. Owing to the inherent complexity of the operating system, unforeseen conditions may arise which may undermine stability of the overall software-hardware system.
- the present invention provides a method of operation of a hardware computing system including: transferring a command to a remote application manager through a network bus; executing, by a programmable execution array of the remote application manager, the command; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and providing a response through the network bus for the command.
- the present invention provides a hardware computing system including: a network bus for transferring a command; a remote application manager coupled to the network bus for receiving the command; a programmable execution array coupled to the remote application manager configured for executing the command; a routing logic module in the remote application manager configured to determine a subsequent command to be executed by an alternate remote application manager; and a general purpose central processing unit coupled to the network bus configured to receive a response to the for the command from the remote application manager, the alternate remote application manager, or a combination thereof.
- FIG. 1 is a block diagram of a hardware computing system in an embodiment of the present invention.
- FIG. 2 is a block diagram of the application manager of FIG. 1 .
- FIG. 3 is a block diagram of the distributed processing array of FIG. 1 .
- FIG. 4 is a detailed block diagram of the command processor assembly of FIG. 2 .
- FIG. 5 is a detailed block diagram of the command interpreter of FIG. 4 .
- FIG. 6 is a detailed block diagram of the application manager of FIG. 1 .
- FIG. 7 is a detailed block diagram of the service request module of FIG. 4 .
- FIG. 8 is a detailed block diagram of a response routing logic.
- FIG. 9 is a flow chart of a method of operation of a hardware computing system in a further embodiment of the present invention.
- application refers herein to a sequence of software commands grouped in order to complete a desired process.
- processing includes decoding of software commands, loading of registers, accessing peripherals, and/or accessing memory in executing an application.
- the term “software application” refers herein to a machine language program, compiled to operate in the general purpose central processing unit, comprising a list of executable commands that are recognized by the general purpose central processing unit.
- the term “execute” refers herein to perform a mathematical operation, a logical operation, storage access operation, or a combination thereof, as required by a command of the software application.
- chained commands refers to a series of commands that are serially and individually executed to complete a single task.
- the chained commands are treated as a single command call from an application software, which can include a command and 1 through N subsequent commands where N is an integer.
- the chained commands are responded to with a single completion status.
- FIG. 1 therein is shown a block diagram of a hardware computing system 100 in an embodiment of the present invention.
- the block diagram of the hardware computing system 100 depicts a peripheral controller 102 , can be an integrated circuit for communicating with peripheral devices such as disk drives, tape drives, communication devices, printers, scanners, or the like, coupled to a general purpose central processing unit 104 .
- the term “general purpose central processing unit” refers herein to any micro-processor or processor group that is intended to execute software instructions for operation of a software application.
- a memory device 106 can be coupled to the general purpose central processing unit 104 for storing operation results and retrieving instructions or operation input data required by a software application 107 .
- the memory device 106 can include registers, dynamic random access memory (DRAM), static random access memory (SRAM), non-volatile random access memory (NVRAM), or the like. It is understood that the software application 107 can enter the hardware computing system 100 from the memory device 106 or the peripheral controller 102 . It is also understood that the software application 107 can be transferred from the peripheral controller 102 to the memory device 106 at an initiation of the software application 107 .
- DRAM dynamic random access memory
- SRAM static random access memory
- NVRAM non-volatile random access memory
- An application manager 108 can be coupled to each of the peripheral controller 102 , the general purpose central processing unit 104 , and the memory device 106 .
- the application manager 108 can configure a programmable execution array 110 in order to select which of the configured commands can be executed by the programmable execution array 110 .
- the application manager 108 can maintain a command configuration table that can be used to supplement or replace commands configured in the programmable execution array 110 .
- the programmable execution array 110 can include a plurality of field programmable gate arrays and supporting memory that are capable of being configured to support the software application 107 .
- any application commands or command strings for the programmable execution array 110 can be configured by a system developer (not shown) to include an array of commands that can be executed in the programmable execution array 110 rather than by the software application 107 .
- the programmable execution array 110 can execute the commands for the software application 107 one to two orders of magnitude faster than is possible with the general purpose central processing unit 104 .
- the performance benefit provided by the application manager 108 can be customized to support specific commands or applications in order to provide the optimum performance benefit when the application manager 108 is invoked.
- the application manager 108 can receive a command call from the software application 107 and activate the programmable execution array 110 when the programmable execution array 110 is configured to support the current command call.
- the application manager 108 maintains a list of commands that can be supported by the possible configurations of the programmable execution array 110 . If the programmable execution array 110 can be configured to execute the command required by the software application 107 , the software application 107 can pause the general purpose central processing unit 104 in order to allow the operation of the command by the programmable execution array 110 .
- the application manager 108 can reconfigure the programmable execution array 110 if a different command must be implemented for execution of the command call.
- the application manager 108 can provide a status, through a command execution interface 112 , in order to allow the software application 107 to activate a fixed delay or sleep function in the general purpose central processing unit 104 .
- the general purpose central processing unit 104 will resume execution after the delay.
- the programmable execution array 110 can be reconfigured and execute the command call provided by the software application 107 during the delay of the general purpose central processing unit 104 .
- command execution interface 112 is shown as a direct connection between the general purpose central processing unit 104 and the application manager 108 , it is understood that the command execution interface 112 can be implemented as a bus status, serial communication packet, exception indicator, an interrupt, or status exchange sequence.
- the command execution interface 112 is intended to allow communication between the application manager 108 and the software application 107 executing on the general purpose central processing unit 104 .
- the application manager 108 can access the command execution interface 112 in order to pause or bypass the execution of the command call by the general purpose central processing unit 104 . If the application manager 108 is able to execute the command, it can retrieve the command parameters through a memory bus 114 .
- the application manager 108 and the programmable execution array 110 can be paused between the command calls in the flow of the software application 107 . It is understood that while the memory bus 114 is shown as two busses, the memory bus 114 can be a single bus having the general purpose central processing unit 104 and the application manager 108 as balanced connections.
- the programmable execution array 110 can store the results of the execution of the command in the memory device 106 upon completion of a command call from the software application 107 .
- the general purpose central processing unit 104 will skip the command and wait for the application manager 108 to complete the execution of the command call. It is understood that in most cases the application manager 108 can complete the execution of the command before the general purpose central processing unit 104 can detect the command and the application manager 108 can complete a number of the commands before the general purpose central processing unit 104 is ready for its next command.
- a peripheral control bus 116 provides a communication path to the storage and communication devices coupled to the peripheral controller 102 .
- the application manager 108 can utilize the peripheral controller 102 to complete command operations that require file transfers to or from any attached peripheral devices (not shown).
- the command call from the software application 107 can cause the application manager 108 activate a network bus 118 for transfer of the command call to a distributed processing array 120 .
- the distributed processing array 120 can include a remote application manager 122 coupled to a remote programmable execution array 124 .
- the network bus 118 can include an intra-system bus, such as a peripheral controller interface (PCI) bus or PCI-express bus, an inter-system bus such as serial attached small computer system interface (SAS) or serial advanced technology attachment (SATA), or an extra-system interface, such as Ethernet, the Internet, an optical interface, or a combination thereof.
- the network bus 118 can be configured to pass all or portions of the command call from the software application 107 to the remote application manager 122 . While only one of the remote application manager 122 is shown for simplicity, any number of the remote application manager 122 can be configured to communicate through the network bus 118 .
- the remote application manager 122 can process the command call through the remote programmable execution array 124 to which it is coupled.
- the combination of the remote application manager 122 and the remote programmable execution array 124 can execute the command call from the software application 107 or portions thereof.
- the remote application manager 122 can produce an intermediate result that is passed to the application manager 108 for further processing.
- This configuration enables complete subroutine execution by the hardware computing system 100 .
- a remote exception line 126 can alert the application manager 108 to the condition.
- the application manager 108 would then revert the operation back to the software application 107 for completion by the general purpose central processing unit 104 as a recovery. This mechanism, while slower, can complete the command execution without additional intervention being required.
- the role of the application manager 108 in this configuration can be that of a resource manager with extended resources beyond the local hardware system.
- the application manager 108 can be pre-configured to identify the instances of the remote application manager 122 that are allocated to its use. Each of the remote application manager 122 can then allocate the remote programmable execution array 124 that it manages. It is understood that the remote programmable execution array 124 can include one or more of the field programmable gate arrays and supporting memory that are capable of being configured to support the software application 107 .
- the configuration of the hardware computing system 100 can be predetermined by the system designer (not shown) to rely on specific instances of the remote application manager 122 within the distributed processing array 120 .
- the configuration can be verified by the application manager 108 during system initialization of the hardware computing system 100 .
- multiple instances of the remote application manager 122 can be allocated to support the application manager 108 during execution of the command call of the application software 107 .
- Each of the remote application manager 122 can manage the remote programmable execution array 124 , to which it is attached, in the same fashion that the application manager 108 manages the programmable execution array 110 .
- the hardware computing system 100 can be configured to utilize a primary instance of the remote application manager 122 and have a contingency for a fail-over version of the remote application manager 122 available to complete a pending operation for the command call of the application software 107 , where the command call can be a single command or a chained command.
- the application manager 108 is shown as managing the distributed processing array 120 , it could also be an instance of the distributed processing array 120 that is dedicated to a specific instance of the application software 107 .
- the memory bus 114 coupled to the application manager 108 can be coupled directly to the network bus 118 which allows the application software 107 to process command calls in parallel through the distributed processing array 120 .
- the application manager 108 determines that the programmable execution array 110 is not capable of being configured to execute the command required by the software application 107 and none of the remote programmable execution array 124 are available, the application manager 108 can communicate through the command execution interface 112 to the software application 107 which can enable the general purpose central processing unit 104 to execute the command call through software execution.
- This hardware execution of the commands by the application manager 108 can be adjusted by re-configuring the programmable execution array 110 or accessing the distributed processing array 120 to identify the appropriate resource allocation.
- the execution time of the hardware computing system 100 can be accelerated by providing more commands, that can be accommodated by the application manager 108 , than will fit within the programmable execution array 110 in a single configuration.
- the hardware computing system 100 can support multiple threads of the command call of the application software 107 as well as parallel processing and chaining of the command calls of the application software 107 .
- the resources of the distributed processing array 120 can be located in the same system as the general purpose central processing unit 104 , in an adjacent structure separate from the general purpose central processing unit 104 , or regionally remote from the general purpose central processing unit 104 .
- This allocation of the remote application manager 122 can provide acceleration of the command call of the application software 107 to multiple instances of the general purpose central processing unit 104 in order to fully utilize the flexibility and speed of the application manager 108 and the remote application manager 122 .
- the block diagram of the application manager 201 depicts a command processor assembly 202 , which can be implemented in a complex programmable logic device (CPLD).
- the command processor assembly 202 can include a command processor 204 that receives a command stream through an embedded memory controller 206 from the memory bus 114 or the network bus 118 .
- the command processor 204 can determine if the command can be executed without the assistance of the general purpose central processing unit 104 of FIG. 1 .
- the command processor 204 can access the embedded memory controller 206 , coupled to a configuration memory 208 , through an embedded memory bus 209 in order to determine whether the command can be executed by the application manager 108 .
- the configuration memory 208 can be any volatile memory such as a random access memory (RAM) or a non-volatile memory such as a flash memory.
- the configuration memory 208 can be written by the embedded memory controller 206 to hold the circuit configurations that can be loaded into a configurable logic device 210 , such as a field programmable gate array (FPGA).
- FPGA field programmable gate array
- the command processor 204 can maintain the current configuration of the programmable execution array 110 and if necessary, can alter the configuration by accessing a field programmable gate array (FPGA) interface module 212 .
- the programmable execution array 110 can be coupled to the FPGA interface module 212 , which maintains the configuration and percent utilization of the programmable execution array 110 .
- FPGA field programmable gate array
- the receipt of a command call from the application software 107 of FIG. 1 can activate the command processor 204 to configure the programmable execution array 110 or a remote programmable execution array 124 of FIG. 1 through the network bus 118 . Any exception that is detected in the remote programmable execution array 124 can be conveyed to the command processor 204 through the remote exception line 126 .
- the command processor 204 can initially determine whether the programmable execution array 110 is currently configured to execute the command that is presented on the memory bus 114 by accessing the configuration memory 208 through the embedded memory controller 206 .
- the command processor 204 can also maintain pointers to a pre-configured instance of the remote programmable execution array 124 managed by the remote application manager 122 of FIG. 1 that is allocated to the command processor 204 .
- the command processor 204 can update the current state and configuration of the programmable execution array 110 through the FPGA interface module 212 . It is understood that the number of configuration images that are maintained in the configuration memory 208 can represent more logic than is able to be loaded in the configurable logic device 210 at one time. By monitoring the usage statistics of the configuration images, the command processor 204 can manage the loading of the configuration images to the programmable execution array 110 in order to increase the percentage of utilization of the application manager 108 . The same process can be performed by the remote application manager 122 for the remote programmable execution array 124 . The command processor 204 can manage the remote application manager 122 in order to tune the execution of command call of the application software 107 including processing intermediate results by the programmable execution array 110 or the remote programmable execution array 124 to continue processing of the command call to completion.
- the command processor 204 can take control of the command by activating a status in the command execution interface 112 .
- the command processor 204 can then retrieve the command parameters and transfer the command parameters through a command traffic bus 214 or the network bus 118 .
- the command processor 204 activates the FPGA interface module 212 to manage an FPGA control bus 216 during the command parameter transfer and any reconfiguration processes.
- the command processor 204 can manipulate the configuration through the FPGA interface module 212 and the embedded memory controller 206 .
- the embedded memory controller 206 can address an instance of the configuration memory 208 in order to provide configuration patterns on a configuration bus 218 .
- the embedded memory controller 206 can drive a memory control bus 220 , coupled to the embedded memory 208 , to provide address and control lines for selecting the configuration patterns that are provided to the configurable logic device 210 .
- the FPGA interface module 212 can be coupled to multiple instances of the configurable logic device 210 within the programmable execution array 110 .
- the command processor 204 can detect any conditions that can cause erroneous operations, such as the configuration time-out, image loading error, check sum error, activation of the remote exception line 126 , or the like. If a failure condition is detected by the command processor 204 , the embedded memory controller 206 , the FPGA interface module 212 , or a combination thereof, the command processor assembly 202 can activate a command process error driver 222 . The activation of the command process error driver 222 can cause the general purpose central processing unit 104 to execute the command call that was pending during the command set-up by the command processor assembly 202 and detection of the failure condition.
- the command processor assembly 202 can be coupled to the peripheral control bus 116 for accessing storage and communication devices managed by the peripheral controller 102 of FIG. 1 .
- a command queue module 224 can manage a series of commands that can be executed by the programmable execution array 110 , the remote programmable execution array 124 , or a combination thereof.
- the command queue module 224 can allow the command processor 204 to hold a command in reserve while the configurable logic device 210 is reconfigured to execute the reserved command. It is understood that command queue module 224 can defer execution of commands for any of the configurable logic device 210 in the programmable execution array 110 or the remote programmable execution array 124 .
- the command queue module 224 can allow the transfer of the command to be interrupted and later completed without causing additional retry delay.
- the command queue module 224 can also coordinate the execution of chained commands or manage intermediate results of a single command execution.
- the command queue module can manage results of a double precision mathematical operation executed by instances of the remote programmable execution array 124 or pass the intermediate results of a chained command to the programmable execution array 110 for completion.
- the command processor 204 and the command queue module 224 can process the series of commands as a single command call from the application software 107 .
- the command queue module 224 can benefit a multi-threaded operation of the general purpose central processing unit 104 by allowing the programmable execution array 110 to complete the command from the interrupted thread and hold the results until the interrupted thread is restored.
- the management of the command queue module 224 can provide a watch-dog timer and queue monitoring that prevents the retrieval of the incorrect information by the general purpose central processing unit 104 while switching operational threads as performed by the multi-threaded operation or the general purpose central processing unit 104 having multiple operational cores.
- the command processor 204 and the command queue module 224 can be configured to manage chained commands executed by the programmable execution array 110 or the remote programmable execution array 124 without interrupting the general purpose central processing unit 104 .
- the command processor can be targeted to support a specific set of the command calls of the software application 107 or to support specific sets of commands that are inefficient when executed by the general purpose central processing unit 104 .
- the command processor 204 can be configured to execute specific commands in the programmable execution array 110 or the remote programmable execution array 124 and coordinate the results without support from the general purpose central processing unit 104 .
- the performance of the hardware computing system 100 of FIG. 1 can be measured to be greater than twice that of the general purpose central processing unit 104 alone. It is understood that the logical connections within the command processor assembly 202 are not shown for clarity and brevity.
- FIG. 3 therein is shown a block diagram of the distributed processing array 120 .
- the block diagram of the distributed processing array 120 depicts a remote application manager 302 coupled to configuration memory 208 and the configurable logic device 210 .
- configuration memory 208 and the configurable logic device 210 are shown, but it is understood that the programmable execution array 110 includes more than one each of the configuration memory 208 and the configurable logic device 210 .
- the remote application manager 302 can provide the remote exception line 126 for alerting the application manager 108 of FIG. 1 that an exception in the set-up or execution of a command call of the application software 107 of FIG. 1 has occurred.
- a remote status bus 304 can activate an alternate remote application manager 306 that can be configured to complete the execution of the command call of the application software 107 should the exception be detected.
- the alternate remote application manager 306 can provide a redundant execution path to increase the reliability of the hardware computing system 100 of FIG. 1 .
- the remote status bus 304 can be used to convey intermediate solutions between the remote application manager 302 and the alternate remote application manager 306 for chained commands or operations where information carries over from the remote application manager 302 to the alternate remote application manager 306 or vice versa. While the remote status bus 304 is shown to couple the remote application manager 302 to the alternate remote application manager 306 , it is understood that the remote status bus 304 can be coupled to more than one of the alternate remote application manager 306 and can be configured as a star, ring, mesh, toroidal mesh, hyper cube, tree, or the like.
- the remote status bus 304 can be any communication structure including a wired bus, a serial link, an optical link, a wireless link, or an electro-magnetic link.
- the remote application manager 302 and the alternate remote application manager 306 can be commonly joined to the network bus 118 for exchange of command parameters, intermediate results, status, or configuration information with the application manager 108 .
- the network bus 118 is shown in a star configuration but it is understood that the network bus 118 can be can be configured as a star, ring, mesh, toroidal mesh, hypercube, tree, or the like.
- the network bus 118 can be any communication structure including a wired bus, a serial link, an optical link, a wireless link, or an electro-magnetic link.
- the remote application manager 302 and the alternate remote application manager 306 can be configured to support different instances of the application manager 108 or the same instance of the application manager 108 . It is understood that while the remote application manager 302 and the alternate remote application manager 306 are shown together, they do not have to be co-located.
- the network bus 118 and the remote status bus 304 can extend between boards, systems, or regions without modifying their operation.
- the configuration of the distributed processing array 120 can provide redundant operation between the remote application manager 302 and the alternate remote application manager 306 to provide fail-over operation without intervention of the general purpose central processing unit 104 of FIG. 1 . It has also been discovered that the remote application manager 302 and the alternate remote application manager 306 can be configured to support chained commands where the remote application manager 302 can pass intermediate results to the alternate remote application manager 306 through the remote status bus 304 .
- any number of the remote application manager 302 and the alternate remote application manager 306 can be joined by the remote status bus 304 in the distributed processing array 120 without adding power or space burden to the hardware computing system 100 .
- the combination of the remote application manager 302 and the alternate remote application manager 306 can be joined by the remote status bus 304 to form the remote application manager 122 .
- the combination of the instances of the programmable execution array 110 can be coupled to the remote application manager 122 to form the remote programmable execution array 124 .
- the distributed processing array 120 can provide additional bandwidth capability to the hardware computing system 100 without a major modification to the system within which it resides.
- FIG. 4 therein is shown a detailed block diagram of the command processor assembly 202 of FIG. 2 .
- the detailed block diagram of the command processor assembly 202 depicts a command interpreter 402 having an FPGA control module 404 and a command processing unit 406 .
- a service request module 408 such as a command request queue mechanism can be coupled to the memory bus 114 and the network bus 118 .
- the service request module 408 can maintain the session number associated with the command and pass the command to the command interpreter 402 .
- the service request module 408 can maintain a queue for each of the possible commands that are supported by the programmable execution array 110 of FIG. 1 .
- the general purpose central processing unit 104 of FIG. 1 can read the status of the queues within the service request module 408 in order to enable command chaining and pipelining of consecutive command strings.
- the service request module 408 can be coupled to the command interpreter 402 by a session identification bus 409 .
- the session identification bus 409 allows the command interpreter 402 to monitor the operation of commands that can be queued in the service request module 402 .
- the service request module 408 can provide a command stream 410 to a selector 412 .
- the selector 412 can direct the command stream 410 through an FPGA data stream 413 and a memory data stream 414 .
- the FPGA data stream 413 can provide configuration image data as well as the command input data required to execute a configured command.
- An FPGA program control bus 416 can be sourced from the command interpreter 402 and can be managed by the FPGA control manager 404 .
- the memory data stream 414 can provide FPGA configuration data options to be stored in the configuration memory 208 of FIG. 2 .
- the memory data stream 414 can also be used to retrieve the FPGA configuration data used to re-configure the configurable logic device 210 of FIG. 2 .
- a memory control function 418 can be coupled to the command interpreter 402 for managing the embedded memory bus 209 .
- the memory control function 418 can source the memory control bus 422 , which in coordination with the memory data stream 414 forms the embedded memory bus 209 .
- An embedded table memory 424 can be coupled to the command interpreter 402 for maintaining current configuration data and statistical usage data.
- the command traffic bus 214 includes the FPGA data stream 413 , the FPGA program control bus 416 , and an FPGA response bus 428 .
- the coordination of the FPGA data stream 413 and the FPGA program control bus 416 is under control of the FPGA control module 404 .
- the command interpreter 402 can monitor the integrity of the transfers between the command data stream 408 and the command traffic bus 214 or the embedded memory bus 209 . If the command interpreter 402 detects an error in the transfers the command process error driver 222 will be asserted.
- the FPGA response bus 428 can be coupled to a routing logic module 430 .
- the routing logic module 430 can be initialized by a table initialize line 432 from the command interpreter 402 .
- the tables in the routing logic module 430 can include a source list, a destination list, and a count for managing the execution of multiple commands in a chained commands operation or a command queued operation.
- the routing logic module 430 can be coupled to a service response queue module 434 in order to maintain a response queue for each of the possible commands that are supported by the programmable execution array 110 .
- the routing logic module 430 can store the session number status for each of the operations completed by the programmable execution array 110 as well as the source, destination, and count for each of the command calls executed by the programmable execution array 110 .
- the output of the routing logic module 430 can be coupled to the service request module 408 , by a response queue bus 436 , in order to convey pending status and results to the memory bus 114 or the network bus 118 .
- the combination of the service request module 408 , the programmable execution array 110 , and the routing logic module 430 can allow the software application 107 of FIG. 1 to execute multiple consecutive commands, multiple commands from different threads, or chained commands while reducing the burden on the general purpose central processing unit 104 .
- the reduction in processing requirements on the general purpose central processing unit 104 can increase the system performance by greater than a factor of 2.
- the detailed block diagram of the command interpreter 402 of FIG. 4 depicts a command control module 502 having an FPGA interface controller 504 .
- the FPGA interface controller 504 can include state machines and check logic that provides full integrity during the loading of characterization images and the command input and results.
- the command control module 502 also has a table memory controller 506 and an application ID look-up module 508 .
- the table memory controller 506 can source a memory interface bus 510 that provides enable, address, and read/write control to the memory control function 418 of FIG. 4 .
- An FPGA parallel loader 512 is managed by the FPGA interface controller 504 .
- the FPGA parallel loader 512 can provide the FPGA program control bus 416 while also performing error checking and timing control.
- the application ID look-up module 508 can receive the output of a request activity vector module 514 that processes the command stream 409 .
- the command control module 502 can verify the rate and alignment of the request activity vector module 514 during processing.
- An FPGA utilization vector 516 can maintain the utilization vector and percent utilization of the programmable execution array 110 of FIG. 2 as an aid during reconfiguration of the configurable logic device 210 of FIG. 2 .
- a per partition usage register 518 can monitor the usage statistics of all of the commands partitioned in the programmable execution array 110 of FIG. 1 .
- the usage statistics maintained in the per partition usage register 518 can provide session number statistics to track command execution within the programmable execution array 110 .
- the usage statistics can guide the reconfiguration of the configurable logic device 210 by matching the session number of an executable thread for the general purpose central processing unit 104 of FIG. 1 .
- FIG. 6 therein is shown a detailed block diagram of the application manager 108 of FIG. 1 .
- the detailed block diagram of the application manager 108 depicts the command processor assembly 202 having the command execution interface 112 and the memory bus 114 forming a system interface side of the command processor assembly 202 .
- An execution side of the command processor assembly 202 can include the embedded memory bus 209 and the command traffic bus 214 .
- the embedded memory bus 209 can couple to each instance of the configuration memory 208 in the programmable execution array 110 .
- the execution cell 602 also includes one instance of the configurable logic device 224 which is coupled to the command traffic bus 214 . More than one of the execution cell 602 can be included in the programmable execution array 110 .
- an increased number of the execution cell 602 can require additional instances of the embedded memory bus 209 and the command traffic bus 214 . It is further understood that the implementation of the additional instances of the embedded memory bus 209 and the command traffic bus 214 can allow concurrent execution of commands in the separate instances of the execution cell 602 . Some command execution can require configuring multiple instances of the execution cell 602 to execute a single command, such as a double precision mathematical operation.
- the configuration of the programmable execution array 110 can provide support for concurrent execution, by the execution cell 602 , of multiple queued commands from the software application 107 of FIG. 1 . It has further been discovered that by carrying a session number as a label through the entire operation, the commands and results can be queued in the service request module 408 and the routing logic module 430 without losing the context of the results. The burden, on the general purpose central processing unit 104 of FIG. 1 , can be reduced because of a reduction in execution requirements and interrupt processing associated with the software application 107 .
- FIG. 7 therein is shown a detailed block diagram of the service request module 408 of FIG. 4 .
- the detailed block diagram of the service request module 408 depicts an ingress control module 702 coupled to the memory bus 114 and/or the network bus 118 .
- the ingress control module 702 can support bidirectional communication to allow command delivery and polling by the software application 107 of FIG. 1 .
- the service request module 408 can be instantiated in a remote application manager 302 of FIG. 3 or the application manager 108 of FIG. 1 .
- the ingress control module 702 can drive several instances of a command queue bus 704 for transferring a command 706 , a subsequent command 707 , or a combination thereof to a queuing register 708 .
- the queuing register 708 can be implemented as a volatile memory, non-volatile memory, or register file to store the command 706 for deferred operation, interrupted transfer, or diagnostic operations. It is understood that the command 706 can include all of the input parameters necessary to execute the command 706 within an instance of the execution cell 602 of FIG. 6 .
- a command queuing array 709 is formed by a number of the queuing register coupled to the ingress control module 702 .
- a controller bus 710 can couple the ingress control module 702 to a command interpreter interface 712 , such as a framer for synchronizing the communication between the ingress control module 702 and the command interpreter 402 of FIG. 4 .
- the command interpreter interface 712 can provide the session identification bus 409 to the command interpreter 402 .
- the command interpreter interface 712 can provide the session number information to the command interpreter 402 for monitoring the status and availability of the execution cell 602 required to execute the command 706 .
- the queuing register 708 can provide an FPGA instruction bus 714 that is coupled to a command stateful multiplexer 716 .
- the command stateful multiplexer 716 can have a state tracking module 718 that monitors the session numbers of the command 706 associated with each of the queuing register 708 .
- the ingress control module 702 can determine when all of the parameters, for the command 706 , are assembled within the queuing register 708 .
- the queuing register 708 is capable of supporting multiple of the command 706 or parts of the command 706 .
- the state tracking module 718 can allow the command stateful multiplexer 716 to manage a multi-threaded operation of the application software 107 of FIG. 1 as well as queuing of the command 706 for deferred processing.
- the command stateful multiplexer 716 can be controlled by the combination of the state tracking module 718 and the FPGA instruction bus 714 .
- a signal is sent to the command stateful multiplexer 716 that causes the session identification and the command 706 to be sent through the command stream 410 to the execution cell 602 that is slated to execute the command 706 .
- the ingress control module 702 can establish a timeout for any partially loaded command 706 . If for any reason the command 706 is not fully loaded in the queuing register 708 , an error can be detected and the command interpreter 402 can be alerted.
- the command interpreter 402 can assemble an error response that can be communicated to the general purpose central processing unit 104 of FIG. 1 through the command process error driver 222 .
- the service request module 408 has been shown having six of the queuing register 708 as an example of the implementation without limiting the number of the queuing register 708 . It is further understood that the software application 107 can communicate with the service request module 408 by polling or by establishing an interrupt through the command interpreter 402 .
- the management of the queuing registers 708 can provide a rapid execution path for the command 706 from the software application 107 . Since each of the queuing registers 708 can monitor multiple instances of the command 706 it is possible to link successive operations, chained commands, and perform whole sub-routines without interrupting the progress of the general purpose central processing unit 104 . The accelerated execution of the software application 107 can dramatically increase the performance of the general purpose central processing unit 104 .
- FIG. 8 therein is shown a detailed block diagram of a response routing logic 801 .
- the detailed block diagram of the response routing logic 801 depicts the FPGA response bus 428 coupled to the routing logic module 430 .
- the routing logic module 430 is subsequently coupled to the service response queue module 434 for response destination management.
- the service response queue module 434 can include an FPGA bus interface 802 .
- the FPGA bus interface 802 can direct the FPGA response bus 428 to a response queuing register 804 in a response queuing array 805 through a session selected bus 806 .
- the response queuing array 805 can match the operation of the command queuing array 709 of FIG. 7 in order to align the correct instance of a response 808 on the response queue bus 436 .
- the response 808 can include status and data associated with the execution of the command 706 of FIG. 7 by the execution cell 602 of FIG. 6 .
- the response 808 can be provided through a result and status bus 810 to a result stateful multiplexer 812 .
- the result stateful multiplexer 812 can include a response session tracking module 814 for determining which of the response queuing register 804 in the response queuing array 805 can contain the response 808 for a specific session identification.
- a response steering bus 816 can couple the FPGA bus interface 802 to the result stateful multiplexer 812 .
- the result stateful multiplexer 812 can convey the response 808 from the response queuing register 804 to the FPGA bus interface 802 for transfer through a response activity bus 818 .
- the result stateful multiplexer 812 can provide conditional interrupts to the general purpose central processing unit 104 to notify of completion of the command 706 , an error status, or exception condition.
- the service response queue module 434 has been shown having six of the response queuing register 804 as an example of the implementation without limiting the number of the response queuing register 804 . It is further understood that the software application 107 can communicate with the routing logic module 430 by polling or by establishing an interrupt through the memory bus 114 and the service request module 408 .
- the management of the response queuing register 804 can provide a rapid execution path for the response 808 to the command 706 from the software application 107 including providing the response 808 correctly aligned with a queued command 706 . Since each of the response queuing register 804 can monitor multiple instances of the response 808 it is possible to link successive operations and perform whole sub-routines without interrupting the progress of the general purpose central processing unit 104 . The accelerated execution of the software application 107 can dramatically increase the performance of the general purpose central processing unit 104 .
- the routing logic module 430 can include a source/destination table 822 that can be coupled to a counter 820 .
- the table initialize line 432 can cause the source/destination table 822 to be loaded with source and destination information for each command in a set of chained commands including a count of how many of the commands are in the chain.
- the source/destination table 822 and the counter 820 can work to define which commands can be loaded into the remote application manager 122 of FIG. 2 .
- the completion of the command 706 can cause the source/destination table 822 to match the source identification of the alternate remote application manager 306 in order to determine which of the alternate remote application manager 306 should receive the intermediate results of the command 706 in order to execute the execution of the next command in the chained commands.
- a key feature of the routing logic module 430 is that it can receive an input through the FPGA response bus 428 , from the routing logic module 430 in the alternate remote application manager 306 of FIG. 3 , the routing logic module 430 receiving the input can interrogate the source/destination table 822 in order to determine which instance of the alternate remote application manager 306 is in line to receive the intermediate results and when the chained commands are complete.
- the method 900 includes: transferring a command to a remote application manager through a network bus in a block 902 ; transferring a command from the command stream by an application manager in a block 904 ; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager in a block 906 ; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and providing a response through the network bus for the command in a block 908 .
- the resulting method, process, apparatus, device, product, and/or system is straightforward, cost-effective, uncomplicated, highly versatile, accurate, sensitive, and effective, and can be implemented by adapting known components for ready, efficient, and economical manufacturing, application, and utilization.
- Another important aspect of the present invention is that it valuably supports and services the historical trend of reducing costs, simplifying systems, and increasing performance.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Stored Programmes (AREA)
- Advance Control (AREA)
- Multi Processors (AREA)
Abstract
A method of operation of a hardware computing system includes: transferring a command to a remote application manager through a network bus; executing, by a programmable execution array of the remote application manager, the command; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and providing a response through the network bus for the command.
Description
- This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/612,882 filed Mar. 19, 2012, and the subject matter thereof is incorporated herein by reference in its entirety.
- The present invention relates generally to a hardware computing system, and more particularly to a system for accelerating application execution.
- Operating systems in computers enable the computers to communicate with external resources for execution of commands related to an application. The operating system typically handles direct control of items associated with computer usage including keyboard, display, disk storage, network facilities, printers, modems, etc. The operating system in a computer is typically designed to cause a general purpose central processing unit (“CPU”) to perform tasks including the managing of local and network file systems, memory, peripheral device drivers, and processes including application processes.
- Placing responsibility for all of these functions on the CPU imposes significant processing burdens on it, particularly when the operating system is sophisticated, as, for example, in the case of Windows NT™, Unix™, and NetWare™. The CPU is called upon to perform housekeeping tasks for the system and installed software. The continuous update of software and the maintenance tasks of the compute hardware can relegate the execution of an application to a very low priority. The more burden that is placed on the CPU to run tasks other than those associated with applications, the less CPU time is available to run applications with the result that performance of the applications may be degraded.
- In addition, the throughput of devices external to the CPU is subject to the limitations imposed by the CPU when the operating system places responsibility for managing these devices on the CPU. Furthermore, reliability of the overall software-hardware system, including the CPU, running the operating system, in association with the devices, will depend, among other things, on the operating system. Owing to the inherent complexity of the operating system, unforeseen conditions may arise which may undermine stability of the overall software-hardware system.
- Thus, a need still remains for a hardware computing system with extended processing. In view of the performance and power limitations imposed on general purpose central processing units, it is increasingly critical that answers be found to these problems. In view of the ever-increasing commercial competitive pressures, along with growing consumer expectations and the diminishing opportunities for meaningful product differentiation in the marketplace, it is critical that answers be found for these problems. Additionally, the need to reduce costs, improve efficiencies and performance, and meet competitive pressures adds an even greater urgency to the critical necessity for finding answers to these problems.
- Solutions to these problems have been long sought but prior developments have not taught or suggested any solutions and, thus, solutions to these problems have long eluded those skilled in the art.
- The present invention provides a method of operation of a hardware computing system including: transferring a command to a remote application manager through a network bus; executing, by a programmable execution array of the remote application manager, the command; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and providing a response through the network bus for the command.
- The present invention provides a hardware computing system including: a network bus for transferring a command; a remote application manager coupled to the network bus for receiving the command; a programmable execution array coupled to the remote application manager configured for executing the command; a routing logic module in the remote application manager configured to determine a subsequent command to be executed by an alternate remote application manager; and a general purpose central processing unit coupled to the network bus configured to receive a response to the for the command from the remote application manager, the alternate remote application manager, or a combination thereof.
- Certain embodiments of the invention have other steps or elements in addition to or in place of those mentioned above. The steps or elements will become apparent to those skilled in the art from a reading of the following detailed description when taken with reference to the accompanying drawings.
-
FIG. 1 is a block diagram of a hardware computing system in an embodiment of the present invention. -
FIG. 2 is a block diagram of the application manager ofFIG. 1 . -
FIG. 3 is a block diagram of the distributed processing array ofFIG. 1 . -
FIG. 4 is a detailed block diagram of the command processor assembly ofFIG. 2 . -
FIG. 5 is a detailed block diagram of the command interpreter ofFIG. 4 . -
FIG. 6 is a detailed block diagram of the application manager ofFIG. 1 . -
FIG. 7 is a detailed block diagram of the service request module ofFIG. 4 . -
FIG. 8 is a detailed block diagram of a response routing logic. -
FIG. 9 is a flow chart of a method of operation of a hardware computing system in a further embodiment of the present invention. - The following embodiments are described in sufficient detail to enable those skilled in the art to make and use the invention. It is to be understood that other embodiments would be evident based on the present disclosure, and that system, process, or mechanical changes may be made without departing from the scope of the present invention.
- In the following description, numerous specific details are given to provide a thorough understanding of the invention. However, it will be apparent that the invention may be practiced without these specific details. In order to avoid obscuring the present invention, some well-known circuits, system configurations, and process steps are not disclosed in detail.
- The drawings showing embodiments of the system are semi-diagrammatic and not to scale and, particularly, some of the dimensions are for the clarity of presentation and are shown exaggerated in the drawing FIGs. Similarly, although the views in the drawings for ease of description generally show similar orientations, this depiction in the FIGs. is arbitrary for the most part. Generally, the invention can be operated in any orientation.
- The same numbers are used in all the drawing FIGs. to relate to the same elements. The embodiments have been numbered first embodiment, second embodiment, etc. as a matter of descriptive convenience and are not intended to have any other significance or provide limitations for the present invention.
- The term “application” refers herein to a sequence of software commands grouped in order to complete a desired process. The term “processing” as used herein includes decoding of software commands, loading of registers, accessing peripherals, and/or accessing memory in executing an application.
- The term “software application” refers herein to a machine language program, compiled to operate in the general purpose central processing unit, comprising a list of executable commands that are recognized by the general purpose central processing unit.
- The term “execute” refers herein to perform a mathematical operation, a logical operation, storage access operation, or a combination thereof, as required by a command of the software application. The term “chained commands” as used herein refers to a series of commands that are serially and individually executed to complete a single task. The chained commands are treated as a single command call from an application software, which can include a command and 1 through N subsequent commands where N is an integer. The chained commands are responded to with a single completion status.
- Referring now to
FIG. 1 , therein is shown a block diagram of ahardware computing system 100 in an embodiment of the present invention. The block diagram of thehardware computing system 100 depicts aperipheral controller 102, can be an integrated circuit for communicating with peripheral devices such as disk drives, tape drives, communication devices, printers, scanners, or the like, coupled to a general purposecentral processing unit 104. The term “general purpose central processing unit” refers herein to any micro-processor or processor group that is intended to execute software instructions for operation of a software application. Amemory device 106 can be coupled to the general purposecentral processing unit 104 for storing operation results and retrieving instructions or operation input data required by asoftware application 107. Thememory device 106 can include registers, dynamic random access memory (DRAM), static random access memory (SRAM), non-volatile random access memory (NVRAM), or the like. It is understood that thesoftware application 107 can enter thehardware computing system 100 from thememory device 106 or theperipheral controller 102. It is also understood that thesoftware application 107 can be transferred from theperipheral controller 102 to thememory device 106 at an initiation of thesoftware application 107. - An
application manager 108 can be coupled to each of theperipheral controller 102, the general purposecentral processing unit 104, and thememory device 106. Theapplication manager 108 can configure aprogrammable execution array 110 in order to select which of the configured commands can be executed by theprogrammable execution array 110. Theapplication manager 108 can maintain a command configuration table that can be used to supplement or replace commands configured in theprogrammable execution array 110. Theprogrammable execution array 110 can include a plurality of field programmable gate arrays and supporting memory that are capable of being configured to support thesoftware application 107. - It is understood that any application commands or command strings for the
programmable execution array 110 can be configured by a system developer (not shown) to include an array of commands that can be executed in theprogrammable execution array 110 rather than by thesoftware application 107. Theprogrammable execution array 110 can execute the commands for thesoftware application 107 one to two orders of magnitude faster than is possible with the general purposecentral processing unit 104. The performance benefit provided by theapplication manager 108 can be customized to support specific commands or applications in order to provide the optimum performance benefit when theapplication manager 108 is invoked. - The
application manager 108 can receive a command call from thesoftware application 107 and activate theprogrammable execution array 110 when theprogrammable execution array 110 is configured to support the current command call. Theapplication manager 108 maintains a list of commands that can be supported by the possible configurations of theprogrammable execution array 110. If theprogrammable execution array 110 can be configured to execute the command required by thesoftware application 107, thesoftware application 107 can pause the general purposecentral processing unit 104 in order to allow the operation of the command by theprogrammable execution array 110. - The
application manager 108 can reconfigure theprogrammable execution array 110 if a different command must be implemented for execution of the command call. Theapplication manager 108 can provide a status, through acommand execution interface 112, in order to allow thesoftware application 107 to activate a fixed delay or sleep function in the general purposecentral processing unit 104. The general purposecentral processing unit 104 will resume execution after the delay. Theprogrammable execution array 110 can be reconfigured and execute the command call provided by thesoftware application 107 during the delay of the general purposecentral processing unit 104. - While the
command execution interface 112 is shown as a direct connection between the general purposecentral processing unit 104 and theapplication manager 108, it is understood that thecommand execution interface 112 can be implemented as a bus status, serial communication packet, exception indicator, an interrupt, or status exchange sequence. Thecommand execution interface 112 is intended to allow communication between theapplication manager 108 and thesoftware application 107 executing on the general purposecentral processing unit 104. Theapplication manager 108 can access thecommand execution interface 112 in order to pause or bypass the execution of the command call by the general purposecentral processing unit 104. If theapplication manager 108 is able to execute the command, it can retrieve the command parameters through amemory bus 114. Theapplication manager 108 and theprogrammable execution array 110 can be paused between the command calls in the flow of thesoftware application 107. It is understood that while thememory bus 114 is shown as two busses, thememory bus 114 can be a single bus having the general purposecentral processing unit 104 and theapplication manager 108 as balanced connections. Theprogrammable execution array 110 can store the results of the execution of the command in thememory device 106 upon completion of a command call from thesoftware application 107. - If the
command execution interface 112 is set to indicate theapplication manager 108 will execute the command, the general purposecentral processing unit 104 will skip the command and wait for theapplication manager 108 to complete the execution of the command call. It is understood that in most cases theapplication manager 108 can complete the execution of the command before the general purposecentral processing unit 104 can detect the command and theapplication manager 108 can complete a number of the commands before the general purposecentral processing unit 104 is ready for its next command. - A
peripheral control bus 116 provides a communication path to the storage and communication devices coupled to theperipheral controller 102. Theapplication manager 108 can utilize theperipheral controller 102 to complete command operations that require file transfers to or from any attached peripheral devices (not shown). - The command call from the
software application 107 can cause theapplication manager 108 activate anetwork bus 118 for transfer of the command call to a distributedprocessing array 120. The distributedprocessing array 120 can include aremote application manager 122 coupled to a remoteprogrammable execution array 124. - The
network bus 118 can include an intra-system bus, such as a peripheral controller interface (PCI) bus or PCI-express bus, an inter-system bus such as serial attached small computer system interface (SAS) or serial advanced technology attachment (SATA), or an extra-system interface, such as Ethernet, the Internet, an optical interface, or a combination thereof. Thenetwork bus 118 can be configured to pass all or portions of the command call from thesoftware application 107 to theremote application manager 122. While only one of theremote application manager 122 is shown for simplicity, any number of theremote application manager 122 can be configured to communicate through thenetwork bus 118. - The
remote application manager 122 can process the command call through the remoteprogrammable execution array 124 to which it is coupled. The combination of theremote application manager 122 and the remoteprogrammable execution array 124 can execute the command call from thesoftware application 107 or portions thereof. In some instances theremote application manager 122 can produce an intermediate result that is passed to theapplication manager 108 for further processing. - This configuration enables complete subroutine execution by the
hardware computing system 100. By distributed processing of a single command or a chain of the commands that do not require intervention by the general purposecentral processing unit 104. In the event an exception occurs during the processing of the command call by theremote application manager 122, aremote exception line 126 can alert theapplication manager 108 to the condition. Theapplication manager 108 would then revert the operation back to thesoftware application 107 for completion by the general purposecentral processing unit 104 as a recovery. This mechanism, while slower, can complete the command execution without additional intervention being required. - The role of the
application manager 108 in this configuration can be that of a resource manager with extended resources beyond the local hardware system. Theapplication manager 108 can be pre-configured to identify the instances of theremote application manager 122 that are allocated to its use. Each of theremote application manager 122 can then allocate the remoteprogrammable execution array 124 that it manages. It is understood that the remoteprogrammable execution array 124 can include one or more of the field programmable gate arrays and supporting memory that are capable of being configured to support thesoftware application 107. - The configuration of the
hardware computing system 100 can be predetermined by the system designer (not shown) to rely on specific instances of theremote application manager 122 within the distributedprocessing array 120. The configuration can be verified by theapplication manager 108 during system initialization of thehardware computing system 100. In the distributedprocessing array 120, multiple instances of theremote application manager 122 can be allocated to support theapplication manager 108 during execution of the command call of theapplication software 107. Each of theremote application manager 122 can manage the remoteprogrammable execution array 124, to which it is attached, in the same fashion that theapplication manager 108 manages theprogrammable execution array 110. - It is understood that the
hardware computing system 100 can be configured to utilize a primary instance of theremote application manager 122 and have a contingency for a fail-over version of theremote application manager 122 available to complete a pending operation for the command call of theapplication software 107, where the command call can be a single command or a chained command. It is further understood that while theapplication manager 108 is shown as managing the distributedprocessing array 120, it could also be an instance of the distributedprocessing array 120 that is dedicated to a specific instance of theapplication software 107. In some system applications of thehardware computing system 100, thememory bus 114 coupled to theapplication manager 108 can be coupled directly to thenetwork bus 118 which allows theapplication software 107 to process command calls in parallel through the distributedprocessing array 120. - When the
application manager 108 determines that theprogrammable execution array 110 is not capable of being configured to execute the command required by thesoftware application 107 and none of the remoteprogrammable execution array 124 are available, theapplication manager 108 can communicate through thecommand execution interface 112 to thesoftware application 107 which can enable the general purposecentral processing unit 104 to execute the command call through software execution. This hardware execution of the commands by theapplication manager 108 can be adjusted by re-configuring theprogrammable execution array 110 or accessing the distributedprocessing array 120 to identify the appropriate resource allocation. The execution time of thehardware computing system 100 can be accelerated by providing more commands, that can be accommodated by theapplication manager 108, than will fit within theprogrammable execution array 110 in a single configuration. It is understood that additional configurations can be established in theprogrammable execution array 110 by theapplication manager 108 or accessed through thenetwork bus 118 for execution, of single commands or chained commands of the command call of theapplication software 107, by the distributedprocessing array 120. - It has been discovered that the
hardware computing system 100 can support multiple threads of the command call of theapplication software 107 as well as parallel processing and chaining of the command calls of theapplication software 107. The resources of the distributedprocessing array 120 can be located in the same system as the general purposecentral processing unit 104, in an adjacent structure separate from the general purposecentral processing unit 104, or regionally remote from the general purposecentral processing unit 104. This allocation of theremote application manager 122 can provide acceleration of the command call of theapplication software 107 to multiple instances of the general purposecentral processing unit 104 in order to fully utilize the flexibility and speed of theapplication manager 108 and theremote application manager 122. - Referring now to
FIG. 2 , therein is shown a block diagram of theapplication manager 201. The block diagram of theapplication manager 201 depicts acommand processor assembly 202, which can be implemented in a complex programmable logic device (CPLD). Thecommand processor assembly 202 can include acommand processor 204 that receives a command stream through an embeddedmemory controller 206 from thememory bus 114 or thenetwork bus 118. Thecommand processor 204 can determine if the command can be executed without the assistance of the general purposecentral processing unit 104 ofFIG. 1 . - The
command processor 204 can access the embeddedmemory controller 206, coupled to aconfiguration memory 208, through an embeddedmemory bus 209 in order to determine whether the command can be executed by theapplication manager 108. Theconfiguration memory 208 can be any volatile memory such as a random access memory (RAM) or a non-volatile memory such as a flash memory. Theconfiguration memory 208 can be written by the embeddedmemory controller 206 to hold the circuit configurations that can be loaded into aconfigurable logic device 210, such as a field programmable gate array (FPGA). - The
command processor 204 can maintain the current configuration of theprogrammable execution array 110 and if necessary, can alter the configuration by accessing a field programmable gate array (FPGA)interface module 212. Theprogrammable execution array 110 can be coupled to theFPGA interface module 212, which maintains the configuration and percent utilization of theprogrammable execution array 110. By way of an example, only one of theconfiguration memory 208 and one of theconfigurable logic device 210 are shown, but it is understood that theprogrammable execution array 110 includes more than one each of theconfiguration memory 208 and theconfigurable logic device 210. - The receipt of a command call from the
application software 107 ofFIG. 1 can activate thecommand processor 204 to configure theprogrammable execution array 110 or a remoteprogrammable execution array 124 ofFIG. 1 through thenetwork bus 118. Any exception that is detected in the remoteprogrammable execution array 124 can be conveyed to thecommand processor 204 through theremote exception line 126. - The
command processor 204 can initially determine whether theprogrammable execution array 110 is currently configured to execute the command that is presented on thememory bus 114 by accessing theconfiguration memory 208 through the embeddedmemory controller 206. Thecommand processor 204 can also maintain pointers to a pre-configured instance of the remoteprogrammable execution array 124 managed by theremote application manager 122 ofFIG. 1 that is allocated to thecommand processor 204. - If it is determined that the
programmable execution array 110 is not appropriately configured to execute the command, thecommand processor 204 can update the current state and configuration of theprogrammable execution array 110 through theFPGA interface module 212. It is understood that the number of configuration images that are maintained in theconfiguration memory 208 can represent more logic than is able to be loaded in theconfigurable logic device 210 at one time. By monitoring the usage statistics of the configuration images, thecommand processor 204 can manage the loading of the configuration images to theprogrammable execution array 110 in order to increase the percentage of utilization of theapplication manager 108. The same process can be performed by theremote application manager 122 for the remoteprogrammable execution array 124. Thecommand processor 204 can manage theremote application manager 122 in order to tune the execution of command call of theapplication software 107 including processing intermediate results by theprogrammable execution array 110 or the remoteprogrammable execution array 124 to continue processing of the command call to completion. - When the
command processor 204 determines that the command can be executed by the configuration within theprogrammable execution array 110 or the remoteprogrammable execution array 124, thecommand processor 204 can take control of the command by activating a status in thecommand execution interface 112. Thecommand processor 204 can then retrieve the command parameters and transfer the command parameters through acommand traffic bus 214 or thenetwork bus 118. Thecommand processor 204 activates theFPGA interface module 212 to manage anFPGA control bus 216 during the command parameter transfer and any reconfiguration processes. - In order to reconfigure the
programmable execution array 110, thecommand processor 204 can manipulate the configuration through theFPGA interface module 212 and the embeddedmemory controller 206. The embeddedmemory controller 206 can address an instance of theconfiguration memory 208 in order to provide configuration patterns on aconfiguration bus 218. The embeddedmemory controller 206 can drive amemory control bus 220, coupled to the embeddedmemory 208, to provide address and control lines for selecting the configuration patterns that are provided to theconfigurable logic device 210. It is understood that theFPGA interface module 212 can be coupled to multiple instances of theconfigurable logic device 210 within theprogrammable execution array 110. - The
command processor 204 can detect any conditions that can cause erroneous operations, such as the configuration time-out, image loading error, check sum error, activation of theremote exception line 126, or the like. If a failure condition is detected by thecommand processor 204, the embeddedmemory controller 206, theFPGA interface module 212, or a combination thereof, thecommand processor assembly 202 can activate a commandprocess error driver 222. The activation of the commandprocess error driver 222 can cause the general purposecentral processing unit 104 to execute the command call that was pending during the command set-up by thecommand processor assembly 202 and detection of the failure condition. Thecommand processor assembly 202 can be coupled to theperipheral control bus 116 for accessing storage and communication devices managed by theperipheral controller 102 ofFIG. 1 . - A
command queue module 224 can manage a series of commands that can be executed by theprogrammable execution array 110, the remoteprogrammable execution array 124, or a combination thereof. Thecommand queue module 224 can allow thecommand processor 204 to hold a command in reserve while theconfigurable logic device 210 is reconfigured to execute the reserved command. It is understood thatcommand queue module 224 can defer execution of commands for any of theconfigurable logic device 210 in theprogrammable execution array 110 or the remoteprogrammable execution array 124. - It is understood that the communication of a command from the general purpose
central processing unit 104 can be interrupted by a higher priority task. Thecommand queue module 224 can allow the transfer of the command to be interrupted and later completed without causing additional retry delay. Thecommand queue module 224 can also coordinate the execution of chained commands or manage intermediate results of a single command execution. - By way of an example, the command queue module can manage results of a double precision mathematical operation executed by instances of the remote
programmable execution array 124 or pass the intermediate results of a chained command to theprogrammable execution array 110 for completion. By managing the intermediate results of the chained command, thecommand processor 204 and thecommand queue module 224 can process the series of commands as a single command call from theapplication software 107. - The
command queue module 224 can benefit a multi-threaded operation of the general purposecentral processing unit 104 by allowing theprogrammable execution array 110 to complete the command from the interrupted thread and hold the results until the interrupted thread is restored. The management of thecommand queue module 224 can provide a watch-dog timer and queue monitoring that prevents the retrieval of the incorrect information by the general purposecentral processing unit 104 while switching operational threads as performed by the multi-threaded operation or the general purposecentral processing unit 104 having multiple operational cores. - It has been discovered that the
command processor 204 and thecommand queue module 224 can be configured to manage chained commands executed by theprogrammable execution array 110 or the remoteprogrammable execution array 124 without interrupting the general purposecentral processing unit 104. The command processor can be targeted to support a specific set of the command calls of thesoftware application 107 or to support specific sets of commands that are inefficient when executed by the general purposecentral processing unit 104. Thecommand processor 204 can be configured to execute specific commands in theprogrammable execution array 110 or the remoteprogrammable execution array 124 and coordinate the results without support from the general purposecentral processing unit 104. The performance of thehardware computing system 100 ofFIG. 1 can be measured to be greater than twice that of the general purposecentral processing unit 104 alone. It is understood that the logical connections within thecommand processor assembly 202 are not shown for clarity and brevity. - Referring now to
FIG. 3 , therein is shown a block diagram of the distributedprocessing array 120. The block diagram of the distributedprocessing array 120 depicts aremote application manager 302 coupled toconfiguration memory 208 and theconfigurable logic device 210. By way of an example, only one of theconfiguration memory 208 and one of theconfigurable logic device 210 are shown, but it is understood that theprogrammable execution array 110 includes more than one each of theconfiguration memory 208 and theconfigurable logic device 210. - The
remote application manager 302 can provide theremote exception line 126 for alerting theapplication manager 108 ofFIG. 1 that an exception in the set-up or execution of a command call of theapplication software 107 ofFIG. 1 has occurred. Aremote status bus 304 can activate an alternateremote application manager 306 that can be configured to complete the execution of the command call of theapplication software 107 should the exception be detected. The alternateremote application manager 306 can provide a redundant execution path to increase the reliability of thehardware computing system 100 ofFIG. 1 . - The
remote status bus 304 can be used to convey intermediate solutions between theremote application manager 302 and the alternateremote application manager 306 for chained commands or operations where information carries over from theremote application manager 302 to the alternateremote application manager 306 or vice versa. While theremote status bus 304 is shown to couple theremote application manager 302 to the alternateremote application manager 306, it is understood that theremote status bus 304 can be coupled to more than one of the alternateremote application manager 306 and can be configured as a star, ring, mesh, toroidal mesh, hyper cube, tree, or the like. Theremote status bus 304 can be any communication structure including a wired bus, a serial link, an optical link, a wireless link, or an electro-magnetic link. - The
remote application manager 302 and the alternateremote application manager 306 can be commonly joined to thenetwork bus 118 for exchange of command parameters, intermediate results, status, or configuration information with theapplication manager 108. Thenetwork bus 118 is shown in a star configuration but it is understood that thenetwork bus 118 can be can be configured as a star, ring, mesh, toroidal mesh, hypercube, tree, or the like. Thenetwork bus 118 can be any communication structure including a wired bus, a serial link, an optical link, a wireless link, or an electro-magnetic link. - The
remote application manager 302 and the alternateremote application manager 306 can be configured to support different instances of theapplication manager 108 or the same instance of theapplication manager 108. It is understood that while theremote application manager 302 and the alternateremote application manager 306 are shown together, they do not have to be co-located. Thenetwork bus 118 and theremote status bus 304 can extend between boards, systems, or regions without modifying their operation. - It has been discovered that the configuration of the distributed
processing array 120 can provide redundant operation between theremote application manager 302 and the alternateremote application manager 306 to provide fail-over operation without intervention of the general purposecentral processing unit 104 ofFIG. 1 . It has also been discovered that theremote application manager 302 and the alternateremote application manager 306 can be configured to support chained commands where theremote application manager 302 can pass intermediate results to the alternateremote application manager 306 through theremote status bus 304. - It is understood that any number of the
remote application manager 302 and the alternateremote application manager 306 can be joined by theremote status bus 304 in the distributedprocessing array 120 without adding power or space burden to thehardware computing system 100. The combination of theremote application manager 302 and the alternateremote application manager 306 can be joined by theremote status bus 304 to form theremote application manager 122. The combination of the instances of theprogrammable execution array 110 can be coupled to theremote application manager 122 to form the remoteprogrammable execution array 124. The distributedprocessing array 120 can provide additional bandwidth capability to thehardware computing system 100 without a major modification to the system within which it resides. - Referring now to
FIG. 4 , therein is shown a detailed block diagram of thecommand processor assembly 202 ofFIG. 2 . The detailed block diagram of thecommand processor assembly 202 depicts acommand interpreter 402 having anFPGA control module 404 and acommand processing unit 406. Aservice request module 408, such as a command request queue mechanism can be coupled to thememory bus 114 and thenetwork bus 118. Theservice request module 408 can maintain the session number associated with the command and pass the command to thecommand interpreter 402. - The
service request module 408 can maintain a queue for each of the possible commands that are supported by theprogrammable execution array 110 ofFIG. 1 . The general purposecentral processing unit 104 ofFIG. 1 can read the status of the queues within theservice request module 408 in order to enable command chaining and pipelining of consecutive command strings. Theservice request module 408 can be coupled to thecommand interpreter 402 by asession identification bus 409. Thesession identification bus 409 allows thecommand interpreter 402 to monitor the operation of commands that can be queued in theservice request module 402. Theservice request module 408 can provide acommand stream 410 to aselector 412. Theselector 412 can direct thecommand stream 410 through anFPGA data stream 413 and amemory data stream 414. - The
FPGA data stream 413 can provide configuration image data as well as the command input data required to execute a configured command. An FPGAprogram control bus 416 can be sourced from thecommand interpreter 402 and can be managed by theFPGA control manager 404. - The
memory data stream 414 can provide FPGA configuration data options to be stored in theconfiguration memory 208 ofFIG. 2 . Thememory data stream 414 can also be used to retrieve the FPGA configuration data used to re-configure theconfigurable logic device 210 ofFIG. 2 . - A
memory control function 418 can be coupled to thecommand interpreter 402 for managing the embeddedmemory bus 209. Thememory control function 418 can source thememory control bus 422, which in coordination with thememory data stream 414 forms the embeddedmemory bus 209. An embeddedtable memory 424 can be coupled to thecommand interpreter 402 for maintaining current configuration data and statistical usage data. - The
command traffic bus 214 includes theFPGA data stream 413, the FPGAprogram control bus 416, and anFPGA response bus 428. The coordination of theFPGA data stream 413 and the FPGAprogram control bus 416 is under control of theFPGA control module 404. Thecommand interpreter 402 can monitor the integrity of the transfers between thecommand data stream 408 and thecommand traffic bus 214 or the embeddedmemory bus 209. If thecommand interpreter 402 detects an error in the transfers the commandprocess error driver 222 will be asserted. - The
FPGA response bus 428 can be coupled to arouting logic module 430. Therouting logic module 430 can be initialized by atable initialize line 432 from thecommand interpreter 402. The tables in therouting logic module 430 can include a source list, a destination list, and a count for managing the execution of multiple commands in a chained commands operation or a command queued operation. - The
routing logic module 430 can be coupled to a serviceresponse queue module 434 in order to maintain a response queue for each of the possible commands that are supported by theprogrammable execution array 110. Therouting logic module 430 can store the session number status for each of the operations completed by theprogrammable execution array 110 as well as the source, destination, and count for each of the command calls executed by theprogrammable execution array 110. The output of therouting logic module 430 can be coupled to theservice request module 408, by aresponse queue bus 436, in order to convey pending status and results to thememory bus 114 or thenetwork bus 118. - It is understood that the implementation details provided are a possible implementation of the present invention. It is further understood that other implementations may be possible and are included by the description in this specification.
- It has been discovered that the combination of the
service request module 408, theprogrammable execution array 110, and therouting logic module 430 can allow thesoftware application 107 ofFIG. 1 to execute multiple consecutive commands, multiple commands from different threads, or chained commands while reducing the burden on the general purposecentral processing unit 104. The reduction in processing requirements on the general purposecentral processing unit 104 can increase the system performance by greater than a factor of 2. - Referring now to
FIG. 5 , therein is shown a detailed block diagram of thecommand interpreter 402 ofFIG. 4 . The detailed block diagram of thecommand interpreter 402 depicts acommand control module 502 having anFPGA interface controller 504. TheFPGA interface controller 504 can include state machines and check logic that provides full integrity during the loading of characterization images and the command input and results. - The
command control module 502 also has atable memory controller 506 and an application ID look-upmodule 508. Thetable memory controller 506 can source amemory interface bus 510 that provides enable, address, and read/write control to thememory control function 418 ofFIG. 4 . - An FPGA
parallel loader 512 is managed by theFPGA interface controller 504. The FPGAparallel loader 512 can provide the FPGAprogram control bus 416 while also performing error checking and timing control. - The application ID look-up
module 508 can receive the output of a requestactivity vector module 514 that processes thecommand stream 409. Thecommand control module 502 can verify the rate and alignment of the requestactivity vector module 514 during processing. AnFPGA utilization vector 516 can maintain the utilization vector and percent utilization of theprogrammable execution array 110 ofFIG. 2 as an aid during reconfiguration of theconfigurable logic device 210 ofFIG. 2 . - A per
partition usage register 518 can monitor the usage statistics of all of the commands partitioned in theprogrammable execution array 110 ofFIG. 1 . The usage statistics maintained in the perpartition usage register 518 can provide session number statistics to track command execution within theprogrammable execution array 110. The usage statistics can guide the reconfiguration of theconfigurable logic device 210 by matching the session number of an executable thread for the general purposecentral processing unit 104 ofFIG. 1 . - Referring now to
FIG. 6 , therein is shown a detailed block diagram of theapplication manager 108 ofFIG. 1 . The detailed block diagram of theapplication manager 108 depicts thecommand processor assembly 202 having thecommand execution interface 112 and thememory bus 114 forming a system interface side of thecommand processor assembly 202. - An execution side of the
command processor assembly 202 can include the embeddedmemory bus 209 and thecommand traffic bus 214. The embeddedmemory bus 209 can couple to each instance of theconfiguration memory 208 in theprogrammable execution array 110. There is one instance of theconfiguration memory 208 coupled to one instance of theconfigurable logic device 224 by theconfiguration bus 218 included in anexecution cell 602. Theexecution cell 602 also includes one instance of theconfigurable logic device 224 which is coupled to thecommand traffic bus 214. More than one of theexecution cell 602 can be included in theprogrammable execution array 110. - It is understood that an increased number of the
execution cell 602 can require additional instances of the embeddedmemory bus 209 and thecommand traffic bus 214. It is further understood that the implementation of the additional instances of the embeddedmemory bus 209 and thecommand traffic bus 214 can allow concurrent execution of commands in the separate instances of theexecution cell 602. Some command execution can require configuring multiple instances of theexecution cell 602 to execute a single command, such as a double precision mathematical operation. - It has been discovered that the configuration of the
programmable execution array 110 can provide support for concurrent execution, by theexecution cell 602, of multiple queued commands from thesoftware application 107 ofFIG. 1 . It has further been discovered that by carrying a session number as a label through the entire operation, the commands and results can be queued in theservice request module 408 and therouting logic module 430 without losing the context of the results. The burden, on the general purposecentral processing unit 104 ofFIG. 1 , can be reduced because of a reduction in execution requirements and interrupt processing associated with thesoftware application 107. - Referring now to
FIG. 7 , therein is shown a detailed block diagram of theservice request module 408 ofFIG. 4 . The detailed block diagram of theservice request module 408 depicts aningress control module 702 coupled to thememory bus 114 and/or thenetwork bus 118. Theingress control module 702 can support bidirectional communication to allow command delivery and polling by thesoftware application 107 ofFIG. 1 . Theservice request module 408 can be instantiated in aremote application manager 302 ofFIG. 3 or theapplication manager 108 ofFIG. 1 . - The
ingress control module 702 can drive several instances of acommand queue bus 704 for transferring acommand 706, asubsequent command 707, or a combination thereof to aqueuing register 708. The queuingregister 708 can be implemented as a volatile memory, non-volatile memory, or register file to store thecommand 706 for deferred operation, interrupted transfer, or diagnostic operations. It is understood that thecommand 706 can include all of the input parameters necessary to execute thecommand 706 within an instance of theexecution cell 602 ofFIG. 6 . Acommand queuing array 709 is formed by a number of the queuing register coupled to theingress control module 702. - A
controller bus 710 can couple theingress control module 702 to acommand interpreter interface 712, such as a framer for synchronizing the communication between theingress control module 702 and thecommand interpreter 402 ofFIG. 4 . Thecommand interpreter interface 712 can provide thesession identification bus 409 to thecommand interpreter 402. Thecommand interpreter interface 712 can provide the session number information to thecommand interpreter 402 for monitoring the status and availability of theexecution cell 602 required to execute thecommand 706. - The queuing
register 708 can provide anFPGA instruction bus 714 that is coupled to acommand stateful multiplexer 716. Thecommand stateful multiplexer 716 can have astate tracking module 718 that monitors the session numbers of thecommand 706 associated with each of the queuingregister 708. Theingress control module 702 can determine when all of the parameters, for thecommand 706, are assembled within the queuingregister 708. The queuingregister 708 is capable of supporting multiple of thecommand 706 or parts of thecommand 706. Thestate tracking module 718 can allow thecommand stateful multiplexer 716 to manage a multi-threaded operation of theapplication software 107 ofFIG. 1 as well as queuing of thecommand 706 for deferred processing. - The
command stateful multiplexer 716 can be controlled by the combination of thestate tracking module 718 and theFPGA instruction bus 714. When theingress control module 702 determines that thecommand 706 is ready for execution, a signal is sent to thecommand stateful multiplexer 716 that causes the session identification and thecommand 706 to be sent through thecommand stream 410 to theexecution cell 602 that is slated to execute thecommand 706. - The
ingress control module 702 can establish a timeout for any partially loadedcommand 706. If for any reason thecommand 706 is not fully loaded in thequeuing register 708, an error can be detected and thecommand interpreter 402 can be alerted. Thecommand interpreter 402 can assemble an error response that can be communicated to the general purposecentral processing unit 104 ofFIG. 1 through the commandprocess error driver 222. - It is understood that the
service request module 408 has been shown having six of the queuingregister 708 as an example of the implementation without limiting the number of the queuingregister 708. It is further understood that thesoftware application 107 can communicate with theservice request module 408 by polling or by establishing an interrupt through thecommand interpreter 402. - It has been discovered that the management of the queuing registers 708 can provide a rapid execution path for the
command 706 from thesoftware application 107. Since each of the queuing registers 708 can monitor multiple instances of thecommand 706 it is possible to link successive operations, chained commands, and perform whole sub-routines without interrupting the progress of the general purposecentral processing unit 104. The accelerated execution of thesoftware application 107 can dramatically increase the performance of the general purposecentral processing unit 104. - Referring now to
FIG. 8 , therein is shown a detailed block diagram of aresponse routing logic 801. The detailed block diagram of theresponse routing logic 801 depicts theFPGA response bus 428 coupled to therouting logic module 430. Therouting logic module 430 is subsequently coupled to the serviceresponse queue module 434 for response destination management. - The service
response queue module 434 can include anFPGA bus interface 802. TheFPGA bus interface 802 can direct theFPGA response bus 428 to aresponse queuing register 804 in aresponse queuing array 805 through a session selectedbus 806. Theresponse queuing array 805 can match the operation of thecommand queuing array 709 ofFIG. 7 in order to align the correct instance of aresponse 808 on theresponse queue bus 436. - The
response 808 can include status and data associated with the execution of thecommand 706 ofFIG. 7 by theexecution cell 602 ofFIG. 6 . Theresponse 808 can be provided through a result andstatus bus 810 to a resultstateful multiplexer 812. Theresult stateful multiplexer 812 can include a responsesession tracking module 814 for determining which of theresponse queuing register 804 in theresponse queuing array 805 can contain theresponse 808 for a specific session identification. - A
response steering bus 816 can couple theFPGA bus interface 802 to theresult stateful multiplexer 812. Theresult stateful multiplexer 812 can convey theresponse 808 from theresponse queuing register 804 to theFPGA bus interface 802 for transfer through aresponse activity bus 818. Theresult stateful multiplexer 812 can provide conditional interrupts to the general purposecentral processing unit 104 to notify of completion of thecommand 706, an error status, or exception condition. - It is understood that the service
response queue module 434 has been shown having six of theresponse queuing register 804 as an example of the implementation without limiting the number of theresponse queuing register 804. It is further understood that thesoftware application 107 can communicate with therouting logic module 430 by polling or by establishing an interrupt through thememory bus 114 and theservice request module 408. - It has been discovered that the management of the
response queuing register 804 can provide a rapid execution path for theresponse 808 to thecommand 706 from thesoftware application 107 including providing theresponse 808 correctly aligned with a queuedcommand 706. Since each of theresponse queuing register 804 can monitor multiple instances of theresponse 808 it is possible to link successive operations and perform whole sub-routines without interrupting the progress of the general purposecentral processing unit 104. The accelerated execution of thesoftware application 107 can dramatically increase the performance of the general purposecentral processing unit 104. - The
routing logic module 430 can include a source/destination table 822 that can be coupled to acounter 820. The table initializeline 432 can cause the source/destination table 822 to be loaded with source and destination information for each command in a set of chained commands including a count of how many of the commands are in the chain. The source/destination table 822 and thecounter 820 can work to define which commands can be loaded into theremote application manager 122 ofFIG. 2 . The completion of thecommand 706 can cause the source/destination table 822 to match the source identification of the alternateremote application manager 306 in order to determine which of the alternateremote application manager 306 should receive the intermediate results of thecommand 706 in order to execute the execution of the next command in the chained commands. As each of thecommands 706, in the chained commands, is executed and the intermediate results are passed to the next instance of the alternateremote application manager 306 and the value in thecounter 820 can be decremented until the value reaches zero indicating the chain of commands is completed. Thecounter 820 reaching zero indicates that the chain of thecommands 706 is complete. The final results can be indicated by the value stored when the counter reaches the zero count. Those results can be passed to theapplication manager 108 ofFIG. 1 through thenetwork bus 118. A single instance of theremote application manager 302 can monitor the results and respond to theapplication software 107 through thenetwork bus 118 and thememory bus 114. - A key feature of the
routing logic module 430 is that it can receive an input through theFPGA response bus 428, from therouting logic module 430 in the alternateremote application manager 306 ofFIG. 3 , therouting logic module 430 receiving the input can interrogate the source/destination table 822 in order to determine which instance of the alternateremote application manager 306 is in line to receive the intermediate results and when the chained commands are complete. - Referring now to
FIG. 9 , therein is shown a flow chart of amethod 900 of operation of a hardware computing system in a further embodiment of the present invention. Themethod 900 includes: transferring a command to a remote application manager through a network bus in ablock 902; transferring a command from the command stream by an application manager in ablock 904; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager in ablock 906; identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and providing a response through the network bus for the command in ablock 908. - The resulting method, process, apparatus, device, product, and/or system is straightforward, cost-effective, uncomplicated, highly versatile, accurate, sensitive, and effective, and can be implemented by adapting known components for ready, efficient, and economical manufacturing, application, and utilization.
- Another important aspect of the present invention is that it valuably supports and services the historical trend of reducing costs, simplifying systems, and increasing performance.
- These and other valuable aspects of the present invention consequently further the state of the technology to at least the next level.
- While the invention has been described in conjunction with a specific best mode, it is to be understood that many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the aforegoing description. Accordingly, it is intended to embrace all such alternatives, modifications, and variations that fall within the scope of the included claims. All matters hithertofore set forth herein or shown in the accompanying drawings are to be interpreted in an illustrative and non-limiting sense.
Claims (20)
1. A method of operation of a hardware computing system comprising:
transferring a command to a remote application manager through a network bus;
executing, by a programmable execution array of the remote application manager, the command;
identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and
providing a response through the network bus for the command.
2. The method as claimed in claim 1 wherein transferring the command to the remote application manager includes:
receiving, by an ingress control module, the command;
transferring to a queuing register the parameters of the command; and
driving an FPGA instruction bus with the parameters of the command.
3. The method as claimed in claim 1 wherein executing, by the programmable execution array, includes:
receiving, by a command interpreter interface, parameters for the command; and
initializing a state tracking module with a session number form the command interpreter interface.
4. The method as claimed in claim 1 wherein identifying, with the routing logic module, the subsequent command to be executed by the alternate remote application manager, includes:
configuring a programmable logic device of the alternate remote application manager for the subsequent command; and
decrementing a counter in the routing logic module of the remote application manager when the programmable logic device of the alternate remote application manager completes the subsequent command.
5. The method as claimed in claim 1 further comprising accessing a remote status bus between the remote application manager and the alternate remote application manager for passing an interim result of the command.
6. A method of operation of a hardware computing system comprising:
transferring a command to a remote application manager through a network bus including loading a queuing register;
loading a programmable execution array, of the remote application manager, with the command including monitoring by an ingress control module for parameters of the command;
identifying, with a routing logic module, a subsequent command to be executed by an alternate remote application manager; and
providing a response through the network bus for the command if a counter decrements to zero.
7. The method as claimed in claim 6 wherein transferring the command to the remote application manager. includes:
receiving, by an ingress control module, the command including transferring the command through a memory bus;
transferring to a queuing register the parameters of the command including transferring the content of the memory bus to the network bus; and
driving an FPGA instruction bus with the parameters of the command including activating a remote exception line if an exception occurs in the remote application manager.
8. The method as claimed in claim 6 wherein executing by the programmable execution array, includes:
receiving, by command interpreter interface, parameters for the command including transferring from an application manager; and
initializing a state tracking module with a session number form the command interpreter interface including responding to a response queuing register in the application manager.
9. The method as claimed in claim 6 wherein identifying, with the routing logic module, the subsequent command to be executed by the alternate remote application manager, includes:
configuring a programmable logic device of the alternate remote application manager for the subsequent command including passing an interim result by a remote status bus; and
wherein:
decrementing the counter in the routing logic module of the remote application manager when the programmable logic device of the alternate remote application manager completes the subsequent command including driving an FPGA response bus by the alternate remote application manager.
10. The method as claimed in claim 6 further comprising accessing a remote status bus between the remote application manager and the alternate remote application manager for passing an interim result of the command including passing a response to an application manager through the network bus.
11. A hardware computing system comprising:
a network bus for transferring a command;
a remote application manager coupled to the network bus for receiving the command;
a programmable execution array coupled to the remote application manager configured for executing the command;
a routing logic module in the remote application manager configured to determine a subsequent command to be executed by an alternate remote application manager; and
a general purpose central processing unit coupled to the network bus configured to receive a response to the command from the remote application manager, the alternate remote application manager, or a combination thereof.
12. The system as claimed in claim 11 wherein the remote application manager coupled to the network bus, includes:
an ingress control module configured to receive the command;
a queuing register configured to receive the parameters of the command; and
an FPGA instruction bus propagates the parameters of the command.
13. The system as claimed in claim 11 wherein the programmable execution array, includes:
a command interpreter interface, configured to receive the parameters for the command; and
a state tracking module initialized with a session number form the command interpreter interface.
14. The system as claimed in claim 11 wherein the alternate remote application manager configured to identify, with the routing logic module, the subsequent command to be executed, includes:
a programmable logic device of the alternate remote application manager configured for the subsequent command; and
a counter in the routing logic module of the remote application manager decremented when the programmable logic device of the alternate remote application manager completes the subsequent command.
15. The system as claimed in claim 11 further comprising a remote status bus between the remote application manager and the alternate remote application manager accessed to pass an interim result of the command.
16. The system as claimed in claim 11 further comprising:
a queuing register in the remote application manager; and
an ingress control module monitored for parameters of the command.
17. The system as claimed in claim 16 wherein the remote application manager coupled to the network bus, includes:
an ingress control module configured to receive the command includes a memory bus configured to transfer the command;
a queuing register configured to receive the parameters of the command by the memory bus coupled to the network bus; and
an FPGA instruction bus propagates the parameters of the command includes a remote exception line activated if an exception occurs in the remote application manager.
18. The system as claimed in claim 16 wherein the programmable execution array, includes:
a command interpreter interface, configured to receive the parameters for the command includes an application manager coupled to the network bus; and
a state tracking module initialized with a session number form the command interpreter interface includes a response queuing register in the remote application manager.
19. The system as claimed in claim 16 wherein the alternate remote application manager configured to identify, with the routing logic module, the subsequent command to be executed, includes:
a programmable logic device of the alternate remote application manager configured for the subsequent command includes a remote status bus between the remote application manager and the alternate remote application manager; and
a counter in the routing logic module of the remote application manager decremented when the programmable logic device of the alternate remote application manager completes the subsequent command includes an FPGA response bus driven by the alternate remote application manager.
20. The system as claimed in claim 16 further comprising a remote status bus between the remote application manager and the alternate remote application manager accessed to pass an interim result of the command includes an application manager received a response through the network bus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/846,187 US20130226986A1 (en) | 2012-03-19 | 2013-03-18 | Hardware computing system with extended processing and method of operation thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261612882P | 2012-03-19 | 2012-03-19 | |
US13/846,187 US20130226986A1 (en) | 2012-03-19 | 2013-03-18 | Hardware computing system with extended processing and method of operation thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130226986A1 true US20130226986A1 (en) | 2013-08-29 |
Family
ID=48611638
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/764,154 Active 2033-06-14 US9055069B2 (en) | 2012-03-19 | 2013-02-11 | Hardware computing system with software mediation and method of operation thereof |
US13/789,649 Active 2033-08-21 US9055070B2 (en) | 2012-03-19 | 2013-03-07 | Hardware computing system with extended calculation and method of operation thereof |
US13/846,187 Abandoned US20130226986A1 (en) | 2012-03-19 | 2013-03-18 | Hardware computing system with extended processing and method of operation thereof |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/764,154 Active 2033-06-14 US9055069B2 (en) | 2012-03-19 | 2013-02-11 | Hardware computing system with software mediation and method of operation thereof |
US13/789,649 Active 2033-08-21 US9055070B2 (en) | 2012-03-19 | 2013-03-07 | Hardware computing system with extended calculation and method of operation thereof |
Country Status (2)
Country | Link |
---|---|
US (3) | US9055069B2 (en) |
WO (3) | WO2013142446A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140215012A1 (en) * | 2010-11-22 | 2014-07-31 | Samsung Electronics Co., Ltd. | Method and apparatus for executing application of mobile device |
US10853070B1 (en) * | 2017-10-23 | 2020-12-01 | Habana Labs Ltd. | Processor suspension buffer and instruction queue |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012154612A1 (en) * | 2011-05-06 | 2012-11-15 | Xcelemor, Inc. | Computing system with hardware scheduled reconfiguration mechanism and method of operation thereof |
CN105573789B (en) * | 2015-09-07 | 2017-08-08 | 武汉精测电子技术股份有限公司 | The many image upgrade loading methods of FPGA and device based on soft-core processor |
US10320390B1 (en) | 2016-11-17 | 2019-06-11 | X Development Llc | Field programmable gate array including coupled lookup tables |
US20180143860A1 (en) * | 2016-11-22 | 2018-05-24 | Intel Corporation | Methods and apparatus for programmable integrated circuit coprocessor sector management |
US11416422B2 (en) | 2019-09-17 | 2022-08-16 | Micron Technology, Inc. | Memory chip having an integrated data mover |
US11163490B2 (en) * | 2019-09-17 | 2021-11-02 | Micron Technology, Inc. | Programmable engine for data movement |
US11397694B2 (en) | 2019-09-17 | 2022-07-26 | Micron Technology, Inc. | Memory chip connecting a system on a chip and an accelerator chip |
US20220321403A1 (en) * | 2021-04-02 | 2022-10-06 | Nokia Solutions And Networks Oy | Programmable network segmentation for multi-tenant fpgas in cloud infrastructures |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4228496A (en) * | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US20030212870A1 (en) * | 2002-05-08 | 2003-11-13 | Nowakowski Steven Edmund | Method and apparatus for mirroring data stored in a mass storage system |
US20050149937A1 (en) * | 2003-12-19 | 2005-07-07 | Stmicroelectronics, Inc. | Accelerator for multi-processing system and method |
US20070143759A1 (en) * | 2005-12-15 | 2007-06-21 | Aysel Ozgur | Scheduling and partitioning tasks via architecture-aware feedback information |
US20090327610A1 (en) * | 2005-11-04 | 2009-12-31 | Commissariat A L'energie Atomique | Method and System for Conducting Intensive Multitask and Multiflow Calculation in Real-Time |
US20100058036A1 (en) * | 2008-08-29 | 2010-03-04 | International Business Machines Corporation | Distributed Acceleration Devices Management for Streams Processing |
US7761643B1 (en) * | 2004-08-27 | 2010-07-20 | Xilinx, Inc. | Network media access controller embedded in an integrated circuit host interface |
US20110202924A1 (en) * | 2010-02-17 | 2011-08-18 | Microsoft Corporation | Asynchronous Task Execution |
US20120020250A1 (en) * | 2010-05-18 | 2012-01-26 | Lsi Corporation | Shared task parameters in a scheduler of a network processor |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101332840B1 (en) * | 2012-01-05 | 2013-11-27 | 서울대학교산학협력단 | Cluster system, Host node, Computing node, and application execution method based on parallel computing framework |
US5555201A (en) | 1990-04-06 | 1996-09-10 | Lsi Logic Corporation | Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including interactive system for hierarchical display of control and dataflow information |
WO1996013902A1 (en) * | 1994-11-01 | 1996-05-09 | Virtual Machine Works, Inc. | Programmable multiplexing input/output port |
US6628653B1 (en) * | 1998-06-04 | 2003-09-30 | Nortel Networks Limited | Programmable packet switching device |
US7823131B2 (en) | 2001-06-29 | 2010-10-26 | Mentor Graphics Corporation | Debugger for a hardware-implemented operating system |
CN1605058A (en) | 2001-10-16 | 2005-04-06 | 捷豹逻辑股份有限公司 | Interface architecture for embedded field programmable gate array cores |
WO2004027576A2 (en) | 2002-09-18 | 2004-04-01 | Netezza Corporation | Asymmetric data streaming architecture having autonomous and asynchronous job processing unit |
US7418575B2 (en) | 2003-07-29 | 2008-08-26 | Stretch, Inc. | Long instruction word processing with instruction extensions |
US7934005B2 (en) * | 2003-09-08 | 2011-04-26 | Koolspan, Inc. | Subnet box |
US8397214B2 (en) * | 2004-05-14 | 2013-03-12 | National Instruments Corporation | Generating a hardware description for a programmable hardware element based on a graphical program including multiple physical domains |
US7594226B2 (en) * | 2004-08-16 | 2009-09-22 | National Instruments Corporation | Implementation of packet-based communications in a reconfigurable hardware element |
US7299339B2 (en) * | 2004-08-30 | 2007-11-20 | The Boeing Company | Super-reconfigurable fabric architecture (SURFA): a multi-FPGA parallel processing architecture for COTS hybrid computing framework |
US7822243B2 (en) * | 2005-06-02 | 2010-10-26 | Siemens Aktiengesellschaft | Method and device for image reconstruction |
US8127113B1 (en) | 2006-12-01 | 2012-02-28 | Synopsys, Inc. | Generating hardware accelerators and processor offloads |
US7991909B1 (en) * | 2007-03-27 | 2011-08-02 | Xilinx, Inc. | Method and apparatus for communication between a processor and processing elements in an integrated circuit |
US8059670B2 (en) | 2007-08-01 | 2011-11-15 | Texas Instruments Incorporated | Hardware queue management with distributed linking information |
US8176466B2 (en) | 2007-10-01 | 2012-05-08 | Adobe Systems Incorporated | System and method for generating an application fragment |
CA2637132A1 (en) | 2008-07-09 | 2010-01-09 | Scott Tripp | Moulding for building exterior |
US8014295B2 (en) * | 2009-07-14 | 2011-09-06 | Ixia | Parallel packet processor with session active checker |
US8364946B2 (en) | 2010-03-22 | 2013-01-29 | Ishebabi Harold | Reconfigurable computing system and method of developing application for deployment on the same |
KR20120117151A (en) | 2011-04-14 | 2012-10-24 | 삼성전자주식회사 | Apparatus and method for generating performing virtual machine vm migration process in device |
US8910177B2 (en) | 2011-04-14 | 2014-12-09 | Advanced Micro Devices, Inc. | Dynamic mapping of logical cores |
-
2013
- 2013-02-11 US US13/764,154 patent/US9055069B2/en active Active
- 2013-03-07 US US13/789,649 patent/US9055070B2/en active Active
- 2013-03-18 WO PCT/US2013/032867 patent/WO2013142446A1/en active Application Filing
- 2013-03-18 WO PCT/US2013/032868 patent/WO2013142447A1/en active Application Filing
- 2013-03-18 US US13/846,187 patent/US20130226986A1/en not_active Abandoned
- 2013-03-18 WO PCT/US2013/032866 patent/WO2013142445A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4228496A (en) * | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US20030212870A1 (en) * | 2002-05-08 | 2003-11-13 | Nowakowski Steven Edmund | Method and apparatus for mirroring data stored in a mass storage system |
US20050149937A1 (en) * | 2003-12-19 | 2005-07-07 | Stmicroelectronics, Inc. | Accelerator for multi-processing system and method |
US7761643B1 (en) * | 2004-08-27 | 2010-07-20 | Xilinx, Inc. | Network media access controller embedded in an integrated circuit host interface |
US20090327610A1 (en) * | 2005-11-04 | 2009-12-31 | Commissariat A L'energie Atomique | Method and System for Conducting Intensive Multitask and Multiflow Calculation in Real-Time |
US20070143759A1 (en) * | 2005-12-15 | 2007-06-21 | Aysel Ozgur | Scheduling and partitioning tasks via architecture-aware feedback information |
US20100058036A1 (en) * | 2008-08-29 | 2010-03-04 | International Business Machines Corporation | Distributed Acceleration Devices Management for Streams Processing |
US20110202924A1 (en) * | 2010-02-17 | 2011-08-18 | Microsoft Corporation | Asynchronous Task Execution |
US20120020250A1 (en) * | 2010-05-18 | 2012-01-26 | Lsi Corporation | Shared task parameters in a scheduler of a network processor |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140215012A1 (en) * | 2010-11-22 | 2014-07-31 | Samsung Electronics Co., Ltd. | Method and apparatus for executing application of mobile device |
US9215271B2 (en) * | 2010-11-22 | 2015-12-15 | Samsung Electronics Co., Ltd | Method and apparatus for executing application of mobile device |
US10853070B1 (en) * | 2017-10-23 | 2020-12-01 | Habana Labs Ltd. | Processor suspension buffer and instruction queue |
Also Published As
Publication number | Publication date |
---|---|
WO2013142446A1 (en) | 2013-09-26 |
US20130160031A1 (en) | 2013-06-20 |
US9055069B2 (en) | 2015-06-09 |
WO2013142447A1 (en) | 2013-09-26 |
US20130191854A1 (en) | 2013-07-25 |
WO2013142445A1 (en) | 2013-09-26 |
US9055070B2 (en) | 2015-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130226986A1 (en) | Hardware computing system with extended processing and method of operation thereof | |
JP7313381B2 (en) | Embedded scheduling of hardware resources for hardware acceleration | |
US11941434B2 (en) | Task processing method, processing apparatus, and computer system | |
US8874681B2 (en) | Remote direct memory access (‘RDMA’) in a parallel computer | |
US9317318B2 (en) | Virtual machine monitor configured to support latency sensitive virtual machines | |
US8850262B2 (en) | Inter-processor failure detection and recovery | |
US8898517B2 (en) | Handling a failed processor of a multiprocessor information handling system | |
US9720676B2 (en) | Implementing updates to source code executing on a plurality of compute nodes | |
US20150269640A1 (en) | Capacity Upgrade on Demand Automation | |
US9979799B2 (en) | Impersonating a specific physical hardware configuration on a standard server | |
US8688831B2 (en) | Managing workload distribution among a plurality of compute nodes | |
US8447912B2 (en) | Paging memory from random access memory to backing storage in a parallel computer | |
US9977730B2 (en) | System and method for optimizing system memory and input/output operations memory | |
US11951999B2 (en) | Control unit for vehicle and error management method thereof | |
US20110173422A1 (en) | Pause processor hardware thread until pin | |
US8151028B2 (en) | Information processing apparatus and control method thereof | |
US8336055B2 (en) | Determining the status of virtual storage in the first memory within the first operating system and reserving resources for use by augmenting operating system | |
US11048523B2 (en) | Enabling software sensor power operation requests via baseboard management controller (BMC) | |
US20230024607A1 (en) | System-on-chip for sharing graphics processing unit that supports multimaster, and method for operating graphics processing unit | |
US20230342234A1 (en) | System management mode (smm) error handler | |
US11966750B2 (en) | System-on-chip management controller | |
US20230230101A1 (en) | Method for validating a product portfolio | |
US20240031263A1 (en) | Methods and apparatus to improve management operations of a cloud computing environment | |
JP2000267946A (en) | General purpose computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: XCELEMOR, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZIEVERS, PETER J.;REEL/FRAME:030035/0084 Effective date: 20130318 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |