SYSTEM AND METHOD FOR EFFECTIVELY UTILIZING A CACHE MEMORY IN AN ELECTRONIC DEVICE
BACKGROUND SECTION
1. Field of the Invention
This invention relates generally to techniques for implementing memory devices, and relates more particularly to a system and method for effectively utilizing a cache memory in an electronic device.
2. Description of the Background Art
Implementing effective methods for performing various data transfer operations in electronic devices is a significant consideration for designers and manufacturers of contemporary electronic devices. For example, an electronic device may advantageously communicate with other electronic devices in an electronic interconnect to share data to thereby substantially increase the capabilities and versatility of individual devices in the electronic interconnect. In certain instances, an electronic interconnect may be implemented in a home environment to enable flexible and beneficial sharing of data and device resources between various consumer electronic devices, such as personal computers, digital video disc (DVD) devices, digital set-top boxes for digital broadcasting, enhanced television sets, and audio reproduction systems.
Effectively managing data transfer operations in an interconnect of electronic devices may create substantial challenges for designers of electronic devices. For example, enhanced demands for increased device functionality and performance may require more system processing power and require additional resources. An increase in processing or hardware requirements may also result in a corresponding detrimental economic impact due to increased production costs and operational inefficiencies.
Interconnect size is also a factor that affects data transfer operations in an electronic device. Communications in an electronic interconnect typically become more complex as the number of individual devices or nodes increases. Assume that a particular device on an electronic interconnect is defined as a local device with local software elements, and other devices on the electronic interconnect are defined as remote devices with remote software elements. Accordingly, a local software module on the local device may need to transfer data to various remote software elements on remote devices across the electronic interconnect. However, successfully managing a substantial number of electronic devices across an interconnect may provide significant benefits to a system user.
Furthermore, enhanced device capability to perform various advanced processing tasks may provide additional benefits to a system user, but may also place increased demands on the control and management of the various devices in the electronic interconnect. For example, an enhanced electronic interconnect that effectively accesses, processes, and displays digital television programming may benefit from efficient interconnect communication techniques because of the large amount and complexity of the digital data involved. Due to growing demands on system processor resources and substantially increasing data magnitudes, it is apparent that developing new and effective methods for performing data transfer operations is a matter of importance for the related electronic technologies. Therefore, for all the foregoing reasons, implementing effective methods for performing data transfers in electronic devices remains a significant consideration for designers, manufacturers, and users of contemporary electronic devices.
SUMMARY In accordance with the present invention, a system and method are disclosed for effectively utilizing cache memory in an electronic device. In one embodiment, initially, a processor sequentially executes program instructions of a device application. In certain instances, the foregoing program instructions may include one or more isochronous load instructions that instruct the processor to load time-sensitive isochronous data from a memory into a specific corresponding mapped location of a local cache.
In accordance with certain embodiments, the processor may advantageously instruct the cache to create a marker for inclusion in a particular storage segment to indicate that information stored therein includes special information, such as isochronous data. In certain embodiments, the marker may prevent the cache from removing the marked isochronous data without the prior occurrence of predetermined rollout exception events.
For example, in one embodiment, if a target location in the cache currently comprises a segment that includes initial isochronous data designated by a marker, and if another isochronous load instruction creates a conflict by mapping subsequent isochronous data from the source memory to the same target location in the cache, then the processor preferably may rollout the initial isochronous data to permit the subsequent isochronous data from the source memory to be marked and loaded into that particular segment of the cache.
In addition, the device application may instruct the processor to rollout a selectable marked segment of the cache in response to various changes of status in the host electronic device. For example, if an isochronous process is aborted, then the corresponding isochronous data may no longer be required in the cache, and the device application may advantageously issue a rollout command to thereby optimize performance of the cache. Similarly, when a particular isochronous process is completed, a rollout exception may be provided in which the device application issues a rollout command to empty a corresponding marked segment of the cache.
For example, in the case where a segment includes sixty-four bytes of isochronous data, the device application may then advantageously be notified when the final or sixty-fourth byte of isochronous data is accessed from the marked segment and utilized. In response, the device application may then issue a rollout command to return the cached isochronous data to a corresponding location in the source memory.
In certain embodiments, if the device application performs a time- sensitive isochronous process that requires precise deterministic behavior, then the device application may advantageously issue various types of isochronous prefetch load instructions to facilitate efficient and timely completion of the isochronous process. For example, if the device application knows that a certain block of isochronous data must be moved to the cache, then the device application may issue an isochronous prefetch load instruction in advance to notify the source memory to transfer all or part of the foregoing block of isochronous data, rather than sending individual isochronous load instructions for each line of the block of isochronous data.
Use of isochronous prefetch load instructions may thus result in a more efficient and timely isochronous data transfer because the processor need not wait to complete the transfer of an individual line of isochronous data from the source memory before beginning the transfer of a subsequent line of isochronous data. The processor, source memory, and cache may execute the foregoing isochronous prefetch load instruction using any appropriate and effective technique that ensures that the transfer of any given portion of the isochronous data occurs prior to the designated time for utilizing that given portion of the isochronous data. The present invention thus advantageously provides effective and efficient techniques for utilizing a cache memory in an electronic device.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram for one embodiment of an electronic interconnect, in accordance with the present invention;
FIG. 2 is a block diagram for one embodiment of an exemplary device of FIG. 1 , in accordance with the present invention;
FIG. 3 is a diagram for one embodiment of the memory of FIG. 2, in accordance with the present invention;
FIG. 4 is a diagram for one embodiment of the cache of FIG. 2, in accordance with the present invention;
FIG. 5 is a diagram for one embodiment of a segment of the cache of
FIG. 4, in accordance with the present invention;
FIG. 6 is a block diagram illustrating procedure for effectively utilizing a cache, in accordance with one embodiment of the present invention; and
FIGS. 7A and 7B are a flowchart of method steps for effectively utilizing a cache, in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The present invention relates to an improvement in electronic devices The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown, but is to be accorded the widest scope consistent with the principles and features described herein.
The present invention comprises a system and method for effectively utilizing cache memory in an electronic device, and includes a processor that operates in response to a software program to insert a detectable marker in isochronous data that is stored into the cache memory. The marker may then be utilized to identify the isochronous data as special information that is protected from removal from the cache device without the occurrence of a predetermined rollout exception event.
Referring now to FIG. 1 , a block diagram for one embodiment of an electronic interconnect 1 10 is shown, according to the present invention. In the FIG. 1 embodiment, interconnect 1 10 preferably comprises, but is not limited to, a number of electronic devices 1 12 (device A 1 12(a), device B
112(b), root device 1 14, device C 1 12(c), device D 1 12(d), and device E 112(e)).
In alternate embodiments, electronic interconnect 1 10 may readily be configured to include various other devices 112 or components that function in addition to, or instead of, those discussed in conjunction with the FIG. 1 embodiment. In alternate embodiments, interconnect 110 may readily be connected and configured in any other appropriate and suitable manner.
In the FIG. 1 embodiment, devices 112 of interconnect 110 may be implemented as any type of electronic device, including, but not limited to, personal computers, printers, digital video disc devices, television sets, audio systems, video cassette recorders, and set-top boxes for digital broadcasting.
In the FIG. 1 embodiment, devices 112 preferably communicate with one another using a bus link 132. Bus link 132 preferably includes path 132(a), path 132(b), path 132(c), path 132(d), and path 132(e). For example, in one embodiment, device B 112(b) is coupled to device A 1 12(a) via path 132(a), and to root device 1 14 via path 132(b). Similarly, root device 114 is coupled to device C 112(c) via path 132(c), and to device D 1 12(d) via path 132(d). In addition, device D 112(d) is coupled to device E 1 12(e) via path 132(e). In the FIG. 1 embodiment, bus link 132 is preferably implemented using an IEEE Std 1394-1995 Standard for a High Performance Serial Bus, which is hereby incorporated by reference. However, in alternate embodiments, interconnect 110 may readily communicate and function using various other interconnect methodologies which are equally within the scope of the present invention. In the FIG. 1 embodiment, each device in electronic interconnect 110 may preferably communicate with any other device within interconnect 1 10. For example, device E 1 12(e) may communicate with device B 1 12(b) by transmitting transfer data via cable 132(e) to device D 1 12(d), which then may transmit the transfer data via cable 132(d) to root device 1 14. In response, root device 1 14 then may transmit the transfer data to device B 112(b) via cable 132(b). In the FIG. 1 embodiment, root device 1 14 preferably provides a master cycle start signal to synchronize isochronous processes for devices 112 in interconnect 110. In other embodiments of interconnect 1 10, any one of the interconnect devices 112 may be designated as the root device or cycle master.
Referring now to FIG. 2, a block diagram for one embodiment of an exemplary device 1 12 from interconnect 1 10 is shown, in accordance with the present invention. Device 112 preferably includes, but is not limited to, a processor 212, an input/ output (I/O) interface 214, a memory 216, a device bus 226, and a bus interface 220. Processor 212, I/O interface 214, memory 216 and bus interface 220 preferably are each coupled to, and communicate via common device bus 226.
In the FIG. 2 embodiment, processor 212 may be implemented as any appropriate multipurpose microprocessor device. Memory 216 may be implemented as one or more appropriate storage devices, including, but not limited to, read-only memory, random-access memory, and various types of non-volatile memory, such as floppy disc devices or hard disc devices. I/O interface 214 preferably may provide an interface for communications with various compatible sources and/or destinations.
In accordance with the present invention, bus interface 220 preferably provides an interface between device 112 and interconnect 1 10. In the FIG. 2 embodiment, bus interface 220 preferably communicates with other devices 1 12 on interconnect 110 via bus link 132. Bus interface 220 also preferably communicates with processor 212, I/O device 214, and memory 216 via common device bus 226.
In the FIG. 2 embodiment, device 112 preferably includes the capability to perform various tasks that involve isochronous data and isochronous processes. Isochronous data typically includes information that is time- sensitive, and therefore requires deterministic transfer operations to guarantee delivery of the isochronous data in a timely manner. For example, video data that is intended for immediate display must arrive at the appropriate destination in a timely manner in order to prevent jitter or breakup of the corresponding image during display. To achieve this goal, device 1 12 preferably performs isochronous and other types of processing in segments of time called "cycles".
Scheduling of isochronous processes typically requires a finite time period that is sometimes referred to as "overhead". As the cycle time period is reduced, the overhead becomes a more significant factor because of the reduced amount of time remaining to perform the actual isochronous transfer. In the FIG. 2 embodiment, the cycle time period may be in the proximity of 125 microseconds, with a cycle frequency of approximately eight kilohertz.
In the FIG. 2 embodiment, processor 212 preferably includes cache 230 which processor 212 may utilize to locally store information from memory
216 for rapid and convenient local access. In alternate embodiments, cache 230 may be implemented in any other appropriate location and manner. The functionality and configuration of cache 230 is further discussed below in conjunction with FIGS. 5 through 7.
Referring now to FIG. 3, a diagram for one embodiment of the FIG. 2 memory 216 is shown, in accordance with the present invention. In the FIG. 3 embodiment, memory 216 preferably includes, but is not limited to, device software 312, isochronous data 314, and non-isochronous data 316. In alternate embodiments, memory 216 may readily include various other components in addition to, or instead of, those that are discussed in conjunction with the FIG. 3 embodiment.
In the FIG. 3 embodiment, device software 312 includes software instructions that are preferably executed by processor 212 for performing various functions and operations by device 112. The particular nature and functionality of device software 312 preferably varies depending upon factors such as the type and purpose of the corresponding host device 112. Device software 312 may include various instructions that cause processor 212 to transfer portions of isochronous data 314 and/or non-isochronous data 316 bi-directionally between memory 216 and cache 230, in accordance with the present invention. The operation and utilization of device application is further discussed below in conjunction with FIGS. 6 and 7.
Referring now to FIG. 4, a diagram for one embodiment of the FIG. 2 cache 230 is shown, in accordance with the present invention. In the FIG. 4 embodiment, cache 230 preferably includes a location 1 (512(a)) through a location N (512(d)). In the FIG. 4 embodiment, cache 230 may preferably be implemented using a four-way associativity technique in which each location 512(a) through 512 (d) preferably includes four separate segments into which processor 212 may selectively load information from address locations in memory 216.
Therefore, location 1 (512(a)) preferably includes segments 514(al), 514(a2), 514(a3) and 514(a4). Similarly, location 2 (512(b)) preferably includes segments 514(bl), 514(b2), 514(b3) and 514(b4), and location 3 (512(c)) preferably includes segments 514(d), 514(c2), 514(c3) and 514(c4). Finally, location N (512(d)) preferably includes segments 514(dl), 514(d2), 514(d3) and 514(d4). In alternate embodiments, cache 230 may readily be configured to include various components, locations, and/ or segments in addition to, or instead of, those shown in the FIG. 4 embodiment. For example, in various alternate embodiments, each location 512 of cache 230 may include any desired number of storage segments 514.
During operation of the FIG. 4 embodiment, processor 212 may preferably utilize the four-way associativity technique for mapping and storing information from various address locations of memory 216 into cache 230. Memory 216 typically possesses a substantially larger storage capacity than the relatively smaller storage capacity of cache 230. Therefore, multiple storage location addresses from memory 216 may be mapped to the same location 512 of cache 230. However, each location 512 of cache 230 preferably includes a plurality of storage segments 514 to permit multiple memory locations from memory 216 to be stored into one location 512 of cache 230.
In the FIG. 4 embodiment, a problem may arise when all segments 514 of a given location 512 of cache 230 already contain data from memory 216, and processor 212 requires additional storage capacity at that location 512 to perform a time-critical isochronous process that includes transferring isochronous data 314 into cache 230 from memory 216.
The present invention advantageously includes a technique for increasing deterministic performance of isochronous processes by supporting priority storage of isochronous data into cache 230. In accordance with the present invention, processor 212 may therefore mark a specific segment 514 of cache 230 to indicate that the contents of the marked segment 514 contains special information (such as isochronous data 314) that should not be removed or "rolled out" (returned to memory 216) to make room for other
data unless certain specific exception conditions exist. The marking of a segment 514 and the identification of exception conditions for permitting a rollout are further discussed below in conjunction with FIGS. 5 through 7. Cache architectures and techniques are further discussed in IEEE Std 1596- 1992 which is entitled "IEEE Standard For Scalable Coherent Interface (SCI), which is hereby incorporated by reference.
Referring now to FIG. 5, a diagram for one embodiment of a segment 514 of the FIG. 4 cache 230 is shown, in accordance with the present invention. In the FIG. 5 embodiment, segment 214 preferably includes the capacity to store sixty-four bytes of information from memory 216. However, in alternate embodiments, segment 214 may be implemented to store any desired amount or type of information from any appropriate source.
The FIG. 5 embodiment preferably includes a marker 520 to indicate that segment 514 has been assigned a special status. In the FIG. 5 embodiment, marker 520 preferably indicates that segment 514 includes time-sensitive information that is required for the successful and timely performance of an isochronous process. In the FIG. 5 embodiment, marker 520 may include a digital "bit" that processor 212 preferably sets to a binary value of one to mark the corresponding segment 514 as isochronous information. However, in alternate embodiments, segment 514 may likewise be marked using any other effective technique in order to indicate any desired and appropriate status condition.
Referring now to FIG. 6, a block diagram illustrating a procedure for utilizing a cache 230 is shown, in accordance with one embodiment of the present invention. In the FIG. 6 embodiment, processor 212 initially begins to sequentially access and execute program instructions of device application 312 via path 612. In certain instances, the foregoing program instructions of device application 312 may include one or more isochronous load instructions that direct processor 212 to load time- sensitive isochronous data 314 from memory 216 into a specific corresponding mapped location 512 of
cache 230 via path 616. In the FIG. 6 embodiment, one address location of memory 216 preferably may be stored in a single segment 514 of the mapped location 512 of cache 230.
In the FIG. 6 embodiment, in accordance with the present invention, processor 212 may advantageously instruct cache 230 to create a marker 520 for inclusion in that particular storage segment 514 of cache 230 to indicate that the information stored therein includes isochronous data 314. In some embodiments, marker 520 may simply prevent cache 230 from rolling out the isochronous data 314 in the absence of specific instructions from processor 212.
Under certain conditions, if an excessive number of segments 514 become designated as special by marker 520, then the performance of cache 230 may become degraded or impaired because of a significantly reduced number of available storage segments 514 in cache 230. To address this issue, in certain embodiments of the present invention, a number of rollout exceptions may be implemented to optimize the performance of cache 230.
For example, in one embodiment, if a target location 512 in cache 230 currently comprises a segment 514 that includes initial isochronous data designated by an initial marker 520, and if another isochronous load instruction creates a conflict by mapping subsequent isochronous data 314 from memory 216 to the same target location 512 in cache 230, then processor 212 preferably may rollout the initial isochronous data to permit the subsequent isochronous data from memory 216 to be marked with a marker 520 and loaded into that particular segment 514. In addition, under certain conditions related to another rollout exception, device application 312 may instruct processor 212 to rollout a selectable marked segment 514 of cache 230 in response to various changes of status in device 1 12. For example, if an isochronous process is aborted, then the corresponding isochronous data may no longer be required in cache 230, and device application 312 may advantageously issue a rollout command to thereby optimize performance of cache 230.
Similarly, when a particular isochronous process is completed, a rollout exception may be provided in which device application 312 issues a rollout command to empty a corresponding marked segment 514 of cache 230. For example, in the case where segment 514 includes sixty-four bytes of isochronous data, then device application 312 may advantageously determine the final or sixty-fourth byte of isochronous data is accessed and used from the marked segment 514. In response, device application 312 may then issue a rollout command to return cached isochronous data to a corresponding location in memory 216. In the FIG. 6 embodiment, if device application 312 performs a time- sensitive isochronous process that requires precise deterministic behavior, then device application 312 may advantageously issue various types of isochronous prefetch load instructions to facilitate efficient and successful completion of the isochronous process. For example, if device application 312 knows that a certain block of isochronous data 314 must be moved to cache 230, then device application 312 may issue an isochronous prefetch load instruction in advance to notify memory 216 to transfer all or part of the foregoing block of isochronous data 314, rather than sending individual isochronous load instructions for each individual line of the block of isochronous data 314.
Use of isochronous prefetch load instructions may thus result in a more efficient and timely isochronous data transfer because processor 212 need not wait to complete the transfer of a line of isochronous data from memory 216 before beginning the transfer of a subsequent line of isochronous data from memory 216. Processor 212, memory 216, and cache 230 may execute the foregoing isochronous prefetch load instruction using any appropriate and effective technique that ensures that the transfer of a given portion of the isochronous data occurs prior to the designated time for processing or utilizing that given portion of the isochronous data. For example, device application 312 may provide isochronous "hints" which a compiler program may translate into corresponding isochronous prefetch load instructions. Alternately, device application 312 may include
various prefetch parameters for calculating isochronous prefetch load instructions, or device application 312 may provide specific isochronous prefetch load instructions to processor 212 in appropriate predetermined situations.
Referring now to FIG. 7A, an initial portion of a flowchart of method steps for utilizing a cache 230 is shown, in accordance with one embodiment of the present invention. Initially, the FIG. 7A method steps illustrate an embodiment in which a specific target location 512 in cache 230 has no vacant segments 514 for storing additional information from memory 216. Processor 212 may therefore be required to perform a rollout procedure in order to empty a segment 514 and load the additional information from memory 216.
The FIG. 7A and 7B embodiment is presented to illustrated certain principles and aspects of the present invention. However, in alternate embodiments, the present invention may readily be implemented by utilizing various steps and techniques in addition to, or instead of, those disclosed in conjunction with the FIG. 7A and 7B embodiment. Furthermore, in alternate embodiments, the FIG. 7A and 7B method steps may similarly occur in various sequences other than that discussed in conjunction with the FIG. 7A and 7B embodiment.
In the FIG. 7A embodiment, initially, in step 720, processor 212 preferably receives a program instruction from a software program (such as device application 312), and responsively determines the type of the received program instruction. If the received instruction type is a load or store data instruction, then, in step 724, processor 212 determines whether the particular data specified in a load data instruction is already in cache 230.
If the particular data specified in the load data instruction is already in cache 230, then the FIG. 7A process advances to flowchart 7B. However, if the data specified in the load data instruction of step 720 is not already in cache 230, then, in step 728, processor 212 determines whether the instruction (step 720) is an isochronous load or store instruction. If the
instruction is not an isochronous load or store instruction, then, in step 732, processor 212 preferably rolls out an unmarked segment 514 in the target location 512 of cache 230. In step 736, processor 212 then fetches and loads the transfer data from memory 216 into an appropriate segment 514 of cache 230, and the FIG. 7A process advances to FIG. 7B.
However, if the instruction is an isochronous load or store instruction, then processor 212 preferably determines whether the isochronous data 314 (to be transferred between memory 216 and cache 230) is mapped to a target location 512 of cache 230 that includes a marked segment 514 (as designated by a marker 520). If the transfer data is not mapped to a target location 512 of cache 230 that includes a marked segment 514, then in step 742, processor 212 preferably rolls out any segment 514 in the target location 512 of cache 230. The FIG. 7A process then advances to step 748.
However, in foregoing step 740, if the isochronous data for transfer is mapped to a target location 512 of cache 230 that includes a marked segment 514, then, in step 744, processor 212 preferably rolls out the information in the marked segment 514 to create a vacant target segment 514. Next, in step 748, processor 212 preferably fetches and loads the particular isochronous data from memory 216 into the vacant target segment 514 in cache 230. In step 752, processor 212 may also advantageously mark the foregoing target segment 514 with a marker 520 to indicate its special status.
In foregoing step 720, if the instruction type comprises a "flush" instruction that designates a particular flushable marked segment 514 in cache 230, then, in step 776, processor 212 preferably rolls out the information in that particular flushable marked segment 514 in cache 230 to allow free access for other data transfer operations. The FIG. 7A process then advances to FIG. 7B. In foregoing step 720, if the instruction type comprises any instruction other than a "load data" instruction or a "flush" instruction, then the FIG. 7A process advances to step "B" of the FIG. 7B flowchart.
Referring now to FIG. 7B, a final portion of a flowchart of method steps for utilizing a cache 230 is shown, in accordance with one embodiment of the present invention. In the FIG. 7B flowchart, initially, in step 764, processor 212 preferably executes any "other" program instruction that if necessary as a result of foregoing step 720 of FIG. 7A. Then, in step 768, processor 212 preferably determines whether all information has been accessed and utilized in any of one or more finished segments 514 in cache 230 that are marked with marker 520.
As discussed above in conjunction with FIG. 6, in some embodiments, processor 212 may determine whether all information in a finished segment 514 has been used by monitoring whether the final storage location or address has been accessed and utilized from a marked segment 514. If all information has not been used in a segment 514 of cache 230, then the FIG. 7B process preferably advances to step 756. However, if all information has been used in a finished segment 514 of cache 230, then, in step 772, processor 212 preferably rolls out the information in that particular finished segment 514 of cache 230 to allow access to the finished segment 514 by various other data transfer operations.
Next, in step 756, processor 212 preferably performs an update procedure on a program counter. Finally, in step 760, processor 212 preferably fetches the next program instruction from the software program (such as device application 312), and the FIG. 7B process returns to the foregoing step 720 of FIG. 7A to analyze another program instruction.
The invention has been explained above with reference to a preferred embodiment. Other embodiments will be apparent to those skilled in the art in light of this disclosure. For example, the present invention may readily be implemented using configurations and techniques other than those described in the preferred embodiment above. Additionally, the present invention may effectively be used in conjunction with systems other than the one described above as the preferred embodiment. Therefore, these and other variations
upon the preferred embodiments are intended to be covered by the present invention, which is limited only by the appended claims.