US20050188278A1 - System software to self-migrate from a faulty memory location to a safe memory location - Google Patents
System software to self-migrate from a faulty memory location to a safe memory location Download PDFInfo
- Publication number
- US20050188278A1 US20050188278A1 US10/748,502 US74850203A US2005188278A1 US 20050188278 A1 US20050188278 A1 US 20050188278A1 US 74850203 A US74850203 A US 74850203A US 2005188278 A1 US2005188278 A1 US 2005188278A1
- Authority
- US
- United States
- Prior art keywords
- memory
- system software
- computer system
- safe
- faulty
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1048—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
- G06F11/106—Correcting systematically all correctable errors, i.e. scrubbing
Abstract
A method and system to provide system software to self-migrate from a faulty memory location to a safe memory location. A faulty portion of memory in a system software memory region of a computer system is detected, the faulty portion having stored a system software component. The system software component is relocated from the faulty portion of memory to a safe portion of memory.
Description
- 1. Field of Invention
- The field of invention relates generally to computer systems and, more specifically but not exclusively, relates to system software to self-migrate from a faulty memory location to a safe memory location.
- 2. Background Information
- In a typical PC architecture, the initialization and configuration of the computer system by the Basic Input/Output System (BIOS) is commonly referred to as the pre-boot phase. The pre-boot phase is generally defined as the firmware that runs between the processor reset and the first instruction of the Operating System (OS) loader. At the start of a pre-boot, it is up to the code in the firmware to initialize the system to the point that an operating system loaded off of media, such as a hard disk, can take over. The start of the OS load begins the period commonly referred to as OS runtime. During OS runtime, the firmware acts as an interface between software and hardware components of a computer system. As computer systems have become more sophisticated, the operational environment between the application and OS levels and the hardware level is generally referred to as the firmware or the firmware environment.
- When a computer system starts up, system software is loaded into memory. Usually, system software is loaded once when the computer is booted and is not removed from memory until the system is shut down. In contrast, user applications are designed and implemented so that they may be loaded and torn down numerous times during a single on/off cycle of the computer system. Thus, if the memory location of a user application is faulty, a simple solution is to re-start the application in a different memory location. However, system software generally cannot be moved to a different memory location without resetting the entire computer system.
- Today's system software does not have the ability to self-relocate without restarting the computer system. The system software may be able to mark a region of memory as “bad” and keep the information in persistent storage (e.g., flash, CMOS, etc.) so that the next time the system starts, these faulty memory areas will be avoided when loading the system software. However, for systems that rarely reboot, such as a server, errors may grow within a region of memory until finally a complete failure occurs. Also, while scrubbing the failed memory area may reduce some memory errors, repeatedly scrubbing a faulty region wastes resources and creates overhead that reduces system performance.
- Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
-
FIG. 1A is a block diagram illustrating one embodiment of a memory of a computer system in accordance with the teachings of the present invention. -
FIG. 1B is a flowchart illustrating one embodiment of the logic and operations for system software to self-migrate from a faulty memory location to a safe memory location in accordance with the teachings of the present invention. -
FIG. 1C is a block diagram illustrating one embodiment of system software to self-migrate from a faulty memory location to a safe memory location in accordance with the teachings of the present invention. -
FIG. 2 is a flowchart illustrating one embodiment of the logic and operations for system software to self-migrate from a faulty memory location to a safe memory location in accordance with the teachings of the present invention -
FIG. 3 is a block diagram illustrating one embodiment of system software to self-migrate from a faulty memory location to a safe memory location in accordance with the teachings of the present invention. -
FIG. 4 is a block diagram illustrating one embodiment of system software to self-migrate from a faulty memory location to a safe memory location in accordance with the teachings of the present invention. -
FIG. 5 is a block diagram illustrating one embodiment of a computer system in accordance with the teachings of the present invention. - Embodiments of a method and system to provide system software to self-migrate from a faulty memory location to a safe memory location are described herein. In the following description, numerous specific details are set forth, such as embodiments pertaining to the Extensible Firmware Interface (EFI) framework standard, to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
- Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention: Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
- In one embodiment of the present invention, firmware of a computer system operates in accordance with an extensible firmware framework known as the Extensible Firmware Interface (EFI) (EFI Specification, Version 1.10, Dec. 1, 2002, available at http://developer.intel.com/technology/efi.) EFI is a public industry specification that describes an abstract programmatic interface between platform firmware and shrink-wrap operating systems or other custom application environments. The EFI framework standard includes provisions for extending BIOS functionality beyond that provided by the BIOS code stored in a platform's BIOS device (e.g., flash memory.) More particularly, EFI enables firmware, in the form of firmware modules and drivers, to be loaded from a variety of different resources, including primary and secondary flash devices, option ROMs (Read-Only Memory), various persistent storage devices (e.g., hard disks, CD-ROM (Compact Disk-Read Only Memory), etc.), and from one or more computer systems over a computer network.
-
FIG. 1A illustrates amemory 100 at OS runtime of a computer system according to an embodiment of the present invention. Memory 100 hasstored system software 102, anoperating system 104,OS drivers 106, and user applications 108.System software 102 may include, but is not limited to, EFI components, such as EFI Runtime Drivers, Portable Executable and Common Object File Format (PE/COFF) images, System Management Mode (SMM) components, or the like. - Generally, system software includes instructions and data loaded during the pre-boot phase that persist into operating system runtime. The system software is not under the control of the operating system. In one embodiment, the system software is loaded from a firmware device during pre-boot. The operating system may not even be aware of system software that is loaded into memory. In one embodiment, during pre-boot, the firmware allocates a system software memory region for its own use and tags this portion of memory as reserved and thus not useable by the operating system.
- It will be understood that embodiments of the invention are not limited to the memory layout as shown in
FIG. 1A . Also, for simplicity, each section of memory, such asoperating system 104, is shown as contiguous, but it will be understood that each space may include non-contiguous portions ofmemory 100. -
FIG. 1B shows aflowchart 150 that illustrates an embodiment of the invention to migrate system software from a faulty memory location to a safe memory location. Beginning in ablock 152, the computer system is reset and initialized. Boot instructions stored in the computer system firmware are loaded into memory and executed. In one embodiment, the system boot instructions will begin initializing the platform by conducting a Power-On Self-Test (POST) routine. During the pre-boot phase, hardware devices such as a processor, the chipset, and memory of the computer system are initialized. Also, during initialization some system software may be loaded into memory. - Continuing in a
block 154, a memory error detector is set. In one embodiment, the memory error detector includes an error correction code (ECC.) ECC generally refers to various methods to detect errors in transmitted or stored data and, in some cases, to correct them. Proceeding to ablock 156, the target OS of the system is booted. In ablock 158, during OS runtime, an error is detected in a portion of memory storing system software. The memory address of the faulty portion is determined, as depicted at ablock 160. The logic continues to ablock 162 where the system software is relocated from the faulty portion to a safe portion of memory. In ablock 164, the error portion is marked as unusable. The former location of the system software may be logged to a System Error Log (SEL) or Baseboard Management Controller (BMC) for later analysis. -
FIG. 1C is a block diagram illustrating one embodiment of relocating asystem software component 170 in accordance with the teachings of the present invention.System software component 170 is a portion of thesystem software 102 loaded inmemory 100. In one embodiment, such a system software component includes a Portable Executable and Common Object File Format (PE/COFF) executable image. Asystem software manager 172 is used to track the location of system software components in memory. As shown inFIG. 1C , systemsoftware memory manager 172 is updated to indicate the new location ofsystem software component 170 whencomponent 170 is relocated from a faulty memory location to a safe memory location ofmemory 100. - Migrating away from faulty memory regions increases system reliability and reduces performance overhead. It is important to migrate away from memory areas that generate Single-Bit Errors (SBEs) because too many SBEs may lead to a Multi-Bit Error (MBE.) Generally, an SBE includes a single bit of data being incorrect when reading an entire byte (or word.) An MBE includes more than one bit in a complete byte being incorrect. Usually, an MBE is not correctable, so the data or code that was stored in that region of memory is lost. Also, numerous SBEs create performance overhead because of the need for constant scrubbing and logging of errors.
- It will be appreciated that embodiments of the present invention allow for system software to perform self-healing actions independent of the operating system. Instructions and data in memory under control of firmware may be allocated and migrated without the firmware consulting the operating system. Moreover, this migration of system software is done dynamically to prevent system down-time that would be caused if the system had to be re-booted after a system software migration. The system software itself detects errors and performs the relocation of a system software component.
-
FIG. 2 shows aflowchart 200 that illustrates an embodiment of the invention to provide system software to migrate from a faulty memory location to a safe memory location utilizing the System Management Mode (SMM) of an Intel Architecture 32-bit processor (IA32 processor.) - SMM is a special mode for handling system wide functions and is intended for use only be system firmware, and not by an OS or an application. When SMM is invoked through a System Management Interrupt (SMI), the processor saves the current state of the processor and switches to a separate operating environment contained in System Management Random Access Memory (SMRAM). While in SMM, the processor executes SMI handler code to perform operations. When the SMI handler has completed its operations, it executes a resume instruction. This instruction causes the processor to reload the saved state of the processor, switch back to protected or real mode, and resume executing the interrupted application or OS tasks.
- Starting in a
block 202, the computer system is reset and initialized. Proceeding to ablock 204, the error correction code is set. Continuing in ablock 206, the SMM core is loaded. Proceeding to ablock 208, drivers are loaded. Such drivers include, but are not limited to, a SMM driver. SMI Handlers may also be loaded into SMRAM during pre-boot. In an EFI-compliant system, boot service and runtime service drivers are loaded into conventional memory (i.e., memory outside of SMRAM.) The boot service drivers are unloaded when the target OS is booted, while the runtime drivers continue into OS runtime. - The logic proceeds to a
block 210 where the target OS is booted. In ablock 212, the OS executes. As the OS executes, the ECC monitors the memory for errors. When an ECC error is detected, as depicted in ablock 214, an SMI is generated, as shown in ablock 216. The SMI interrupts OS runtime and puts the computer system into SMM. In one embodiment, the error is a single bit error in memory. - Proceeding to block 218, the memory address of the error is determined. In a
decision block 220, the logic determines if the memory error is in the portion of memory containing SMRAM. If the answer to decision block 220 is no, then the logic proceeds to ablock 228 to scrub the memory region with the SBE. - Memory scrubbing is often used to correct memory errors, and involves reading memory and writing back to it. Generally, this duty is automatically handled via a system's chipset (e.g., memory controller) and/or built-in functionality provided by a memory component (e.g., a Dynamic Random Access Memory (DRAM) Dual In-line Memory Module (DIMM)). However, in some instances in which the memory controller or built-in functionality is less sophisticated, this task must be performed by software through a service handler.
- If the answer to a
decision block 220 is yes, then the logic proceeds to ablock 222 to determine if the address of the error is within an SMM area of SMRAM. In one embodiment, the SMM core keeps a system software memory manager having pointers to SMM components within SMRAM. Each pointer is checked to determine if its associated SMM component is at an address having the error. If the error is not within an SMM component, then the error is in an unused portion of SMRAM and the answer to decision block 222 is no. In one embodiment, the SMM core may manage a queue of pointers to SMM Drivers and a queue of pointers to SMI Handlers. - If the answer to decision block 222 is no, then the logic proceeds to block 228 to scrub the region. After the region is scrubbed, the logic proceeds to block 230 to handle any additional SMIs and then to block 212 to resume executing the OS.
- If the answer to decision block 222 is yes, then the logic proceeds to a
block 224 to relocate the system software to a safe location of memory. Embodiments of relocating the system software are described below. Proceeding to ablock 226, the error portion of memory is marked as unusable. Marking the error portion as unusable ensures that the system does not accidentally migrate into the error portion at a later time. The logic then proceeds to block 230 to handle any other SMIs, and then back to block 212 to continue executing the OS. -
FIG. 3 illustrates an embodiment of the invention to migrate system software from a faulty memory location to a safe memory location in an IA32 computer system employing legacy system software.Memory 300 of a computer system includesSMRAM 301.SMRAM 301 includes anSMM core 302, SMM Drivers 304-305, andSMI Handler 308. TheSMM core 302 includes pointers to the SMM Drivers 304-305. Each SMM Driver 304-305 includes code for processing an SMI to a hardware device of the computer system during SMM. - While in SMM, the processor executes code and stores data in the SMRAM space. The actual physical location of the SMRAM may be in system memory or in a separate memory device. The SMRAM space is mapped to the physical address space of the processor that can be up to 4 Gigabytes (GB) in size. SMRAM may be allocated various portions of memory including, but not limited to, 512 Kilobytes (KB), 1 Megabyte (MB), 8 MBs, or the like. The processor uses SMRAM to save the state of the processor and to store SMM related code and data. SMRAM may also store system management information and Original Equipment Manufacturer (OEM) specific information.
- SMRAM begins at a base physical address called SMBASE as shown at 306 in
FIG. 3 . Usually, the default base address of SMBASE is 30000H. SMI requests use the SMBASE as a starting point to process an SMI. - A Global Descriptor Table (GDT) describes system segments such as SMM. A Code Segment Descriptor (CSD) is associated with each system segment of the GDT. Usually, a CSD is 8 bytes long and includes the segment's base address, size, and other information. An offset is added to the segment base address to produce a 32-bit linear address. If paging is disabled, then the linear address is interpreted as a physical address. If paging is enabled, then the linear address is interpreted as a virtual address and mapped to a physical address using page tables.
- In legacy SMM, SMM code is linked at address 0 of memory and the CSD base is set to map the address 0 to the base of SMRAM. In one embodiment, relocating the system software may include moving the contents of SMRAM to another portion of memory and resetting the SMBASE. In
FIG. 3 ,SMRAM 301 is moved to another portion of memory. TheSMBASE 306 is then reset tonew location SMBASE 312 to establish a new base for SMRAM. The GDT and CSDs are updated accordingly. -
FIG. 4 illustrates an embodiment of the invention to migrate system software from a faulty memory location to a safe memory location in an IA32 computer system having system software compliant with the EFI framework. In one embodiment, the memory is executed in a physical addressing mode. - The SMBASE for EFI compliant systems is established during pre-boot as follows. The SMBASE is a register in each CPU. CPU starts with the SMBASE set to 0x38000H (0x3000-segment, offset 0x8000.) The permissible address ranges for the platform's SMRAM implementation is ascertained and allocated. After the address range has been allocated, the initial address for the SMRAM is relocated from the default address (0x38000H) to the ascertained platform address. This region of SMRAM is protected by the chipset. In one embodiment, the SMRAM is relocated to a position below 4 Gigabytes of physical memory.
- In one embodiment, relocating the system software may include moving at least a portion of SMRAM within the SMRAM address space.
SMRAM 400 includes software components of anSMM core 402, SMM Drivers 404-406, and SMI Handlers 408-409. InFIG. 4 ,SMM Driver 405 is at address Top of Segment (TSEG) 255 MB+x. Anerror 412 is detected inSMM Driver 405, soSMM Driver 405 is migrated withinSMRAM 400 to positionTSEG 255 MB+y. - In one embodiment, the SMM Core maintains a system software memory manager (SSMM) 403 to map available SMRAM space and to map locations of system software components. The system software uses the system
software memory manager 403 to find an available memory region for migration of theSMM Driver 405. The pointer toSMM Driver 405 in theSMM Core 402 is updated with the new location ofSMM Driver 405. The old location ofSMM Driver 405 is marked as unusable. - In another embodiment, the
entire SMRAM 400 is relocated to another location in memory, similar to as described above in conjunction withFIG. 3 . The new locations of the system software components are updated in the systemsoftware memory manager 403. - In one implementation of EFI, Portable Executable and Common Object File Format (PE/COFF) executable images are used (PE/COFF Specification, Version 6.0, February 1999, available at http://www.microsoft.com/whdc/hwdev/hardware/pecoff.mspx) for various system software components. The PE/COFF images can be relocated to the new, safer memory location without having to manage the GDT and/or paging as described above connection with
FIG. 3 . In the embodiment ofFIG. 4 ,SMM Core 402, SMM Drivers 404-406, and SMI Handlers 408-409 are PE/COFF executable images. - Embodiments of the present invention may be implemented on a 64-bit processor, such as the Intel® Itanium® family of processors. Itanium® processors employ a Platform Management Interrupt (PMI.) The handling of an SMI with an IA32 processor and a PMI with an Itanium® family processor involve similar processes. In general, the operations and logic as shown in the flowcharts of
FIG. 1B andFIG. 2 may be applied in analogous manner to an Itanium® processor. - Itanium® firmware includes a System Abstraction Layer (SAL), Processor Abstraction Layer (PAL), and an EFI Layer. The SAL is a firmware layer that isolates operating system and other higher-level software from implementation differences in the platform. The PAL provides a firmware abstraction between the processor hardware and system software and platform firmware, so as to maintain a single software interface for multiple implementations of the processor hardware.
- PAL-based interrupts are serviced by PAL firmware, system firmware, or the operating system. One type of interrupt is a Platform Management Interrupt (PMI.) A PMI is a platform management request to perform functions such as platform error handling, memory scrubbing, or power management.
- PMIs occur during instruction processing causing the flow of control to be passed to the PAL PMI Handler. In the process, system state information is saved in the interrupt registers by the processor hardware and the processor starts to execute instructions from the PAL. The PAL will either handle the PMI if it is a PAL-related PMI or transition to the SAL PMI code if the PMI is a SAL related PMI. Upon completion of the processing, the interrupted processor state is restored and the execution of the interrupted instruction is resumed.
- Some differences between Itanium® and IA32 processors are noted as follows. First, Itanium® processors do not enter a special CPU mode upon activation of a PMI signal. Instead, Itanium® processors provide a mechanism to bring a handler into the processor to handle a PMI event. Second, instead of maintaining an SMRAM area, Itanium® processors use a Firmware Reserved region in memory for storing system software. Firmware Reserved memory includes a portion of memory that holds firmware components similar to those discussed above with reference to the SMRAM used for IA32 processors. The system software stored in the Firmware Reserved area is maintained in OS runtime memory and does not have hardware protection as with SMRAM. In an EFI-compliant system, an EFI Runtime memory region may be reserved for use by EFI components.
- Relocating system software in a computer system having an Itanium® processor is similar to that as described above in conjunction with
FIG. 4 . When an error is detected in a region of memory having a system software component, that system software component may be relocated to an available portion of memory. Pointers to the component are updated appropriately and the former location is marked as unusable. In one embodiment, since the system software is located in OS runtime memory space, the OS may also be made aware that the former location of the system software component is unusable memory space. - Generally, components within control of the computer system firmware may be relocated as described herein. Components that are within SMRAM, Firmware Reserved memory, or the like, may be relocated. In embodiments of an EFI-compliant system, EFI runtime drivers, as well as other EFI components that survive into OS runtime, may also be relocated as described herein.
-
FIG. 5 is an illustration of one embodiment of anexample computer system 500 on which embodiments of the present invention may be implemented.Computer system 500 includes aprocessor 502 coupled to abus 506.Memory 504,storage 512,non-volatile storage 505,display controller 508, input/output controller 516 and modem ornetwork interface 514 are also coupled tobus 506. Thecomputer system 500 interfaces to external systems through the modem ornetwork interface 514. Thisinterface 514 may be an analog modem, Integrated Services Digital Network (ISDN) modem, cable modem, Digital Subscriber Line (DSL) modem, a T-1 line interface, a T-3 line interface, token ring interface, satellite transmission interface, or other interfaces for coupling a computer system to other computer systems. Acarrier wave signal 523 is received/transmitted by modem ornetwork interface 514 to communicate withcomputer system 500. In the embodiment illustrated inFIG. 5 , carrier waivesignal 523 is used to interfacecomputer system 500 with acomputer network 524, such as a local area network (LAN), wide area network (WAN), or the Internet. In one embodiment,computer network 524 is further coupled to a remote computer (not shown), such thatcomputer system 500 and the remote computer can communicate. -
Processor 502 may be a conventional microprocessor including, but not limited to, an Intel Corporation x86, Pentium®, or Itanium® family microprocessor, a Motorola family microprocessor, or the like.Memory 504 may include, but is not limited to, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronized Dynamic Random Access Memory (SDRAM), Rambus Dynamic Random Access Memory (RDRAM), or the like.Display controller 508 controls in a conventional manner adisplay 510, which in one embodiment may be a cathode ray tube (CRT), a liquid crystal display (LCD), an active matrix display, or the like. An input/output device 518 coupled to input/output controller 516 may be a keyboard, disk drive, printer, scanner and other input and output devices, including a mouse, trackball, trackpad, joystick, or other pointing device. - The
computer system 500 also includesnon-volatile storage 505 on which firmware and/or data may be stored. Non-volatile storage devices include, but are not limited to, Read-Only Memory (ROM), Flash memory, Erasable Programmable Read Only Memory (EPROM), Electronically Erasable Programmable Read Only Memory (EEPROM), or the like. -
Storage 512 in one embodiment may be a magnetic hard disk, an optical disk, or another form of storage for large amounts of data. Some data may be written by a direct memory access process intomemory 504 during execution of software incomputer system 500. It is appreciated that software may reside instorage 512,memory 504,non-volatile storage 505 or may be transmitted or received via modem ornetwork interface 514. - For the purposes of the specification, a machine-readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable or accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.). For example, a machine-readable medium includes, but is not limited to, recordable/non-recordable media (e.g., a read only memory (ROM), a random access memory (RAM), a magnetic disk storage media, an optical storage media, a flash memory device, etc.). In addition, a machine-readable medium can include propagated signals such as electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.).
- It will be appreciated that
computer system 500 is one example of many possible computer systems that have different architectures. For example, computer systems that utilize the Microsoft Windows® operating system in combination with Intel microprocessors often have multiple buses, one of which may be considered a peripheral bus. Workstation computers may also be considered as computer systems that may be used with the present invention. Workstation computers may not include a hard disk or other mass storage, and the executable programs are loaded from a corded or wireless network connection intomemory 504 for execution byprocessor 502. In addition, handheld or palmtop computers, which are sometimes referred to as personal digital assistants (PDAs), may also be considered as computer systems that may be used with the present invention. As with workstation computers, handheld computers may not include a hard disk or other mass storage, and the executable programs are loaded from a corded or wireless network connection intomemory 504 for execution byprocessor 502. A typical computer system will usually include at least aprocessor 502,memory 504, and abus 506coupling memory 504 toprocessor 502. - It will also be appreciated that in one embodiment,
computer system 500 is controlled by operating system software. For example, one embodiment of the present invention utilizes Microsoft Windows® as the operating system forcomputer system 500. In other embodiments, other operating systems that may also be used withcomputer system 500 include, but are not limited to, the Apple Macintosh operating system, the Linux operating system, the Microsoft Windows CE® operating system, the Unix operating system, the 3Com Palm operating system, or the like. - The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
- These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims. Rather, the scope of the invention is to be determined by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.
Claims (22)
1. A method, comprising:
detecting a faulty portion of memory in a computer system, the faulty portion having stored a system software component in a system software memory region of memory; and
relocating the system software component from the faulty portion of memory to a safe portion of memory.
2. The method of claim 1 wherein the system software component includes instructions loaded from a firmware device during a pre-boot phase of the computer system that persist into an operating system runtime of the computer system.
3. The method of claim 1 wherein relocating the system software component comprises:
finding the safe portion of memory within the system software memory region;
moving the system software component to the safe portion of memory; and
updating a system software memory manager to indicate the system software component is located at the safe portion of memory.
4. The method of claim 1 wherein relocating the system software component comprises:
finding the safe portion of memory within the memory of the computer system;
moving the system software memory region to the safe portion of memory; and
resetting a base address for the system software memory region.
5. The method of claim 1 wherein the system software memory region comprises System Management Random Access Memory (SMRAM).
6. The method of claim 1 wherein the system software memory region comprises a firmware reserved region of memory of the computer system.
7. The method of claim 1 , further comprising setting a memory error detector during a pre-boot phase of the computer system.
8. The method of claim 1 , further comprising determining a memory address of the faulty portion.
9. The method of claim 1 , further comprising marking the faulty portion as unusable.
10. An article of manufacture comprising:
a machine-readable medium including a plurality of instructions which when executed perform operations comprising:
detecting a faulty portion in a system software memory region of a computer system during an operating system runtime of the computer system, the system software memory region having stored system software for the computer system; and
relocating the system software from the faulty portion to a safe portion of memory of the computer system during operating system runtime.
11. The article of manufacture of claim 10 wherein relocating the system software comprises:
finding the safe portion of memory;
moving a portion of system software to the safe portion of memory; and
indicating the portion of system software is located at the safe portion of memory.
12. An article of manufacture of claim 11 wherein indicating the portion of system software is located at the safe portion of memory comprises updating a system software memory manager for the system software memory region to indicate the portion of system software is at the safe portion of memory.
13. The article of manufacture of claim 11 wherein the portion of system software comprises an executable image in accordance with a Portable Executable and Common Object File Format (PE/COFF).
14. The article of manufacture of claim 10 wherein the system software memory region comprises a System Management Random Access Memory (SMRAM) region.
15. The article of manufacture of claim 10 wherein the system software memory region comprises a firmware reserved region, wherein firmware of the computer system to operate in accordance with an Extensible Firmware Interface (EFI) framework standard.
16. The article of manufacture of claim 10 wherein execution of the plurality of instructions further perform operations comprising marking the faulty portion of the system software memory region as unusable after relocating the system software.
17. A computer system, comprising:
a processor;
a memory device operatively coupled to the processor; and
at least one flash device operatively coupled to the processor, the at least one flash device including firmware instructions which when executed by the processor perform operations comprising:
detecting a faulty portion of the memory device during an operating system runtime of the computer system, the faulty portion of the memory device having stored a system software component for the computer system;
determining a location of the faulty portion; and
relocating the system software component from the faulty portion to a safe portion of the memory device during operating system runtime.
18. The computer system of claim 17 wherein relocating the system software component comprises:
finding the safe portion of the memory device;
moving the system software component to the safe portion; and
updating a system software memory manager to indicate that the system software component is located at the safe portion.
19. The computer system of claim 17 wherein the system software component includes an executable image in accordance with a Portable Executable and Common Object File Format (PE/COFF).
20. The computer system of claim 17 wherein the system software component is stored in a System Management Random Access Memory (SMRAM) region of the memory device.
21. The computer system of claim 17 wherein the system software component is stored in a firmware reserved region of the memory device.
22. The computer system of claim 17 wherein the firmware instructions to operate in accordance with an Extensible Firmware Interface (EFI) framework standard.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/748,502 US7321990B2 (en) | 2003-12-30 | 2003-12-30 | System software to self-migrate from a faulty memory location to a safe memory location |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/748,502 US7321990B2 (en) | 2003-12-30 | 2003-12-30 | System software to self-migrate from a faulty memory location to a safe memory location |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050188278A1 true US20050188278A1 (en) | 2005-08-25 |
US7321990B2 US7321990B2 (en) | 2008-01-22 |
Family
ID=34860690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/748,502 Expired - Fee Related US7321990B2 (en) | 2003-12-30 | 2003-12-30 | System software to self-migrate from a faulty memory location to a safe memory location |
Country Status (1)
Country | Link |
---|---|
US (1) | US7321990B2 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040255053A1 (en) * | 2003-06-16 | 2004-12-16 | Kang Byung-Suk | Information processing device and method for controlling the same |
US20050257040A1 (en) * | 2004-05-12 | 2005-11-17 | Samsung Electronics Co., Ltd. | Information processing system and method of controlling the same |
US20060253620A1 (en) * | 2005-05-06 | 2006-11-09 | Kang Byung-Suk | Data structure of flash memory having system area with variable size in which data can be updated, USB memory device having the flash memory, and method of controlling the system area |
US20060291304A1 (en) * | 2005-06-23 | 2006-12-28 | Rothman Michael A | Method for enhanced block management |
US20070074068A1 (en) * | 2005-09-28 | 2007-03-29 | Lite-On Technology Corporation | Method for protecting backup data of a computer system from damage |
US20070168754A1 (en) * | 2005-12-19 | 2007-07-19 | Xiv Ltd. | Method and apparatus for ensuring writing integrity in mass storage systems |
US20070168717A1 (en) * | 2005-12-13 | 2007-07-19 | Lung-Chiao Chang | Method of Data Protection for Computers |
US20080022154A1 (en) * | 2005-03-24 | 2008-01-24 | Fujitsu Limited | Information processing device |
US20080147945A1 (en) * | 2006-09-28 | 2008-06-19 | Zimmer Vincent J | Parallel memory migration |
US20080183945A1 (en) * | 2007-01-31 | 2008-07-31 | Hughes Nathan J | Firmware relocation |
US20080288714A1 (en) * | 2007-05-15 | 2008-11-20 | Sandisk Il Ltd | File storage in a computer system with diverse storage media |
US20080313401A1 (en) * | 2006-12-20 | 2008-12-18 | Byung Suk Kang | Device for Processing Information and Working Method Thereof |
US20090063836A1 (en) * | 2007-08-31 | 2009-03-05 | Rothman Michael A | Extended fault resilience for a platform |
US20090313444A1 (en) * | 2006-06-28 | 2009-12-17 | Seiko Epson Corporation | Semiconductor storage apparatus managing system, semiconductor storage apparatus, host apparatus, program and method of managing semiconductor storage apparatus |
US20100169554A1 (en) * | 2008-12-25 | 2010-07-01 | Fujitsu Limited | Terminal apparatus |
US8024496B2 (en) | 2009-04-10 | 2011-09-20 | International Business Machines Corporation | Enhanced memory migration descriptor format and method |
US20120023364A1 (en) * | 2010-07-26 | 2012-01-26 | Swanson Robert C | Methods and apparatus to protect segments of memory |
US20130046491A1 (en) * | 2011-08-17 | 2013-02-21 | Seagate Technology Llc | In-line analyzer for wavelet based defect scanning |
US20140052948A1 (en) * | 2011-07-28 | 2014-02-20 | Huawei Technologies Co., Ltd. | Method and device for implementing memory migration |
DE102007046947B4 (en) * | 2006-09-29 | 2017-10-12 | Dell Products L.P. | System and method for managing system management interrupts in a multi-processor computer system |
KR20210044194A (en) * | 2020-05-29 | 2021-04-22 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Memory fault processing method and device, electronic equipment and storage medium |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4157536B2 (en) * | 2005-03-29 | 2008-10-01 | 富士通株式会社 | Program execution device, program execution method, and service providing program |
US20070088988A1 (en) * | 2005-10-14 | 2007-04-19 | Dell Products L.P. | System and method for logging recoverable errors |
US8166338B2 (en) * | 2009-06-04 | 2012-04-24 | International Business Machines Corporation | Reliable exception handling in a computer system |
US10007579B2 (en) | 2016-03-11 | 2018-06-26 | Microsoft Technology Licensing, Llc | Memory backup management in computing systems |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5592616A (en) * | 1995-06-07 | 1997-01-07 | Dell Usa, Lp | Method for performing efficient memory testing on large memory arrays using test code executed from cache memory |
US5638532A (en) * | 1994-12-06 | 1997-06-10 | Digital Equipment Corporation | Apparatus and method for accessing SMRAM in a computer based upon a processor employing system management mode |
US5862314A (en) * | 1996-11-01 | 1999-01-19 | Micron Electronics, Inc. | System and method for remapping defective memory locations |
US6189111B1 (en) * | 1997-03-28 | 2001-02-13 | Tandem Computers Incorporated | Resource harvesting in scalable, fault tolerant, single system image clusters |
US6240531B1 (en) * | 1997-09-30 | 2001-05-29 | Networks Associates Inc. | System and method for computer operating system protection |
US20020010876A1 (en) * | 1998-10-30 | 2002-01-24 | Peter Kliegelhofer | Method for operating memory devices for storing data |
US6343338B1 (en) * | 1997-04-01 | 2002-01-29 | Microsoft Corporation | System and method for synchronizing disparate processing modes and for controlling access to shared resources |
US20020169979A1 (en) * | 2001-05-11 | 2002-11-14 | Zimmer Vincent J. | Hardened extensible firmware framework |
US20030093579A1 (en) * | 2001-11-15 | 2003-05-15 | Zimmer Vincent J. | Method and system for concurrent handler execution in an SMI and PMI-based dispatch-execution framework |
US20030140271A1 (en) * | 2002-01-22 | 2003-07-24 | Dell Products L.P. | System and method for recovering from memory errors |
US20030154392A1 (en) * | 2002-02-11 | 2003-08-14 | Lewis Timothy A. | Secure system firmware using interrupt generation on attempts to modify shadow RAM attributes |
US20030177129A1 (en) * | 2002-03-04 | 2003-09-18 | Barry Bond | Extensible loader |
-
2003
- 2003-12-30 US US10/748,502 patent/US7321990B2/en not_active Expired - Fee Related
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5638532A (en) * | 1994-12-06 | 1997-06-10 | Digital Equipment Corporation | Apparatus and method for accessing SMRAM in a computer based upon a processor employing system management mode |
US5592616A (en) * | 1995-06-07 | 1997-01-07 | Dell Usa, Lp | Method for performing efficient memory testing on large memory arrays using test code executed from cache memory |
US5862314A (en) * | 1996-11-01 | 1999-01-19 | Micron Electronics, Inc. | System and method for remapping defective memory locations |
US6189111B1 (en) * | 1997-03-28 | 2001-02-13 | Tandem Computers Incorporated | Resource harvesting in scalable, fault tolerant, single system image clusters |
US6343338B1 (en) * | 1997-04-01 | 2002-01-29 | Microsoft Corporation | System and method for synchronizing disparate processing modes and for controlling access to shared resources |
US6240531B1 (en) * | 1997-09-30 | 2001-05-29 | Networks Associates Inc. | System and method for computer operating system protection |
US20020010876A1 (en) * | 1998-10-30 | 2002-01-24 | Peter Kliegelhofer | Method for operating memory devices for storing data |
US20020169979A1 (en) * | 2001-05-11 | 2002-11-14 | Zimmer Vincent J. | Hardened extensible firmware framework |
US20020169951A1 (en) * | 2001-05-11 | 2002-11-14 | Zimmer Vincent J. | SMM loader and execution mechanism for component software for multiple architectures |
US20030093579A1 (en) * | 2001-11-15 | 2003-05-15 | Zimmer Vincent J. | Method and system for concurrent handler execution in an SMI and PMI-based dispatch-execution framework |
US20030140271A1 (en) * | 2002-01-22 | 2003-07-24 | Dell Products L.P. | System and method for recovering from memory errors |
US20030154392A1 (en) * | 2002-02-11 | 2003-08-14 | Lewis Timothy A. | Secure system firmware using interrupt generation on attempts to modify shadow RAM attributes |
US20030177129A1 (en) * | 2002-03-04 | 2003-09-18 | Barry Bond | Extensible loader |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040255053A1 (en) * | 2003-06-16 | 2004-12-16 | Kang Byung-Suk | Information processing device and method for controlling the same |
US20050257040A1 (en) * | 2004-05-12 | 2005-11-17 | Samsung Electronics Co., Ltd. | Information processing system and method of controlling the same |
US7337254B2 (en) * | 2004-05-12 | 2008-02-26 | Samsung Electronics Co., Ltd. | Information processing system and method of controlling the same |
US20080022154A1 (en) * | 2005-03-24 | 2008-01-24 | Fujitsu Limited | Information processing device |
US8527806B2 (en) * | 2005-03-24 | 2013-09-03 | Fujitsu Limited | Information processing device and memory anomaly monitoring method |
US20060253620A1 (en) * | 2005-05-06 | 2006-11-09 | Kang Byung-Suk | Data structure of flash memory having system area with variable size in which data can be updated, USB memory device having the flash memory, and method of controlling the system area |
US7352621B2 (en) | 2005-06-23 | 2008-04-01 | Intel Corporation | Method for enhanced block management |
US20060291304A1 (en) * | 2005-06-23 | 2006-12-28 | Rothman Michael A | Method for enhanced block management |
US20070074068A1 (en) * | 2005-09-28 | 2007-03-29 | Lite-On Technology Corporation | Method for protecting backup data of a computer system from damage |
US7707454B2 (en) * | 2005-09-28 | 2010-04-27 | Lite-On Technology Corporation | Method for protecting backup data of a computer system from damage |
US20070168717A1 (en) * | 2005-12-13 | 2007-07-19 | Lung-Chiao Chang | Method of Data Protection for Computers |
US20070168754A1 (en) * | 2005-12-19 | 2007-07-19 | Xiv Ltd. | Method and apparatus for ensuring writing integrity in mass storage systems |
US20090313444A1 (en) * | 2006-06-28 | 2009-12-17 | Seiko Epson Corporation | Semiconductor storage apparatus managing system, semiconductor storage apparatus, host apparatus, program and method of managing semiconductor storage apparatus |
US7962807B2 (en) * | 2006-06-28 | 2011-06-14 | Seiko Epson Corporation | Semiconductor storage apparatus managing system, semiconductor storage apparatus, host apparatus, program and method of managing semiconductor storage apparatus |
US9384039B2 (en) * | 2006-09-28 | 2016-07-05 | Intel Corporation | Parallel memory migration |
US7941624B2 (en) * | 2006-09-28 | 2011-05-10 | Intel Corporation | Parallel memory migration |
US20110213942A1 (en) * | 2006-09-28 | 2011-09-01 | Zimmer Vincent J | Parallel memory migration |
US20080147945A1 (en) * | 2006-09-28 | 2008-06-19 | Zimmer Vincent J | Parallel memory migration |
DE102007046947B4 (en) * | 2006-09-29 | 2017-10-12 | Dell Products L.P. | System and method for managing system management interrupts in a multi-processor computer system |
US20100095079A1 (en) * | 2006-12-20 | 2010-04-15 | Byung Suk Kang | Device for processing information and working method thereof |
US20080313401A1 (en) * | 2006-12-20 | 2008-12-18 | Byung Suk Kang | Device for Processing Information and Working Method Thereof |
US8065500B2 (en) * | 2006-12-20 | 2011-11-22 | Lg Electronics Inc. | Device for processing information and working method thereof |
US7797504B2 (en) * | 2006-12-20 | 2010-09-14 | Lg Electronics Inc. | Device for processing information based on stored identifiers and a working method therof. |
US20080183945A1 (en) * | 2007-01-31 | 2008-07-31 | Hughes Nathan J | Firmware relocation |
US20080288714A1 (en) * | 2007-05-15 | 2008-11-20 | Sandisk Il Ltd | File storage in a computer system with diverse storage media |
US7899987B2 (en) * | 2007-05-15 | 2011-03-01 | Sandisk Il Ltd. | File storage in a computer system with diverse storage media |
US7831858B2 (en) * | 2007-08-31 | 2010-11-09 | Intel Corporation | Extended fault resilience for a platform |
US20090063836A1 (en) * | 2007-08-31 | 2009-03-05 | Rothman Michael A | Extended fault resilience for a platform |
US20100169554A1 (en) * | 2008-12-25 | 2010-07-01 | Fujitsu Limited | Terminal apparatus |
US8190813B2 (en) * | 2008-12-25 | 2012-05-29 | Fujitsu Limited | Terminal apparatus with restricted non-volatile storage medium |
US8024496B2 (en) | 2009-04-10 | 2011-09-20 | International Business Machines Corporation | Enhanced memory migration descriptor format and method |
US20120023364A1 (en) * | 2010-07-26 | 2012-01-26 | Swanson Robert C | Methods and apparatus to protect segments of memory |
AU2011286271B2 (en) * | 2010-07-26 | 2014-08-07 | Intel Corporation | Methods and apparatus to protect segments of memory |
KR101473119B1 (en) * | 2010-07-26 | 2014-12-15 | 인텔 코오퍼레이션 | Methods and apparatus to protect segments of memory |
US9063836B2 (en) * | 2010-07-26 | 2015-06-23 | Intel Corporation | Methods and apparatus to protect segments of memory |
EP2598997A4 (en) * | 2010-07-26 | 2015-08-05 | Intel Corp | Methods and apparatus to protect segments of memory |
CN103140841A (en) * | 2010-07-26 | 2013-06-05 | 英特尔公司 | Methods and apparatus to protect segments of memory |
US20140052948A1 (en) * | 2011-07-28 | 2014-02-20 | Huawei Technologies Co., Ltd. | Method and device for implementing memory migration |
US9600202B2 (en) * | 2011-07-28 | 2017-03-21 | Huawei Technologies Co., Ltd. | Method and device for implementing memory migration |
US20130046491A1 (en) * | 2011-08-17 | 2013-02-21 | Seagate Technology Llc | In-line analyzer for wavelet based defect scanning |
US9915944B2 (en) * | 2011-08-17 | 2018-03-13 | Seagate Technology Llc | In-line analyzer for wavelet based defect scanning |
KR20210044194A (en) * | 2020-05-29 | 2021-04-22 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Memory fault processing method and device, electronic equipment and storage medium |
KR102488882B1 (en) * | 2020-05-29 | 2023-01-17 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Memory fault processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US7321990B2 (en) | 2008-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7321990B2 (en) | System software to self-migrate from a faulty memory location to a safe memory location | |
US20050149711A1 (en) | Method and system for firmware-based run time exception filtering | |
US9372754B2 (en) | Restoring from a legacy OS environment to a UEFI pre-boot environment | |
US7260848B2 (en) | Hardened extensible firmware framework | |
US7020738B2 (en) | Method for resolving address space conflicts between a virtual machine monitor and a guest operating system | |
US7934209B2 (en) | Method for firmware variable storage with eager compression, fail-safe extraction and restart time compression scan | |
US7917689B2 (en) | Methods and apparatuses for nonvolatile memory wear leveling | |
US8522236B2 (en) | Method and system for establishing a robust virtualized environment | |
US6775728B2 (en) | Method and system for concurrent handler execution in an SMI and PMI-based dispatch-execution framework | |
US7269768B2 (en) | Method and system to provide debugging of a computer system from firmware | |
EP1854006B1 (en) | Method and system for preserving dump data upon a crash of the operating system | |
US7146512B2 (en) | Method of activating management mode through a network for monitoring a hardware entity and transmitting the monitored information through the network | |
US7451298B2 (en) | Processing exceptions from 64-bit application program executing in 64-bit processor with 32-bit OS kernel by switching to 32-bit processor mode | |
US20050114639A1 (en) | Hardened extensible firmware framework to support system management mode operations using 64-bit extended memory mode processors | |
US9483782B2 (en) | Automating capacity upgrade on demand | |
KR20080027207A (en) | High integrity firmware | |
US8539214B1 (en) | Execution of a program module within both a PEI phase and a DXE phase of an EFI firmware | |
US7512719B1 (en) | Sharing a dynamically located memory block between components executing in different processor modes in an extensible firmware interface environment | |
US7840792B2 (en) | Utilizing hand-off blocks in system management mode to allow independent initialization of SMBASE between PEI and DXE phases | |
US7831858B2 (en) | Extended fault resilience for a platform | |
US7484083B1 (en) | Method, apparatus, and computer-readable medium for utilizing BIOS boot specification compliant devices within an extensible firmware interface environment | |
US20050071624A1 (en) | Providing a self-describing media for a computer system | |
US8949868B2 (en) | Methods, systems and computer program products for dynamic linkage | |
US8516233B2 (en) | Method for setting a boot list to disks with multiple boot logical volumes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZIMMER, VINCENT J.;ROTHMAN, MICHAEL A.;REEL/FRAME:014860/0011;SIGNING DATES FROM 20031226 TO 20031229 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Expired due to failure to pay maintenance fee |
Effective date: 20160122 |