US20090128550A1 - Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes - Google Patents

Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes Download PDF

Info

Publication number
US20090128550A1
US20090128550A1 US12/231,295 US23129508A US2009128550A1 US 20090128550 A1 US20090128550 A1 US 20090128550A1 US 23129508 A US23129508 A US 23129508A US 2009128550 A1 US2009128550 A1 US 2009128550A1
Authority
US
United States
Prior art keywords
graphics
gpu
parallel
rendering
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/231,295
Inventor
Reuven Bakalash
Yaniv Leviathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lucid Information Technology Ltd
Google LLC
Original Assignee
Lucid Information Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/IL2004/001069 external-priority patent/WO2005050557A2/en
Priority claimed from US11/340,402 external-priority patent/US7812844B2/en
Priority claimed from US11/648,160 external-priority patent/US8497865B2/en
Priority claimed from PCT/IB2007/003464 external-priority patent/WO2008004135A2/en
Priority claimed from US11/655,735 external-priority patent/US8085273B2/en
Priority claimed from US11/789,039 external-priority patent/US20070291040A1/en
Priority claimed from US11/897,536 external-priority patent/US7961194B2/en
Priority to US12/231,295 priority Critical patent/US20090128550A1/en
Application filed by Lucid Information Technology Ltd filed Critical Lucid Information Technology Ltd
Assigned to LUCID INFORMATION TECHNOLOGY, LTD. reassignment LUCID INFORMATION TECHNOLOGY, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAKALASH, REUVEN, LEVIATHAN, YANIV
Publication of US20090128550A1 publication Critical patent/US20090128550A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LUCIDLOGIX TECHNOLOGY LTD.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/52Parallel processing
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/14Display of multiple viewports
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/363Graphics controllers
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/39Control of the bit-mapped memory
    • G09G5/395Arrangements specially adapted for transferring the contents of the bit-mapped memory to the screen
    • G09G5/397Arrangements specially adapted for transferring the contents of two or more bit-mapped memories to the screen simultaneously, e.g. for mixing or overlay

Definitions

  • the present invention relates generally to the field of 3D computer graphics rendering, and more particularly, to ways of and means for improving the performance of parallel graphics processes running on 3D parallel graphics processing systems supporting the decomposition of 3D scene objects among its multiple graphics processing pipelines (GPPLs).
  • GPPLs graphics processing pipelines
  • FIG. 1 discloses diverse kinds of PC-level computing systems embodying different types of parallel graphics rendering subsystems (PGRSs) with graphics processing pipelines (GPPLs) generally illustrated in FIG. 1 .
  • the multi-pipeline architecture of such systems can be realized using GPU-based GPPLs of classical design, as shown in FIG. 2A , or alternatively, using more advanced GPU-based GPPLs, compliant with the DirectX 10 standard, as shown in FIG. 2C .
  • the multi-pipeline architecture of such systems can be realized using multi-core CPU based GPPLs as shown in FIG. 2C .
  • such graphics-based computing systems support multiple modes of graphics rendering parallelism across their GPPLs, including time, image and object division modes, which can be adaptively and dynamically switched into operation during the run-time of any graphics application running on the host computing system. While each mode of parallel operation has its advantages, as described in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, supra, the object division mode of parallel operation is particularly helpful during the running of interactive gaming applications because this mode has the potential of resolving many bottleneck conflicts which naturally accompany such demanding applications.
  • objects within a 3D scene are (i) automatically decomposed based on a specified criteria, and assigned/designated to particular GPUs, and (ii) distributed to the assigned/designated GPUs, so that the GPUs can render partial images of the 3D scene, based on the assigned/designated objects distributed thereto during parallel rendering operations, and ultimately, for these partial image fragments to be re-composited in a final color frame buffer (FB) of the primary GPPL, for display on one or more visual display devices.
  • FB color frame buffer
  • the pixel depth or z values of objects's images within the 3D scene must be analyzed/compared, during each image frame, against (i) the pixel depth values of other objects' images (which may be occluding a particular object during rendering), as well as (ii) the rear or background clipping plane represented within the 3D scene.
  • This depth-based image recomposition process is illustrated in FIGS.
  • 1 E 1 , 1 E 2 and 1 E 3 illustrates how local depth maps of objects assigned to particular GPUs are constructed within each GPU, and are used during the recomposition of partial image fragments generated within the color frame buffer of each GPU during the final stage of the object-division (OD) based image recomposition process.
  • OD object-division
  • FIG. 1 E 1 in conventional prior art Object Division, shows a simple scene, comprising of three objects, A, B and C.
  • An exemplary decomposition of this scene can be done by sending object A for rendering to GPU 1 , and objects B and C to GPU 2 .
  • FIGS. 1 E 2 and 1 E 3 show the color and Z buffers created by prior art Object Division method. From the given View Point object B (of GPU 2 ) is obstructed by object A (of GPU 1 ). Both Z-buffers, of GPU 1 and GPU 2 , create local depth maps, each map constructed from objects designated to the GPU. Each GPU is unaware of objects rendered by the other GPU, therefore such objects are not reflected in the Z-buffer of the GPU.
  • object-division mode of graphics parallelism has a number of important advantages over the other methods of parallel graphics rendering, for example: (i) responsiveness to user interface inputs; (ii) parallelization of the entire 3D graphics pipeline including the vertex as well as pixel parts thereof; (iii) the reduction of CPU-GPU transfer load; and (iv) the reduction of GPU memory requirements.
  • the OD mode of graphics parallelism suffers from a number of inherent shortcomings and drawbacks.
  • the object-division mode of parallelism requires a complex and intensive process of merging a plurality of partial image fragments buffered in color frame buffers (FBs), utilizing depth-based information stored in the Z buffers of the GPUs, involving in depth-based comparisons on a pixel-by-pixel basis, resulting in substantial time delays, significant bandwidth consumption, and high hardware costs.
  • FBs color frame buffers
  • objects being rendered at each GPU that are obstructed by objects rendered by other GPUs, are processed for rendering (i.e. drawn) as if these objects were visible.
  • these redundant portions are eliminated during the final image re-composition process, using depth-based comparisons, such redundant processing operations greatly decreases the efficiency of the object-division mode of parallelism.
  • each GPU When the anti-aliasing (AA) mode is operating during the object-division mode of parallelism, each GPU performs the correct anti-aliasing of its image fragments. However, some objects that are anti-aliased with their current background will become extrinsic to their new background when composed into the final image.
  • a primary object of the present invention is to provide a new and improved method of and apparatus for practicing parallel 3D graphics processes in modern multiple-GPU based computer graphics systems, based on the division of objects in 3D scenes, among multiple graphics processing pipelines (GPPLs), while avoiding the shortcomings and drawbacks associated with prior art apparatus and methodologies.
  • GPPLs graphics processing pipelines
  • Another object of the present invention is to provide a novel parallel graphics processing system (PGPS) embodied within a host computing system having (i) host memory space (HMS) for storing one or more graphics-based applications and a graphics library for generating graphics commands and data (GCAD) during the run-time (i.e. execution) of the graphics-based application, (ii) one or more CPUs for executing said graphics-based applications, and (iii) a display device for displaying images containing graphics during the execution of said graphics-based applications.
  • HMS host memory space
  • GCAD graphics commands and data
  • Another object of the present invention is to provide improved PC-level computing systems and architectures employing the parallel graphics processing technique of the present invention.
  • Another object of the present invention is to provide a parallel graphics processing subsystem supporting object division based parallelism among its GPPLs (e.g. GPU-based GPPLs), and performing pixel depth value comparison within each GPU using a common global depth map (GDM) during the pixel rendering process, in contrast to conventional approaches involving the use of Z-buffer comparisons during the final phase of image recomposition.
  • GPPLs e.g. GPU-based GPPLs
  • GDM global depth map
  • Another object of the present invention is to provide a novel method of parallel graphics processing based on object division parallelism among a plurality of GPPLs, and employing a global depth map (GDM), created by the graphics application, for use in z-depth tests during the pixel rendering process, and eliminating the shortcoming of z-buffer comparisons of all GPUs in regular object division.
  • GDM global depth map
  • Another object of the present invention is to provide a method of recompositing partial complementary-type images within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating partial complementary-type images within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs using GDMs created during a first GDM pass of a multi-pass parallel graphics processing method.
  • GDMs global depth maps
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs during a color-based pixel rendering process.
  • GDMs global depth maps
  • Another object of the present invention is to provide a method of providing global depth maps (GDMs) within multiple GPPLs, generated during a graphics application.
  • GDMs global depth maps
  • Another object of the present invention is to provide a method of generating images using a depthless image recomposition process within multiple GPPLs.
  • Another object of the present invention is to provide a novel Z-buffering mechanism for use in compositing a 3D scene in a 3D parallel graphics rendering system, comprising a (color) frame buffer (memory) having a color value for each pixel and a z-buffer with the same number of entries is provided for storing a z-value for each pixel in the frame buffer; and wherein the z-buffer is initialized to zero, representing the z-value at the back clipping plane of the 3D scene, wherein the frame buffer is initialized to the background color, and wherein the largest value that can be stored in the z-buffer represents the z value of the front clipping plane.
  • Another object of the present invention is to provide such a novel Z-buffering mechanism wherein polygons compositing the 3D scene are scan converted into the frame buffer in an arbitrary order, and wherein during the scan-conversion process, if the polygon being scan converted at point (x,y) is no farther from the viewer than the point whose color and depth are currently in the buffers, then the color and depth values of the new point is used to replace the old color and depth values stored at the point (x,y).
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which creates a global depth map (GDM) within each GPU in cases where such a global depth map is not provided by the graphics application, for use as a depth reference during Z-tests conducted throughout the graphics application, thereby eliminating object overdrawing and other shortcomings and drawbacks associated with conventional object division based parallel graphics rendering processes.
  • GDM global depth map
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which creates and uses a global depth map (GDM) within each GPU, for the purpose of testing the z-depth values of all objects in the 3D scene, thereby eliminating the shortcomings and drawbacks associated with using z-buffer comparisons from all GPUs, as performed in prior art object division based pixel rendering processes.
  • GDM global depth map
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which supports object division based parallelism among the GPPLs while providing an anti-aliasing process that is substantially free from the artifacts generated when using prior art object division based pixel rendering processes.
  • Another object of the present invention is to utilize a Global Depth Map created by the application, e.g. during a special Ambient Light Pass, for Z-test reference, enabling Depthless Image Recomposition Process.
  • Another object of the present invention is to provide a method of generating complementary-type partial images in each GPPL using the GDM and the object division based parallel rendering process.
  • Another object of the present invention is to provide a depthless image recomposition process for object division parallelism, creating a complete image frame of 3D scene, eliminating the need of comparing depth values of all GPUs as part of compositing process.
  • Another object of present invention is to provide an improved object division method free of anti-aliasing artifacts, in contrast to prior art object division method.
  • Another object of present invention is to create an improved object division method free of overdrawing effect, greatly increasing the efficiency of prior art object division parallelism.
  • FIG. 1A is a graphical representation of a PC-level based multi-GPPL parallel graphics rendering platform of the type disclosed in Applicants' U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, showing multi-CPUs, system memory, a system interface, and a plurality of GPPLs, with a display interface driving one or more graphics display screens;
  • FIG. 1B is a schematic representation of a plurality of GPU-based graphics processing pipelines (GPPLs), such as in nVidia's GeForce 7700 graphics subsystem, that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A ;
  • GPPLs GPU-based graphics processing pipelines
  • FIG. 1C is a schematic representation of a plurality of advanced GPU-based graphics processing pipelines (GPPLs), such as in nVidia's GeForce 8800 GTX graphics subsystem, that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A ;
  • GPPLs graphics processing pipelines
  • FIG. 1D is a schematic representation of a plurality of multicore-based graphics processing pipelines (GPPLs) that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A ;
  • GPPLs multicore-based graphics processing pipelines
  • FIG. 1 E 1 is a graphical illustration of a 3D scene modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 1A , operating in a classic object division (OD) mode of operation, wherein dual GPUs (GPU 1 and GPU 2 ) are provided, and three objects A, B and C are shown against a rectangular background frame, wherein cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z, wherein the 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU 2 , and wherein partial images of the 3D scene are rendered in the GPUs and stored in the Color Buffers, and finally recomposited within GPU 1 using pixel depth information maintained within the Z buffers of the GPUs;
  • OD object division
  • FIG. 1 E 2 is a schematic representation of the Color Buffer and Z (Depth) Buffer associated with GPU 1 employed in the dual-GPU embodiment of the parallel graphics rendering system of FIG. 1A operating in a classic Object Division Mode of operation, wherein the Color Buffer holds color values for the pixels of object A computed locally by GPU 1 , while the Z Buffer holds a local depth (z value) map for the pixels of object A also computed locally by GPU 1 ;
  • FIG. 1 E 3 is a schematic representation of Color Buffer and Z (Depth) Buffer associated with GPU 2 employed in dual-GPU embodiment of the parallel graphics rendering system of FIG. 1A operating in a classic Object Division Mode of operation, wherein the Color Buffer holds color values for the pixels of objects B and C computed locally by GPU 2 , while the Z Buffer holds a local depth (z value) map for the pixels of objects B and C, also computed locally by GPU 2 ;
  • FIG. 2A is a graphical illustration of a 3D scene modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 2C , carrying out a method of Depthless Image Recomposition (DIR) according to the present invention based an object division (OD) mode of parallel graphics processing operation, wherein dual GPUs (GPU 1 and GPU 2 ) are provided, and three objects A, B and C are shown against a rectangular background frame, wherein cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z, wherein the 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU 2 , and wherein partial complementary-type images of the 3D scene are rendered in the GPUs and stored in the Color Buffers, and finally recomposited within GPU 1 without using the global depth map (GDM) maintained within the Z buffers of the GPUs;
  • DIR
  • FIG. 2B is a high-level flow chart illustrating a generalized embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) providing a Global Depth Map (GDM) to each GPPL, for each image frame in the 3D scene to be generated, for use in rendering partial images of the 3D scene along a specified viewing direction, (b) generating complementary-type partial images in each GPPL, using the GDM and the object division based parallel rendering process according to the present invention, and (c) recompositing a complete image frame of the 3D scene using the depthless image recomposition (DIR) process of the present invention illustrated in FIGS. 3 B 1 and 3 B 2 (i.e. without the use of depth comparison);
  • GDM Global Depth Map
  • FIG. 2C is a schematic representation illustrating the three primary stages of the generalized method of the present invention carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, operating in an object division (OD) mode of operation according to the present invention
  • each GPPL includes (i) a GPU having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer, and wherein (a) the first stage involves providing a Global Data Map (GDM) to the Z buffer of each GPPL, by transmitting graphics commands and data to all GPPLs, (b) the second stage involves generating a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs, and (c) the third stage involves recompositing a complete image frame within the primary
  • FIG. 2 D 1 is a schematic representation of the complementary-type partial image generation process of the present invention carried out within GPU 1 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C , wherein a Global Depth Map (GDM) is generated within the Z Buffer for all objects within the 3D scene (showing three different depth values namely the background having the highest depth ( 2415 ), wherein object A is closest to the viewer, has the lowest depth value ( 2416 ), its pixels have passed the Z-test and their depth values are written to the Z Buffer of GPU 1 , wherein object C ( 2414 ) has a middle depth value, its pixels have passed the z-test and their depth values are written to the Z buffer of GPU 1 , wherein object B has the deepest depth values, its pixels have all failed the z-test and their depth values have been replaced by the depth values of its occluding object A ( 2416 ) written in the Z Buffer in GPU 1 , and wherein a color-based complementary-type partial image is generated within the
  • FIG. 2 D 2 is a schematic representation of the complementary-type partial image generation process of the present invention carried out within GPU 2 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C , wherein a Global Depth Map (GDM) is generated within the Z Buffer for all objects within the 3D scene (showing three different depth values namely, the background having the highest depth ( 2415 ), wherein the Z Buffer holds the Global Depth Map ( 242 ) identical to those depth values in the Z Buffer of GPU 1 ( 2411 ), and wherein a color-based complementary-type partial image is generated within the Color Buffer of GPU 2 by recompositing (i) the pixels of non-assigned objects A rendered/drawn without color (i.e.
  • GDM Global Depth Map
  • FIG. 2 D 3 is a schematic representation illustrating the depthless method of image recomposition according to the principles of the present invention, carried out within the dual-GPU embodiment of the parallel graphics rendering system shown in FIG. 2C , wherein partial complementary images generated and buffered within GPPL 1 and GPPL 2 are recomposited (i.e. combined) by merging, in puzzle-like manner, to form a full color image frame of the 3D scene, without using any depth value information stored in the Z buffers of these GPPLs;
  • FIG. 2 E 1 is a schematic representation illustrating the depthless method of image recomposition according to the principles of the present invention, carried out within an eight-GPU embodiment of the parallel graphics rendering system, wherein (1) during the first level of hierarchical image merging involves four sub-stages of image merging, namely, (i) the partial complementary image generated and buffered within GPPL 1 is merged with the partial complementary image generated and buffered within GPPL 2 without using any depth value information stored in the Z buffers of these GPPLs, (ii) the partial complementary image generated and buffered within GPPL 3 is merged with the partial complementary image generated and buffered within GPPL 4 without using any depth value information stored in the Z buffers of these GPPLs, (iii) the partial complementary image generated and buffered within GPPL 5 is merged with the partial complementary image generated and buffered within GPPL 6 without using any depth value information stored in the Z buffers of these GPPLs, and (iv) partial complementary image generated and buffered within
  • FIG. 2 E 2 is a flow chart illustrating the primary steps of the depthless method of recompositing image frames of a 3D scene from partial complementary images, carried out over n hierarchical levels or stages of using depthless complementary image merging operations, wherein at each (n ⁇ 1)th level, pairs of source and target partial complementary images are merged into a target complementary image, for use at the nth level of processing, according to the principles of the present invention;
  • FIG. 2 E 3 is a flow chart illustrating the complementary image merging process carried out between a pair of partial complementary images buffered in the color buffers of a pair of GPPLs, wherein the addition of all pixels of source image and target images occurs within the target GPPL using its pixel shader processor running the shader merge code, and wherein the image merge result may become the source image for the next hierarchical step in the multi-level complementary image merging process of the present invention;
  • FIG. 3 A 1 is a high-level flow chart illustrating a first illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during the first special rendering pass (i.e. GDM Creation Pass), generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data to all GPPLs equally for pixel depth (z) testing, (b) during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention, and (c) after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3 D 3 , 2 E 1 , 2 E 2 and 3 E 3 ;
  • GDM global depth map
  • FIG. 3 A 2 is a schematic representation illustrating the three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves during the special rendering pass (i.e.
  • GDM creating pass providing a Global Data Map (GDM) to the Z buffer of each GPPL involving the transmission of graphics commands and data to all GPPLs for all objects in the frame of the 3D scene to be rendered
  • the second stage involves, for subsequent passes, generating a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs
  • the third phase involves recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention
  • FIG. 3 A 3 is a graphical representation of a Hash Table ( 3112 ) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first phase of the method of FIG. 3 A 4 , and a Current State Buffer ( 4111 ) for storing a draw command;
  • FIG. 3 A 4 is a flowchart illustrating the steps performed during the first illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3 A 1 , with the pixels of objects assigned to a GPPL being normally rendered in color within the GPPL;
  • FIG. 3 B 1 is a high-level flow chart illustrating a second illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during a first special rendering pass (i.e. GDM Creation Pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e.
  • a first special rendering pass i.e. GDM Creation Pass
  • GDM global depth map
  • FIG. 3 B 2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves, during a first special pass (i.e. GDM creating pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e.
  • a first special pass i.e. GDM creating pass
  • GDM global depth map
  • the second stage involves generating, for subsequent passes, a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs
  • the third stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention
  • FIG. 3 B 3 is a graphical representation of a Hash Table ( 3112 ) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 3 B 4 A and 3 B 4 B, and a Current State Buffer ( 4111 ) for storing a draw command;
  • FIGS. 3 B 4 A and 3 B 4 B are flowcharts illustrating the steps performed during the second illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3 B 1 , wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 4A is a third illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during each pass of the multi-pass method, (i) generating global depth map (GDM) values for each detoured object transmitted to each GPPL, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs, thereby generating complementary-type partial images in each GPPL, and (c) after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3 D 3 , 2 E 1 , 2 E 2 and 3 E 3 ;
  • GDM global depth map
  • FIG. 4B is a schematic representation illustrating the two primary stages of the third illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves (i) during each pass of the multi-pass method, generating global depth map (GDM) values for each detoured object transmitted to each GPPL, (ii) rendering without color (i.e.
  • GDM global depth map
  • the second stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention;
  • FIG. 4C is a graphical representation of a Hash Table ( 5112 ) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 4 D 1 and 4 D 2 , and a Current State Buffer ( 5111 ) for storing a draw command;
  • FIGS. 4 D 1 and 4 D 2 are flowcharts illustrating the steps performed during the third illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A , wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 5A is a fourth illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during a first special Ambient Light Pass of the multi-pass method, generating a global depth map (GDM) within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers, (b) during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e.
  • GDM global depth map
  • FIG. 5B is a schematic representation illustrating the three primary stages of the third illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves of (a) during a first special Ambient Light Pass of the multi-pass method, generating a global depth map (GDM) within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers, (b) the second stage involves, during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color.(i.e.
  • the third stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention;
  • FIG. 5C is a graphical representation of a Hash Table in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 5 D 1 and 5 D 2 , and a Current State Buffer ( 5111 ) for storing a draw command;
  • FIGS. 5 D 1 and 5 D 2 are flowcharts illustrating the steps performed during the fourth illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A , wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 6A is a schematic representation of PC-based host computing system of the present invention (a) embodying an illustrative embodiment of the parallel 3D graphics processing system (PGPS) of the present invention illustrated throughout FIGS. 2A through 5D , and (b) comprising (i) a parallel mode control module (PMCM), (ii) a parallel graphics processing subsystem for supporting the parallelization stages of decomposition, distribution and re-composition implemented using a decomposition module, a distribution module and a re-composition module, respectively, and (ii) a plurality of either GPU and/or CPU based graphics processing pipelines (GPPLs) operated in a parallel manner under the control of the PMCM;
  • PMCM parallel mode control module
  • GPPLs GPU and/or CPU based graphics processing pipelines
  • FIG. 6 B 1 is a schematic representation of the subcomponents of a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the PGPS of the present invention depicted in FIG. 6A , shown comprising (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers;
  • a graphics processing unit GPU supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers;
  • FIG. 6 B 2 is a schematic representation of the subcomponents of a second illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the PGPS of the present invention depicted in FIG. 6A , shown comprising (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers;
  • a graphics processing unit GPU supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating
  • FIG. 6 B 3 is a schematic representation of the subcomponents of an illustrative embodiment of a CPU-based graphics processing pipeline that can be employed in the PGPS of the present invention depicted in FIG. 6A , and shown comprising (i) a video memory structure supporting a frame buffer including stencil, depth and color buffers, and (ii) a graphics processing pipeline realized by one cell of a multi-core CPU chip, consisting of 16 in-order SIMD processors, and further including a GPU-specific extension, namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline;
  • FIG. 6C is a schematic representation illustrating the pipelined structure of the parallel graphics processing system (PGPS) of the present invention shown driving a plurality of GPPLs, wherein the decomposition module supports the scanning of commands, the control of commands, the tracking of objects, the balancing of loads, and the assignment of objects to GPPLs, wherein the distribution module supports transmission of graphics data (e.g.
  • FB data, commands, textures, geometric data and other data in various modes including CPU-to/from-GPU, inter-GPPL, broadcast, hub-to/from-CPU, and hub-to/from-CPU and hub-to/from-GPPL, and wherein the re-composition module supports the merging of partial image fragments in the Color Buffers of the GPPLs in a variety of ways, in accordance with the principles of the present invention (e.g. merge color frame buffers without z buffers, merge color buffers using stencil assisted processing, and other modes of partial image merging);
  • FIGS. 7 A 1 A and 7 A 1 B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A , the primary steps of the method of FIG. 3 A 4 are implemented;
  • FIG. 7 A 2 is a schematic representation illustrating the three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • HMS host memory space
  • FIGS. 7 B 1 A and 7 B 1 B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A , the primary steps of the methods of FIGS. 3 B 4 A and 3 B 4 B are implemented;
  • FIG. 7 B 2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • HMS host memory space
  • FIGS. 7 C 1 A and 7 C 1 B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A , the primary steps of the methods of FIGS. 4 D 1 and 4 D 2 are implemented;
  • FIG. 7 C 2 is a schematic representation illustrating the two primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • HMS host memory space
  • FIGS. 7 D 1 A and 7 D 1 B are a flowchart illustrating in which modules of the parallel graphics processing system of FIG. 6A , the primary steps of the methods of FIGS. 5 D 1 and 5 D 2 are implemented;
  • FIG. 7 D 2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • HMS host memory space
  • FIG. 8A is a schematic representation of a first illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem resides as a software package in the Host or CPU Memory Space (HMS) while multiple GPUs on external GPU cards are connected to a North bridge circuit, implement the Rendering and Recomposition Modules, and are driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e.
  • the Distribution Module uses the North bridge circuit to distribute graphic commands and data (GCAD) to the external GPUs
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to an external graphics card via a PCI-express interface;
  • FIG. 8B is a schematic representation of a second illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition and Distribution and Modules of the Parallel Graphics Rendering Subsystem resides as a software package in the Host or CPU Memory Space (HMS) while the Rendering and Recomposition Modules are realized across multiple GPUs connected to a bridge circuit (having an internal IPD) as well as on external graphic cards connected to the North memory bridge chip and driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e.
  • the Distribution Module uses the bridge chip to distribute the graphic commands and data (GCAD) to the multiple GPUs located on the external graphics cards
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to one of the external graphics cards or the IGD;
  • FIG. 8C is a schematic representation of a third illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) 400 and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host Memory Space (HMS) while a single GPU is supported on a CPU/GPU fusion-architecture processor die (alongside the CPU), one or more GPUs are supported on an external graphic card connected to a bridge circuit and driven in a parallelized manner under the control of the PMCM, and the Rendering and Recomposition Modules are realized across the GPUs on the graphics card (ii) the Decomposition Module divides (i.e.
  • the Distribution Module uses the memory controller (controlling the HMS) and the interconnect network (e.g. crossbar switch) within the CPU/GPU processor chip to distribute graphic commands and data to the multiple GPUs on the CPU/GPU die chip and on the external graphics cards,
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport on the graphics card, as well as memory controller and interconnect (e.g.
  • crossbar switch within the CPU/GPU processor chip, to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to the external graphics card via a PCI-express interface connected to the bridge circuit;
  • FIG. 8 D 1 is a schematic representation of a fourth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host Memory Space (HMS) while a second cluster of CPU cores on a multi-core CPU chip function as a CPU and a second cluster of CPU cores are used to implement a plurality of multi-core graphics pipelines (GPPLs) (i.e.
  • PMCM Parallelization Mode Control Module
  • HMS Host Memory Space
  • GPPLs multi-core graphics pipelines
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode
  • the Distribution Module uses the bridge circuit and interconnect network within the multi-core CPU chip to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented on the multi-core CPU chip
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport as well as the bridge and interconnect network within the multi-core CPU chip to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate
  • FIG. 8 D 2 is a schematic representation of a fifth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host or CPU Memory Space (HMS) while a first cluster of CPU cores on the multi-core CPU chips on external graphics cards function as GPPLs and implement the Re-composition Module across a plurality of the GPPLs whereas a second cluster of CPU cores function as GPPLs and implement the Rendering Module, (ii) the Decomposition Module divides (i.e.
  • PMCM Parallelization Mode Control Module
  • HMS CPU Memory Space
  • the Distribution Module uses the North bridge circuit and interconnect networks within the multi-core CPU chips (on the external cards) to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented thereon,
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses interconnect networks within the multi-core CPU chips to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL, via a display interface;
  • FIG. 8E is a schematic representation of a sixth illustrative embodiment of the MMPGRS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host or CPU Memory Space (HMS) while the Decomposition Submodule No. 2 and Distribution Module are realized within a single graphics hub device (e.g.
  • PMCM Parallel Mode Control Module
  • HMS CPU Memory Space
  • the Decomposition Submodule No. 1 transfers graphic commands and data (GCAD) to the Decomposition Submodule No. 2 via the bridge circuit
  • the Decomposition Submodule No. 2 divides (i.e.
  • the Distribution Module distributes graphic commands and data (GCAD) to the external GPUs
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card;
  • FIG. 8F is a schematic representation of an seventh illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM), including the Distribution Management Submodule, and the Decomposition Module reside as a software package in the Host Memory Space (HMS) of the host computing system, while the Distribution Module and interconnect transport are realized within a single graphics hub device (e.g. chip) that is connected to the bridge circuit of the host computing system and a cluster of external GPUs implementing the Rendering and Recomposition Modules, and that all of the GPUs are driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e.
  • the Distribution Management Module within the PMCM distributes the graphic commands and data (GCAD) to the external GPUs via the bridge circuit and interconnect transport mechanism,
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card(s);
  • FIG. 8G is a schematic representation of a eighth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS) while the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on within a bridge circuit on the motherboard within the host computing system, with the Rendering Module and the Recomposition Module being implemented by a plurality of GPUs driven in a parallelized under the control of the PMCM, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No.
  • PMCM Parallel Mode Control Module
  • HMS Host Memory Space
  • the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on within a bridge circuit on the motherboard within the host computing system, with the Rendering
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iv) the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface;
  • FIG. 8H is a schematic representation of a ninth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS) while the Decomposition Submodule No.
  • PMCM Parallel Mode Control Module
  • HMS Host Memory Space
  • the Distribution Module is realized (as a graphics hub) on the die of a hybrid CPU/GPU fusion-architecture chip within the host computing system and having a single GPU driven with one or more GPUs on an external graphics card (connected to the CPU/GPU chip) in a parallelized under the control of the PMCM, and GPUs on the external graphics card are used to implement the Recomposition Module, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, (iii) the Decomposition Submodule No. 2 divides (i.e.
  • the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface;
  • FIG. 8I is a schematic representation of a tenth illustrative embodiment of the PGPS of the present invention embodied in a game console system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 are realized as a software package within the Host Memory Space (HMS), while the Decomposition Submodule No. 2 and the Distribution Module are realized as a graphics hub semiconductor chip within the game console system, and the Rendering and Recomposition Modules are implemented by multiple GPPLs supported on the game console board and driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No.
  • PMCM Parallel Mode Control Module
  • HMS Host Memory Space
  • the Decomposition Submodule No. 2 and the Distribution Module are realized as a graphics hub semiconductor chip within the game console system
  • the Rendering and Recomposition Modules are implemented by multiple GPPLs
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iv) the Distribution Module distributes the graphic commands and data (GCAD) to the multiple GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU via an analog display interface.
  • FIGS. 3A through 8I in the accompanying Drawings the various illustrative embodiments of the 3D parallel graphics rendering system (PGPS) and 3D parallel graphics rendering process (PGRP) of the present invention will now be described in great technical detail, wherein like elements will be indicated using like reference numerals.
  • PGPS 3D parallel graphics rendering system
  • PGRP 3D parallel graphics rendering process
  • one aspect of the present invention teaches a new way of and means for recompositing images of 3D scenes, represented within a 3D parallel graphics rendering subsystem (PGPS) supporting object division parallelism among its multiple graphics processing pipelines (GPPLs), but without performing pixel z-depth comparisons, which are otherwise required by prior art systems and processes.
  • PGPS 3D parallel graphics rendering subsystem
  • GPPLs multiple graphics processing pipelines
  • the performance of 3D graphics rendering processes and subsystems can be significantly improved using the principles of the present invention.
  • the image recomposition method of the present invention can be practiced within any parallel graphics processing system (PGPS) having multiple GPPLs driven in an object division mode of parallelism, or hybrid mode of parallel operation employing a combination of object and image/screen division techniques and/or principles.
  • PGPS parallel graphics processing system
  • the method of the present invention is embodied within a system employing objective-division mode of parallelism.
  • the image recomposition method and apparatus of the present invention can be practiced within conventional computing platforms (e.g. PCs, laptops, servers, etc.) as well as silicon level graphics systems (e.g. graphics system on chip (SOC) implementations, integrated graphics device IGD implementations, and hybrid CPU/GPU die implementations).
  • conventional computing platforms e.g. PCs, laptops, servers, etc.
  • silicon level graphics systems e.g. graphics system on chip (SOC) implementations, integrated graphics device IGD implementations, and hybrid CPU/GPU die implementations.
  • the generalized embodiment of the method of parallel graphics processing according to the present invention comprises several steps, namely: (a) at the first step 221 , providing a Global Depth Map (GDM) to each GPPL, for each image frame in the 3D scene 231 be generated, for use in rendering partial images of the 3D scene along a specified viewing direction; (b) at the second step 222 , generating complementary-type partial images in each GPPL, using the GDM and the object division based parallel rendering process according to the present invention; and (c) at the third step 223 , recompositing a complete image frame of the 3D scene using the depthless image recomposition (DIR) process of the present invention illustrated in FIGS. 3 B 1 and 3 B 2 (i.e. without the use of depth comparison).
  • DIR depthless image recomposition
  • the method of parallel graphics processing can involve, and will typically employ, multiple parallel graphics rendering passes, so that most, but not necessary all, illustrative embodiments of the method of the present invention will be “multi-pass” in character and nature.
  • steps (a) and (b) can be integrated within a single pass of a multi-pass method of parallel graphics processing, while steps (b) and (c) can be carried out in subsequent passes of the multi-pass parallel graphics rendering process of the present invention. This different embodiments are described in FIGS. 3 A 1 through 5 D.
  • GDM Global Depth Map
  • depth values of all objects of the scene must be imported to each GPU, and stored in its Z buffer. This is done as follows: while all objects originally designated to the GPU are drawn normally, the other GPU's objects are brought to the GPU for their depth values only, ignoring their color and texture. Thus while the Z-buffer is being updated for object's depth, only a black silhouette of the imported object is drawn in the color buffer, as indicated in FIG. 2 D 1 .
  • This method of “black rendering” is inexpensive because it avoids altogether the heavy processing associated with shading, texturing, and other pixel processing required during normal drawing operations with color values.
  • each object that is designated/assigned to a particular GPU must also be imported to other GPUs for “black rendering” purposes, i.e. so as to update the Global Depth Map GDM being stored in the Z buffers of other non-designated GPUs.
  • Those pixels of a “black rendered” object that have passed the Z test are drawn in color buffers as black silhouette of the object.
  • GDM Global Depth Map
  • FIG. 2B a 3D scene is shown modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 2C , adapted to carry out a method of Depthless Image Recomposition (DIR) according to the present invention based, an object division (OD) mode of parallel graphics processing operation.
  • DIR Depthless Image Recomposition
  • OD object division
  • dual GPUs GPU 1 and GPU 2
  • three objects A, B and C are shown against a rectangular background frame.
  • cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z.
  • the 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU 2 .
  • the partial complementary-type images of the 3D scene are rendered in the GPU 1 and GPU 2 and stored in their respective Color Buffers, and finally recomposited within GPU 1 without using the global depth map (GDM) maintained within the Z Buffers of the GPUs.
  • GDM global depth map
  • GPPL 1 includes (i) a GPU 1 having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer.
  • GPPL 2 includes (i) a GPU 2 having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer.
  • the first stage involves providing a Global Data Map (GDM) to the Z buffer of each GPU, by transmitting graphics commands and data to all GPPLs.
  • GDM Global Data Map
  • the second stage involves generating a complementary-type partial images within the color buffer of each GPU using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs.
  • the third stage involves recompositing a complete image frame within the primary GPU, from the complementary-type partial images stored in the color buffers, using the depthless recomposition process of the present invention.
  • the image recompositing stage ( 233 ) is performed after all the intra and inter-GPU Z-tests have been completed, making the final comparison of Z-buffers needless. Therefore, for the case of dual GPUs (i.e. GPU 1 and GPU 2 ), the recompositing process of the present invention involves only merging the color Frame Buffers of GPU 1 and GPU 2 , and no depth comparison operations are involved.
  • the depthless image recomposition process as will be described below with reference to FIGS. 2 D 1 through 2 E 3 .
  • FIGS. 2 D 1 through 2 D 3 the Complementary-Type Partial Image Generation Process of the present invention is graphically illustrated in connection with the dual-GPU embodiment of the parallel graphics rendering system of the illustrative embodiment, supporting GPU 1 and GPU 2 .
  • FIG. 2 D 1 illustrates the complementary-type partial image generation process of the present invention carried out within GPU 1 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C .
  • a Global Depth Map GDM
  • GDM Global Depth Map
  • the object A is closest to the viewer, has the lowest depth value ( 2416 ), and its pixels have passed the Z-test and their depth values are written to the Z Buffer of GPU 1 .
  • Object C ( 2414 ) has a middle depth value, its pixels have passed the z-test filter, and their depth values are written to the Z buffer of GPU 1 .
  • object B has the deepest depth values, its pixels have all failed the z-test and their depth values have been replaced by the depth values of its occluding object A ( 2416 ) written in the Z Buffer in GPU 1 .
  • a color-based complementary-type partial image is generated within the Color Buffer of GPU 1 by recompositing (iii) the pixels of assigned object A rendered/drawn in color, (ii) the pixels of non-assigned object C drawn without color (i.e. black), and (iii) the pixels of assigned object B which are overwritten by the color pixels of the assigned occluding object A, which is closer to the viewer than object B.
  • FIG. 2 D 2 illustrates the complementary-type partial image generation process of the present invention carried out within GPU 2 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C .
  • the Z Buffer in GPU 2 holds a Global Depth Map ( 2422 ) which is identical to those depth values of the GDM held in the Z Buffer of GPU 1 ( 2411 ).
  • a color-based complementary-type partial image is generated within the Color Buffer of GPU 2 by recompositing (i) the pixels of non-assigned objects A rendered/drawn without color (i.e.
  • the depthless method of image recomposition is carried out within the dual-GPU embodiment of the parallel graphics rendering system shown in FIG. 2C , by simply combining, in puzzle-like manner, through merging, the partial complementary images generated and buffered within GPPL 1 and GPPL 2 so as to form, the Color Frame Buffer of GPU 1 (i.e. the primary GPU), a full color image frame of the 3D scene, without using any depth value information stored in the Z buffers of these GPUs.
  • the depthless image recomposition process according to the present invention involves performing a hierarchical complementary recomposition process, as illustrated in FIGS. 2 E 1 , 2 E 2 and 2 E 3 , described below.
  • FIG. 2 E 1 the depthless method of image recomposition according to the principles of the present invention is shown carried out, in a hierarchical manner, within an eight-GPPL (e.g. 8-GPU) embodiment of the parallel graphics processing system.
  • the process is performed hierarchically in “log 2 n” merging steps, where n is the number of GPPLs employed in the parallel graphics processing platform.
  • n is the number of GPPLs employed in the parallel graphics processing platform.
  • the partial complementary color images in the Color Buffers of pairs of GPPLs (identified as source GPPL and target GPPL) are merged without the use of any depth value information. Therefore, there are no depth (Z) buffers involved in the depthless image recomposition process according to the principles of the present invention.
  • the source and target images, buffered in the source and target GPPLs are complementary-type images, in accordance with the principles of the present invention, i.e. at a given x,y position in an image, at most only one GPPL can hold a non zero pixel value (i.e. the visible pixel) which has survived the z-test against the GDM stored in the Z Buffers of all GPPLs.
  • the following operations are performed: (i) the partial complementary image generated and buffered within GPPL 1 is merged with the partial complementary image generated and buffered within GPPL 2 without using any depth value information stored in the Z buffers of these GPPLs; (ii) the partial complementary image generated and buffered within GPPL 3 is merged with the partial complementary image generated and buffered within GPPL 4 without using any depth value information stored in the Z buffers of these GPPLs; (iii) the partial complementary image generated and buffered within GPPL 5 is merged with the partial complementary image generated and buffered within GPPL 6 without using any depth value information stored in the Z buffers of these GPPLs, and (iv) partial complementary image generated and buffered within GPPL 7 is merged with the partial complementary image generated and buffered within GPPL 8 without using any depth value information stored in the Z buffers of these GPPLs.
  • the following operations are performed: (i) the partial complementary image recomposited and buffered within GPPL 2 is merged with the partial complementary image generated and buffered within GPPL 4 without using any depth value information stored in the Z buffers of these GPPLs; and (ii) the partial complementary image recomposited and buffered within GPPL 6 is merged with the partial complementary image generated and buffered within GPPL 8 without using any depth value information stored in the Z buffers of these GPPLs.
  • the partial complementary image recomposited and buffered within GPPL 4 is merged with the partial complementary image generated and buffered within GPPL 8 (the primary GPPL) without using any depth value information stored in the Z buffers of these GPPLs, so as to generate a complete color image frame of the 3D scene within GPPL 8 , without using any depth value information stored in the Z buffers of these GPPLs.
  • FIG. 2 E 2 a generalized method of depthless recompositing image frames of a 3D scene from partial complementary images is described using a parallel graphics processing platform having n GPPLs, and wherein image merging occurs at log 2 n hierarchical levels or stages. At each level, pairs of source and target images are merged into target image ( 25230 ). In general, the process can be carried out over n hierarchical levels of depthless complementary image merging operations, wherein at each (n ⁇ 1)th level, pairs of source and target partial complementary images are merged into a target complementary image, for subsequent use at the nth level of processing, according to the principles of the present invention.
  • the first step of the method involves the system commencing of partial complementary image merge processing, at the first hierarchical level.
  • FIG. 2 E 3 illustrates the complementary image merging process carried out between a pair of partial complementary images buffered in the color buffers of a single pair of GPPLs.
  • the addition of all pixel values in the source image (tex 2 ) and the target image (tex 1 ) occurs within the target GPPL using its pixel shader processor running the shader merge code ( 25346 ).
  • the image merge result (tex 1 ) may become the source image (tex 2 ) for the next hierarchical step in the multi-level complementary image merging process of the present invention.
  • partial complementary-type color images are rendered in the target and source GPPLs, according to the principles of the present invention, and stored in the color Frame Buffer of the GPPLs.
  • the partial complementary-type color images are copied from the color Frame Buffer in the target and source GPPLs, into their respective texture memory, and indicated as “tex 1 ” and “tex 2 ” images, respectively.
  • the Shader's merge code (program) is downloaded and run using “tex 1 ” and “tex 2 ” images, and performs the operations indicated at Blocks 25347 through 25350 , which will be described below.
  • the program determines whether or not all of the x,y locations of the image have been recomposited, and if not, then the process returns to Block 25347 and repeats the pixel merging process for the next x,y image frame location. If all x,y locations in the image frame have been processed (i.e. merged), thes the program moves the merged image tex 1 to the color buffer in the primary GPPL, and the process is completed for the particular image frame being generated for display.
  • GDM global depth map
  • the Special GDM-Creation Pass version wherein all Z values are distributed to all GPUs during a special single first pass
  • Global Depth Map Creation Pass performed at the beginning of each frame, so as to generate a GDM for the image frame, stored within the Z buffer of each GPPL; (ii) a second method “the Special GDM-Creation Pass, with color rendering of detoured objects in selected GPU,” which is a variation of the ‘GDM Creation Pass’ method described above, wherein the difference is that the Global Depth Pass includes also normal color rendering of each detoured object in selected GPU, in addition to the updating of the Global Depth Map (GDM) in all GPUs.
  • GDM Global Depth Map Creation Pass
  • a third method called the “Regular Course GDM Creation version”, wherein the Z values of each object are distributed to their designated/assigned GPUs during the regular course of normal rendering in a graphics application; and
  • a fourth method called the Application Provided GDM version, wherein the graphics application generates a GDM for its own purposes, e.g. for Shadow Volumes, and provides the GDM to the GPPLs for use in graphics rendering operations in accordance with the principles of the present invention.
  • FIG. 3 A 1 a first illustrative embodiment of the method of parallel graphics processing according to the present invention is shown and described as comprising three primary steps, indicated at Blocks 3111 , 3112 and 3113 in FIG. 3 A 1 .
  • a global depth map (GDM) is generated within each GPPL, by a process involving the broadcasting of graphics commands and data to all GPPLs equally, for pixel depth (z) testing.
  • This first special rendering pass occurs once, for each image frame to be rendered, during the multi-pass graphics rendering method of the present invention.
  • the first special GDM creation pass indicated at Block 3111 in FIG. 3 A 1 employs an object tracking mechanism comprising a current state buffer 4111 , 5111 , and a hash table of states ( 4112 , 5112 ), illustrated in FIGS.
  • the current state buffer is used to hold the current state, and is updated by draw commands and state commands.
  • the Hash table of states is used to register the first appearance of all objects (i.e. each entry in the hash table is considered a full state of an object).
  • a complementary-type partial image is generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention.
  • FIG. 3 A 2 illustrates the graphics pipeline activity along three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention.
  • the GDM creation pass is performed and the GDM is provided within the Z buffer of each GPU.
  • multiple rendering passes are performed the partial complementary-type color images are generated in the color buffers of the Color Buffer.
  • depthless compositing is performed.
  • decompositing of objects and load balancing is controlled by a software based decomposition module residing in the host, as will be described in greater detail hereinafter.
  • the following example considers the case of a parallel graphics processing system employing only two GPUs, however it can be extended to any number of GPUs.
  • the first stage at Block 3121 involves, during the special rendering pass (i.e. GDM creating pass), providing a Global Data Map (GDM) to the Z buffer of each GPPL.
  • GDM Global Data Map
  • graphics commands and data are transmitted to all GPPLs (i.e. equally broadcasted to both GPUs) for all objects in the frame of the 3D scene to be rendered, as indicated by the broken-line arrows.
  • the goal is performing the Z-test on all objects and populating the Z-buffer ( 3126 ) in each GPU without any drawing into the color buffer.
  • the final result of this pass is the GDM stored in Z-buffer, as a reference for all Z-tests in the subsequent passes.
  • each entry of the Hash Table ( 3132 ) in FIG. 3 A 3 holds the state of a primitive (object), which is not assigned to any GPU, for tracking the “appearance(s)” of object primitives.
  • the Current State Buffer ( 3131 ) is provided for storing a draw command.
  • primitive object is a group of one or more primitive graphics elements, drawn by a single draw call.
  • a primitive graphics element generally refers to a basic shape, such as point, line, or triangle.
  • the appearance of the object is defined by the state of the object, that includes information on its vertex array, index array, vertex shader parameters, pixel shader parameters, transformation matrix, skinning transformation matrix, and state parameters (e.g. RenderState-blending related, SamplerState-filter, etc.).
  • state parameters e.g. RenderState-blending related, SamplerState-filter, etc.
  • the entire state defines the exact appearance of the object in the scene. For example, the same character (e.g. soldier), geometrically defined by given vertex and index buffers, can appear in a graphics game several times in various locations and forms by just modifying its transformation matrix, i.e. modifying its state.
  • the state of an object is shaped by two commands: the State command, and the Draw Primitive command.
  • the current state of an object is an accumulation of these two commands.
  • the appearance of an object in the stream of geometric data is considered as a first appearance (or debut), only if this exact state did not occur (i.e. happen) before in the system.
  • An additional appearance of an object is considered a successive appearance if, and only if, it appears in exactly the same state as it had before.
  • a modified state creates another first appearance of object.
  • This first pass creates global depth maps (GDMs) in all GPUs by delivering the depth value of each object to the Z buffers.
  • the depth value of an object is registered in the global depth map (GDM) for only the first appearance of an object. Therefore, during this first GDM creation pass where no color rendering occurs (i.e. writing into the Color FB is disabled), all draw commands are scanned for the first appearance of each object, which is represented by the current State Buffer ( 4111 ). While the State Buffer is being registered in an entry of the Hash Table ( 4112 ), the object is sent to all GPUs for Z-testing and updating of the Global Depth Map in Z-buffers. Writing into the color FB is disabled. Upon completion of this pass of objects' debuts all GPUs hold global depth map. The successive passes keep behaving according to the original application's schedule.
  • the second stage involves, during subsequent passes, generating complementary-type partial images within the color buffer of each GPPL.
  • This step involves using the GDM and the Z Test Filter in each GPU, and transmitting graphics commands and data to only assigned GPPLs, as indicated by the solid-line arrows.
  • the scene is decomposed between GPUs. The exact decomposition of objects may change from pass to pass, according to dynamic load balance considerations.
  • Each GPU renders incoming objects ( 3128 ) into its Color Buffer ( 3130 ), using the graphics commands and data associated with the objects, while z-testing the pixel depth values of each object against the GDM stored in the Z-buffer 3129 .
  • the third or last phase involves recompositing a complete image frame within the primary GPPL (i.e. GPU 1 ), from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention, described hereinabove.
  • This depthless recompositing process involves moving the complementary partial image in the secondary color buffer, into the primary color buffer of GPU 1 and merging these partial images in accordance with the principles of the present invention, and then displaying the partial image fragments.
  • FIG. 3 A 4 illustrates the first illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3 A 1 .
  • the single specialized GDM creation pass is carried out in the Block 3121 .
  • the pixels of objects, assigned to a GPPL are normally rendered in color within the GPPL by the steps indicated at Blocks 3122 .
  • the partial color complementary-type images are recomposited within the primary GPU, then the fully composited image within the primary GPU is displayed on the display device.
  • FIG. 3 A 4 illustrates the first illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3 A 1 .
  • the single specialized GDM creation pass is carried out in the Block 3121 .
  • the pixels of objects, assigned to a GPPL are normally rendered in color within the GPPL by the steps indicated at Blocks 3122 .
  • the partial color complementary-type images are recomposited within the primary GPU, then the fully composited image within the primary
  • the extra pass is performed so as to create the GDM in all GPUs.
  • the first Block 3121 in FIG. 3 A 1 is realized by Blocks 31401 through 31412 .
  • the pass starts by initializing the color buffers with black color values and scanning all the graphics commands for the frame to be rendered from the 3D scene, from a specified viewing direction.
  • the CPU analyzes the stream of commands associated with the image frame to be rendered, and when the end of the command stream is detected, the process moves to the multi-pass rendering stage 3122 , and while the end of the command stream is not detected, then the process proceeds to Block 31403 .
  • a ‘State command’ is encountered at Block 4203 , it is used to update ( 4204 ) the current state buffer ( 4111 ).
  • the current state of the object is updated in the state buffer at Block at 31406 , and the Hash Table is scanned at Block 31407 for the appearance of the object.
  • the object can be found in the Hash Table only if this is not its first appearance. In this case, the object is abandoned and the command stream examination resumes.
  • the current state is not in the Hash Table, the object's state in the Hash Table is updated at Block 31408 .
  • the “Disable Write” command is generated to the Color Frame Buffer (FB), and at Block 31410 , the Disable Write command is sent to all GPUs.
  • FB Color Frame Buffer
  • the Draw Primitive Command is broadcasted to all GPUs, and then at Block 31412 , the object is colorlessly rendered in all GPUs (i.e. in black, which was the initialized color set at Block 31401 ).
  • the result is an update of object's depth in the Global Depth Map in all GPUs, while the color Frame Buffer remains clear.
  • GDM Global Depth Map
  • the stream of commands is now scanned from the beginning, as indicated at Block 31415 .
  • objects are distributed among GPUs based on any possible scheme of load balance.
  • Hash Table For every Draw Command, a load balance is calculated at Block 4216 , and a GPU is chosen for the object.
  • the object is normally rendered in that GPU. The above sequence repeats for any number of passes required to render the frame.
  • the next step, at Block 31418 involves making hierarchical merges of the partial complementary-type images in all GPU color buffers.
  • the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU.
  • the final merge of partial complementary images occurs in the color buffer of the primary GPU.
  • transparent objects e.g. flames
  • overlays e.g. scores in computer games
  • FIG. 3 B 1 is a high-level flow chart illustrating a second illustrative embodiment of the method of parallel graphics processing according to the present invention.
  • This method is a variation of the ‘GDM Creation Pass’ method described above, wherein the difference is that the Global Depth Pass includes also normal color rendering of each detoured object in selected GPU, in addition to the updating of the Global Depth Map (GDM) in all GPUs.
  • GDM Global Depth Map
  • a first special rendering pass involves (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs.
  • GDM global depth map
  • Block 3 B 1 the method continues by generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention.
  • the method concludes by recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention, illustrated in FIGS. 2 D 3 , 2 E 1 , 2 E 2 and 3 E 3 .
  • FIG. 3 B 2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the multi-pass parallel graphics processing method of present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention.
  • the following specification addresses the case of using only two GPUs, however, it is understood that the method it can be practiced on a parallel graphics processing system supporting any number of GPUs.
  • de-compositing of objects and load balancing is controlled by the software based Decomposition module residing in the host system.
  • the first stage indicated at Block 3221 involves, during a first special pass (i.e. GDM creating pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs indicated by solid-line and broken-line arrows, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs indicated by the dotted-line arrows, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs indicated by solid-line arrows.
  • GDM global depth map
  • the objects are separated into two classes: objects that are assigned to the GPU indicated by black arrows, and objects that are not assigned to the GPU indicated by broken-line arrows (i.e. assigned to other GPUs).
  • the Z-test is performed equally on both classes of objects, while drawing to the color buffer is done selectively.
  • Z-buffer is populated by z-tested depth values of all fragments, for assigned objects as well as non-assigned objects.
  • the partial image fragments of assigned objects are drawing in within the color buffer, whereas partial image fragments of non-assigned fragments are drawn without color (black).
  • the final result of this rendering pass is that (i) the Z Buffer of each GPU holds a GDM in its final state, whereas (ii) the Color Buffer of each GPU holds a complementary-type partial color image in its preliminary state.
  • the second stage indicated at Block 3222 involves performing multiple rendering passes, wherein during each subsequent rendering pass, a complementary-type partial images is generated within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs indicated by solid-line arrows.
  • a complementary-type partial images is generated within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs indicated by solid-line arrows.
  • the objects of the scene are decomposed between GPUs.
  • the exact decomposition of objects may change from rendering pass to rendering pass, according to dynamic load balance considerations.
  • Each GPU renders its incoming objects into color buffer, while performing z-test against the GDM in its Z-buffer.
  • the third stage indicated at Block 3223 is a stage of depthless recomposition, wherein, after the final rendering pass, a complete image frame is recomposited within the primary GPPL.
  • This stage is performed using the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , and the depthless recomposition process of the present invention.
  • all of the images in the frame buffers of the GPUs are scanned, pixel by pixel, and at each x,y coordinate, the color value of all GPUs are summed up and the result PXL final (x,y) (from GPU 2 ) is moved to the x,y of the final image in the primary image buffer (i.e. GPU 1 ).
  • the final image is completed when all pixels are scanned.
  • the addition of all pixels of source (tex 2 ) and target (tex 1 ) images occurs in the target GPPL (i.e. GPU 1 ), by means of its pixel shader processor, running the shader's merge code.
  • the merge result remains in the target GPPL, which may become a source for the next hierarchical step.
  • GPU 1 is the target GPPL, and the composited image in its color buffer are moved to the display device for display.
  • FIGS. 3 B 4 A and 3 B 4 B illustrate the steps performed during the second illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3 B 1 .
  • the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL, while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black).
  • FIGS. 3 B 4 A and 3 B 4 B differ from the method of FIG. 3 A 4 , in that during the GDM Creation Pass, indicated at Block 3221 , detoured objects are normally rendered in color in the color buffer of the selected GPU, in addition to the Global Depth Map (GDM) being updated in the Z buffers of all GPUs. These differences will become more apparent hereinafter.
  • GDM Global Depth Map
  • the extra pass is performed so as to create the GDM in all GPUs.
  • the first Block 3121 in FIG. 3 A 1 is realized by Blocks 31401 through 31412 .
  • the pass starts by (initializing the color buffers with colorless values and) scanning all the graphics commands for the frame to be rendered from the 3D scene, from a specified viewing direction.
  • the CPU analyzes the stream of commands associated with the image frame to be rendered, and when the end of the command stream is detected, the process moves to the multi-pass rendering stage 3222 , and while the end of the command stream is not detected, then the process proceeds to Block 31403 .
  • a ‘State command’ is encountered at Block 4203 , it is used to update ( 4204 ) the current state buffer ( 4111 ).
  • the current state of the object is updated in the state buffer at Block at 31406 , and the Hash Table is scanned at Block 31407 for the appearance of the object.
  • the object can be found in the Hash Table only if this is not its first appearance. In this case, the object is abandoned and the command stream examination resumes. If the current state is not in the Hash Table, the object's state in the Hash Table is updated at Block 31408 . Then, at Block 31409 , the load balance among the GPUs is calculated, the GPU selected,
  • a GPU Upon updating the Hash Table at Block 32408 , a GPU is chosen according to any selected load balance scheme.
  • the object is marked in the ‘Drawn’ list of debut objects ( 4309 ). This list assists to eliminate redundant drawings of objects that have been drawn the first time during the Global Depth Pass. A marked object will be cleared from the list in successive passes, the first time it is called for rendering. This call will be skipped while its entry in the list cleaned up. The object is then sent for normal color rendering to the designated GPU ( 4310 ).
  • the next step of the method involves broadcasting the object to the rest of GPUs for Global Depth Map (GDM) update in Z buffers, and for drawing visible pixels in black into the color frame buffers.
  • GDM Global Depth Map
  • the current pixel shader program is adapted in these GPUs, for the alpha status of drawn object. Namely, whether the object is to be drawn with transparencies (alpha) or without. Therefore, according to the status of an object's alpha test, determined at Block 4314 , there are two possible modifications which are made to the pixel shader: (i) a modification of the pixel shader for an opaque (i.e.
  • GDM Global Depth Map
  • the stream of commands is then scanned from the beginning for successive rendering passes.
  • the end of drawing passes is determined by determining when the end of the graphics command stream occurs.
  • a search in ‘list of deubbed objects’ is performed at Block 32423 , by determining whether the object is marked in the “Drawn” List. If the object is found in the List, then the entry in the List is cleared a Block 32422 , and rendering is skipped, and next Draw command in the line is handled. Otherwise, at Block 32424 , a GPU is chosen according to load balance considerations.
  • the object commands is sent to the designated GPU for normal color rendering, and then the object is normally rendered in that GPU. The above sequence repeats for any number of passes required to render the frame.
  • the next step, at Block 32430 involves making hierarchical merges of the partial complementary-type images in all GPU color buffers.
  • the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU.
  • the final merge of partial complementary images occurs in the color buffer of the primary GPU.
  • transparent objects e.g. flames
  • overlays e.g. scores in computer games
  • FIG. 4A describes the third illustrative embodiment of the multi-pass parallel graphics processing method of present invention, carried out on a parallel graphics processing system of the present invention.
  • the Global Depth Maps are generated in each of the GPUs during the regular course of a graphics application, instead of during an extra special pass (i.e. processing step) performed during the beginning of image frame processing.
  • GDM global depth map
  • a complete image frame of the 3D scene is recomposited using the depthless complementary image recomposition process illustrated in FIGS. 2 D 3 , 2 E 1 , 2 E 2 and 3 E 3 .
  • FIG. 4B illustrates the two primary stages of the third illustrative embodiment of the multi-pass parallel graphics processing method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention.
  • the first multi-pass rendering stage involves (i) during each pass of the multi-pass method, generating global depth map (GDM) values for each detoured object transmitted to each GPPL indicated by solid-line arrows, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs indicated in dotted-line arrows, and (iii), rendering in color the pixels of all objects sent to assigned-GPPLs indicated by broken-line arrows.
  • GDM global depth map
  • re-compositing of objects and load balancing are controlled by a software-based Decomposition module residing in the host.
  • This stage of multi-pass rendering includes the generation of the GDM as part of its regular multi-pass rendering process.
  • Decomposed geometric data is sent to assigned or designated GPUs.
  • any detoured object is also sent once to all other GPUs so that the object contributes its z value share to the GDM under development within the Z buffer.
  • the GDM is generated as follows: GDM values are generated for each deubbed object and transmitted to each GPU, while; (i) for objects sent to non-assigned GPUs indicated by broken-line arrows, their pixels are normally z-tested, their depth values are stored in the z-buffer while their fragments are rendered colorlessly in the color buffer; and (ii) for objects sent to assigned-GPUs indicated by solid-line arrows, their pixels are normally z-tested, their depth values are stored in z-buffer and their fragments are rendered in color.
  • the illustrative embodiment employs only two GPUs, although it is understood, that any number of GPUs can be supported on the parallel graphics processing platform of the present invention.
  • the second depthless compositing stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 , using the depthless recomposition process of the present invention. Thereafter, the completely composite image is moved from the primary GPU to the display device for display.
  • FIGS. 4 D 1 and 4 D 2 illustrate the steps performed during the third illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A .
  • the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black), while their contribution to the global depth map (GDM) for the frame, are generated for each detoured object transmitted to each GPPL.
  • GDM global depth map
  • the current state of an object is kept updated in a “current state” buffer 431 .
  • this buffer is copied into the Hash Table 432 .
  • the state record of each object in the algorithm of FIG. 4C also includes the GPU number/index. The object is designated/assigned to the GPU for processing based on load balancing considerations, for all rendering passes.
  • the process begins by scanning the graphics commands in a given frame of a 3D scene to be rendered for display.
  • determination of the end of the graphics command stream for the frame is monitored.
  • process control moves to Block 422 , involving partial image recompositing, in accordance with the principles of the present invention.
  • the monitoring of state commands occurs at Block 4403
  • the monitoring of Primitive Draw commands occurs at Block 4405
  • current object state updating occurs at Block 4406
  • current list updating occurs at Block 4404 .
  • the Current State Buffer 431 is updated by two classes of commands; State commands and Draw Primitive command.
  • the detection of State command is followed by updating the State Buffer.
  • the Draw Primitive command initiates the process of drawing primitive.
  • the primitive must be examined for its debut appearance. This is done by scanning the Hash Table 432 for the current state of a Draw Command (of an object). If the state of the object is found in the hash table, along with the designation of its GPU, this means that this object has appeared before for processing, and has been rendered within the GPUs according to the principles of the present invention. In such a case, the load balance is updated (distinguished from calculation of load balance, which is done only for the debut of an object), and the object is sent its designated GPU for rendering. Otherwise, GPU is selected for load balance considerations, Hash Table is updated, and the object (draw command) is sent to the designated GPU for normal/regular rendering.
  • every incoming “Draw Primitive” command for an object is subject to the “first appearance test” which involves the matching of the Current State Buffer 431 to the Hash Table 432 , illustrated in FIG. 4C . If a match is found to exist therebetween at Block 4407 , then the object is sent to the designated GPU, and load balancing is updated at Block 4408 .
  • Block 4424 load balance calculations are performed, and the GPU selected/designated, and at Block 4425 , the Hash Table is updated by creating a new entry for the Current State Buffer in the Hash Table.
  • the Draw Command for the object is sent to the selected designated/assigned GPU, for normal color rendering.
  • the Draw Command for the object is simultaneously broadcasted to all other non-designated/assigned GPUs for (i) updating the global map (GDM) values in their Z buffers, and (ii) drawing black pixels of the object's silhouette in the color frame buffers (FBs) of these GPUs, by performing: alpha testing as indicated at Block 4428 ; required pixel shader modification as indicated at Blocks 4427 and 4429 ; black rendering in the color buffers of the rest of the GPUs, using the Draw Primitive command as indicated at Block 4430 ; rendering the object in the rest of the GPUs as indicated at Block 4431 ; and restoring the pixel shaders in each GPU to their original state as indicated at Block 4432 for normal/regular rendering at Block 4426 .
  • GDM global map
  • FBs color frame buffers
  • the recomposition process is described in detail. Notably, the process is identical to the process described above at Blocks 32430 through 32433 . Specifically, at Block 4420 , involves making hierarchical merges of the partial complementary-type images in all GPU color buffers. For a number of GPUs greater than two, the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU. Specifically, at Block 4421 , the final merge of partial complementary images occurs in the color buffer of the primary GPU. Then at Block 4422 , transparent objects (e.g. flames) and overlays (e.g. scores in computer games) are rendered in the primary GPU on top of composited color buffer, by the graphics-based application. Finally, at Block 4423 , the image is moved out to the display unit.
  • transparent objects e.g. flames
  • overlays e.g. scores in computer games
  • FIG. 5A describes a fourth illustrative embodiment of the multi-pass method of parallel graphics processing according to the present invention.
  • This illustrative embodiment of the method of the present invention is based on taking advantage of the GDM generated by a graphics application, and is intended to work only in a graphics application that generates a GDM for its own purposes at the beginning of each frame (e.g. for Shadow Volumes based graphics applications originally intended for single GPU-based systems).
  • the first pass (termed Ambient Light pass) generates a depth map in the Z-buffer for all image fragments that are visible from the view point. All visible fragments in color buffer are homogenously dim colored during this pass.
  • this depth map which was originally intended for single GPU, is simultaneously generated in all GPUs, and used as a GDM according to the principles of the present invention.
  • the homogenously dim color buffer serves to prevent the obstructed object from appearing in the image: the obstructing objects of all GPUs are drawn as a colorless silhouette of the object in the color FB, as described hereinabove.
  • the details regarding the Application Provided GDM algorithm of the present invention are described in the flowchart flowcharts of FIG. 5D FIGS. 5 D 1 and 5 D 2 .
  • a global depth map is generated within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers.
  • complementary-type partial images are generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs).
  • a complete image frame of the 3D scene is recomposited using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 2 D 3 , 2 E 1 , 2 E 2 and 3 E 3 .
  • FIG. 5B is a schematic representation illustrating the three primary stages of the fourth illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention.
  • de-compositing of objects and load balancing is controlled by the software based Decomposition module residing in the host.
  • the software based Decomposition module residing in the host.
  • the following embodiment considers a graphics processing platform having only two GPUs. However, it is understood that any number of GPUs may be supported on the platform to carry out the method.
  • a global depth map is generated within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers.
  • the objects are rendered, the pixels are z-tested and their depth values are stored into the z-buffer creating the GDM.
  • Color buffers are disabled (or alternatively rendered colorlessly, depending on application).
  • complementary-type partial images are generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs, indicated in solid-line arrows).
  • the scene is decomposed between GPUs, and each GPU is delivered its assigned objects. The exact decomposition of objects may change from pass to pass, according to dynamic load balance considerations.
  • Each GPU renders its objects into color buffer, while performing z-test against the GDM in Z-buffer.
  • a complete image frame of the 3D scene is recomposited by merging the complementary-type partial images stored in the color buffers of GPPL 1 and GPPL 2 using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3 D 3 , 2 E 1 , 2 E 2 and 3 E 3 . Thereafter, the complete image in the primary GPU is displayed on the display device.
  • FIGS. 5 D 1 and 5 D 2 are flowcharts illustrating the steps performed during the fourth illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A , wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black).
  • the Ambient Light pass is made in all GPUs, generating a GDM in the z-buffers, and “black” rendering objects in the color buffers.
  • Blocks 5402 through 5417 constitute the light source pass, repeating for all light sources of the scene. For simplicity, only one occluder is considered per each light source.
  • the number of light sources in the 3D scene is monitored, and when all light sources have been rendered, then process control moves to Block 5420 in Stage 523 .
  • Shadow Volume algorithm for the next light source, each occluding object (“occluder”), a shadow volume is calculated.
  • front face and back face of the shadow volume are compared to the GDM to generate a shadow volume stencil, which is registered in the stencil buffer.
  • the command stream is scanned looking for state commands for updating state buffer, and for draw commands at Block 5409 .
  • the current state buffer is updated.
  • the presence of the current state is checked in the Hash Table. If the current state is in not present in the Hash Table, then load balance is calculated and the GPU selected for the draw command.
  • the Hash Table is updated, and at Block 5416 , the Draw Command is sent to the designated GPU for normal rendering.
  • a debut object must be assigned to GPU in accordance with load balance considerations at Block 5314 , registered in Hash Table as indicated at Block 5315 , and sent for rendering in designated GPU as indicated at Block 5316 and 5317 .
  • Block 5411 If at Block 5411 the current state is in the Hash Table (i.e. the object is a repeat object), then the designated GPU is tracked in the Hash Table for its allocated GPU at Block 5412 , and at Block 5413 , the Draw Command is sent to the designated GPU, and then advances to Block 5417 , where the object is rendered in the designated GPU, and then returns to Block 5406 .
  • Block 5402 determines that all light rendering passes are completed, then the hierarchical merge of color buffers in the GPUs is performed at Block 5420 in Stage 523 .
  • the final/complete image frame is composited in the primary GPU.
  • overlays and transparent object are rendered in the primary GPU, and at Block 5423 , the final image in the primary GPU is displayed on the display device.
  • the parallel 3D graphics processing system and method of the present invention can be practiced in diverse kinds of computing and micro-computing environments in which 3D graphics support is required or desired.
  • FIGS. 6A through 6C the parallel graphics processing system (PGPS) of the present invention will now be described in greater detail.
  • FIG. 6A there is shown a PC-based host computing system embodying an illustrative embodiment of the parallel 3D graphics processing system (PGPS) platform of the present invention, illustrated throughout FIGS. 2A through 5D .
  • the PGPS comprises: (i) a Parallel Mode Control Module (PMCM); (ii) a Parallel Processing Subsystem for supporting the parallelization stages of decomposition, distribution and re-composition implemented using a Decomposition Module, a Distribution Module and a Re-Composition Module, respectively; and (ii) a plurality of either GPU and/or CPU based graphics processing pipelines (GPPLs) operated in a parallel manner under the control of the PMCM.
  • PMCM Parallel Mode Control Module
  • GPPLs GPU and/or CPU based graphics processing pipelines
  • the PMCM further comprises an OS-GPU interface (I/F) and Utilities; Merge Management Module; Distribution Management Module; Distributed Graphics Function Control; and Hub Control, as described in greater detail in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, incorporated herein by reference.
  • OS-GPU interface I/F
  • Utilities Merge Management Module
  • Distribution Management Module Distribution Management Module
  • Distributed Graphics Function Control and Hub Control
  • the Decomposition Module further comprises a Load Balance Submodule, and a Division Submodule
  • the Distribution Module comprises a Distribution Management Submodule and an Interconnect Network.
  • the Rendering Module comprises the plurality of GPPLs
  • the Re-Composition Module comprises the Pixel Shader, the Shader Program Memory and the Video Memory (e.g. Z Buffer and Color Buffers) within each of the GPPLs cooperating over the Interconnect Network.
  • FIG. 6 B 1 a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in FIG. 6A .
  • the GPPL comprises: (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
  • FB frame buffer
  • GPU graphics processing unit
  • the GPPL comprises (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
  • FB frame buffer
  • GPU graphics processing unit
  • FIG. 6 B 3 an illustrative embodiment of a CPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in FIG. 6A .
  • the GPPL comprises (i) a video memory structure supporting a frame buffer including stencil, depth and color buffers, and (ii) a graphics processing pipeline realized by one cell of a multi-core CPU chip, consisting of 16 in-order SIMD processors, and further including a GPU-specific extension, namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline.
  • a GPU-specific extension namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline.
  • the pipelined structure of the parallel graphics processing system (PGPS) of the present invention is shown driving a plurality of GPPLs.
  • the Decomposition Module supports the scanning of commands, the control of commands, the tracking of objects, the balancing of loads, and the assignment of objects to GPPLs.
  • the Distribution Module supports transmission of graphics data (e.g. FB data, commands, textures, geometric data and other data) in various modes including CPU-to/from-GPU, inter-GPPL, broadcast, hub-to/from-CPU, and hub-to/from-CPU and hub-to/from-GPPL.
  • the Re-composition Module supports the merging of partial image fragments in the Color Buffers of the GPPLs in a variety of ways, in accordance with the principles of the present invention (e.g. merge color frame buffers without z buffers, merge color buffers using stencil assisted processing, and other modes of partial image merging).
  • FIGS. 2A through 5D can be practiced using diverse types of parallel computing platforms supporting a plurality or clusters of GPPLs, realized in many possible ways.
  • the four illustrative embodiments of the parallel graphics processing method of the present invention, illustrated in FIGS. 3 A 1 through 5 D will now be shown implemented using the architecture provided by the PGPS of the present invention shown in FIG. 6A , in which particular modules (e.g. Decomposition Module, Distribution Module, Rendering Module or Recomposition Module) are used to perform or carry out different stages and/or steps in each such parallel graphics processing method.
  • modules e.g. Decomposition Module, Distribution Module, Rendering Module or Recomposition Module
  • the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 3140 through 31409 and Blocks 31414 and 31416 in the methods of FIGS. 7A 1 A and 7 A 1 B; (ii) the Distribution Module carries out Blocks 31410 through 31411 and Blocks 31417 and 31418 in the methods of FIGS. 7 A 1 A and 7 A 1 B; (iii) the Rendering Module carries out Blocks 31412 and Blocks 31420 through 31421 in the methods of FIGS. 7 A 1 A and 7 A 1 B; and (iv) Recomposition Module carries out Block 31419 in FIGS. 7 A 1 A and 7 A 1 B.
  • FIG. 7 A 2 the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • HMS host memory space
  • the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 32401 through 32409 , Blocks 32415 through 32416 , Block 32413 , and Blocks 32419 through 32424 in the methods of FIGS. 7 B 1 A and 7 B 1 B; (ii) the Distribution Module carries out Blocks 32425 through 32430 in the methods of FIGS.
  • FIG. 7 B 2 the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • HMS host memory space
  • the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 4401 through 4408 , Blocks 4420 and 4421 , Blocks 4424 and 4425 , and Blocks 4427 through 4429 in the methods of FIGS. 7 C 1 A and 7 C 1 B (ii) the Distribution Module carries out Blocks 4409 , 4426 and 4430 in the methods of FIGS. 7 C 1 A and 7 C 1 B; (iii) the Rendering Module carries out Blocks 4410 , Blocks 4422 and 4423 and Block 4431 in the methods of FIGS. 7 C 1 A and 7 C 1 B; and (iv) Recomposition Module carries out Block 4432 in FIGS. 7 C 1 A and 7 C 1 B.
  • FIG. 7 C 2 the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • HMS host memory space
  • the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 5405 through 5412 , and Blocks 5414 and 5415 in the methods of FIGS. 7 D 1 A and 7 D 1 B; (ii) the Distribution Module carries out Blocks 5413 and 5416 in the method methods of FIGS. 7 D 1 A and 7 D 1 B; (iii) the Rendering Module carries out Blocks 5401 through 5404 , Blocks 5417 , and 5422 and 5423 in the methods of FIGS. 7 D 1 A and 7 D 1 B; and (iv) Recomposition Module carries out Block 5420 and 5421 in FIGS. 7 D 1 A and 7 D 1 B.
  • FIG. 7 D 2 the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • HMS host memory space
  • FIG. 8A shows a first illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • GCAD graphics commands and data
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) 8201 and the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem resides as a software package in the Host Memory Space (HMS) 8200 of the CPU 8210 .
  • the Vendor's GPU drivers 8223 also reside on HMS 8200 , along with the Graphics Applications 8221 , and the Standard Graphics Library 8222 .
  • the multiple GPUs on external GPU cards are (i) connected to a North bridge circuit on a motherboard, (ii) implement the Rendering and Recomposition Modules, and (iii) driven in a parallelized manner under the control of the PMCM.
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module uses the North bridge circuit to distribute graphic commands and data (GCAD) to the external GPUs.
  • the Rendering Module generates complementary-type partial color images according to the parallel multi-pass graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport (e.g.
  • complementary-type partial color images are recomposited using the depthless image merging process of the present invention, described in great detail above, so as to generate a complete image frame of the 3D scene for display on the display device, connected to an external graphics card via a PCI-express interface.
  • FIG. 8B shows a second illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • GCAD graphics commands and data
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) 8201 and the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 820 of the CPU.
  • the Vendor's GPU drivers 8223 reside on HMS 8200 , along with the Graphics Applications 8221 , and the Standard Graphics Library 8222 .
  • the Rendering and Recomposition Modules are realized across multiple GPUs connected to a bridge circuit on a motherboard (and having an internal IGD) and driven in a parallelized manner under the control of the PMCM.
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • Distribution Module uses the North bridge chip to distribute the graphic commands and data (GCAD) to the multiple GPUs located on the external graphics cards.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to one of the external graphics cards or the IGD.
  • FIG. 8C shows a third illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • GCAD graphics commands and data
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) 8201 , the Decomposition Module 8202 and the Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 8200 .
  • the Vendor's GPU drivers 8223 also reside on HMS 8200 , along with the Graphics Applications 8221 , and the Standard Graphics Library 8222 .
  • a single GPU is supported on a CPU/GPU fusion-architecture processor die (alongside the CPU), and one or more GPUs are supported on one or more external graphic cards connected to a bridge circuit, and driven in a parallelized manner under the control of the PMCM.
  • the Rendering and Recomposition Modules are realized across the GPUs on the graphics card(s).
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module uses the memory controller (controlling the HMS) and the interconnect network (e.g. crossbar switch) within the CPU/GPU processor chip to distribute graphic commands and data to the multiple GPUs on the CPU/GPU die chip and on the external graphics cards.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS.
  • the Recomposition Module uses inter-GPU communication transport on the graphics card, as well as memory controller and interconnect (e.g. crossbar switch) within the CPU/GPU processor chip, to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages.
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to the external graphics card via a PCI-express interface connected to the bridge circuit.
  • FIG. 8 D 1 shows a fourth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition a Parallel Graphics Processing Subsystem (PGPS)
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallelization Mode Control Module (PMCM) 8201 , the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 8200 .
  • the Vendor's GPU drivers 8223 reside on HMS 8200 , along with the Graphics Applications 8221 , and the Standard Graphics Library 8222 .
  • a first cluster of the CPU cores on a multi-core CPU chip function as the CPU
  • a second cluster of the CPU cores function as a plurality of multi-core graphics pipelines (GPPLs).
  • the Rendering Module and the Re-composition Module are realized across a plurality of the GPUs on the external graphics cards.
  • Some of the GPPLs implemented by the CPU cores may participate in the implementation of the Rendering and/or Recomposition Modules.
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module uses the bridge circuit and interconnect network within the multi-core CPU chip to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented on the multi-core CPU chip, as well as the GPUs on the external graphics cards.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport as well as the bridge and interconnect network within the multi-core CPU chip to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages.
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL (i.e. GPU) via a display interface.
  • FIG. 8 D 2 shows a fifth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition.
  • PGPS Parallel Graphics Processing Subsystem
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) of the CPU on the motherboard.
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • a first cluster of CPU cores on the multi-core CPU chips on externals graphics cards function as GPPLs and implement the Re-composition Module across a plurality of the GPPLs
  • a second cluster of CPU cores function as GPPLs and implement the Rendering Module.
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode.
  • the Distribution Module uses the North bridge circuit and interconnect networks within the multi-core CPU chips (on the external cards) to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented thereon.
  • the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention.
  • the Recomposition Module uses interconnect networks within the multi-core CPU chips to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages.
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL, via a display interface.
  • FIG. 8E shows a sixth illustrative embodiment of the MMPGRS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition.
  • PGPS Parallel Graphics Processing Subsystem
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host or CPU Memory Space (HMS).
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • the Decomposition Submodule No. 2 and Distribution Module are realized within a single graphics hub device (e.g. chip) that is connected to (i) the bridge circuit on the motherboard, via a PCI-express interface, and (ii) a cluster of external GPUs via the interconnect network within the graphics hub chip.
  • the GPUs are used to implement the Rendering Module and Recomposition Modules and are driven in a parallelized manner under the control of the PMCM.
  • the Decomposition Submodule No. 1 transfers graphic commands and data (GCAD) to the Decomposition Submodule No. 2 via the bridge circuit.
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module distributes graphic commands and data (GCAD) to the external GPUs.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card.
  • FIG. 8F shows a seventh illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • GCAD graphics commands and data
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) (including the Distribution Management Submodule) and the Decomposition Module reside as a software package in the Host Memory Space (HMS) of the host computing system.
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • the Distribution Module and its interconnect transport are realized within a single “reduced” graphics hub device (e.g. chip) that is connected to the bridge circuit of the host computing system, and a cluster of external GPUs implementing the Rendering and Recomposition Modules, and are driven in a parallelized manner under the control of the PMCM.
  • the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Management Module within the PMCM distributes the graphic commands and data (GCAD) to the external GPUs via the bridge circuit and interconnect transport mechanism.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card(s).
  • FIG. 8G shows an eighth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • PGPS Parallel Graphics Processing Sub
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS).
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on within a bridge circuit on the motherboard within the host computing system.
  • the Rendering Module and the Recomposition Module are implemented by a plurality of GPUs which are driven in a parallelized under the control of the PMCM.
  • the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2.
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface.
  • FIG. 8H shows a ninth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition.
  • PGPS Parallel Graphics Processing Sub
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS).
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on the processor die of a hybrid CPU/GPU fusion-architecture chip on the motherboard, and having one or more GPUs driven with one or more GPUs on an external graphics card(s) (connected to the CPU/GPU chip via the interconnect) in a parallelized under the control of the PMCM.
  • the GPUs on the external graphics card are used to implement the Rendering and Recomposition Modules.
  • the GPUs within the hybrid chip may assist in implementing the Rendering and/or Recomposition Modules.
  • the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2.
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface.
  • FIG. 8I shows a tenth illustrative embodiment of the PGPS of the present invention, embodied within a game console system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs).
  • the game console system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; a multi-core CPU chip with multiple CPU-cores, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and re
  • the Parallel Graphics Processing Subsystem includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • the Parallel Graphics Processing Subsystem also includes: (i) a graphics hub with an interconnect network, (ii) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (iii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • GPPLs graphic processing pipelines
  • PMCM parallel mode control module
  • the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 are realized as a software package within the Host Memory Space (HMS).
  • the Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library.
  • the Decomposition Submodule No. 2 and the Distribution Module are realized as a graphics hub semiconductor chip within the game console system, whereas the Rendering and Recomposition Modules are implemented by multiple GPPLs supported on the game console board and driven in a parallelized manner under the control of the PMCM.
  • the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, via the memory controller on the multi-core CPU chip and the interconnect in the graphics hub chip of the present invention.
  • the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A , 7 B, 7 C or 7 D.
  • the Distribution Module distributes the graphic commands and data (GCAD) to the multiple GPUs.
  • the Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D .
  • the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages.
  • the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU via an analog display interface.
  • the depthless image recomposition process of the present invention is based on simplicity and low cost of implementation. It also offers a number of advantages over recomposition methods that are associated with “classical modes” of object division, based on depth comparison, which require expensive and high processing requirements, high bandwidth requirements, and additional cost of recompositing hardware.
  • the depthless image recomposition process of the present invention does not involve any depth comparison, and merges the partial complementary images in the color buffers using a simple depth-less puzzle-like merging operation.
  • the method of the present invention eliminates obstructed objects in early stages of multi-pass rendering operations. The more passes, the more aggregated savings.
  • the anti-aliasing in a GPU is based on processing the edge pixels against their background, while this background might turn hidden, and be replaced by the background of another GPU in the final image.
  • the result in classical modes of object division is incorrect image in the “stitched” boundaries.
  • the method of the present invention eliminates the hidden background during rendering process at each GPU, and pixels are always anti-aliased against their final background.

Abstract

A computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes. The computing system includes (i) a CPU memory space for storing one or more graphics-based applications and a graphics library for generating graphics commands and data (GCAD) during the run-time of the graphics-based applications, (ii) one or more CPUs for executing the graphics-based applications, and (iii) parallel graphics processing system (PGPS) having multiple graphics processing pipelines (GPPLs), supporting object division based parallelism among the GPPLs, and performing pixel depth value comparison within each GPPL using a common global depth map (GDM) during pixel rendering processing.

Description

    CROSS-REFERENCE TO RELATED CASES
  • The present application is Continuation of U.S. application Ser. No. 12/077,072 filed Mar. 14, 2008; which is a Continuation-in-Part (CIP) of the following Applications: U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007; International Application Serial No. PCT/US07/26466 filed Dec. 28, 2007; U.S. application Ser. No. 11/789,039 filed Apr. 23, 2007; U.S. application Ser. No. 11/655,735 filed Jan. 18, 2007, International Application Serial No. PCT/IB07/03464 filed Jan. 18, 2007; which is based on Provisional Application Ser. No. 60/759,608 filed Jan. 18, 2006; U.S. application Ser. No. 11/648,160 filed Dec. 31, 2006; U.S. application Ser. No. 11/386,454 filed Mar. 22, 2006; U.S. application Ser. No. 11/340,402 filed Jan. 25, 2006; which is based on Provisional Application Ser. No. 60/647,146 filed Jan. 25, 2005; International Application Serial No. PCT/IB06/01529 filed Jan. 25, 2006; U.S. application Ser. No. 10/579,682 filed May 17, 2006, which is a National Stage Entry of International Application Serial No. PCT/IL2004/001069 filed Nov. 19, 2004, which is based on Provisional Application Ser. No. 60/523,084 filed Nov. 19, 2003; each said Patent Application being commonly owned by Lucid Information Technology, Ltd., and being incorporated herein by reference as if set forth fully herein.
  • BACKGROUND OF INVENTION
  • 1. Field of Invention
  • The present invention relates generally to the field of 3D computer graphics rendering, and more particularly, to ways of and means for improving the performance of parallel graphics processes running on 3D parallel graphics processing systems supporting the decomposition of 3D scene objects among its multiple graphics processing pipelines (GPPLs).
  • 2. Brief Description of the State of Knowledge in the Art
  • Applicants' U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, incorporated herein by reference, in its entirety, discloses diverse kinds of PC-level computing systems embodying different types of parallel graphics rendering subsystems (PGRSs) with graphics processing pipelines (GPPLs) generally illustrated in FIG. 1. The multi-pipeline architecture of such systems can be realized using GPU-based GPPLs of classical design, as shown in FIG. 2A, or alternatively, using more advanced GPU-based GPPLs, compliant with the DirectX 10 standard, as shown in FIG. 2C. Alternatively, the multi-pipeline architecture of such systems can be realized using multi-core CPU based GPPLs as shown in FIG. 2C.
  • In general, such graphics-based computing systems support multiple modes of graphics rendering parallelism across their GPPLs, including time, image and object division modes, which can be adaptively and dynamically switched into operation during the run-time of any graphics application running on the host computing system. While each mode of parallel operation has its advantages, as described in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, supra, the object division mode of parallel operation is particularly helpful during the running of interactive gaming applications because this mode has the potential of resolving many bottleneck conflicts which naturally accompany such demanding applications.
  • During the object division mode of parallel operation, supported on a parallel graphics rendering system, for example, of the type disclosed in Applicant's U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, objects within a 3D scene (i.e. graphics data and commands representative thereof) are (i) automatically decomposed based on a specified criteria, and assigned/designated to particular GPUs, and (ii) distributed to the assigned/designated GPUs, so that the GPUs can render partial images of the 3D scene, based on the assigned/designated objects distributed thereto during parallel rendering operations, and ultimately, for these partial image fragments to be re-composited in a final color frame buffer (FB) of the primary GPPL, for display on one or more visual display devices.
  • During conventional image recomposition processes, supported on parallel graphics rendering platforms operating in the object-division mode of parallelism, the pixel depth or z values of objects's images within the 3D scene must be analyzed/compared, during each image frame, against (i) the pixel depth values of other objects' images (which may be occluding a particular object during rendering), as well as (ii) the rear or background clipping plane represented within the 3D scene. This depth-based image recomposition process is illustrated in FIGS. 1E1, 1E2 and 1E3, and illustrates how local depth maps of objects assigned to particular GPUs are constructed within each GPU, and are used during the recomposition of partial image fragments generated within the color frame buffer of each GPU during the final stage of the object-division (OD) based image recomposition process.
  • As shown in FIG. 1E1, in conventional prior art Object Division, shows a simple scene, comprising of three objects, A, B and C. An exemplary decomposition of this scene can be done by sending object A for rendering to GPU 1, and objects B and C to GPU 2. FIGS. 1E2 and 1E3 show the color and Z buffers created by prior art Object Division method. From the given View Point object B (of GPU 2) is obstructed by object A (of GPU 1). Both Z-buffers, of GPU 1 and GPU 2, create local depth maps, each map constructed from objects designated to the GPU. Each GPU is unaware of objects rendered by the other GPU, therefore such objects are not reflected in the Z-buffer of the GPU.
  • Clearly, use of the object-division mode of graphics parallelism has a number of important advantages over the other methods of parallel graphics rendering, for example: (i) responsiveness to user interface inputs; (ii) parallelization of the entire 3D graphics pipeline including the vertex as well as pixel parts thereof; (iii) the reduction of CPU-GPU transfer load; and (iv) the reduction of GPU memory requirements. However, the OD mode of graphics parallelism suffers from a number of inherent shortcomings and drawbacks.
  • In particular, the object-division mode of parallelism requires a complex and intensive process of merging a plurality of partial image fragments buffered in color frame buffers (FBs), utilizing depth-based information stored in the Z buffers of the GPUs, involving in depth-based comparisons on a pixel-by-pixel basis, resulting in substantial time delays, significant bandwidth consumption, and high hardware costs.
  • Also, objects being rendered at each GPU, that are obstructed by objects rendered by other GPUs, are processed for rendering (i.e. drawn) as if these objects were visible. Although these redundant portions are eliminated during the final image re-composition process, using depth-based comparisons, such redundant processing operations greatly decreases the efficiency of the object-division mode of parallelism.
  • When the anti-aliasing (AA) mode is operating during the object-division mode of parallelism, each GPU performs the correct anti-aliasing of its image fragments. However, some objects that are anti-aliased with their current background will become extrinsic to their new background when composed into the final image.
  • In many graphics applications, there are different lighting sources, and multi-pass applications must render the same scene geometry several times (i.e. passes), typically for different lighting calculations. Thus, the final color of a pixel of an object will be determined by blending together the results of all of the partial rendering passes. When using the object-division mode of parallelism, this multi-pass rendering increases the complexity of the image re-composition process due to the additional dependency on the stencil buffer, which operates on top of the Z buffer within each GPU.
  • In view, therefore of the above, there is a great need in the art for an improved method of and apparatus for carrying out parallel 3D graphics processing, while avoiding the shortcomings and drawbacks of the prior art apparatus and methodologies.
  • OBJECTS AND SUMMARY OF THE PRESENT INVENTION
  • Accordingly, a primary object of the present invention is to provide a new and improved method of and apparatus for practicing parallel 3D graphics processes in modern multiple-GPU based computer graphics systems, based on the division of objects in 3D scenes, among multiple graphics processing pipelines (GPPLs), while avoiding the shortcomings and drawbacks associated with prior art apparatus and methodologies.
  • Another object of the present invention is to provide a novel parallel graphics processing system (PGPS) embodied within a host computing system having (i) host memory space (HMS) for storing one or more graphics-based applications and a graphics library for generating graphics commands and data (GCAD) during the run-time (i.e. execution) of the graphics-based application, (ii) one or more CPUs for executing said graphics-based applications, and (iii) a display device for displaying images containing graphics during the execution of said graphics-based applications.
  • Another object of the present invention is to provide improved PC-level computing systems and architectures employing the parallel graphics processing technique of the present invention.
  • Another object of the present invention is to provide a parallel graphics processing subsystem supporting object division based parallelism among its GPPLs (e.g. GPU-based GPPLs), and performing pixel depth value comparison within each GPU using a common global depth map (GDM) during the pixel rendering process, in contrast to conventional approaches involving the use of Z-buffer comparisons during the final phase of image recomposition.
  • Another object of the present invention is to provide a novel method of parallel graphics processing based on object division parallelism among a plurality of GPPLs, and employing a global depth map (GDM), created by the graphics application, for use in z-depth tests during the pixel rendering process, and eliminating the shortcoming of z-buffer comparisons of all GPUs in regular object division.
  • Another object of the present invention is to provide a method of recompositing partial complementary-type images within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating partial complementary-type images within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs.
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs using GDMs created during a first GDM pass of a multi-pass parallel graphics processing method.
  • Another object of the present invention is to provide a method of generating global depth maps (GDMs) within multiple GPPLs during a color-based pixel rendering process.
  • Another object of the present invention is to provide a method of providing global depth maps (GDMs) within multiple GPPLs, generated during a graphics application.
  • Another object of the present invention is to provide a method of generating images using a depthless image recomposition process within multiple GPPLs.
  • Another object of the present invention is to provide a novel Z-buffering mechanism for use in compositing a 3D scene in a 3D parallel graphics rendering system, comprising a (color) frame buffer (memory) having a color value for each pixel and a z-buffer with the same number of entries is provided for storing a z-value for each pixel in the frame buffer; and wherein the z-buffer is initialized to zero, representing the z-value at the back clipping plane of the 3D scene, wherein the frame buffer is initialized to the background color, and wherein the largest value that can be stored in the z-buffer represents the z value of the front clipping plane.
  • Another object of the present invention is to provide such a novel Z-buffering mechanism wherein polygons compositing the 3D scene are scan converted into the frame buffer in an arbitrary order, and wherein during the scan-conversion process, if the polygon being scan converted at point (x,y) is no farther from the viewer than the point whose color and depth are currently in the buffers, then the color and depth values of the new point is used to replace the old color and depth values stored at the point (x,y).
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which creates a global depth map (GDM) within each GPU in cases where such a global depth map is not provided by the graphics application, for use as a depth reference during Z-tests conducted throughout the graphics application, thereby eliminating object overdrawing and other shortcomings and drawbacks associated with conventional object division based parallel graphics rendering processes.
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which creates and uses a global depth map (GDM) within each GPU, for the purpose of testing the z-depth values of all objects in the 3D scene, thereby eliminating the shortcomings and drawbacks associated with using z-buffer comparisons from all GPUs, as performed in prior art object division based pixel rendering processes.
  • Another object of the present invention is to provide a 3D parallel graphics rendering system which supports object division based parallelism among the GPPLs while providing an anti-aliasing process that is substantially free from the artifacts generated when using prior art object division based pixel rendering processes.
  • Another object of the present invention is to utilize a Global Depth Map created by the application, e.g. during a special Ambient Light Pass, for Z-test reference, enabling Depthless Image Recomposition Process.
  • Another object of the present invention is to provide a method of generating complementary-type partial images in each GPPL using the GDM and the object division based parallel rendering process.
  • Another object of the present invention is to provide a depthless image recomposition process for object division parallelism, creating a complete image frame of 3D scene, eliminating the need of comparing depth values of all GPUs as part of compositing process.
  • Another object of present invention is to provide an improved object division method free of anti-aliasing artifacts, in contrast to prior art object division method.
  • Another object of present invention is to create an improved object division method free of overdrawing effect, greatly increasing the efficiency of prior art object division parallelism.
  • These and other objects of the present invention will become apparent hereinafter and in the claims to invention.
  • BRIEF DESCRIPTION OF DRAWINGS OF PRESENT INVENTION
  • For a more complete understanding of how to practice the Objects of the Present Invention, the following Detailed Description of the Illustrative Embodiments can be read in conjunction with the accompanying Drawings, briefly described below:
  • FIG. 1A is a graphical representation of a PC-level based multi-GPPL parallel graphics rendering platform of the type disclosed in Applicants' U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, showing multi-CPUs, system memory, a system interface, and a plurality of GPPLs, with a display interface driving one or more graphics display screens;
  • FIG. 1B is a schematic representation of a plurality of GPU-based graphics processing pipelines (GPPLs), such as in nVidia's GeForce 7700 graphics subsystem, that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A;
  • FIG. 1C is a schematic representation of a plurality of advanced GPU-based graphics processing pipelines (GPPLs), such as in nVidia's GeForce 8800 GTX graphics subsystem, that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A;
  • FIG. 1D is a schematic representation of a plurality of multicore-based graphics processing pipelines (GPPLs) that can be employed in the multi-GPPL graphics rendering platform of FIG. 1A;
  • FIG. 1E1 is a graphical illustration of a 3D scene modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 1A, operating in a classic object division (OD) mode of operation, wherein dual GPUs (GPU1 and GPU2) are provided, and three objects A, B and C are shown against a rectangular background frame, wherein cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z, wherein the 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU2, and wherein partial images of the 3D scene are rendered in the GPUs and stored in the Color Buffers, and finally recomposited within GPU1 using pixel depth information maintained within the Z buffers of the GPUs;
  • FIG. 1E2 is a schematic representation of the Color Buffer and Z (Depth) Buffer associated with GPU1 employed in the dual-GPU embodiment of the parallel graphics rendering system of FIG. 1A operating in a classic Object Division Mode of operation, wherein the Color Buffer holds color values for the pixels of object A computed locally by GPU1, while the Z Buffer holds a local depth (z value) map for the pixels of object A also computed locally by GPU1;
  • FIG. 1E3 is a schematic representation of Color Buffer and Z (Depth) Buffer associated with GPU2 employed in dual-GPU embodiment of the parallel graphics rendering system of FIG. 1A operating in a classic Object Division Mode of operation, wherein the Color Buffer holds color values for the pixels of objects B and C computed locally by GPU2, while the Z Buffer holds a local depth (z value) map for the pixels of objects B and C, also computed locally by GPU2;
  • FIG. 2A is a graphical illustration of a 3D scene modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 2C, carrying out a method of Depthless Image Recomposition (DIR) according to the present invention based an object division (OD) mode of parallel graphics processing operation, wherein dual GPUs (GPU1 and GPU2) are provided, and three objects A, B and C are shown against a rectangular background frame, wherein cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z, wherein the 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU2, and wherein partial complementary-type images of the 3D scene are rendered in the GPUs and stored in the Color Buffers, and finally recomposited within GPU1 without using the global depth map (GDM) maintained within the Z buffers of the GPUs;
  • FIG. 2B is a high-level flow chart illustrating a generalized embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) providing a Global Depth Map (GDM) to each GPPL, for each image frame in the 3D scene to be generated, for use in rendering partial images of the 3D scene along a specified viewing direction, (b) generating complementary-type partial images in each GPPL, using the GDM and the object division based parallel rendering process according to the present invention, and (c) recompositing a complete image frame of the 3D scene using the depthless image recomposition (DIR) process of the present invention illustrated in FIGS. 3B1 and 3B2 (i.e. without the use of depth comparison);
  • FIG. 2C is a schematic representation illustrating the three primary stages of the generalized method of the present invention carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, operating in an object division (OD) mode of operation according to the present invention, wherein each GPPL includes (i) a GPU having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer, and wherein (a) the first stage involves providing a Global Data Map (GDM) to the Z buffer of each GPPL, by transmitting graphics commands and data to all GPPLs, (b) the second stage involves generating a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs, and (c) the third stage involves recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers, using the depthless recomposition process of the present invention;
  • FIG. 2D1 is a schematic representation of the complementary-type partial image generation process of the present invention carried out within GPU1 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C, wherein a Global Depth Map (GDM) is generated within the Z Buffer for all objects within the 3D scene (showing three different depth values namely the background having the highest depth (2415), wherein object A is closest to the viewer, has the lowest depth value (2416), its pixels have passed the Z-test and their depth values are written to the Z Buffer of GPU1, wherein object C (2414) has a middle depth value, its pixels have passed the z-test and their depth values are written to the Z buffer of GPU1, wherein object B has the deepest depth values, its pixels have all failed the z-test and their depth values have been replaced by the depth values of its occluding object A (2416) written in the Z Buffer in GPU1, and wherein a color-based complementary-type partial image is generated within the Color Buffer of GPU1 by recompositing (iii) the pixels of assigned object A rendered/drawn in color, (ii) the pixels of non-assigned object C drawn without color (i.e. black), and (iii) the pixels of assigned object B which are overwritten by the color pixels of the assigned occluding object A, which is closer to the viewer than object B;
  • FIG. 2D2 is a schematic representation of the complementary-type partial image generation process of the present invention carried out within GPU2 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C, wherein a Global Depth Map (GDM) is generated within the Z Buffer for all objects within the 3D scene (showing three different depth values namely, the background having the highest depth (2415), wherein the Z Buffer holds the Global Depth Map (242) identical to those depth values in the Z Buffer of GPU1 (2411), and wherein a color-based complementary-type partial image is generated within the Color Buffer of GPU2 by recompositing (i) the pixels of non-assigned objects A rendered/drawn without color (i.e. black), (ii) the pixels of assigned object C drawn with color, and (iii) the pixels of non-assigned object B which are overwritten by the colorless (i.e. black) values of non-assigned object A, which is closer to the viewer than object B;
  • FIG. 2D3 is a schematic representation illustrating the depthless method of image recomposition according to the principles of the present invention, carried out within the dual-GPU embodiment of the parallel graphics rendering system shown in FIG. 2C, wherein partial complementary images generated and buffered within GPPL1 and GPPL2 are recomposited (i.e. combined) by merging, in puzzle-like manner, to form a full color image frame of the 3D scene, without using any depth value information stored in the Z buffers of these GPPLs;
  • FIG. 2E1 is a schematic representation illustrating the depthless method of image recomposition according to the principles of the present invention, carried out within an eight-GPU embodiment of the parallel graphics rendering system, wherein (1) during the first level of hierarchical image merging involves four sub-stages of image merging, namely, (i) the partial complementary image generated and buffered within GPPL1 is merged with the partial complementary image generated and buffered within GPPL2 without using any depth value information stored in the Z buffers of these GPPLs, (ii) the partial complementary image generated and buffered within GPPL3 is merged with the partial complementary image generated and buffered within GPPL4 without using any depth value information stored in the Z buffers of these GPPLs, (iii) the partial complementary image generated and buffered within GPPL5 is merged with the partial complementary image generated and buffered within GPPL6 without using any depth value information stored in the Z buffers of these GPPLs, and (iv) partial complementary image generated and buffered within GPPL7 is merged with the partial complementary image generated and buffered within GPPL8 without using any depth value information stored in the Z buffers of these GPPLs, wherein (2) during the second level of hierarchical image merging, (i) the partial complementary image recomposited and buffered within GPPL2 is merged with the partial complementary image generated and buffered within GPPL4 without using any depth value information stored in the Z buffers of these GPPLs, and (ii) the partial complementary image recomposited and buffered within GPPL6 is merged with the partial complementary image generated and buffered within GPPL8 without using any depth value information stored in the Z buffers of these GPPLs, and wherein (3) during the third level of hierarchical image merging, the partial complementary image recomposited and buffered within GPPL4 is merged with the partial complementary image generated and buffered within GPPL8 (the primary GPPL) without using any depth value information stored in the Z buffers of these GPPLs, so as to generate a complete color image frame of the 3D scene within GPPL 8, without using any depth value information stored in the Z buffers of these GPPLs;
  • FIG. 2E2 is a flow chart illustrating the primary steps of the depthless method of recompositing image frames of a 3D scene from partial complementary images, carried out over n hierarchical levels or stages of using depthless complementary image merging operations, wherein at each (n−1)th level, pairs of source and target partial complementary images are merged into a target complementary image, for use at the nth level of processing, according to the principles of the present invention;
  • FIG. 2E3 is a flow chart illustrating the complementary image merging process carried out between a pair of partial complementary images buffered in the color buffers of a pair of GPPLs, wherein the addition of all pixels of source image and target images occurs within the target GPPL using its pixel shader processor running the shader merge code, and wherein the image merge result may become the source image for the next hierarchical step in the multi-level complementary image merging process of the present invention;
  • FIG. 3A1 is a high-level flow chart illustrating a first illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during the first special rendering pass (i.e. GDM Creation Pass), generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data to all GPPLs equally for pixel depth (z) testing, (b) during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention, and (c) after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3D3, 2E1, 2E2 and 3E3;
  • FIG. 3A2 is a schematic representation illustrating the three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves during the special rendering pass (i.e. GDM creating pass), providing a Global Data Map (GDM) to the Z buffer of each GPPL involving the transmission of graphics commands and data to all GPPLs for all objects in the frame of the 3D scene to be rendered, (b) the second stage involves, for subsequent passes, generating a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs, and (c) the third phase involves recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention;
  • FIG. 3A3 is a graphical representation of a Hash Table (3112) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first phase of the method of FIG. 3A4, and a Current State Buffer (4111) for storing a draw command;
  • FIG. 3A4 is a flowchart illustrating the steps performed during the first illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3A1, with the pixels of objects assigned to a GPPL being normally rendered in color within the GPPL;
  • FIG. 3B1 is a high-level flow chart illustrating a second illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during a first special rendering pass (i.e. GDM Creation Pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs, (b) during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention, and (c) recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3D3, 2E1, 2E2 and 3E3;
  • FIG. 3B2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves, during a first special pass (i.e. GDM creating pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs, (b) the second stage involves generating, for subsequent passes, a complementary-type partial images within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs, and (c) the third stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention;
  • FIG. 3B3 is a graphical representation of a Hash Table (3112) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 3B4A and 3B4B, and a Current State Buffer (4111) for storing a draw command;
  • FIGS. 3B4A and 3B4B are flowcharts illustrating the steps performed during the second illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3B1, wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 4A is a third illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during each pass of the multi-pass method, (i) generating global depth map (GDM) values for each debuted object transmitted to each GPPL, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs, thereby generating complementary-type partial images in each GPPL, and (c) after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3D3, 2E1, 2E2 and 3E3;
  • FIG. 4B is a schematic representation illustrating the two primary stages of the third illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves (i) during each pass of the multi-pass method, generating global depth map (GDM) values for each debuted object transmitted to each GPPL, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii), rendering in color the pixels of all objects sent to assigned-GPPLs, and (b) the second stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention;
  • FIG. 4C is a graphical representation of a Hash Table (5112) in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 4D1 and 4D2, and a Current State Buffer (5111) for storing a draw command;
  • FIGS. 4D1 and 4D2 are flowcharts illustrating the steps performed during the third illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A, wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 5A is a fourth illustrative embodiment of the method of parallel graphics processing according to the present invention, comprising the steps of (a) during a first special Ambient Light Pass of the multi-pass method, generating a global depth map (GDM) within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers, (b) during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs), and (c) after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3D3, 2E1, 2E2 and 3E3;
  • FIG. 5B is a schematic representation illustrating the three primary stages of the third illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein (a) the first stage involves of (a) during a first special Ambient Light Pass of the multi-pass method, generating a global depth map (GDM) within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers, (b) the second stage involves, during subsequent passes, generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color.(i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs), (c) the third stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention;
  • FIG. 5C is a graphical representation of a Hash Table in which each entry holds the state of a primitive, which is not assigned to any GPU, for tracking the appearance of object primitives during the first stage of the methods of FIGS. 5D1 and 5D2, and a Current State Buffer (5111) for storing a draw command;
  • FIGS. 5D1 and 5D2 are flowcharts illustrating the steps performed during the fourth illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A, wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black);
  • FIG. 6A is a schematic representation of PC-based host computing system of the present invention (a) embodying an illustrative embodiment of the parallel 3D graphics processing system (PGPS) of the present invention illustrated throughout FIGS. 2A through 5D, and (b) comprising (i) a parallel mode control module (PMCM), (ii) a parallel graphics processing subsystem for supporting the parallelization stages of decomposition, distribution and re-composition implemented using a decomposition module, a distribution module and a re-composition module, respectively, and (ii) a plurality of either GPU and/or CPU based graphics processing pipelines (GPPLs) operated in a parallel manner under the control of the PMCM;
  • FIG. 6B1 is a schematic representation of the subcomponents of a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the PGPS of the present invention depicted in FIG. 6A, shown comprising (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers;
  • FIG. 6B2 is a schematic representation of the subcomponents of a second illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) that can be employed in the PGPS of the present invention depicted in FIG. 6A, shown comprising (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers;
  • FIG. 6B3 is a schematic representation of the subcomponents of an illustrative embodiment of a CPU-based graphics processing pipeline that can be employed in the PGPS of the present invention depicted in FIG. 6A, and shown comprising (i) a video memory structure supporting a frame buffer including stencil, depth and color buffers, and (ii) a graphics processing pipeline realized by one cell of a multi-core CPU chip, consisting of 16 in-order SIMD processors, and further including a GPU-specific extension, namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline;
  • FIG. 6C is a schematic representation illustrating the pipelined structure of the parallel graphics processing system (PGPS) of the present invention shown driving a plurality of GPPLs, wherein the decomposition module supports the scanning of commands, the control of commands, the tracking of objects, the balancing of loads, and the assignment of objects to GPPLs, wherein the distribution module supports transmission of graphics data (e.g. FB data, commands, textures, geometric data and other data) in various modes including CPU-to/from-GPU, inter-GPPL, broadcast, hub-to/from-CPU, and hub-to/from-CPU and hub-to/from-GPPL, and wherein the re-composition module supports the merging of partial image fragments in the Color Buffers of the GPPLs in a variety of ways, in accordance with the principles of the present invention (e.g. merge color frame buffers without z buffers, merge color buffers using stencil assisted processing, and other modes of partial image merging);
  • FIGS. 7A1A and 7A1B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A, the primary steps of the method of FIG. 3A4 are implemented;
  • FIG. 7A2 is a schematic representation illustrating the three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • FIGS. 7B1A and 7B1B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A, the primary steps of the methods of FIGS. 3B4A and 3B4B are implemented;
  • FIG. 7B2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • FIGS. 7C1A and 7C1B are flowcharts illustrating in which modules of the parallel graphics processing system of FIG. 6A, the primary steps of the methods of FIGS. 4D1 and 4D2 are implemented;
  • FIG. 7C2 is a schematic representation illustrating the two primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • FIGS. 7D1A and 7D1B are a flowchart illustrating in which modules of the parallel graphics processing system of FIG. 6A, the primary steps of the methods of FIGS. 5D1 and 5D2 are implemented;
  • FIG. 7D2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention, wherein the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs;
  • FIG. 8A is a schematic representation of a first illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem resides as a software package in the Host or CPU Memory Space (HMS) while multiple GPUs on external GPU cards are connected to a North bridge circuit, implement the Rendering and Recomposition Modules, and are driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iii) the Distribution Module uses the North bridge circuit to distribute graphic commands and data (GCAD) to the external GPUs, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to an external graphics card via a PCI-express interface;
  • FIG. 8B is a schematic representation of a second illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition and Distribution and Modules of the Parallel Graphics Rendering Subsystem resides as a software package in the Host or CPU Memory Space (HMS) while the Rendering and Recomposition Modules are realized across multiple GPUs connected to a bridge circuit (having an internal IPD) as well as on external graphic cards connected to the North memory bridge chip and driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iii) the Distribution Module uses the bridge chip to distribute the graphic commands and data (GCAD) to the multiple GPUs located on the external graphics cards, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to one of the external graphics cards or the IGD;
  • FIG. 8C is a schematic representation of a third illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) 400 and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host Memory Space (HMS) while a single GPU is supported on a CPU/GPU fusion-architecture processor die (alongside the CPU), one or more GPUs are supported on an external graphic card connected to a bridge circuit and driven in a parallelized manner under the control of the PMCM, and the Rendering and Recomposition Modules are realized across the GPUs on the graphics card (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iii) the Distribution Module uses the memory controller (controlling the HMS) and the interconnect network (e.g. crossbar switch) within the CPU/GPU processor chip to distribute graphic commands and data to the multiple GPUs on the CPU/GPU die chip and on the external graphics cards, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses inter-GPU communication transport on the graphics card, as well as memory controller and interconnect (e.g. crossbar switch) within the CPU/GPU processor chip, to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to the external graphics card via a PCI-express interface connected to the bridge circuit;
  • FIG. 8D1 is a schematic representation of a fourth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host Memory Space (HMS) while a second cluster of CPU cores on a multi-core CPU chip function as a CPU and a second cluster of CPU cores are used to implement a plurality of multi-core graphics pipelines (GPPLs) (i.e. of Rendering Module) which are parallelized under the control of the PMCM, with the Re-composition Module being realized across a plurality of the GPPLs, (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iii) the Distribution Module uses the bridge circuit and interconnect network within the multi-core CPU chip to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented on the multi-core CPU chip, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses inter-GPU communication transport as well as the bridge and interconnect network within the multi-core CPU chip to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL (e.g. GPU) via a display interface;
  • FIG. 8D2 is a schematic representation of a fifth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Rendering Subsystem reside as a software package in the Host or CPU Memory Space (HMS) while a first cluster of CPU cores on the multi-core CPU chips on external graphics cards function as GPPLs and implement the Re-composition Module across a plurality of the GPPLs whereas a second cluster of CPU cores function as GPPLs and implement the Rendering Module, (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iii) the Distribution Module uses the North bridge circuit and interconnect networks within the multi-core CPU chips (on the external cards) to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented thereon, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses interconnect networks within the multi-core CPU chips to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL, via a display interface;
  • FIG. 8E is a schematic representation of a sixth illustrative embodiment of the MMPGRS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host or CPU Memory Space (HMS) while the Decomposition Submodule No. 2 and Distribution Module are realized within a single graphics hub device (e.g. chip) that is connected to the bridge circuit of the host computing system via a PCI-express interface and to a cluster of external GPUs via an interconnect, with the GPUs implementing the Rendering Module and Recomposition Modules and being driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Submodule No. 1 transfers graphic commands and data (GCAD) to the Decomposition Submodule No. 2 via the bridge circuit, (iii) the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iv) the Distribution Module distributes graphic commands and data (GCAD) to the external GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card;
  • FIG. 8F is a schematic representation of an seventh illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM), including the Distribution Management Submodule, and the Decomposition Module reside as a software package in the Host Memory Space (HMS) of the host computing system, while the Distribution Module and interconnect transport are realized within a single graphics hub device (e.g. chip) that is connected to the bridge circuit of the host computing system and a cluster of external GPUs implementing the Rendering and Recomposition Modules, and that all of the GPUs are driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode, (iii) the Distribution Management Module within the PMCM distributes the graphic commands and data (GCAD) to the external GPUs via the bridge circuit and interconnect transport mechanism, (iv) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (v) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vi) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card(s);
  • FIG. 8G is a schematic representation of a eighth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS) while the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on within a bridge circuit on the motherboard within the host computing system, with the Rendering Module and the Recomposition Module being implemented by a plurality of GPUs driven in a parallelized under the control of the PMCM, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, (iii) the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iv) the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface;
  • FIG. 8H is a schematic representation of a ninth illustrative embodiment of the PGPS of the present invention embodied in a PC-level computing system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS) while the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on the die of a hybrid CPU/GPU fusion-architecture chip within the host computing system and having a single GPU driven with one or more GPUs on an external graphics card (connected to the CPU/GPU chip) in a parallelized under the control of the PMCM, and GPUs on the external graphics card are used to implement the Recomposition Module, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, (iii) the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iv) the Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface;
  • FIG. 8I is a schematic representation of a tenth illustrative embodiment of the PGPS of the present invention embodied in a game console system, showing (i) that the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 are realized as a software package within the Host Memory Space (HMS), while the Decomposition Submodule No. 2 and the Distribution Module are realized as a graphics hub semiconductor chip within the game console system, and the Rendering and Recomposition Modules are implemented by multiple GPPLs supported on the game console board and driven in a parallelized manner under the control of the PMCM, (ii) the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, via the memory controller on the multi-core CPU chip and the interconnect in the graphics hub chip of the present invention, (iii) the Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode, (iv) the Distribution Module distributes the graphic commands and data (GCAD) to the multiple GPUs, (v) the Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention, (vi) the Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages, and finally (vii) the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU via an analog display interface.
  • DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS OF THE PRESENT INVENTION
  • Referring now to FIGS. 3A through 8I in the accompanying Drawings, the various illustrative embodiments of the 3D parallel graphics rendering system (PGPS) and 3D parallel graphics rendering process (PGRP) of the present invention will now be described in great technical detail, wherein like elements will be indicated using like reference numerals.
  • In general, one aspect of the present invention teaches a new way of and means for recompositing images of 3D scenes, represented within a 3D parallel graphics rendering subsystem (PGPS) supporting object division parallelism among its multiple graphics processing pipelines (GPPLs), but without performing pixel z-depth comparisons, which are otherwise required by prior art systems and processes. By virtue of the depthless image recompositing technique of the present invention, the performance of 3D graphics rendering processes and subsystems can be significantly improved using the principles of the present invention.
  • In general, the image recomposition method of the present invention can be practiced within any parallel graphics processing system (PGPS) having multiple GPPLs driven in an object division mode of parallelism, or hybrid mode of parallel operation employing a combination of object and image/screen division techniques and/or principles. In the illustrative embodiment, the method of the present invention is embodied within a system employing objective-division mode of parallelism.
  • The image recomposition method and apparatus of the present invention can be practiced within conventional computing platforms (e.g. PCs, laptops, servers, etc.) as well as silicon level graphics systems (e.g. graphics system on chip (SOC) implementations, integrated graphics device IGD implementations, and hybrid CPU/GPU die implementations).
  • Generalized Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • As indicated in FIG. 2A, the generalized embodiment of the method of parallel graphics processing according to the present invention comprises several steps, namely: (a) at the first step 221, providing a Global Depth Map (GDM) to each GPPL, for each image frame in the 3D scene 231 be generated, for use in rendering partial images of the 3D scene along a specified viewing direction; (b) at the second step 222, generating complementary-type partial images in each GPPL, using the GDM and the object division based parallel rendering process according to the present invention; and (c) at the third step 223, recompositing a complete image frame of the 3D scene using the depthless image recomposition (DIR) process of the present invention illustrated in FIGS. 3B1 and 3B2 (i.e. without the use of depth comparison).
  • In practice, the method of parallel graphics processing can involve, and will typically employ, multiple parallel graphics rendering passes, so that most, but not necessary all, illustrative embodiments of the method of the present invention will be “multi-pass” in character and nature. Also, in some illustrative embodiments of the present invention, steps (a) and (b) can be integrated within a single pass of a multi-pass method of parallel graphics processing, while steps (b) and (c) can be carried out in subsequent passes of the multi-pass parallel graphics rendering process of the present invention. This different embodiments are described in FIGS. 3A1 through 5D.
  • In order to create the Global Depth Map (GDM) in all GPUs, depth values of all objects of the scene must be imported to each GPU, and stored in its Z buffer. This is done as follows: while all objects originally designated to the GPU are drawn normally, the other GPU's objects are brought to the GPU for their depth values only, ignoring their color and texture. Thus while the Z-buffer is being updated for object's depth, only a black silhouette of the imported object is drawn in the color buffer, as indicated in FIG. 2D1. This method of “black rendering” is inexpensive because it avoids altogether the heavy processing associated with shading, texturing, and other pixel processing required during normal drawing operations with color values.
  • According to the recomposition method of the present invention, each object that is designated/assigned to a particular GPU, must also be imported to other GPUs for “black rendering” purposes, i.e. so as to update the Global Depth Map GDM being stored in the Z buffers of other non-designated GPUs. Those pixels of a “black rendered” object that have passed the Z test, are drawn in color buffers as black silhouette of the object. For multi-pass applications, when an object may be rendered several times (using several rendering passes), only its first appearance is used for updating the Global Depth Map (GDM) for a given image frame generated from a 3D scene. The graphics data associated with all additional appearances of the object in successive passes will be sent only to the designated GPU, and not to the non-designated GPUs. Therefore the system of the present invention provides a mechanism for tracking an object throughout its all successive appearances.
  • Dual-GPU Embodiment of the Parallel Graphics Processing System of the Present Invention Carrying Out a Method of Depthless Image Recomposition (DIR) Based an Object Division (OD) Mode of Parallel Graphics Processing Operation
  • In FIG. 2B, a 3D scene is shown modeled within a dual-GPU embodiment of the parallel graphics processing system of FIG. 2C, adapted to carry out a method of Depthless Image Recomposition (DIR) according to the present invention based, an object division (OD) mode of parallel graphics processing operation. As shown, dual GPUs (GPU1 and GPU2) are provided, and three objects A, B and C are shown against a rectangular background frame. As shown, cylindrical object B is occluded/obstructed by the cubic object A along the indicated view point within the coordinate reference system X-Y-Z. The 3D scene is decomposed within the 3D dual-GPU based parallel graphics rendering system such that object A is assigned to GPU 1 while objects B and C are assigned to GPU2. The partial complementary-type images of the 3D scene are rendered in the GPU1 and GPU2 and stored in their respective Color Buffers, and finally recomposited within GPU1 without using the global depth map (GDM) maintained within the Z Buffers of the GPUs.
  • Referring to FIG. 2C, the three primary stages of the generalized method of the present invention are illustrated being carried out on the dual-GPU embodiment of the parallel graphics processing system of the present invention, operating in an object division (OD) mode of operation according to the present invention. In this embodiment, GPPL1 includes (i) a GPU1 having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer. Also, GPPL2 includes (i) a GPU2 having a geometry subsystem, a rasterizer, and a pixel subsystem with a pixel shader and raster operators including a Z test operator, and (ii) video memory supporting a Z (depth) Buffer and a Color Buffer.
  • As illustrated at Block 231, the first stage involves providing a Global Data Map (GDM) to the Z buffer of each GPU, by transmitting graphics commands and data to all GPPLs.
  • As illustrated at Block 232, the second stage involves generating a complementary-type partial images within the color buffer of each GPU using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs.
  • As illustrated at Block 233, the third stage involves recompositing a complete image frame within the primary GPU, from the complementary-type partial images stored in the color buffers, using the depthless recomposition process of the present invention, Notably, the image recompositing stage (233) is performed after all the intra and inter-GPU Z-tests have been completed, making the final comparison of Z-buffers needless. Therefore, for the case of dual GPUs (i.e. GPU1 and GPU2), the recompositing process of the present invention involves only merging the color Frame Buffers of GPU1 and GPU2, and no depth comparison operations are involved. The depthless image recomposition process as will be described below with reference to FIGS. 2D1 through 2E3.
  • Complementary-Type Partial Image Generation Process of the Present Invention Carried Out within the GPUS of the Dual-GPU Embodiment of the Parallel Graphics Rendering System of the Illustrative Embodiment
  • Referring now to FIGS. 2D1 through 2D3, the Complementary-Type Partial Image Generation Process of the present invention is graphically illustrated in connection with the dual-GPU embodiment of the parallel graphics rendering system of the illustrative embodiment, supporting GPU1 and GPU2.
  • FIG. 2D1 illustrates the complementary-type partial image generation process of the present invention carried out within GPU1 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C. During this stage, a Global Depth Map (GDM) is generated within the Z Buffer for all objects within the 3D scene (showing three different depth values namely the background having the highest depth (2415). As shown, the object A is closest to the viewer, has the lowest depth value (2416), and its pixels have passed the Z-test and their depth values are written to the Z Buffer of GPU1. Object C (2414) has a middle depth value, its pixels have passed the z-test filter, and their depth values are written to the Z buffer of GPU1. Also, object B has the deepest depth values, its pixels have all failed the z-test and their depth values have been replaced by the depth values of its occluding object A (2416) written in the Z Buffer in GPU1. As shown, a color-based complementary-type partial image is generated within the Color Buffer of GPU1 by recompositing (iii) the pixels of assigned object A rendered/drawn in color, (ii) the pixels of non-assigned object C drawn without color (i.e. black), and (iii) the pixels of assigned object B which are overwritten by the color pixels of the assigned occluding object A, which is closer to the viewer than object B.
  • FIG. 2D2 illustrates the complementary-type partial image generation process of the present invention carried out within GPU2 of the dual-GPU embodiment of the parallel graphics rendering system of FIG. 2C. As shown, the Z Buffer in GPU2 holds a Global Depth Map (2422) which is identical to those depth values of the GDM held in the Z Buffer of GPU1 (2411). Notably, as will be shown in the illustrative embodiments, there are different methods of implementing the GDM within depth buffers of the GPPLs within any given parallel graphics processing platform. Also, as shown a color-based complementary-type partial image is generated within the Color Buffer of GPU2 by recompositing (i) the pixels of non-assigned objects A rendered/drawn without color (i.e. black), (ii) the pixels of assigned object C drawn with color, and (iii) the pixels of non-assigned object B which are overwritten by the colorless (i.e. black) values of non-assigned object A, which is closer to the viewer than object B.
  • As shown in FIG. 2D3, the depthless method of image recomposition according to the principles of the present invention, is carried out within the dual-GPU embodiment of the parallel graphics rendering system shown in FIG. 2C, by simply combining, in puzzle-like manner, through merging, the partial complementary images generated and buffered within GPPL1 and GPPL2 so as to form, the Color Frame Buffer of GPU1 (i.e. the primary GPU), a full color image frame of the 3D scene, without using any depth value information stored in the Z buffers of these GPUs. In parallel graphics rendering systems employing more than two GPUs, the depthless image recomposition process according to the present invention involves performing a hierarchical complementary recomposition process, as illustrated in FIGS. 2E1, 2E2 and 2E3, described below.
  • The Depthless Method of Image Recomposition According to the Principles of the Present Invention Carried Out within an Eight-GPU Embodiment of the Parallel Graphics Processing System of the Present Invention
  • In FIG. 2E1, the depthless method of image recomposition according to the principles of the present invention is shown carried out, in a hierarchical manner, within an eight-GPPL (e.g. 8-GPU) embodiment of the parallel graphics processing system. The process is performed hierarchically in “log2n” merging steps, where n is the number of GPPLs employed in the parallel graphics processing platform. At each stage of the hierarchical process, the partial complementary color images in the Color Buffers of pairs of GPPLs (identified as source GPPL and target GPPL) are merged without the use of any depth value information. Therefore, there are no depth (Z) buffers involved in the depthless image recomposition process according to the principles of the present invention.
  • In the exemplary case of 8 GPPLs (e.g. GPUs) illustrated in FIG. 2E1, there are three (3) hierarchical levels of merge (i.e. log 28=3). In the highest level of the hierarchy, the final image ends up in the primary GPPL (i.e. GPU). At all levels, the source and target images, buffered in the source and target GPPLs, are complementary-type images, in accordance with the principles of the present invention, i.e. at a given x,y position in an image, at most only one GPPL can hold a non zero pixel value (i.e. the visible pixel) which has survived the z-test against the GDM stored in the Z Buffers of all GPPLs. All other GPPLs hold PLX(x,y)=0. At this juncture, it will be appropriate to describe this hierarchical depthless image recomposition process in greater detail below with reference to the three-tier hierarchical example set forth in FIGS. 2E1 through 2E3,
  • As shown in FIG. 2E1, during the first level of hierarchical image merging, the following operations are performed: (i) the partial complementary image generated and buffered within GPPL1 is merged with the partial complementary image generated and buffered within GPPL2 without using any depth value information stored in the Z buffers of these GPPLs; (ii) the partial complementary image generated and buffered within GPPL3 is merged with the partial complementary image generated and buffered within GPPL4 without using any depth value information stored in the Z buffers of these GPPLs; (iii) the partial complementary image generated and buffered within GPPL5 is merged with the partial complementary image generated and buffered within GPPL6 without using any depth value information stored in the Z buffers of these GPPLs, and (iv) partial complementary image generated and buffered within GPPL7 is merged with the partial complementary image generated and buffered within GPPL8 without using any depth value information stored in the Z buffers of these GPPLs.
  • During the second level of hierarchical image merging, the following operations are performed: (i) the partial complementary image recomposited and buffered within GPPL2 is merged with the partial complementary image generated and buffered within GPPL4 without using any depth value information stored in the Z buffers of these GPPLs; and (ii) the partial complementary image recomposited and buffered within GPPL6 is merged with the partial complementary image generated and buffered within GPPL8 without using any depth value information stored in the Z buffers of these GPPLs.
  • During the third level of hierarchical image merging, the partial complementary image recomposited and buffered within GPPL4 is merged with the partial complementary image generated and buffered within GPPL8 (the primary GPPL) without using any depth value information stored in the Z buffers of these GPPLs, so as to generate a complete color image frame of the 3D scene within GPPL 8, without using any depth value information stored in the Z buffers of these GPPLs.
  • The Depthless Method of Recompositing Image Frames of a 3D Scene from Partial Complementary Images, Carried Out Over N Hierarchical Levels or Stages of Using Depthless Complementary Image Merging Operations
  • In FIG. 2E2, a generalized method of depthless recompositing image frames of a 3D scene from partial complementary images is described using a parallel graphics processing platform having n GPPLs, and wherein image merging occurs at log2n hierarchical levels or stages. At each level, pairs of source and target images are merged into target image (25230). In general, the process can be carried out over n hierarchical levels of depthless complementary image merging operations, wherein at each (n−1)th level, pairs of source and target partial complementary images are merged into a target complementary image, for subsequent use at the nth level of processing, according to the principles of the present invention.
  • As indicated at Block 25232, the first step of the method involves the system commencing of partial complementary image merge processing, at the first hierarchical level.
  • At Block 25230, for each pair of source and target images, the system employs the process illustrated at FIG. 2E3 to calculate: imagetarget=imagesource+imagetarget.
  • At Block 25234, the system determines whether or not the last hierarchical level is completed. If the last level is not completed, then at Block 25233, the system moves up or increments the recomposition hierarchy and returns to Block 25230 and performs the same operation for each pair of source and target images, namely: imagetarget=imagesource+imagetarget. If at Block 25235, the system determines that the last level is completed, then at Block 25233, the system determines that the final image frame recomposition result is stored in the Color Buffer in the primary GPPL, ready for rendering transparent objects of scene in a single GPPL, and subsequent display on the display devices supported by the system.
  • The Complementary Image Merging Process of the Present Invention Carried Out Between a Pair of Partial Complementary Images Buffered in the Color Buffers of a Pair of GPPLs
  • FIG. 2E3 illustrates the complementary image merging process carried out between a pair of partial complementary images buffered in the color buffers of a single pair of GPPLs. The addition of all pixel values in the source image (tex2) and the target image (tex1) occurs within the target GPPL using its pixel shader processor running the shader merge code (25346). Notably, in hierarchical processes, the image merge result (tex1) may become the source image (tex2) for the next hierarchical step in the multi-level complementary image merging process of the present invention.
  • As indicated at Blocks 25342 and 25343, partial complementary-type color images are rendered in the target and source GPPLs, according to the principles of the present invention, and stored in the color Frame Buffer of the GPPLs.
  • As indicated at Blocks 25344 and 25345, the partial complementary-type color images are copied from the color Frame Buffer in the target and source GPPLs, into their respective texture memory, and indicated as “tex1” and “tex2” images, respectively.
  • As indicated at Block 25346 in the target GPPL, the Shader's merge code (program) is downloaded and run using “tex1” and “tex2” images, and performs the operations indicated at Blocks 25347 through 25350, which will be described below.
  • As indicated at Block 25347, the merge code program analyzes the next x,y location in the “tex1” and “tex2” images, and at Block 25348, for each set of corresponding x,y values in these tex1 and tex2 images, the merge code program makes a new pixel value according to the formula: PXLtex(x,y)=PXLtex1(x,y)+PXLtex2(x,y)
  • At Block 25349, the program determines whether or not all of the x,y locations of the image have been recomposited, and if not, then the process returns to Block 25347 and repeats the pixel merging process for the next x,y image frame location. If all x,y locations in the image frame have been processed (i.e. merged), thes the program moves the merged image tex1 to the color buffer in the primary GPPL, and the process is completed for the particular image frame being generated for display.
  • There are various ways of and means for practicing the method of parallel graphics processing according to the present invention, illustrated in Figs in 2A through 2E3. Fourth illustrative embodiments of the method and system of the present invention will be described in detail below. Thereafter, various system architectures for implementing the method and system of the present invention will be specified in great detail.
  • Overview on Different Methods of Implementing Global Depth Maps (GDMs) within the GPPLs of Parallel Graphics Processing Systems
  • In accordance with the present invention, there are four illustrative methods of the providing a global depth map (GDM) to the Z Buffer of GPPLs of a parallel graphics processing system, in accordance with the principles of the present invention, namely: (i) a first method called “the Special GDM-Creation Pass version”, wherein all Z values are distributed to all GPUs during a special single first pass (i.e. “Global Depth Map Creation Pass”) performed at the beginning of each frame, so as to generate a GDM for the image frame, stored within the Z buffer of each GPPL; (ii) a second method “the Special GDM-Creation Pass, with color rendering of debuted objects in selected GPU,” which is a variation of the ‘GDM Creation Pass’ method described above, wherein the difference is that the Global Depth Pass includes also normal color rendering of each debuted object in selected GPU, in addition to the updating of the Global Depth Map (GDM) in all GPUs.
  • (iii) a third method called the “Regular Course GDM Creation version”, wherein the Z values of each object are distributed to their designated/assigned GPUs during the regular course of normal rendering in a graphics application; and (iv) a fourth method called the Application Provided GDM version, wherein the graphics application generates a GDM for its own purposes, e.g. for Shadow Volumes, and provides the GDM to the GPPLs for use in graphics rendering operations in accordance with the principles of the present invention.
  • First Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • In FIG. 3A1, a first illustrative embodiment of the method of parallel graphics processing according to the present invention is shown and described as comprising three primary steps, indicated at Blocks 3111, 3112 and 3113 in FIG. 3A1.
  • As indicated at Block 3111, during the first special rendering pass (i.e. the GDM Creation Pass) 3111, a global depth map (GDM) is generated within each GPPL, by a process involving the broadcasting of graphics commands and data to all GPPLs equally, for pixel depth (z) testing. This first special rendering pass occurs once, for each image frame to be rendered, during the multi-pass graphics rendering method of the present invention. As will be described in greater detail hereinafter, the first special GDM creation pass indicated at Block 3111 in FIG. 3A1 employs an object tracking mechanism comprising a current state buffer 4111, 5111, and a hash table of states (4112, 5112), illustrated in FIGS. 3A3(a) and 3A3(b). The current state buffer is used to hold the current state, and is updated by draw commands and state commands. The Hash table of states is used to register the first appearance of all objects (i.e. each entry in the hash table is considered a full state of an object).
  • As indicated at Block 3112 in FIG. 3A1, during subsequent passes, a complementary-type partial image is generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention.
  • As indicated at Block (3113), after the final pass, recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 2D3, 2E1, 2E2 and 3E3.
  • The First Illustrative Embodiment of the Method of the Present Invention, Carried Out on a Dual-GPU Embodiment of the Parallel Graphics Processing System of the Present Invention
  • FIG. 3A2 illustrates the graphics pipeline activity along three primary stages of the first illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention. During the first stage indicated at 3121, the GDM creation pass is performed and the GDM is provided within the Z buffer of each GPU. During the second stage indicated at Block 3122, multiple rendering passes are performed the partial complementary-type color images are generated in the color buffers of the Color Buffer. Then, during the third stage indicated at Block 3123, depthless compositing is performed. During this stage, decompositing of objects and load balancing is controlled by a software based decomposition module residing in the host, as will be described in greater detail hereinafter. For simplicity of explication, the following example considers the case of a parallel graphics processing system employing only two GPUs, however it can be extended to any number of GPUs.
  • As shown, the first stage at Block 3121 involves, during the special rendering pass (i.e. GDM creating pass), providing a Global Data Map (GDM) to the Z buffer of each GPPL. During this first phase of the method, graphics commands and data are transmitted to all GPPLs (i.e. equally broadcasted to both GPUs) for all objects in the frame of the 3D scene to be rendered, as indicated by the broken-line arrows. The goal is performing the Z-test on all objects and populating the Z-buffer (3126) in each GPU without any drawing into the color buffer. The final result of this pass is the GDM stored in Z-buffer, as a reference for all Z-tests in the subsequent passes.
  • When carrying out the method of the present invention, each entry of the Hash Table (3132) in FIG. 3A3 holds the state of a primitive (object), which is not assigned to any GPU, for tracking the “appearance(s)” of object primitives. The Current State Buffer (3131) is provided for storing a draw command.
  • As used herein, the term primitive object, or simple “object”, is a group of one or more primitive graphics elements, drawn by a single draw call. A primitive graphics element generally refers to a basic shape, such as point, line, or triangle. The appearance of the object is defined by the state of the object, that includes information on its vertex array, index array, vertex shader parameters, pixel shader parameters, transformation matrix, skinning transformation matrix, and state parameters (e.g. RenderState-blending related, SamplerState-filter, etc.). The entire state defines the exact appearance of the object in the scene. For example, the same character (e.g. soldier), geometrically defined by given vertex and index buffers, can appear in a graphics game several times in various locations and forms by just modifying its transformation matrix, i.e. modifying its state.
  • The state of an object is shaped by two commands: the State command, and the Draw Primitive command. The current state of an object is an accumulation of these two commands. The appearance of an object in the stream of geometric data is considered as a first appearance (or debut), only if this exact state did not occur (i.e. happen) before in the system. An additional appearance of an object is considered a successive appearance if, and only if, it appears in exactly the same state as it had before. A modified state creates another first appearance of object.
  • This first pass creates global depth maps (GDMs) in all GPUs by delivering the depth value of each object to the Z buffers. The depth value of an object is registered in the global depth map (GDM) for only the first appearance of an object. Therefore, during this first GDM creation pass where no color rendering occurs (i.e. writing into the Color FB is disabled), all draw commands are scanned for the first appearance of each object, which is represented by the current State Buffer (4111). While the State Buffer is being registered in an entry of the Hash Table (4112), the object is sent to all GPUs for Z-testing and updating of the Global Depth Map in Z-buffers. Writing into the color FB is disabled. Upon completion of this pass of objects' debuts all GPUs hold global depth map. The successive passes keep behaving according to the original application's schedule.
  • As indicated at Block 3122, the second stage involves, during subsequent passes, generating complementary-type partial images within the color buffer of each GPPL. This step involves using the GDM and the Z Test Filter in each GPU, and transmitting graphics commands and data to only assigned GPPLs, as indicated by the solid-line arrows. During such subsequent rendering passes, the scene is decomposed between GPUs. The exact decomposition of objects may change from pass to pass, according to dynamic load balance considerations. Each GPU renders incoming objects (3128) into its Color Buffer (3130), using the graphics commands and data associated with the objects, while z-testing the pixel depth values of each object against the GDM stored in the Z-buffer 3129.
  • As indicated at Block 3123, the third or last phase involves recompositing a complete image frame within the primary GPPL (i.e. GPU1), from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention, described hereinabove. This depthless recompositing process involves moving the complementary partial image in the secondary color buffer, into the primary color buffer of GPU1 and merging these partial images in accordance with the principles of the present invention, and then displaying the partial image fragments.
  • The First Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • FIG. 3A4 illustrates the first illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3A1. In this illustrative embodiment, the single specialized GDM creation pass is carried out in the Block 3121. Thereafter, during subsequent passes, the pixels of objects, assigned to a GPPL, are normally rendered in color within the GPPL by the steps indicated at Blocks 3122. Then, by way of the steps within Block 3123, the partial color complementary-type images are recomposited within the primary GPU, then the fully composited image within the primary GPU is displayed on the display device. The details of this process will now be described with reference to FIG. 3A4.
  • During the Blocks 31401 through 31412, the extra pass, called Global Depth Pass, is performed so as to create the GDM in all GPUs. As indicated in FIG. 3A4, the first Block 3121 in FIG. 3A1 is realized by Blocks 31401 through 31412.
  • As indicated at Block 31401, the pass starts by initializing the color buffers with black color values and scanning all the graphics commands for the frame to be rendered from the 3D scene, from a specified viewing direction. At Block 31402, the CPU analyzes the stream of commands associated with the image frame to be rendered, and when the end of the command stream is detected, the process moves to the multi-pass rendering stage 3122, and while the end of the command stream is not detected, then the process proceeds to Block 31403. When a ‘State command’ is encountered at Block 4203, it is used to update (4204) the current state buffer (4111). When a ‘Draw primitive’ command is found at Block 4205, the current state of the object is updated in the state buffer at Block at 31406, and the Hash Table is scanned at Block 31407 for the appearance of the object. The object can be found in the Hash Table only if this is not its first appearance. In this case, the object is abandoned and the command stream examination resumes. If the current state is not in the Hash Table, the object's state in the Hash Table is updated at Block 31408. Then, at Block 31409, the “Disable Write” command is generated to the Color Frame Buffer (FB), and at Block 31410, the Disable Write command is sent to all GPUs. Then, at Block 31411, the Draw Primitive Command is broadcasted to all GPUs, and then at Block 31412, the object is colorlessly rendered in all GPUs (i.e. in black, which was the initialized color set at Block 31401). The result is an update of object's depth in the Global Depth Map in all GPUs, while the color Frame Buffer remains clear.
  • Upon completion of the Global Depth Pass, all the Z-buffers hold a complete Global Depth Map (GDM) based on depth values of all objects in the frame. From this point forward during the method, the GDM is used as a common reference for depth or z value testing.
  • When the end of the command stream is detected at Block 31402, the stream of commands is now scanned from the beginning, as indicated at Block 31415. Upon detection of a drawing command at 4215, objects are distributed among GPUs based on any possible scheme of load balance. At this step, there is no need to check with Hash Table. For every Draw Command, a load balance is calculated at Block 4216, and a GPU is chosen for the object. At Block 4218, finally the object is normally rendered in that GPU. The above sequence repeats for any number of passes required to render the frame.
  • The next step, at Block 31418, involves making hierarchical merges of the partial complementary-type images in all GPU color buffers. For a number of GPUs greater than two, the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU. Specifically, at Block 31419, the final merge of partial complementary images occurs in the color buffer of the primary GPU. Then at Block 31420, transparent objects (e.g. flames) and overlays (e.g. scores in computer games) are rendered in the primary GPU on top of composited color buffer, by the graphics-based application. Finally, at Block 31421, the image is moved out to the display unit.
  • Second Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • FIG. 3B1 is a high-level flow chart illustrating a second illustrative embodiment of the method of parallel graphics processing according to the present invention. This method is a variation of the ‘GDM Creation Pass’ method described above, wherein the difference is that the Global Depth Pass includes also normal color rendering of each debuted object in selected GPU, in addition to the updating of the Global Depth Map (GDM) in all GPUs.
  • As indicated at Block 3211, a first special rendering pass (i.e. GDM Creation Pass) involves (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs.
  • As indicated at Block 3B1, during subsequent passes, the method continues by generating complementary-type partial images in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention.
  • As indicated at Block 3213, after the final rendering pass, the method concludes by recompositing a complete image frame of the 3D scene using the depthless complementary image recomposition process of the present invention, illustrated in FIGS. 2D3, 2E1, 2E2 and 3E3.
  • Second Illustrative Embodiment of the Method of the Present Invention Carried Out on a Dual-GPU Embodiment of the Parallel Graphics Processing System of the Present Invention
  • FIG. 3B2 is a schematic representation illustrating the three primary stages of the second illustrative embodiment of the multi-pass parallel graphics processing method of present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention. For clarity of illustration, the following specification addresses the case of using only two GPUs, however, it is understood that the method it can be practiced on a parallel graphics processing system supporting any number of GPUs. Within the system, de-compositing of objects and load balancing is controlled by the software based Decomposition module residing in the host system.
  • The first stage indicated at Block 3221 involves, during a first special pass (i.e. GDM creating pass), (i) generating a global depth map (GDM) within each GPPL, by broadcasting graphics commands and data for all objects to all GPPLs indicated by solid-line and broken-line arrows, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs indicated by the dotted-line arrows, and (iii) rendering in color the pixels of all objects sent to assigned-GPPLs indicated by solid-line arrows.
  • During the first special GDM creation pass represented at Block 3121 in FIG. 3B2, all objects of the scene are delivered to each GPU. During this stage, the objects are separated into two classes: objects that are assigned to the GPU indicated by black arrows, and objects that are not assigned to the GPU indicated by broken-line arrows (i.e. assigned to other GPUs). The Z-test is performed equally on both classes of objects, while drawing to the color buffer is done selectively. Z-buffer is populated by z-tested depth values of all fragments, for assigned objects as well as non-assigned objects. The partial image fragments of assigned objects are drawing in within the color buffer, whereas partial image fragments of non-assigned fragments are drawn without color (black). The final result of this rendering pass is that (i) the Z Buffer of each GPU holds a GDM in its final state, whereas (ii) the Color Buffer of each GPU holds a complementary-type partial color image in its preliminary state.
  • The second stage indicated at Block 3222 involves performing multiple rendering passes, wherein during each subsequent rendering pass, a complementary-type partial images is generated within the color buffer of each GPPL using the GDM and the Z Test Filter, and transmitting graphics commands and data to only assigned GPPLs indicated by solid-line arrows. During each such rendering pass, the objects of the scene are decomposed between GPUs. The exact decomposition of objects may change from rendering pass to rendering pass, according to dynamic load balance considerations. Each GPU renders its incoming objects into color buffer, while performing z-test against the GDM in its Z-buffer.
  • The third stage indicated at Block 3223 is a stage of depthless recomposition, wherein, after the final rendering pass, a complete image frame is recomposited within the primary GPPL. This stage is performed using the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, and the depthless recomposition process of the present invention. During the recomposition process, all of the images in the frame buffers of the GPUs are scanned, pixel by pixel, and at each x,y coordinate, the color value of all GPUs are summed up and the result PXLfinal(x,y) (from GPU2) is moved to the x,y of the final image in the primary image buffer (i.e. GPU1). The final image is completed when all pixels are scanned. In a merging process involving only a single pair of GPPLs, as illustrated in FIG. 3B2, the addition of all pixels of source (tex2) and target (tex1) images occurs in the target GPPL (i.e. GPU1), by means of its pixel shader processor, running the shader's merge code. The merge result remains in the target GPPL, which may become a source for the next hierarchical step. In the case of the 2 GPU platform, GPU1 is the target GPPL, and the composited image in its color buffer are moved to the display device for display.
  • The Second Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • FIGS. 3B4A and 3B4B illustrate the steps performed during the second illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 3B1. In this illustrative embodiment, the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL, while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black).
  • The methods of FIGS. 3B4A and 3B4B differ from the method of FIG. 3A4, in that during the GDM Creation Pass, indicated at Block 3221, debuted objects are normally rendered in color in the color buffer of the selected GPU, in addition to the Global Depth Map (GDM) being updated in the Z buffers of all GPUs. These differences will become more apparent hereinafter.
  • As indicated at Block 32401 in FIGS. 3B4A and 3B4B, during the Blocks 31401 through 31412, the extra pass, called Global Depth Pass, is performed so as to create the GDM in all GPUs. As indicated in FIG. 3A4, the first Block 3121 in FIG. 3A1 is realized by Blocks 31401 through 31412.
  • As indicated at Block 31401, the pass starts by (initializing the color buffers with colorless values and) scanning all the graphics commands for the frame to be rendered from the 3D scene, from a specified viewing direction. At Block 31402, the CPU analyzes the stream of commands associated with the image frame to be rendered, and when the end of the command stream is detected, the process moves to the multi-pass rendering stage 3222, and while the end of the command stream is not detected, then the process proceeds to Block 31403. When a ‘State command’ is encountered at Block 4203, it is used to update (4204) the current state buffer (4111). When a ‘Draw primitive’ command is found at Block 4205, the current state of the object is updated in the state buffer at Block at 31406, and the Hash Table is scanned at Block 31407 for the appearance of the object. The object can be found in the Hash Table only if this is not its first appearance. In this case, the object is abandoned and the command stream examination resumes. If the current state is not in the Hash Table, the object's state in the Hash Table is updated at Block 31408. Then, at Block 31409, the load balance among the GPUs is calculated, the GPU selected,
  • Upon updating the Hash Table at Block 32408, a GPU is chosen according to any selected load balance scheme. In addition, the object is marked in the ‘Drawn’ list of debut objects (4309). This list assists to eliminate redundant drawings of objects that have been drawn the first time during the Global Depth Pass. A marked object will be cleared from the list in successive passes, the first time it is called for rendering. This call will be skipped while its entry in the list cleaned up. The object is then sent for normal color rendering to the designated GPU (4310).
  • As indicated at Block 32414, the next step of the method involves broadcasting the object to the rest of GPUs for Global Depth Map (GDM) update in Z buffers, and for drawing visible pixels in black into the color frame buffers. For this purpose, the current pixel shader program is adapted in these GPUs, for the alpha status of drawn object. Namely, whether the object is to be drawn with transparencies (alpha) or without. Therefore, according to the status of an object's alpha test, determined at Block 4314, there are two possible modifications which are made to the pixel shader: (i) a modification of the pixel shader for an opaque (i.e. black) object indicated at Block 32415; and (ii) a modification of the pixel shader for semi-transparent object, indicated at Block 32416. After modification of the pixel shaders in the GPUs, the draw command for the object is broadcasted to all GPUs (except of the designated GPU), for the purpose of black rendering, as indicated at Block 3211. Then as indicated at Block 32413, the original shaders in the GPUs are restored for regular color rendering.
  • Upon completion of the Global Depth Pass, all the Z-buffers in GPUs hold a complete Global Depth Map (GDM) for the image frame, based on depth values of all objects in the frame. This map is used as a common reference for depth tests performed in all successive rendering passes carried out in the multi-pass stage indicated at Block 3222.
  • As indicated at Block 32419, the stream of commands is then scanned from the beginning for successive rendering passes. At Block 32420, the end of drawing passes is determined by determining when the end of the graphics command stream occurs. When a Draw command is encountered at Block 32421, a search in ‘list of debuted objects’ is performed at Block 32423, by determining whether the object is marked in the “Drawn” List. If the object is found in the List, then the entry in the List is cleared a Block 32422, and rendering is skipped, and next Draw command in the line is handled. Otherwise, at Block 32424, a GPU is chosen according to load balance considerations. At Block 32425, the object commands is sent to the designated GPU for normal color rendering, and then the object is normally rendered in that GPU. The above sequence repeats for any number of passes required to render the frame.
  • The next step, at Block 32430, involves making hierarchical merges of the partial complementary-type images in all GPU color buffers. For a number of GPUs greater than two, the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU. Specifically, at Block 31431, the final merge of partial complementary images occurs in the color buffer of the primary GPU. Then at Block 31422, transparent objects (e.g. flames) and overlays (e.g. scores in computer games) are rendered in the primary GPU on top of composited color buffer, by the graphics-based application. Finally, at Block 32433, the image is moved out to the display unit.
  • Third Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • FIG. 4A describes the third illustrative embodiment of the multi-pass parallel graphics processing method of present invention, carried out on a parallel graphics processing system of the present invention.
  • As indicated at Block 411, the Global Depth Maps (GDMs) are generated in each of the GPUs during the regular course of a graphics application, instead of during an extra special pass (i.e. processing step) performed during the beginning of image frame processing. During each pass of the multi-pass rendering method, (i) global depth map (GDM) values are generated for each debuted object transmitted to each GPPL, (ii) the pixels of objects sent to non-assigned GPPLs are rendered without color (i.e. in black), and (iii) the pixels of all objects sent to assigned-GPPLs are rendered in color, thereby generating complementary-type partial images in each GPPL using the GDM.
  • As indicated at Block 412, after the final pass, a complete image frame of the 3D scene is recomposited using the depthless complementary image recomposition process illustrated in FIGS. 2D3, 2E1, 2E2 and 3E3.
  • The Method of the Present Invention Carried Out on a Dual-GPU Embodiment of the Parallel Graphics Processing System of the Present Invention
  • FIG. 4B illustrates the two primary stages of the third illustrative embodiment of the multi-pass parallel graphics processing method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention.
  • As indicated at Block 421, the first multi-pass rendering stage involves (i) during each pass of the multi-pass method, generating global depth map (GDM) values for each debuted object transmitted to each GPPL indicated by solid-line arrows, (ii) rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs indicated in dotted-line arrows, and (iii), rendering in color the pixels of all objects sent to assigned-GPPLs indicated by broken-line arrows. In the illustrative embodiments, re-compositing of objects and load balancing are controlled by a software-based Decomposition module residing in the host.
  • This stage of multi-pass rendering includes the generation of the GDM as part of its regular multi-pass rendering process. Decomposed geometric data is sent to assigned or designated GPUs. However, any debuted object is also sent once to all other GPUs so that the object contributes its z value share to the GDM under development within the Z buffer. In each GPU, the GDM is generated as follows: GDM values are generated for each debuted object and transmitted to each GPU, while; (i) for objects sent to non-assigned GPUs indicated by broken-line arrows, their pixels are normally z-tested, their depth values are stored in the z-buffer while their fragments are rendered colorlessly in the color buffer; and (ii) for objects sent to assigned-GPUs indicated by solid-line arrows, their pixels are normally z-tested, their depth values are stored in z-buffer and their fragments are rendered in color. For simplicity of explication, the illustrative embodiment employs only two GPUs, although it is understood, that any number of GPUs can be supported on the parallel graphics processing platform of the present invention.
  • As indicated at Block 422, the second depthless compositing stage involves, after the final pass, recompositing a complete image frame within the primary GPPL, from the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2, using the depthless recomposition process of the present invention. Thereafter, the completely composite image is moved from the primary GPU to the display device for display.
  • Third Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention Depicted in FIG. 4a
  • FIGS. 4D1 and 4D2 illustrate the steps performed during the third illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A, In this illustrative embodiment, the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black), while their contribution to the global depth map (GDM) for the frame, are generated for each debuted object transmitted to each GPPL.
  • As illustrated in FIG. 4C, the current state of an object is kept updated in a “current state” buffer 431. At the debut of an object, this buffer is copied into the Hash Table 432. However, in contrast to the case of Global Depth Map Creation (GDM) Pass algorithm of FIG. 3A2, the state record of each object in the algorithm of FIG. 4C also includes the GPU number/index. The object is designated/assigned to the GPU for processing based on load balancing considerations, for all rendering passes.
  • As indicated at Block 4401, the process begins by scanning the graphics commands in a given frame of a 3D scene to be rendered for display. At Block 4402, determination of the end of the graphics command stream for the frame is monitored. When the end of the command stream is detected, process control moves to Block 422, involving partial image recompositing, in accordance with the principles of the present invention. The monitoring of state commands occurs at Block 4403, and the monitoring of Primitive Draw commands occurs at Block 4405, while current object state updating occurs at Block 4406, and current list updating occurs at Block 4404.
  • As indicated at Block 4406, the Current State Buffer 431 is updated by two classes of commands; State commands and Draw Primitive command. The detection of State command is followed by updating the State Buffer. The Draw Primitive command initiates the process of drawing primitive. First, the primitive must be examined for its debut appearance. This is done by scanning the Hash Table 432 for the current state of a Draw Command (of an object). If the state of the object is found in the hash table, along with the designation of its GPU, this means that this object has appeared before for processing, and has been rendered within the GPUs according to the principles of the present invention. In such a case, the load balance is updated (distinguished from calculation of load balance, which is done only for the debut of an object), and the object is sent its designated GPU for rendering. Otherwise, GPU is selected for load balance considerations, Hash Table is updated, and the object (draw command) is sent to the designated GPU for normal/regular rendering.
  • As indicated at Block 4407, every incoming “Draw Primitive” command for an object is subject to the “first appearance test” which involves the matching of the Current State Buffer 431 to the Hash Table 432, illustrated in FIG. 4C. If a match is found to exist therebetween at Block 4407, then the object is sent to the designated GPU, and load balancing is updated at Block 4408.
  • However, if a match is not found to exist therebetween at Block 4407, then at Block 4424, load balance calculations are performed, and the GPU selected/designated, and at Block 4425, the Hash Table is updated by creating a new entry for the Current State Buffer in the Hash Table. At Block 4426, the Draw Command for the object is sent to the selected designated/assigned GPU, for normal color rendering. At the same time, the Draw Command for the object is simultaneously broadcasted to all other non-designated/assigned GPUs for (i) updating the global map (GDM) values in their Z buffers, and (ii) drawing black pixels of the object's silhouette in the color frame buffers (FBs) of these GPUs, by performing: alpha testing as indicated at Block 4428; required pixel shader modification as indicated at Blocks 4427 and 4429; black rendering in the color buffers of the rest of the GPUs, using the Draw Primitive command as indicated at Block 4430; rendering the object in the rest of the GPUs as indicated at Block 4431; and restoring the pixel shaders in each GPU to their original state as indicated at Block 4432 for normal/regular rendering at Block 4426.
  • At Blocks 4420 through 4423, the recomposition process is described in detail. Notably, the process is identical to the process described above at Blocks 32430 through 32433. Specifically, at Block 4420, involves making hierarchical merges of the partial complementary-type images in all GPU color buffers. For a number of GPUs greater than two, the recomposition process starts from partial merges among GPUs, in a hierarchical way), finalizing by final merge in primary GPU. Specifically, at Block 4421, the final merge of partial complementary images occurs in the color buffer of the primary GPU. Then at Block 4422, transparent objects (e.g. flames) and overlays (e.g. scores in computer games) are rendered in the primary GPU on top of composited color buffer, by the graphics-based application. Finally, at Block 4423, the image is moved out to the display unit.
  • Fourth Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention
  • FIG. 5A describes a fourth illustrative embodiment of the multi-pass method of parallel graphics processing according to the present invention. This illustrative embodiment of the method of the present invention is based on taking advantage of the GDM generated by a graphics application, and is intended to work only in a graphics application that generates a GDM for its own purposes at the beginning of each frame (e.g. for Shadow Volumes based graphics applications originally intended for single GPU-based systems). In each frame of Shadow Volume based graphics applications, the first pass (termed Ambient Light pass) generates a depth map in the Z-buffer for all image fragments that are visible from the view point. All visible fragments in color buffer are homogenously dim colored during this pass. According to the parallelization strategy of present invention, this depth map, which was originally intended for single GPU, is simultaneously generated in all GPUs, and used as a GDM according to the principles of the present invention. Moreover, the homogenously dim color buffer serves to prevent the obstructed object from appearing in the image: the obstructing objects of all GPUs are drawn as a colorless silhouette of the object in the color FB, as described hereinabove. The details regarding the Application Provided GDM algorithm of the present invention are described in the flowchart flowcharts of FIG. 5D FIGS. 5D1 and 5D2.
  • As indicatedat Block 511 in FIG. 5A, during a first special Ambient Light Pass of the multi-pass method, a global depth map (GDM) is generated within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers.
  • As indicated at Block 512 in FIG. 5A, during subsequent passes, complementary-type partial images are generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs).
  • As indicated at Block 513 in FIG. 5A, after the final pass, a complete image frame of the 3D scene is recomposited using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 2D3, 2E1, 2E2 and 3E3.
  • Fourth Illustrative Embodiment of the Method of the Present Invention, Carried Out on a Dual-GPU Embodiment Of The Parallel Graphics Processing System Of The Present Invention
  • FIG. 5B is a schematic representation illustrating the three primary stages of the fourth illustrative embodiment of the method of the present invention, carried out on a dual-GPU embodiment of the parallel graphics processing system of the present invention. As shown, de-compositing of objects and load balancing is controlled by the software based Decomposition module residing in the host. For simplicity illustration, the following embodiment considers a graphics processing platform having only two GPUs. However, it is understood that any number of GPUs may be supported on the platform to carry out the method.
  • As indicated at Block 521 in FIG. 5B, during a first special Ambient Light Pass of the multi-pass method, as part of Shadow Volume algorithm, a global depth map (GDM) is generated within each GPPL by broadcasting all objects to all GPPLs for depth map creation in the Z buffers and colorless image creation within the color buffers. Within each GPU, the objects are rendered, the pixels are z-tested and their depth values are stored into the z-buffer creating the GDM. Color buffers are disabled (or alternatively rendered colorlessly, depending on application).
  • As indicated at Block 522 in FIG. 5B, during subsequent rendering passes, complementary-type partial images are generated in each GPPL using the GDM and the object-division based parallel rendering process according to the present invention (i.e. rendering without color (i.e. in black) the pixels of objects sent to non-assigned GPPLs, and rendering in color the pixels of all objects sent to assigned-GPPLs, indicated in solid-line arrows). During such subsequent rendering passes, the scene is decomposed between GPUs, and each GPU is delivered its assigned objects. The exact decomposition of objects may change from pass to pass, according to dynamic load balance considerations. Each GPU renders its objects into color buffer, while performing z-test against the GDM in Z-buffer.
  • As indicated at Block 523 in FIG. 5B, after the final pass, a complete image frame of the 3D scene is recomposited by merging the complementary-type partial images stored in the color buffers of GPPL1 and GPPL2 using the depthless complementary image recomposition process of the present invention illustrated in FIGS. 3D3, 2E1, 2E2 and 3E3. Thereafter, the complete image in the primary GPU is displayed on the display device.
  • Fourth Illustrative Embodiment of the Method of Parallel Graphics Processing According to the Present Invention Depicted in FIG. 4A
  • FIGS. 5D1 and 5D2 are flowcharts illustrating the steps performed during the fourth illustrative embodiment of the method of parallel graphics processing according to the present invention depicted in FIG. 4A, wherein the pixels of objects assigned to a GPPL are normally rendered in color within the GPPL while pixels of objects not assigned to a GPPL are rendered colorlessly (i.e. in black).
  • As indicated at Block 5401 in FIGS. 5D1 and 5D2, the Ambient Light pass is made in all GPUs, generating a GDM in the z-buffers, and “black” rendering objects in the color buffers.
  • Blocks 5402 through 5417 constitute the light source pass, repeating for all light sources of the scene. For simplicity, only one occluder is considered per each light source. At Block 5402, the number of light sources in the 3D scene is monitored, and when all light sources have been rendered, then process control moves to Block 5420 in Stage 523.
  • As indicated at Block 5403, according to the prior art Shadow Volume algorithm, for the next light source, each occluding object (“occluder”), a shadow volume is calculated.
  • As indicated at Block 5404, front face and back face of the shadow volume are compared to the GDM to generate a shadow volume stencil, which is registered in the stencil buffer.
  • From Blocks 5405 through 5417, all objects of the scene are rendered for shadows in accordance with the stencil.
  • As indicated at Block 5408, the command stream is scanned looking for state commands for updating state buffer, and for draw commands at Block 5409.
  • At Block 5410, upon the occurrence of a draw command of an object, the current state buffer is updated.
  • At Block 5411, the presence of the current state is checked in the Hash Table. If the current state is in not present in the Hash Table, then load balance is calculated and the GPU selected for the draw command.
  • At Block 5415, the Hash Table is updated, and at Block 5416, the Draw Command is sent to the designated GPU for normal rendering. A debut object must be assigned to GPU in accordance with load balance considerations at Block 5314, registered in Hash Table as indicated at Block 5315, and sent for rendering in designated GPU as indicated at Block 5316 and 5317.
  • If at Block 5411 the current state is in the Hash Table (i.e. the object is a repeat object), then the designated GPU is tracked in the Hash Table for its allocated GPU at Block 5412, and at Block 5413, the Draw Command is sent to the designated GPU, and then advances to Block 5417, where the object is rendered in the designated GPU, and then returns to Block 5406.
  • When Block 5402 determines that all light rendering passes are completed, then the hierarchical merge of color buffers in the GPUs is performed at Block 5420 in Stage 523.
  • As indicated at Block 5421, the final/complete image frame is composited in the primary GPU.
  • At Block 5422, overlays and transparent object are rendered in the primary GPU, and at Block 5423, the final image in the primary GPU is displayed on the display device.
  • PC-Based Host Computing System of the Present Invention Embodying an Illustrative Embodiment of the Parallel 3D Graphics Processing System (PGPS) of the Present Invention
  • The parallel 3D graphics processing system and method of the present invention can be practiced in diverse kinds of computing and micro-computing environments in which 3D graphics support is required or desired. Referring to FIGS. 6A through 6C, the parallel graphics processing system (PGPS) of the present invention will now be described in greater detail.
  • In FIG. 6A, there is shown a PC-based host computing system embodying an illustrative embodiment of the parallel 3D graphics processing system (PGPS) platform of the present invention, illustrated throughout FIGS. 2A through 5D. As shown, the PGPS comprises: (i) a Parallel Mode Control Module (PMCM); (ii) a Parallel Processing Subsystem for supporting the parallelization stages of decomposition, distribution and re-composition implemented using a Decomposition Module, a Distribution Module and a Re-Composition Module, respectively; and (ii) a plurality of either GPU and/or CPU based graphics processing pipelines (GPPLs) operated in a parallel manner under the control of the PMCM.
  • As shown, the PMCM further comprises an OS-GPU interface (I/F) and Utilities; Merge Management Module; Distribution Management Module; Distributed Graphics Function Control; and Hub Control, as described in greater detail in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, incorporated herein by reference.
  • As shown, the Decomposition Module further comprises a Load Balance Submodule, and a Division Submodule, whereas the Distribution Module comprises a Distribution Management Submodule and an Interconnect Network.
  • Also, the Rendering Module comprises the plurality of GPPLs, whereas the Re-Composition Module comprises the Pixel Shader, the Shader Program Memory and the Video Memory (e.g. Z Buffer and Color Buffers) within each of the GPPLs cooperating over the Interconnect Network.
  • In FIG. 6B1, a first illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in FIG. 6A. As shown, the GPPL comprises: (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
  • In FIG. 6B2, a second illustrative embodiment of a GPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in FIG. 6A. As shown, the GPPL comprises (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
  • In FIG. 6B3, an illustrative embodiment of a CPU-based graphics processing pipeline (GPPL) is shown for use in the PGPS of the present invention depicted in FIG. 6A. As shown, the GPPL comprises (i) a video memory structure supporting a frame buffer including stencil, depth and color buffers, and (ii) a graphics processing pipeline realized by one cell of a multi-core CPU chip, consisting of 16 in-order SIMD processors, and further including a GPU-specific extension, namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline.
  • In FIG. 6C, the pipelined structure of the parallel graphics processing system (PGPS) of the present invention is shown driving a plurality of GPPLs. As shown, the Decomposition Module supports the scanning of commands, the control of commands, the tracking of objects, the balancing of loads, and the assignment of objects to GPPLs. The Distribution Module supports transmission of graphics data (e.g. FB data, commands, textures, geometric data and other data) in various modes including CPU-to/from-GPU, inter-GPPL, broadcast, hub-to/from-CPU, and hub-to/from-CPU and hub-to/from-GPPL. The Re-composition Module supports the merging of partial image fragments in the Color Buffers of the GPPLs in a variety of ways, in accordance with the principles of the present invention (e.g. merge color frame buffers without z buffers, merge color buffers using stencil assisted processing, and other modes of partial image merging).
  • Using the Parallel Graphics Processing System of the Present Invention to Implement the Various Embodiments of the Method of Parallel Graphics Processing According to the Principles of the Present Invention
  • The parallel graphics processing methods of the present invention, illustrated in FIGS. 2A through 5D can be practiced using diverse types of parallel computing platforms supporting a plurality or clusters of GPPLs, realized in many possible ways. However, for purposes of illustration, the four illustrative embodiments of the parallel graphics processing method of the present invention, illustrated in FIGS. 3A1 through 5D, will now be shown implemented using the architecture provided by the PGPS of the present invention shown in FIG. 6A, in which particular modules (e.g. Decomposition Module, Distribution Module, Rendering Module or Recomposition Module) are used to perform or carry out different stages and/or steps in each such parallel graphics processing method.
  • As shown in the flowcharts of FIGS. 7A1A and 7A1B, which correspond to the flow chart of FIG. 3A4, the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 3140 through 31409 and Blocks 31414 and 31416 in the methods of FIGS. 7A 1A and 7A1B; (ii) the Distribution Module carries out Blocks 31410 through 31411 and Blocks 31417 and 31418 in the methods of FIGS. 7A1A and 7A1B; (iii) the Rendering Module carries out Blocks 31412 and Blocks 31420 through 31421 in the methods of FIGS. 7A1A and 7A1B; and (iv) Recomposition Module carries out Block 31419 in FIGS. 7A1A and 7A1B.
  • In FIG. 7A2, the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • As shown in the flowcharts of FIGS. 7B1A and 7B1B, which correspond to the flow charts of FIGS. 3B4A and 3B4B, the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 32401 through 32409, Blocks 32415 through 32416, Block 32413, and Blocks 32419 through 32424 in the methods of FIGS. 7B1A and 7B1B; (ii) the Distribution Module carries out Blocks 32425 through 32430 in the methods of FIGS. 7B1A and 7B1B; (iii) the Rendering Module carries out Blocks 32410,32411, and Blocks 32432 and 32433 in the methods of FIGS. 7B1A and 7B1B; and (iv) Recomposition Module carries out Block 32431 in FIGS. 7B1A and 7B1B.
  • In FIG. 7B2, the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • As shown in the flowcharts of FIGS. 7C1A and 7C1B, which corresponds to the flow chart of FIG. 3A4, the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 4401 through 4408, Blocks 4420 and 4421, Blocks 4424 and 4425, and Blocks 4427 through 4429 in the methods of FIGS. 7C1A and 7C1B (ii) the Distribution Module carries out Blocks 4409, 4426 and 4430 in the methods of FIGS. 7C1A and 7C1B; (iii) the Rendering Module carries out Blocks 4410, Blocks 4422 and 4423 and Block 4431 in the methods of FIGS. 7C1A and 7C1B; and (iv) Recomposition Module carries out Block 4432 in FIGS. 7C1A and 7C1B.
  • In FIG. 7C2, the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • As shown in the flowcharts of FIGS. 7D1A and 7D1B, which correspond to the flowcharts of FIGS. 5D1 and 5D2, the modules in the system of FIG. 6A perform the following method steps: (i) the Decomposition Module carries out Blocks 5405 through 5412, and Blocks 5414 and 5415 in the methods of FIGS. 7D1A and 7D1B; (ii) the Distribution Module carries out Blocks 5413 and 5416 in the method methods of FIGS. 7D1A and 7D1B; (iii) the Rendering Module carries out Blocks 5401 through 5404, Blocks 5417, and 5422 and 5423 in the methods of FIGS. 7D1A and 7D1B; and (iv) Recomposition Module carries out Block 5420 and 5421 in FIGS. 7D1A and 7D1B.
  • In FIG. 7D2, the Decomposition and Distribution Modules are shown implemented within the host memory space (HMS), whereas the Rendering and Recomposition Modules are implemented by the GPUs.
  • The First Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8A shows a first illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8A, the Parallel Mode Control Module (PMCM) 8201 and the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem resides as a software package in the Host Memory Space (HMS) 8200 of the CPU 8210. Also, the Vendor's GPU drivers 8223 also reside on HMS 8200, along with the Graphics Applications 8221, and the Standard Graphics Library 8222. As shown, the multiple GPUs on external GPU cards are (i) connected to a North bridge circuit on a motherboard, (ii) implement the Rendering and Recomposition Modules, and (iii) driven in a parallelized manner under the control of the PMCM.
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module uses the North bridge circuit to distribute graphic commands and data (GCAD) to the external GPUs. The Rendering Module generates complementary-type partial color images according to the parallel multi-pass graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport (e.g. via an Interconnect Network) to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, complementary-type partial color images are recomposited using the depthless image merging process of the present invention, described in great detail above, so as to generate a complete image frame of the 3D scene for display on the display device, connected to an external graphics card via a PCI-express interface.
  • The Second Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8B shows a second illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8B, the Parallel Mode Control Module (PMCM) 8201 and the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 820 of the CPU. Also, the Vendor's GPU drivers 8223 reside on HMS 8200, along with the Graphics Applications 8221, and the Standard Graphics Library 8222. As shown, the Rendering and Recomposition Modules are realized across multiple GPUs connected to a bridge circuit on a motherboard (and having an internal IGD) and driven in a parallelized manner under the control of the PMCM.
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. Distribution Module uses the North bridge chip to distribute the graphic commands and data (GCAD) to the multiple GPUs located on the external graphics cards. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to one of the external graphics cards or the IGD.
  • The Third Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8C shows a third illustrative embodiment of the PGPS of the present invention embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8C, the Parallel Mode Control Module (PMCM) 8201, the Decomposition Module 8202 and the Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 8200. The Vendor's GPU drivers 8223 also reside on HMS 8200, along with the Graphics Applications 8221, and the Standard Graphics Library 8222. As shown, a single GPU is supported on a CPU/GPU fusion-architecture processor die (alongside the CPU), and one or more GPUs are supported on one or more external graphic cards connected to a bridge circuit, and driven in a parallelized manner under the control of the PMCM. The Rendering and Recomposition Modules are realized across the GPUs on the graphics card(s).
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module uses the memory controller (controlling the HMS) and the interconnect network (e.g. crossbar switch) within the CPU/GPU processor chip to distribute graphic commands and data to the multiple GPUs on the CPU/GPU die chip and on the external graphics cards. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport on the graphics card, as well as memory controller and interconnect (e.g. crossbar switch) within the CPU/GPU processor chip, to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device, connected to the external graphics card via a PCI-express interface connected to the bridge circuit.
  • The Fourth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8D1 shows a fourth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8D1, the Parallelization Mode Control Module (PMCM) 8201, the Decomposition Module 8202 and Distribution Module 8203 of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) 8200. Also, the Vendor's GPU drivers 8223 reside on HMS 8200, along with the Graphics Applications 8221, and the Standard Graphics Library 8222. As shown, a first cluster of the CPU cores on a multi-core CPU chip function as the CPU, while a second cluster of the CPU cores function as a plurality of multi-core graphics pipelines (GPPLs). As shown, the Rendering Module and the Re-composition Module are realized across a plurality of the GPUs on the external graphics cards. Some of the GPPLs implemented by the CPU cores may participate in the implementation of the Rendering and/or Recomposition Modules.
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module uses the bridge circuit and interconnect network within the multi-core CPU chip to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented on the multi-core CPU chip, as well as the GPUs on the external graphics cards. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport as well as the bridge and interconnect network within the multi-core CPU chip to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL (i.e. GPU) via a display interface.
  • The Fifth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8D2 shows a fifth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition. a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8D2, the Parallelization Mode Control Module (PMCM) and the Decomposition and Distribution Modules of the Parallel Graphics Processing Subsystem reside as a software package in the Host Memory Space (HMS) of the CPU on the motherboard. The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, a first cluster of CPU cores on the multi-core CPU chips on externals graphics cards function as GPPLs and implement the Re-composition Module across a plurality of the GPPLs, whereas a second cluster of CPU cores function as GPPLs and implement the Rendering Module.
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the required parallelization mode. The Distribution Module uses the North bridge circuit and interconnect networks within the multi-core CPU chips (on the external cards) to distribute graphic commands and data (GCAD) to the multi-core graphic pipelines implemented thereon. The Rendering Module generates complementary-type partial color images according to a multi-pass parallel graphics processing method of the present invention. The Recomposition Module uses interconnect networks within the multi-core CPU chips to transfer the pixel data of the complementary-type partial images among the GPPLs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPPL, via a display interface.
  • The Sixth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8E shows a sixth illustrative embodiment of the MMPGRS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least three stages, namely, decomposition, distribution and recomposition. a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8E, the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host or CPU Memory Space (HMS). The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, the Decomposition Submodule No. 2 and Distribution Module (including a distribution management submodule and interconnect network) are realized within a single graphics hub device (e.g. chip) that is connected to (i) the bridge circuit on the motherboard, via a PCI-express interface, and (ii) a cluster of external GPUs via the interconnect network within the graphics hub chip. The GPUs are used to implement the Rendering Module and Recomposition Modules and are driven in a parallelized manner under the control of the PMCM.
  • During system operation, the Decomposition Submodule No. 1 transfers graphic commands and data (GCAD) to the Decomposition Submodule No. 2 via the bridge circuit. The Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module distributes graphic commands and data (GCAD) to the external GPUs. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card.
  • The Seventh Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8F shows a seventh illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8F, the Parallel Mode Control Module (PMCM) (including the Distribution Management Submodule) and the Decomposition Module reside as a software package in the Host Memory Space (HMS) of the host computing system. The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, the Distribution Module and its interconnect transport are realized within a single “reduced” graphics hub device (e.g. chip) that is connected to the bridge circuit of the host computing system, and a cluster of external GPUs implementing the Rendering and Recomposition Modules, and are driven in a parallelized manner under the control of the PMCM.
  • During system operation, the Decomposition Module divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Management Module within the PMCM distributes the graphic commands and data (GCAD) to the external GPUs via the bridge circuit and interconnect transport mechanism. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the graphical display card(s).
  • The Eighth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8G shows an eighth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8G. the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS). The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on within a bridge circuit on the motherboard within the host computing system. The Rendering Module and the Recomposition Module are implemented by a plurality of GPUs which are driven in a parallelized under the control of the PMCM.
  • During system operation, the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2. The Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface.
  • The Ninth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8H shows a ninth illustrative embodiment of the PGPS of the present invention, embodied within a host computing system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the computing system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; one or more CPUs, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (ii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8H, the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 reside as a software package in the Host Memory Space (HMS). The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, the Decomposition Submodule No. 2 and the Distribution Module are realized (as a graphics hub) on the processor die of a hybrid CPU/GPU fusion-architecture chip on the motherboard, and having one or more GPUs driven with one or more GPUs on an external graphics card(s) (connected to the CPU/GPU chip via the interconnect) in a parallelized under the control of the PMCM. The GPUs on the external graphics card are used to implement the Rendering and Recomposition Modules. In some embodiments, the GPUs within the hybrid chip may assist in implementing the Rendering and/or Recomposition Modules.
  • During system operation, the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2. The Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module distributes the graphic commands and data (GCAD) to the internal GPU and external GPUs. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU on the external graphics card connected to the hybrid CPU/GPU chip via a PCI-express interface.
  • The Tenth Illustrative Embodiment of the Parallel Graphic Processing System (PGPS) of the Present Invention
  • FIG. 8I shows a tenth illustrative embodiment of the PGPS of the present invention, embodied within a game console system capable of parallelizing the operation multiple graphics processing pipelines (GPPLs). In the general, the game console system comprises: CPU memory space for storing one or more graphics-based applications, and a graphics library for generating a stream of graphics commands and data (GCAD) during the execution of the graphics-based applications; a multi-core CPU chip with multiple CPU-cores, in communication with the memory space, for (i) executing the graphics-based applications, (ii) generating the stream of graphics commands and data, and (iii) segmenting the stream of graphics commands into frames for rendering pixel-based images of a 3D scene generated by the graphics-based application, and wherein objects within the 3D scene are generated by processing the frames of graphics commands and data along the stream; a Parallel Graphics Processing Subsystem (PGPS) supporting an object-division mode of parallel operation including at least four stages, namely, decomposition, distribution, rendering, and recomposition. The Parallel Graphics Processing Subsystem (PGPS) includes a Decomposition Module for supporting the decomposition stage of parallel operation, a Distribution Module for supporting the distribution stage of parallel operation; a Rendering Module for supporting the rendering stage of parallel operation, and a Recomposition Module for supporting the recomposition stage of parallel operation.
  • As shown, the Parallel Graphics Processing Subsystem also includes: (i) a graphics hub with an interconnect network, (ii) a plurality of graphic processing pipelines (GPPLs), including a primary GPPL, wherein each GPPL includes a color frame buffer and Z depth buffer; and (iii) a parallel mode control module (PMCM) for automatically controlling the object-division mode of parallel operation during the run-time of the graphics-based application, during which the GPPLs are driven in a parallelized manner.
  • As shown in FIG. 8I, the Parallel Mode Control Module (PMCM) and the Decomposition Submodule No. 1 are realized as a software package within the Host Memory Space (HMS). The Vendor's GPU drivers also reside on HMS, along with the Graphics Applications, and the Standard Graphics Library. As shown, the Decomposition Submodule No. 2 and the Distribution Module are realized as a graphics hub semiconductor chip within the game console system, whereas the Rendering and Recomposition Modules are implemented by multiple GPPLs supported on the game console board and driven in a parallelized manner under the control of the PMCM.
  • During system operation, the Decomposition Submodule No. 1 transfers graphics commands and data (GCAD) to the Decomposition Submodule No. 2, via the memory controller on the multi-core CPU chip and the interconnect in the graphics hub chip of the present invention. The Decomposition Submodule No. 2 divides (i.e. splits up) the stream of graphic commands and data (GCAD) according to the parallelization mode that is implemented using an embodiment of the parallel multi-pass graphics processing method of the present invention, which may be selected from the group of processes illustrated in FIG. 7A, 7B, 7C or 7D. The Distribution Module distributes the graphic commands and data (GCAD) to the multiple GPUs. The Rendering Module generates complementary-type partial color images according to the multi-pass parallel graphics processing method of the present invention being used, e.g. as illustrated in FIGS. 7A through 7D. The Recomposition Module uses inter-GPU communication transport to transfer the pixel data of the complementary-type partial images among the GPUs during the image recomposition stages. Finally, the complementary-type partial color images are recomposited using the depthless image merging process of the present invention so as to generate a complete image frame of the 3D scene for display on the display device connected to the primary GPU via an analog display interface.
  • ADVANTAGES OF THE PRESENT INVENTION
  • The depthless image recomposition process of the present invention is based on simplicity and low cost of implementation. It also offers a number of advantages over recomposition methods that are associated with “classical modes” of object division, based on depth comparison, which require expensive and high processing requirements, high bandwidth requirements, and additional cost of recompositing hardware.
  • In contrast, the depthless image recomposition process of the present invention does not involve any depth comparison, and merges the partial complementary images in the color buffers using a simple depth-less puzzle-like merging operation.
  • In classical modes of object division, the hidden objects are processed for rendering as if they were visible. This processing redundancy greatly decreases the parallelism efficiency. In the present invention the overdraw effect is completely eliminated by means of the Global Depth Map (GDM) materialized at each GPU.
  • The method of the present invention eliminates obstructed objects in early stages of multi-pass rendering operations. The more passes, the more aggregated savings.
  • In classical modes of object division, the anti-aliasing in a GPU is based on processing the edge pixels against their background, while this background might turn hidden, and be replaced by the background of another GPU in the final image. The result in classical modes of object division is incorrect image in the “stitched” boundaries. In marked contrast, the method of the present invention eliminates the hidden background during rendering process at each GPU, and pixels are always anti-aliased against their final background.
  • While Applicants have disclosed the parallel graphics processing methods in connection with object-based modes of parallel operation, it is understood, however, that the methods of the present invention can be practiced in hybrid environments, in which object-based modes are nested within image-based as in the case of hybrid parallel graphics processing systems. Also, it is understood that these alternative methods can be based on novel ways of dividing and/or quantizing: (i) objects and/or scenery being graphically rendered; (ii) the graphical display screen (on which graphical images of the rendered object/scenery are projected); (iii) temporal aspects of the graphical rendering process; (iv) the illumination sources used during the graphical rendering process using parallel computational operations; as well as (v) various hybrid combinations of these components of the 3D graphical rendering process.
  • While the principles of the present invention have been illustrated in parallel graphics processing platforms supporting a single mode of parallel operation, it is understood that the object-division mode of the present invention can be practiced in multi-mode PGS architectures, as disclosed in U.S. application Ser. No. 11/897,53 filed Aug. 30, 2007, and other system architectures, including hybrid system architectures.
  • It is understood that the parallel graphics processing technology employed in computer graphics systems of the illustrative embodiments may be modified in a variety of ways which will become readily apparent to those skilled in the art of having the benefit of the novel teachings disclosed herein. All such modifications and variations of the illustrative embodiments thereof shall be deemed to be within the scope and spirit of the present invention as defined by the Claims to Invention appended hereto.

Claims (12)

1. A computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes, said computing system comprising:
CPU memory space for storing one or more graphics-based applications and a graphics library for generating graphics commands and data (GCAD) during the run-time of the graphics-based applications;
one or more CPUs for executing said graphics-based applications; and
a parallel graphics processing system (PGPS) having multiple graphics processing pipelines (GPPLs), supporting object division based parallelism among said GPPLs, and performing pixel depth value comparison within each GPPL using a common global depth map (GDM) during pixel rendering processing.
2. The computing system of claim 1, wherein said parallel graphics processing system further includes:
(i) a decomposition module for supporting the decomposition stage of parallel operation;
(ii) a distribution module for supporting the distribution stage of parallel operation;
(iii) a recomposition module for supporting the recomposition stage of parallel operation; and
(iv) a rendering module for supporting the rendering stage of parallel operation.
3. The computing system of claim 2, wherein during operation,
(i) said decomposition module divides the stream of graphic commands and data (GCAD) according to said object-division mode of parallel operation;
(ii) said distribution module distributes graphic commands and data (GCAD) to said GPPLs;
(iii) said rendering module generates complementary-type partial color images according to a parallel multi-pass graphics processing method; and
(iv) said recomposition module uses inter-GPU communication to transfer the pixel data of said complementary-type partial images among said GPPLs, and a depthless image merging process to generate said complete color image of the 3D scene for display on said display device.
4. The computing system of claim 1, wherein each said GPPL is a GPU-based graphics processing pipeline which comprises (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler and a vertex shader, (2) a set up engine, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
5. The computing system of claim 1, wherein each said GPPL is a GPU-based graphics processing pipeline which comprises (i) a video memory structure supporting a frame buffer (FB) including stencil, depth and color buffers, and (ii) a graphics processing unit (GPU) supporting (1) a geometry subsystem having an input assembler, a vertex shader and a geometry shader, (2) a rasterizer, and (3) a pixel subsystem including a pixel shader receiving pixel data from the frame buffer and a raster operators operating on pixel data in the frame buffers.
6. The computing system of claim 1, wherein each said GPPL is a CPU-based graphics processing pipeline which comprises (i) a video memory structure supporting a frame buffer including stencil, depth and color buffers, and (ii) a graphics processing pipeline realized by a cell of a multi-core CPU chip, including a plurality of in-order SIMD processors, and optionally, a GPPL-specific extension, namely, a texture sampler that loads texture maps from memory, filters them for level-of-detail, and feeds to pixel processing portion of the pipeline.
7. The computing system of claim 2, wherein the decomposition module supports the scanning of commands, the control of commands, the tracking of objects, the balancing of loads, and the assignment of objects to said GPPLs,
8. The computing system of claim 2, wherein said distribution module supports transmission of graphics data in various modes including CPU-to/from-GPPL, inter-GPPL, broadcast, hub-to/from-CPU, and hub-to/from-CPU and hub-to/from-GPPL.
9. The computing system of claim 1 wherein said graphics data includes data selected from the group consisting of FB data, commands, textures, geometric data and other data.
10. The computing system of claim 2, wherein said recomposition module supports a variety of modes the merging of partial complementary-type images in the color frame buffers of said GPPLs.
11. The computing system of claim 2, wherein said variety of modes of merging said partial complementary-type images includes: merging color frame buffers without z buffers, and merging color buffers using stencil-assisted processing.
12. The computing system of claim 1, which further comprises a display device for displaying images containing graphics during the execution of said graphics-based applications.
US12/231,295 2003-11-19 2008-08-29 Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes Abandoned US20090128550A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/231,295 US20090128550A1 (en) 2003-11-19 2008-08-29 Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
US52308403P 2003-11-19 2003-11-19
PCT/IL2004/001069 WO2005050557A2 (en) 2003-11-19 2004-11-19 Method and system for multiple 3-d graphic pipeline over a pc bus
US64714605P 2005-01-25 2005-01-25
US75960806P 2006-01-18 2006-01-18
US11/340,402 US7812844B2 (en) 2004-01-28 2006-01-25 PC-based computing system employing a silicon chip having a routing unit and a control unit for parallelizing multiple GPU-driven pipeline cores according to the object division mode of parallel operation during the running of a graphics application
PCT/IB2006/001529 WO2006117683A2 (en) 2005-01-25 2006-01-25 Graphics processing and display system employing multiple graphics cores on a silicon chip of monolithic construction
US11/386,454 US7834880B2 (en) 2004-01-28 2006-03-22 Graphics processing and display system employing multiple graphics cores on a silicon chip of monolithic construction
US11/648,160 US8497865B2 (en) 2006-12-31 2006-12-31 Parallel graphics system employing multiple graphics processing pipelines with multiple graphics processing units (GPUS) and supporting an object division mode of parallel graphics processing using programmable pixel or vertex processing resources provided with the GPUS
PCT/IB2007/003464 WO2008004135A2 (en) 2006-01-18 2007-01-18 Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
US11/655,735 US8085273B2 (en) 2003-11-19 2007-01-18 Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
US57968207A 2007-03-23 2007-03-23
US11/789,039 US20070291040A1 (en) 2005-01-25 2007-04-23 Multi-mode parallel graphics rendering system supporting dynamic profiling of graphics-based applications and automatic control of parallel modes of operation
US11/897,536 US7961194B2 (en) 2003-11-19 2007-08-30 Method of controlling in real time the switching of modes of parallel operation of a multi-mode parallel graphics processing subsystem embodied within a host computing system
PCT/US2007/026466 WO2008082641A2 (en) 2006-12-31 2007-12-28 Multi-mode parallel graphics processing systems and methods
US12/077,072 US20090027383A1 (en) 2003-11-19 2008-03-14 Computing system parallelizing the operation of multiple graphics processing pipelines (GPPLs) and supporting depth-less based image recomposition
US12/231,295 US20090128550A1 (en) 2003-11-19 2008-08-29 Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/077,072 Continuation US20090027383A1 (en) 2003-11-19 2008-03-14 Computing system parallelizing the operation of multiple graphics processing pipelines (GPPLs) and supporting depth-less based image recomposition

Publications (1)

Publication Number Publication Date
US20090128550A1 true US20090128550A1 (en) 2009-05-21

Family

ID=46331857

Family Applications (5)

Application Number Title Priority Date Filing Date
US12/077,072 Abandoned US20090027383A1 (en) 2003-11-19 2008-03-14 Computing system parallelizing the operation of multiple graphics processing pipelines (GPPLs) and supporting depth-less based image recomposition
US12/231,295 Abandoned US20090128550A1 (en) 2003-11-19 2008-08-29 Computing system supporting parallel 3D graphics processes based on the division of objects in 3D scenes
US12/231,304 Active 2025-09-01 US8284207B2 (en) 2003-11-19 2008-08-29 Method of generating digital images of objects in 3D scenes while eliminating object overdrawing within the multiple graphics processing pipeline (GPPLS) of a parallel graphics processing system generating partial color-based complementary-type images along the viewing direction using black pixel rendering and subsequent recompositing operations
US12/231,296 Abandoned US20090179894A1 (en) 2003-11-19 2008-08-29 Computing system capable of parallelizing the operation of multiple graphics processing pipelines (GPPLS)
US13/646,710 Abandoned US20130120410A1 (en) 2003-11-19 2012-10-07 Multi-pass method of generating an image frame of a 3d scene using an object-division based parallel graphics rendering process

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/077,072 Abandoned US20090027383A1 (en) 2003-11-19 2008-03-14 Computing system parallelizing the operation of multiple graphics processing pipelines (GPPLs) and supporting depth-less based image recomposition

Family Applications After (3)

Application Number Title Priority Date Filing Date
US12/231,304 Active 2025-09-01 US8284207B2 (en) 2003-11-19 2008-08-29 Method of generating digital images of objects in 3D scenes while eliminating object overdrawing within the multiple graphics processing pipeline (GPPLS) of a parallel graphics processing system generating partial color-based complementary-type images along the viewing direction using black pixel rendering and subsequent recompositing operations
US12/231,296 Abandoned US20090179894A1 (en) 2003-11-19 2008-08-29 Computing system capable of parallelizing the operation of multiple graphics processing pipelines (GPPLS)
US13/646,710 Abandoned US20130120410A1 (en) 2003-11-19 2012-10-07 Multi-pass method of generating an image frame of a 3d scene using an object-division based parallel graphics rendering process

Country Status (1)

Country Link
US (5) US20090027383A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9153193B2 (en) 2011-09-09 2015-10-06 Microsoft Technology Licensing, Llc Primitive rendering using a single primitive type
CN110428506A (en) * 2019-08-09 2019-11-08 成都景中教育软件有限公司 A kind of dynamic geometry 3-D graphic cutting implementation method based on parameter

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7854550B2 (en) * 2008-01-04 2010-12-21 Aviton Care Limited Intelligent illumination thermometer
EP2098994A1 (en) * 2008-03-04 2009-09-09 Agfa HealthCare NV System for real-time volume rendering on thin clients via a render server
FR2940566B1 (en) * 2008-12-18 2011-03-18 Electricite De France METHOD AND DEVICE FOR SECURE TRANSFER OF DIGITAL DATA
US8379024B2 (en) * 2009-02-18 2013-02-19 Autodesk, Inc. Modular shader architecture and method for computerized image rendering
US8416238B2 (en) * 2009-02-18 2013-04-09 Autodesk, Inc. Modular shader architecture and method for computerized image rendering
US8368694B2 (en) * 2009-06-04 2013-02-05 Autodesk, Inc Efficient rendering of multiple frame buffers with independent ray-tracing parameters
US8810592B2 (en) * 2009-10-09 2014-08-19 Nvidia Corporation Vertex attribute buffer for inline immediate attributes and constants
US9041719B2 (en) * 2009-12-03 2015-05-26 Nvidia Corporation Method and system for transparently directing graphics processing to a graphical processing unit (GPU) of a multi-GPU system
US9053562B1 (en) 2010-06-24 2015-06-09 Gregory S. Rabin Two dimensional to three dimensional moving image converter
JP5835942B2 (en) * 2010-06-25 2015-12-24 キヤノン株式会社 Image processing apparatus, control method thereof, and program
US8614716B2 (en) 2010-10-01 2013-12-24 Apple Inc. Recording a command stream with a rich encoding format for capture and playback of graphics content
US8527239B2 (en) * 2010-10-01 2013-09-03 Apple Inc. Automatic detection of performance bottlenecks in a graphics system
KR20120069364A (en) * 2010-12-20 2012-06-28 삼성전자주식회사 Apparatus and method of processing the frame for considering processing capability and power consumption in multicore environment
US8786619B2 (en) * 2011-02-25 2014-07-22 Adobe Systems Incorporated Parallelized definition and display of content in a scripting environment
JP5155462B2 (en) * 2011-08-17 2013-03-06 株式会社スクウェア・エニックス・ホールディングス VIDEO DISTRIBUTION SERVER, VIDEO REPRODUCTION DEVICE, CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
US20130148947A1 (en) * 2011-12-13 2013-06-13 Ati Technologies Ulc Video player with multiple grpahics processors
US20130155048A1 (en) * 2011-12-14 2013-06-20 Advanced Micro Devices, Inc. Three-dimensional graphics construction and user interface
JP6061262B2 (en) * 2012-02-16 2017-01-18 任天堂株式会社 GAME SYSTEM, GAME CONTROL METHOD, GAME DEVICE, AND GAME PROGRAM
JP5307958B1 (en) 2012-02-23 2013-10-02 株式会社スクウェア・エニックス・ホールディングス VIDEO DISTRIBUTION SERVER, VIDEO REPRODUCTION DEVICE, CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
US10127082B2 (en) * 2012-04-05 2018-11-13 Electronic Arts Inc. Distributed realization of digital content
EP2674916B1 (en) * 2012-04-12 2018-06-27 Square Enix Holdings Co., Ltd. Moving image distribution server, moving image playback device, control method, program, and recording medium
US9873045B2 (en) 2012-05-25 2018-01-23 Electronic Arts, Inc. Systems and methods for a unified game experience
US9736456B1 (en) * 2012-09-28 2017-08-15 Pixelworks, Inc. Two dimensional to three dimensional video conversion
US9741154B2 (en) * 2012-11-21 2017-08-22 Intel Corporation Recording the results of visibility tests at the input geometry object granularity
US9424620B2 (en) * 2012-12-29 2016-08-23 Intel Corporation Identification of GPU phase to determine GPU scalability during runtime
US9992021B1 (en) 2013-03-14 2018-06-05 GoTenna, Inc. System and method for private and point-to-point communication between computing devices
US9086916B2 (en) * 2013-05-15 2015-07-21 Advanced Micro Devices, Inc. Architecture for efficient computation of heterogeneous workloads
US9367347B1 (en) * 2013-06-17 2016-06-14 Marvell International, Ltd. Systems and methods for command execution order control in electronic systems
KR102124395B1 (en) * 2013-08-12 2020-06-18 삼성전자주식회사 Graphics processing apparatus and method thereof
KR102066659B1 (en) * 2013-08-13 2020-01-15 삼성전자 주식회사 A graphic processing unit, a graphic processing system including the same, and a method of operating the same
US20150242988A1 (en) * 2014-02-22 2015-08-27 Nvidia Corporation Methods of eliminating redundant rendering of frames
US9645916B2 (en) 2014-05-30 2017-05-09 Apple Inc. Performance testing for blocks of code
CN104035751B (en) * 2014-06-20 2016-10-12 深圳市腾讯计算机系统有限公司 Data parallel processing method based on multi-graphics processor and device
US10057082B2 (en) * 2014-12-22 2018-08-21 Ebay Inc. Systems and methods for implementing event-flow programs
US10617955B1 (en) * 2015-03-24 2020-04-14 Amazon Technologies, Inc. Testing and delivery of game design assets in a service provider environment
KR102500836B1 (en) * 2016-09-27 2023-02-16 한화테크윈 주식회사 Method and apparatus for processing wide angle image
US10798162B2 (en) * 2017-08-28 2020-10-06 Texas Instruments Incorporated Cluster system with fail-safe fallback mechanism
US10589171B1 (en) 2018-03-23 2020-03-17 Electronic Arts Inc. User interface rendering and post processing during video game streaming
US10537799B1 (en) 2018-03-23 2020-01-21 Electronic Arts Inc. User interface rendering and post processing during video game streaming
US10987579B1 (en) 2018-03-28 2021-04-27 Electronic Arts Inc. 2.5D graphics rendering system
US10922780B2 (en) * 2018-04-10 2021-02-16 Graphisoft Se Method to distribute the drawing calculation of architectural data elements between multiple threads
US10650482B1 (en) * 2018-11-09 2020-05-12 Adobe Inc. Parallel rendering engine
CN109712063B (en) * 2018-12-12 2023-03-14 中国航空工业集团公司西安航空计算技术研究所 Plane clipping circuit of graphic processor
US10918938B2 (en) 2019-03-29 2021-02-16 Electronic Arts Inc. Dynamic streaming video game client
US11514549B2 (en) 2020-02-03 2022-11-29 Sony Interactive Entertainment Inc. System and method for efficient multi-GPU rendering of geometry by generating information in one rendering phase for use in another rendering phase
US11263718B2 (en) * 2020-02-03 2022-03-01 Sony Interactive Entertainment Inc. System and method for efficient multi-GPU rendering of geometry by pretesting against in interleaved screen regions before rendering
US11508110B2 (en) * 2020-02-03 2022-11-22 Sony Interactive Entertainment Inc. System and method for efficient multi-GPU rendering of geometry by performing geometry analysis before rendering
US11170461B2 (en) * 2020-02-03 2021-11-09 Sony Interactive Entertainment Inc. System and method for efficient multi-GPU rendering of geometry by performing geometry analysis while rendering

Citations (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5740464A (en) * 1995-05-15 1998-04-14 Nvidia Corporation Architecture for providing input/output operations in a computer system
US5745762A (en) * 1994-12-15 1998-04-28 International Business Machines Corporation Advanced graphics driver architecture supporting multiple system emulations
US5754866A (en) * 1995-05-08 1998-05-19 Nvidia Corporation Delayed interrupts with a FIFO in an improved input/output architecture
US5758182A (en) * 1995-05-15 1998-05-26 Nvidia Corporation DMA controller translates virtual I/O device address received directly from application program command to physical i/o device address of I/O device on device bus
US5757385A (en) * 1994-07-21 1998-05-26 International Business Machines Corporation Method and apparatus for managing multiprocessor graphical workload distribution
US5909595A (en) * 1995-05-15 1999-06-01 Nvidia Corporation Method of controlling I/O routing by setting connecting context for utilizing I/O processing elements within a computer system to produce multimedia effects
US6169553B1 (en) * 1997-07-02 2001-01-02 Ati Technologies, Inc. Method and apparatus for rendering a three-dimensional scene having shadowing
US6181352B1 (en) * 1999-03-22 2001-01-30 Nvidia Corporation Graphics pipeline selectively providing multiple pixels or multiple textures
US6184908B1 (en) * 1998-04-27 2001-02-06 Ati Technologies, Inc. Method and apparatus for co-processing video graphics data
US6188412B1 (en) * 1998-08-28 2001-02-13 Ati Technologies, Inc. Method and apparatus for performing setup operations in a video graphics system
US6191800B1 (en) * 1998-08-11 2001-02-20 International Business Machines Corporation Dynamic balancing of graphics workloads using a tiling strategy
US6201545B1 (en) * 1997-09-23 2001-03-13 Ati Technologies, Inc. Method and apparatus for generating sub pixel masks in a three dimensional graphic processing system
US6212617B1 (en) * 1998-05-13 2001-04-03 Microsoft Corporation Parallel processing method and system using a lazy parallel data type to reduce inter-processor communication
US6212261B1 (en) * 1996-08-14 2001-04-03 Nortel Networks Limited Internet-based telephone call manager
US6337686B2 (en) * 1998-01-07 2002-01-08 Ati Technologies Inc. Method and apparatus for line anti-aliasing
US20020015055A1 (en) * 2000-07-18 2002-02-07 Silicon Graphics, Inc. Method and system for presenting three-dimensional computer graphics images using multiple graphics processing units
US6352479B1 (en) * 1999-08-31 2002-03-05 Nvidia U.S. Investment Company Interactive gaming server and online community forum
US6362825B1 (en) * 1999-01-19 2002-03-26 Hewlett-Packard Company Real-time combination of adjacent identical primitive data sets in a graphics call sequence
US20020059302A1 (en) * 2000-10-10 2002-05-16 Hitoshi Ebihara Data communication system and method, computer program, and recording medium
US20030020720A1 (en) * 1999-12-06 2003-01-30 Nvidia Corporation Method, apparatus and article of manufacture for a sequencer in a transform/lighting module capable of processing multiple independent execution threads
US20030034975A1 (en) * 1999-12-06 2003-02-20 Nvidia Corporation Lighting system and method for a graphics processor
US6529198B1 (en) * 1999-03-16 2003-03-04 Nec Corporation Parallel rendering device
US6532525B1 (en) * 2000-09-29 2003-03-11 Ati Technologies, Inc. Method and apparatus for accessing memory
US6532013B1 (en) * 2000-05-31 2003-03-11 Nvidia Corporation System, method and article of manufacture for pixel shaders for programmable shading
US6535209B1 (en) * 1999-03-17 2003-03-18 Nvidia Us Investments Co. Data stream splitting and storage in graphics data processing
US6542971B1 (en) * 2001-04-23 2003-04-01 Nvidia Corporation Memory access system and method employing an auxiliary buffer
US6557065B1 (en) * 1999-12-20 2003-04-29 Intel Corporation CPU expandability bus
US20030080959A1 (en) * 2001-10-29 2003-05-01 Ati Technologies, Inc. System, Method, and apparatus for early culling
US20030103054A1 (en) * 1999-12-06 2003-06-05 Nvidia Corporation Integrated graphics processing unit with antialiasing
US6578068B1 (en) * 1999-08-31 2003-06-10 Accenture Llp Load balancer in environment services patterns
US6577309B2 (en) * 1999-12-06 2003-06-10 Nvidia Corporation System and method for a graphics processing framework embodied utilizing a single semiconductor platform
US6577320B1 (en) * 1999-03-22 2003-06-10 Nvidia Corporation Method and apparatus for processing multiple types of pixel component representations including processes of premultiplication, postmultiplication, and colorkeying/chromakeying
US20030112246A1 (en) * 1999-12-06 2003-06-19 Nvidia Corporation Blending system and method in an integrated computer graphics pipeline
US20030117971A1 (en) * 2001-12-21 2003-06-26 Celoxica Ltd. System, method, and article of manufacture for profiling an executable hardware model using calls to profiling functions
US6677953B1 (en) * 2001-11-08 2004-01-13 Nvidia Corporation Hardware viewport system and method for use in a graphics pipeline
US20040012600A1 (en) * 2002-03-22 2004-01-22 Deering Michael F. Scalable high performance 3d graphics
US6683614B2 (en) * 2001-12-21 2004-01-27 Hewlett-Packard Development Company, L.P. System and method for automatically configuring graphics pipelines by tracking a region of interest in a computer graphical display system
US6691180B2 (en) * 1998-04-17 2004-02-10 Nvidia Corporation Apparatus for accelerating the rendering of images
US6690372B2 (en) * 2000-05-31 2004-02-10 Nvidia Corporation System, method and article of manufacture for shadow mapping
US20040036159A1 (en) * 2002-08-23 2004-02-26 Ati Technologies, Inc. Integrated circuit having memory disposed thereon and method of making thereof
US6700583B2 (en) * 2001-05-14 2004-03-02 Ati Technologies, Inc. Configurable buffer for multipass applications
US6704025B1 (en) * 2001-08-31 2004-03-09 Nvidia Corporation System and method for dual-depth shadow-mapping
US6725457B1 (en) * 2000-05-17 2004-04-20 Nvidia Corporation Semaphore enhancement to improve system performance
US6724394B1 (en) * 2000-05-31 2004-04-20 Nvidia Corporation Programmable pixel shading architecture
US6728820B1 (en) * 2000-05-26 2004-04-27 Ati International Srl Method of configuring, controlling, and accessing a bridge and apparatus therefor
US6731298B1 (en) * 2000-10-02 2004-05-04 Nvidia Corporation System, method and article of manufacture for z-texture mapping
US6734861B1 (en) * 2000-05-31 2004-05-11 Nvidia Corporation System, method and article of manufacture for an interlock module in a computer graphics processing pipeline
US6741243B2 (en) * 2000-05-01 2004-05-25 Broadcom Corporation Method and system for reducing overflows in a computer graphics system
US6744433B1 (en) * 2001-08-31 2004-06-01 Nvidia Corporation System and method for using and collecting information from a plurality of depth layers
US6753878B1 (en) * 1999-03-08 2004-06-22 Hewlett-Packard Development Company, L.P. Parallel pipelined merge engines
US6842180B1 (en) * 2000-09-20 2005-01-11 Intel Corporation Opportunistic sharing of graphics resources to enhance CPU performance in an integrated microprocessor
US6844879B2 (en) * 2001-07-19 2005-01-18 Nec Corporation Drawing apparatus
US6856320B1 (en) * 1997-11-25 2005-02-15 Nvidia U.S. Investment Company Demand-based memory system for graphics applications
US20050041031A1 (en) * 2003-08-18 2005-02-24 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US6864984B2 (en) * 2000-03-16 2005-03-08 Fuji Photo Film Co., Ltd. Measuring method and apparatus using attenuation in total reflection
US6864893B2 (en) * 2002-07-19 2005-03-08 Nvidia Corporation Method and apparatus for modifying depth values using pixel programs
US6870540B1 (en) * 1999-12-06 2005-03-22 Nvidia Corporation System, method and computer program product for a programmable pixel processing model with instruction set
US6876362B1 (en) * 2002-07-10 2005-04-05 Nvidia Corporation Omnidirectional shadow texture mapping
US20050081115A1 (en) * 2003-09-26 2005-04-14 Ati Technologies, Inc. Method and apparatus for monitoring and resetting a co-processor
US6885376B2 (en) * 2002-12-30 2005-04-26 Silicon Graphics, Inc. System, method, and computer program product for near-real time load balancing across multiple rendering pipelines
US6894689B1 (en) * 1998-07-22 2005-05-17 Nvidia Corporation Occlusion culling method and apparatus for graphics systems
US6894687B1 (en) * 2001-06-08 2005-05-17 Nvidia Corporation System, method and computer program product for vertex attribute aliasing in a graphics pipeline
US6900810B1 (en) * 2003-04-10 2005-05-31 Nvidia Corporation User programmable geometry engine
US20050122330A1 (en) * 2003-11-14 2005-06-09 Microsoft Corporation Systems and methods for downloading algorithmic elements to a coprocessor and corresponding techniques
US6982718B2 (en) * 2001-06-08 2006-01-03 Nvidia Corporation System, method and computer program product for programmable fragment processing in a graphics pipeline
US20060005178A1 (en) * 2004-07-02 2006-01-05 Nvidia Corporation Optimized chaining of vertex and fragment programs
US6985152B2 (en) * 2004-04-23 2006-01-10 Nvidia Corporation Point-to-point bus bridging without a bridge controller
US6989840B1 (en) * 2001-08-31 2006-01-24 Nvidia Corporation Order-independent transparency rendering system and method
US6995767B1 (en) * 2003-07-31 2006-02-07 Nvidia Corporation Trilinear optimization for texture filtering
US7002588B1 (en) * 1999-12-06 2006-02-21 Nvidia Corporation System, method and computer program product for branching during programmable vertex processing
US20060055695A1 (en) * 2004-09-13 2006-03-16 Nvidia Corporation Increased scalability in the fragment shading pipeline
US20060059494A1 (en) * 2004-09-16 2006-03-16 Nvidia Corporation Load balancing
US7015915B1 (en) * 2003-08-12 2006-03-21 Nvidia Corporation Programming multiple chips from a command buffer
US7023437B1 (en) * 1998-07-22 2006-04-04 Nvidia Corporation System and method for accelerating graphics processing using a post-geometry data stream during multiple-pass rendering
US7027972B1 (en) * 2001-01-24 2006-04-11 Ati Technologies, Inc. System for collecting and analyzing graphics data and method thereof
US7038678B2 (en) * 2003-05-21 2006-05-02 Nvidia Corporation Dependent texture shadow antialiasing
US7038685B1 (en) * 2003-06-30 2006-05-02 Nvidia Corporation Programmable graphics processor for multithreaded execution of programs
US7038692B1 (en) * 1998-04-07 2006-05-02 Nvidia Corporation Method and apparatus for providing a vertex cache
US20060101218A1 (en) * 2004-11-11 2006-05-11 Nvidia Corporation Memory controller-adaptive 1T/2T timing control
US7053901B2 (en) * 2003-12-11 2006-05-30 Nvidia Corporation System and method for accelerating a special purpose processor
US20060119607A1 (en) * 2004-02-27 2006-06-08 Nvidia Corporation Register based queuing for texture requests
US7224359B1 (en) * 2002-02-01 2007-05-29 Nvidia Corporation Depth clamping system and method in a hardware graphics pipeline
US20080007559A1 (en) * 2006-06-30 2008-01-10 Nokia Corporation Apparatus, method and a computer program product for providing a unified graphics pipeline for stereoscopic rendering
US7325086B2 (en) * 2005-12-15 2008-01-29 Via Technologies, Inc. Method and system for multiple GPU support
US7324547B1 (en) * 2002-12-13 2008-01-29 Nvidia Corporation Internet protocol (IP) router residing in a processor chipset
US7324111B2 (en) * 2004-04-09 2008-01-29 Nvidia Corporation Method and apparatus for routing graphics processing signals to a stand-alone module
US7372465B1 (en) * 2004-12-17 2008-05-13 Nvidia Corporation Scalable graphics processing for remote display
US7477256B1 (en) * 2004-11-17 2009-01-13 Nvidia Corporation Connecting graphics adapters for scalable performance
US7525547B1 (en) * 2003-08-12 2009-04-28 Nvidia Corporation Programming multiple chips from a command buffer to process multiple images

Family Cites Families (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2073516A1 (en) 1991-11-27 1993-05-28 Peter Michael Kogge Dynamic multi-mode parallel processor array architecture computer system
JP3199205B2 (en) 1993-11-19 2001-08-13 株式会社日立製作所 Parallel processing unit
US5687357A (en) 1995-04-14 1997-11-11 Nvidia Corporation Register array for utilizing burst mode transfer on local bus
US5794016A (en) 1995-12-11 1998-08-11 Dynamic Pictures, Inc. Parallel-processor graphics architecture
KR100269106B1 (en) 1996-03-21 2000-11-01 윤종용 Multiprocessor graphics system
US6118462A (en) 1997-07-01 2000-09-12 Memtrax Llc Computer system controller having internal memory and external memory control
US6496187B1 (en) 1998-02-17 2002-12-17 Sun Microsystems, Inc. Graphics system configured to perform parallel sample to pixel calculation
US6473089B1 (en) * 1998-03-02 2002-10-29 Ati Technologies, Inc. Method and apparatus for a video graphics circuit having parallel pixel processing
US6259460B1 (en) 1998-03-26 2001-07-10 Silicon Graphics, Inc. Method for efficient handling of texture cache misses by recirculation
US6477687B1 (en) 1998-06-01 2002-11-05 Nvidia U.S. Investment Company Method of embedding RAMS and other macrocells in the core of an integrated circuit chip
US6636215B1 (en) 1998-07-22 2003-10-21 Nvidia Corporation Hardware-assisted z-pyramid creation for host-based occlusion culling
US6415345B1 (en) 1998-08-03 2002-07-02 Ati Technologies Bus mastering interface control system for transferring multistream data over a host bus
US6476807B1 (en) * 1998-08-20 2002-11-05 Apple Computer, Inc. Method and apparatus for performing conservative hidden surface removal in a graphics processor with deferred shading
US6492987B1 (en) 1998-08-27 2002-12-10 Ati Technologies, Inc. Method and apparatus for processing object elements that are being rendered
US6292200B1 (en) 1998-10-23 2001-09-18 Silicon Graphics, Inc. Apparatus and method for utilizing multiple rendering pipes for a single 3-D display
US6288418B1 (en) 1999-03-19 2001-09-11 Nvidia Corporation Multiuse input/output connector arrangement for graphics accelerator integrated circuit
US6333744B1 (en) 1999-03-22 2001-12-25 Nvidia Corporation Graphics pipeline including combiner stages
DE19917092A1 (en) 1999-04-15 2000-10-26 Sp3D Chip Design Gmbh Accelerated method for grid forming of graphic basic element in order beginning with graphic base element instruction data to produce pixel data for graphic base element
US6442656B1 (en) 1999-08-18 2002-08-27 Ati Technologies Srl Method and apparatus for interfacing memory with a bus
US6657635B1 (en) 1999-09-03 2003-12-02 Nvidia Corporation Binning flush in graphics data processing
US7050055B2 (en) 1999-12-06 2006-05-23 Nvidia Corporation Single semiconductor graphics platform with blending and fog capabilities
US6473086B1 (en) 1999-12-09 2002-10-29 Ati International Srl Method and apparatus for graphics processing using parallel graphics processors
US6760031B1 (en) 1999-12-31 2004-07-06 Intel Corporation Upgrading an integrated graphics subsystem
US6975319B1 (en) * 2000-03-24 2005-12-13 Nvidia Corporation System, method and article of manufacture for calculating a level of detail (LOD) during computer graphics processing
US6831652B1 (en) * 2000-03-24 2004-12-14 Ati International, Srl Method and system for storing graphics data
US6633296B1 (en) 2000-05-26 2003-10-14 Ati International Srl Apparatus for providing data to a plurality of graphics processors and method thereof
US6789154B1 (en) 2000-05-26 2004-09-07 Ati International, Srl Apparatus and method for transmitting data
US6662257B1 (en) 2000-05-26 2003-12-09 Ati International Srl Multiple device bridge apparatus and method thereof
US6670958B1 (en) 2000-05-26 2003-12-30 Ati International, Srl Method and apparatus for routing data to multiple graphics devices
US6664963B1 (en) 2000-05-31 2003-12-16 Nvidia Corporation System, method and computer program product for programmable shading using pixel shaders
US6593923B1 (en) 2000-05-31 2003-07-15 Nvidia Corporation System, method and article of manufacture for shadow mapping
US6801202B2 (en) 2000-06-29 2004-10-05 Sun Microsystems, Inc. Graphics system configured to parallel-process graphics data using multiple pipelines
US6959110B1 (en) * 2000-08-17 2005-10-25 Nvidia Corporation Multi-mode texture compression algorithm
US7116331B1 (en) 2000-08-23 2006-10-03 Intel Corporation Memory controller hub interface
US6502173B1 (en) 2000-09-29 2002-12-31 Ati Technologies, Inc. System for accessing memory and method therefore
US6828980B1 (en) * 2000-10-02 2004-12-07 Nvidia Corporation System, method and computer program product for z-texture mapping
US6961057B1 (en) * 2000-10-12 2005-11-01 Nvidia Corporation Method and apparatus for managing and accessing depth data in a computer graphics system
US6362997B1 (en) 2000-10-16 2002-03-26 Nvidia Memory system for use on a circuit board in which the number of loads are minimized
US6636212B1 (en) 2000-11-14 2003-10-21 Nvidia Corporation Method and apparatus for determining visibility of groups of pixels
US6778181B1 (en) 2000-12-07 2004-08-17 Nvidia Corporation Graphics processing system having a virtual texturing array
US7358974B2 (en) 2001-01-29 2008-04-15 Silicon Graphics, Inc. Method and system for minimizing an amount of data needed to test data against subarea boundaries in spatially composited digital video
US6888580B2 (en) 2001-02-27 2005-05-03 Ati Technologies Inc. Integrated single and dual television tuner having improved fine tuning
US7130316B2 (en) * 2001-04-11 2006-10-31 Ati Technologies, Inc. System for frame based audio synchronization and method thereof
US6664960B2 (en) 2001-05-10 2003-12-16 Ati Technologies Inc. Apparatus for processing non-planar video graphics primitives and associated method of operation
WO2002101497A2 (en) 2001-06-08 2002-12-19 Nvidia Corporation System, method and computer program product for programmable fragment processing in a graphics pipeline
US6828987B2 (en) * 2001-08-07 2004-12-07 Ati Technologies, Inc. Method and apparatus for processing video and graphics data
US6778189B1 (en) 2001-08-24 2004-08-17 Nvidia Corporation Two-sided stencil testing system and method
US6947047B1 (en) * 2001-09-20 2005-09-20 Nvidia Corporation Method and system for programmable pipelined graphics processing with branching instructions
US6938176B1 (en) * 2001-10-05 2005-08-30 Nvidia Corporation Method and apparatus for power management of graphics processors and subsystems that allow the subsystems to respond to accesses when subsystems are idle
US7091971B2 (en) * 2001-10-29 2006-08-15 Ati Technologies, Inc. System, method, and apparatus for multi-level hierarchical Z buffering
US7012610B2 (en) 2002-01-04 2006-03-14 Ati Technologies, Inc. Portable device for providing dual display and method thereof
US6829689B1 (en) * 2002-02-12 2004-12-07 Nvidia Corporation Method and system for memory access arbitration for minimizing read/write turnaround penalties
US6947865B1 (en) * 2002-02-15 2005-09-20 Nvidia Corporation Method and system for dynamic power supply voltage adjustment for a semiconductor integrated circuit device
US6933943B2 (en) 2002-02-27 2005-08-23 Hewlett-Packard Development Company, L.P. Distributed resource architecture and system
US6700580B2 (en) 2002-03-01 2004-03-02 Hewlett-Packard Development Company, L.P. System and method utilizing multiple pipelines to render graphical data
US6853380B2 (en) 2002-03-04 2005-02-08 Hewlett-Packard Development Company, L.P. Graphical display system and method
US20030171907A1 (en) 2002-03-06 2003-09-11 Shay Gal-On Methods and Apparatus for Optimizing Applications on Configurable Processors
US7009605B2 (en) 2002-03-20 2006-03-07 Nvidia Corporation System, method and computer program product for generating a shader program
US20030212735A1 (en) 2002-05-13 2003-11-13 Nvidia Corporation Method and apparatus for providing an integrated network of processors
US20040153778A1 (en) 2002-06-12 2004-08-05 Ati Technologies, Inc. Method, system and software for configuring a graphics processing communication mode
US6980209B1 (en) * 2002-06-14 2005-12-27 Nvidia Corporation Method and system for scalable, dataflow-based, programmable processing of graphics data
US6812927B1 (en) * 2002-06-18 2004-11-02 Nvidia Corporation System and method for avoiding depth clears using a stencil buffer
US6797998B2 (en) * 2002-07-16 2004-09-28 Nvidia Corporation Multi-configuration GPU interface device
US6954204B2 (en) * 2002-07-18 2005-10-11 Nvidia Corporation Programmable graphics system and method using flexible, high-precision data formats
US6825843B2 (en) * 2002-07-18 2004-11-30 Nvidia Corporation Method and apparatus for loop and branch instructions in a programmable graphics pipeline
US6952206B1 (en) * 2002-08-12 2005-10-04 Nvidia Corporation Graphics application program interface system and method for accelerating graphics processing
US6779069B1 (en) 2002-09-04 2004-08-17 Nvidia Corporation Computer system with source-synchronous digital link
CA2514296A1 (en) 2003-01-28 2004-08-19 Lucid Information Technology Ltd. Method and system for compositing three-dimensional graphics images using associative decision mechanism
US7145565B2 (en) * 2003-02-27 2006-12-05 Nvidia Corporation Depth bounds testing
US6911983B2 (en) 2003-03-12 2005-06-28 Nvidia Corporation Double-buffering of pixel data using copy-on-write semantics
US7129909B1 (en) * 2003-04-09 2006-10-31 Nvidia Corporation Method and system using compressed display mode list
US6940515B1 (en) * 2003-04-10 2005-09-06 Nvidia Corporation User programmable primitive engine
US7483031B2 (en) 2003-04-17 2009-01-27 Nvidia Corporation Method for synchronizing graphics processing units
US7120816B2 (en) * 2003-04-17 2006-10-10 Nvidia Corporation Method for testing synchronization and connection status of a graphics processing unit module
US7068278B1 (en) * 2003-04-17 2006-06-27 Nvidia Corporation Synchronized graphics processing units
US7119808B2 (en) * 2003-07-15 2006-10-10 Alienware Labs Corp. Multiple parallel processor computer graphics system
US6956579B1 (en) 2003-08-18 2005-10-18 Nvidia Corporation Private addressing in a multi-processor graphics processing system
US7388581B1 (en) 2003-08-28 2008-06-17 Nvidia Corporation Asynchronous conditional graphics rendering
US7015914B1 (en) * 2003-12-10 2006-03-21 Nvidia Corporation Multiple data buffers for processing graphics data
US7248261B1 (en) * 2003-12-15 2007-07-24 Nvidia Corporation Method and apparatus to accelerate rendering of shadow effects for computer-generated images
JP3879002B2 (en) 2003-12-26 2007-02-07 国立大学法人宇都宮大学 Self-optimizing arithmetic unit
US6975325B2 (en) 2004-01-23 2005-12-13 Ati Technologies Inc. Method and apparatus for graphics processing using state and shader management
US7259606B2 (en) 2004-01-27 2007-08-21 Nvidia Corporation Data sampling clock edge placement training for high speed GPU-memory interface
US7483034B2 (en) 2004-02-25 2009-01-27 Siemens Medical Solutions Usa, Inc. System and method for GPU-based 3D nonrigid registration
US7289125B2 (en) * 2004-02-27 2007-10-30 Nvidia Corporation Graphics device clustering with PCI-express
US20050275760A1 (en) 2004-03-02 2005-12-15 Nvidia Corporation Modifying a rasterized surface, such as by trimming
US7978194B2 (en) 2004-03-02 2011-07-12 Ati Technologies Ulc Method and apparatus for hierarchical Z buffering and stenciling
US20050195186A1 (en) 2004-03-02 2005-09-08 Ati Technologies Inc. Method and apparatus for object based visibility culling
US7315912B2 (en) 2004-04-01 2008-01-01 Nvidia Corporation Deadlock avoidance in a bus fabric
US7336284B2 (en) 2004-04-08 2008-02-26 Ati Technologies Inc. Two level cache memory architecture
US20050237329A1 (en) 2004-04-27 2005-10-27 Nvidia Corporation GPU rendering to system memory
US7738045B2 (en) 2004-05-03 2010-06-15 Broadcom Corporation Film-mode (3:2/2:2 Pulldown) detector, method and video device
US7079156B1 (en) 2004-05-14 2006-07-18 Nvidia Corporation Method and system for implementing multiple high precision and low precision interpolators for a graphics pipeline
US8066515B2 (en) 2004-11-17 2011-11-29 Nvidia Corporation Multiple graphics adapter connection systems
US7598958B1 (en) 2004-11-17 2009-10-06 Nvidia Corporation Multi-chip graphics processing unit apparatus, system, and method
US7275123B2 (en) 2004-12-06 2007-09-25 Nvidia Corporation Method and apparatus for providing peer-to-peer data transfer within a computing environment
US7451259B2 (en) 2004-12-06 2008-11-11 Nvidia Corporation Method and apparatus for providing peer-to-peer data transfer within a computing environment
US20060156399A1 (en) 2004-12-30 2006-07-13 Parmar Pankaj N System and method for implementing network security using a sequestered partition
US7924281B2 (en) 2005-03-09 2011-04-12 Ati Technologies Ulc System and method for determining illumination of a pixel by shadow planes
US7796095B2 (en) 2005-03-18 2010-09-14 Ati Technologies Ulc Display specific image processing in an integrated circuit
US7568056B2 (en) 2005-03-28 2009-07-28 Nvidia Corporation Host bus adapter that interfaces with host computer bus to multiple types of storage devices
US7681187B2 (en) 2005-03-31 2010-03-16 Nvidia Corporation Method and apparatus for register allocation in presence of hardware constraints
US20080143731A1 (en) 2005-05-24 2008-06-19 Jeffrey Cheng Video rendering across a high speed peripheral interconnect bus
US7817155B2 (en) 2005-05-24 2010-10-19 Ati Technologies Inc. Master/slave graphics adapter arrangement
US7539801B2 (en) 2005-05-27 2009-05-26 Ati Technologies Ulc Computing device with flexibly configurable expansion slots, and method of operation
US20060282604A1 (en) 2005-05-27 2006-12-14 Ati Technologies, Inc. Methods and apparatus for processing graphics data using multiple processing circuits
US7728841B1 (en) 2005-12-19 2010-06-01 Nvidia Corporation Coherent shader output for multiple targets
US7768517B2 (en) 2006-02-21 2010-08-03 Nvidia Corporation Asymmetric multi-GPU processing
US7872648B2 (en) * 2007-06-14 2011-01-18 Microsoft Corporation Random-access vector graphics

Patent Citations (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5757385A (en) * 1994-07-21 1998-05-26 International Business Machines Corporation Method and apparatus for managing multiprocessor graphical workload distribution
US5745762A (en) * 1994-12-15 1998-04-28 International Business Machines Corporation Advanced graphics driver architecture supporting multiple system emulations
US5754866A (en) * 1995-05-08 1998-05-19 Nvidia Corporation Delayed interrupts with a FIFO in an improved input/output architecture
US5758182A (en) * 1995-05-15 1998-05-26 Nvidia Corporation DMA controller translates virtual I/O device address received directly from application program command to physical i/o device address of I/O device on device bus
US5909595A (en) * 1995-05-15 1999-06-01 Nvidia Corporation Method of controlling I/O routing by setting connecting context for utilizing I/O processing elements within a computer system to produce multimedia effects
US5740464A (en) * 1995-05-15 1998-04-14 Nvidia Corporation Architecture for providing input/output operations in a computer system
US6212261B1 (en) * 1996-08-14 2001-04-03 Nortel Networks Limited Internet-based telephone call manager
US6169553B1 (en) * 1997-07-02 2001-01-02 Ati Technologies, Inc. Method and apparatus for rendering a three-dimensional scene having shadowing
US6201545B1 (en) * 1997-09-23 2001-03-13 Ati Technologies, Inc. Method and apparatus for generating sub pixel masks in a three dimensional graphic processing system
US7170515B1 (en) * 1997-11-25 2007-01-30 Nvidia Corporation Rendering pipeline
US6856320B1 (en) * 1997-11-25 2005-02-15 Nvidia U.S. Investment Company Demand-based memory system for graphics applications
US6337686B2 (en) * 1998-01-07 2002-01-08 Ati Technologies Inc. Method and apparatus for line anti-aliasing
US7038692B1 (en) * 1998-04-07 2006-05-02 Nvidia Corporation Method and apparatus for providing a vertex cache
US6691180B2 (en) * 1998-04-17 2004-02-10 Nvidia Corporation Apparatus for accelerating the rendering of images
US6184908B1 (en) * 1998-04-27 2001-02-06 Ati Technologies, Inc. Method and apparatus for co-processing video graphics data
US6212617B1 (en) * 1998-05-13 2001-04-03 Microsoft Corporation Parallel processing method and system using a lazy parallel data type to reduce inter-processor communication
US7170513B1 (en) * 1998-07-22 2007-01-30 Nvidia Corporation System and method for display list occlusion branching
US6894689B1 (en) * 1998-07-22 2005-05-17 Nvidia Corporation Occlusion culling method and apparatus for graphics systems
US7023437B1 (en) * 1998-07-22 2006-04-04 Nvidia Corporation System and method for accelerating graphics processing using a post-geometry data stream during multiple-pass rendering
US6191800B1 (en) * 1998-08-11 2001-02-20 International Business Machines Corporation Dynamic balancing of graphics workloads using a tiling strategy
US6188412B1 (en) * 1998-08-28 2001-02-13 Ati Technologies, Inc. Method and apparatus for performing setup operations in a video graphics system
US6362825B1 (en) * 1999-01-19 2002-03-26 Hewlett-Packard Company Real-time combination of adjacent identical primitive data sets in a graphics call sequence
US6753878B1 (en) * 1999-03-08 2004-06-22 Hewlett-Packard Development Company, L.P. Parallel pipelined merge engines
US6529198B1 (en) * 1999-03-16 2003-03-04 Nec Corporation Parallel rendering device
US6535209B1 (en) * 1999-03-17 2003-03-18 Nvidia Us Investments Co. Data stream splitting and storage in graphics data processing
US6181352B1 (en) * 1999-03-22 2001-01-30 Nvidia Corporation Graphics pipeline selectively providing multiple pixels or multiple textures
US6577320B1 (en) * 1999-03-22 2003-06-10 Nvidia Corporation Method and apparatus for processing multiple types of pixel component representations including processes of premultiplication, postmultiplication, and colorkeying/chromakeying
US6578068B1 (en) * 1999-08-31 2003-06-10 Accenture Llp Load balancer in environment services patterns
US6352479B1 (en) * 1999-08-31 2002-03-05 Nvidia U.S. Investment Company Interactive gaming server and online community forum
US20030112245A1 (en) * 1999-12-06 2003-06-19 Nvidia Corporation Single semiconductor graphics platform
US6734874B2 (en) * 1999-12-06 2004-05-11 Nvidia Corporation Graphics processing unit with transform module capable of handling scalars and vectors
US6577309B2 (en) * 1999-12-06 2003-06-10 Nvidia Corporation System and method for a graphics processing framework embodied utilizing a single semiconductor platform
US7002588B1 (en) * 1999-12-06 2006-02-21 Nvidia Corporation System, method and computer program product for branching during programmable vertex processing
US20030112246A1 (en) * 1999-12-06 2003-06-19 Nvidia Corporation Blending system and method in an integrated computer graphics pipeline
US6992667B2 (en) * 1999-12-06 2006-01-31 Nvidia Corporation Single semiconductor graphics platform system and method with skinning, swizzling and masking capabilities
US20030038808A1 (en) * 1999-12-06 2003-02-27 Nvidia Corporation Method, apparatus and article of manufacture for a sequencer in a transform/lighting module capable of processing multiple independent execution threads
US6870540B1 (en) * 1999-12-06 2005-03-22 Nvidia Corporation System, method and computer program product for a programmable pixel processing model with instruction set
US20030103054A1 (en) * 1999-12-06 2003-06-05 Nvidia Corporation Integrated graphics processing unit with antialiasing
US20030034975A1 (en) * 1999-12-06 2003-02-20 Nvidia Corporation Lighting system and method for a graphics processor
US20030020720A1 (en) * 1999-12-06 2003-01-30 Nvidia Corporation Method, apparatus and article of manufacture for a sequencer in a transform/lighting module capable of processing multiple independent execution threads
US7051139B2 (en) * 1999-12-20 2006-05-23 Intel Corporation CPU expandability bus
US6557065B1 (en) * 1999-12-20 2003-04-29 Intel Corporation CPU expandability bus
US6864984B2 (en) * 2000-03-16 2005-03-08 Fuji Photo Film Co., Ltd. Measuring method and apparatus using attenuation in total reflection
US6741243B2 (en) * 2000-05-01 2004-05-25 Broadcom Corporation Method and system for reducing overflows in a computer graphics system
US6725457B1 (en) * 2000-05-17 2004-04-20 Nvidia Corporation Semaphore enhancement to improve system performance
US6728820B1 (en) * 2000-05-26 2004-04-27 Ati International Srl Method of configuring, controlling, and accessing a bridge and apparatus therefor
US6734861B1 (en) * 2000-05-31 2004-05-11 Nvidia Corporation System, method and article of manufacture for an interlock module in a computer graphics processing pipeline
US6724394B1 (en) * 2000-05-31 2004-04-20 Nvidia Corporation Programmable pixel shading architecture
US6532013B1 (en) * 2000-05-31 2003-03-11 Nvidia Corporation System, method and article of manufacture for pixel shaders for programmable shading
US6690372B2 (en) * 2000-05-31 2004-02-10 Nvidia Corporation System, method and article of manufacture for shadow mapping
US20020015055A1 (en) * 2000-07-18 2002-02-07 Silicon Graphics, Inc. Method and system for presenting three-dimensional computer graphics images using multiple graphics processing units
US6842180B1 (en) * 2000-09-20 2005-01-11 Intel Corporation Opportunistic sharing of graphics resources to enhance CPU performance in an integrated microprocessor
US6532525B1 (en) * 2000-09-29 2003-03-11 Ati Technologies, Inc. Method and apparatus for accessing memory
US6731298B1 (en) * 2000-10-02 2004-05-04 Nvidia Corporation System, method and article of manufacture for z-texture mapping
US20020059302A1 (en) * 2000-10-10 2002-05-16 Hitoshi Ebihara Data communication system and method, computer program, and recording medium
US7027972B1 (en) * 2001-01-24 2006-04-11 Ati Technologies, Inc. System for collecting and analyzing graphics data and method thereof
US6542971B1 (en) * 2001-04-23 2003-04-01 Nvidia Corporation Memory access system and method employing an auxiliary buffer
US6700583B2 (en) * 2001-05-14 2004-03-02 Ati Technologies, Inc. Configurable buffer for multipass applications
US6894687B1 (en) * 2001-06-08 2005-05-17 Nvidia Corporation System, method and computer program product for vertex attribute aliasing in a graphics pipeline
US6982718B2 (en) * 2001-06-08 2006-01-03 Nvidia Corporation System, method and computer program product for programmable fragment processing in a graphics pipeline
US6844879B2 (en) * 2001-07-19 2005-01-18 Nec Corporation Drawing apparatus
US6744433B1 (en) * 2001-08-31 2004-06-01 Nvidia Corporation System and method for using and collecting information from a plurality of depth layers
US6989840B1 (en) * 2001-08-31 2006-01-24 Nvidia Corporation Order-independent transparency rendering system and method
US6704025B1 (en) * 2001-08-31 2004-03-09 Nvidia Corporation System and method for dual-depth shadow-mapping
US20030080959A1 (en) * 2001-10-29 2003-05-01 Ati Technologies, Inc. System, Method, and apparatus for early culling
US6999076B2 (en) * 2001-10-29 2006-02-14 Ati Technologies, Inc. System, method, and apparatus for early culling
US6677953B1 (en) * 2001-11-08 2004-01-13 Nvidia Corporation Hardware viewport system and method for use in a graphics pipeline
US20030117971A1 (en) * 2001-12-21 2003-06-26 Celoxica Ltd. System, method, and article of manufacture for profiling an executable hardware model using calls to profiling functions
US6683614B2 (en) * 2001-12-21 2004-01-27 Hewlett-Packard Development Company, L.P. System and method for automatically configuring graphics pipelines by tracking a region of interest in a computer graphical display system
US7224359B1 (en) * 2002-02-01 2007-05-29 Nvidia Corporation Depth clamping system and method in a hardware graphics pipeline
US20040012600A1 (en) * 2002-03-22 2004-01-22 Deering Michael F. Scalable high performance 3d graphics
US6876362B1 (en) * 2002-07-10 2005-04-05 Nvidia Corporation Omnidirectional shadow texture mapping
US6864893B2 (en) * 2002-07-19 2005-03-08 Nvidia Corporation Method and apparatus for modifying depth values using pixel programs
US20040036159A1 (en) * 2002-08-23 2004-02-26 Ati Technologies, Inc. Integrated circuit having memory disposed thereon and method of making thereof
US7324547B1 (en) * 2002-12-13 2008-01-29 Nvidia Corporation Internet protocol (IP) router residing in a processor chipset
US6885376B2 (en) * 2002-12-30 2005-04-26 Silicon Graphics, Inc. System, method, and computer program product for near-real time load balancing across multiple rendering pipelines
US6900810B1 (en) * 2003-04-10 2005-05-31 Nvidia Corporation User programmable geometry engine
US7038678B2 (en) * 2003-05-21 2006-05-02 Nvidia Corporation Dependent texture shadow antialiasing
US7038685B1 (en) * 2003-06-30 2006-05-02 Nvidia Corporation Programmable graphics processor for multithreaded execution of programs
US6995767B1 (en) * 2003-07-31 2006-02-07 Nvidia Corporation Trilinear optimization for texture filtering
US7525547B1 (en) * 2003-08-12 2009-04-28 Nvidia Corporation Programming multiple chips from a command buffer to process multiple images
US7015915B1 (en) * 2003-08-12 2006-03-21 Nvidia Corporation Programming multiple chips from a command buffer
US20060114260A1 (en) * 2003-08-12 2006-06-01 Nvidia Corporation Programming multiple chips from a command buffer
US20050041031A1 (en) * 2003-08-18 2005-02-24 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US20050081115A1 (en) * 2003-09-26 2005-04-14 Ati Technologies, Inc. Method and apparatus for monitoring and resetting a co-processor
US20050122330A1 (en) * 2003-11-14 2005-06-09 Microsoft Corporation Systems and methods for downloading algorithmic elements to a coprocessor and corresponding techniques
US7053901B2 (en) * 2003-12-11 2006-05-30 Nvidia Corporation System and method for accelerating a special purpose processor
US20060119607A1 (en) * 2004-02-27 2006-06-08 Nvidia Corporation Register based queuing for texture requests
US7324111B2 (en) * 2004-04-09 2008-01-29 Nvidia Corporation Method and apparatus for routing graphics processing signals to a stand-alone module
US20060028478A1 (en) * 2004-04-23 2006-02-09 Nvidia Corporation Point-to-point bus bridging without a bridge controller
US6985152B2 (en) * 2004-04-23 2006-01-10 Nvidia Corporation Point-to-point bus bridging without a bridge controller
US20060005178A1 (en) * 2004-07-02 2006-01-05 Nvidia Corporation Optimized chaining of vertex and fragment programs
US20060055695A1 (en) * 2004-09-13 2006-03-16 Nvidia Corporation Increased scalability in the fragment shading pipeline
US20060059494A1 (en) * 2004-09-16 2006-03-16 Nvidia Corporation Load balancing
US20060101218A1 (en) * 2004-11-11 2006-05-11 Nvidia Corporation Memory controller-adaptive 1T/2T timing control
US7477256B1 (en) * 2004-11-17 2009-01-13 Nvidia Corporation Connecting graphics adapters for scalable performance
US7372465B1 (en) * 2004-12-17 2008-05-13 Nvidia Corporation Scalable graphics processing for remote display
US7325086B2 (en) * 2005-12-15 2008-01-29 Via Technologies, Inc. Method and system for multiple GPU support
US20080007559A1 (en) * 2006-06-30 2008-01-10 Nokia Corporation Apparatus, method and a computer program product for providing a unified graphics pipeline for stereoscopic rendering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9153193B2 (en) 2011-09-09 2015-10-06 Microsoft Technology Licensing, Llc Primitive rendering using a single primitive type
CN110428506A (en) * 2019-08-09 2019-11-08 成都景中教育软件有限公司 A kind of dynamic geometry 3-D graphic cutting implementation method based on parameter

Also Published As

Publication number Publication date
US20090179894A1 (en) 2009-07-16
US20090128551A1 (en) 2009-05-21
US8284207B2 (en) 2012-10-09
US20090027383A1 (en) 2009-01-29
US20130120410A1 (en) 2013-05-16

Similar Documents

Publication Publication Date Title
US8284207B2 (en) Method of generating digital images of objects in 3D scenes while eliminating object overdrawing within the multiple graphics processing pipeline (GPPLS) of a parallel graphics processing system generating partial color-based complementary-type images along the viewing direction using black pixel rendering and subsequent recompositing operations
JP5345226B2 (en) Graphics processor parallel array architecture
US8074224B1 (en) Managing state information for a multi-threaded processor
US8077174B2 (en) Hierarchical processor array
US8319784B2 (en) Fast reconfiguration of graphics pipeline state
US7463261B1 (en) Three-dimensional image compositing on a GPU utilizing multiple transformations
JP3657518B2 (en) Graphics processor with deferred shading
US7663621B1 (en) Cylindrical wrapping using shader hardware
US7737982B2 (en) Method and system for minimizing an amount of data needed to test data against subarea boundaries in spatially composited digital video
US7522171B1 (en) On-the-fly reordering of 32-bit per component texture images in a multi-cycle data transfer
US7499051B1 (en) GPU assisted 3D compositing
WO2016028482A1 (en) Render target command reordering in graphics processing
US7525547B1 (en) Programming multiple chips from a command buffer to process multiple images
US20090135190A1 (en) Multimode parallel graphics rendering systems and methods supporting task-object division
US7747842B1 (en) Configurable output buffer ganging for a parallel processor
US7484076B1 (en) Executing an SIMD instruction requiring P operations on an execution unit that performs Q operations at a time (Q<P)
EP1255227A1 (en) Vertices index processor
US7404056B1 (en) Virtual copying scheme for creating multiple versions of state information
US20030164823A1 (en) 3D graphics accelerator architecture
US7593971B1 (en) Configurable state table for managing multiple versions of state information
US20220245751A1 (en) Graphics processing systems
US7825936B1 (en) Method and system for texture instruction demotion optimization
JP2003288610A (en) Device and method of image processing

Legal Events

Date Code Title Description
AS Assignment

Owner name: LUCID INFORMATION TECHNOLOGY, LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAKALASH, REUVEN;LEVIATHAN, YANIV;REEL/FRAME:021573/0965

Effective date: 20080513

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LUCIDLOGIX TECHNOLOGY LTD.;REEL/FRAME:046361/0169

Effective date: 20180131