There are six, says Imagination Technologies.
With ray tracing becoming increasingly important for a wide range of graphics applications, Imagination Technologies has developed a Ray Tracing Level System to give developers and OEMs an insight into the capability of solutions for ray tracing acceleration available now and in the future.
The System identifies increasingly advanced stages of ray tracing acceleration, across a range of architectures, not just Imagination’s PowerVR Ray Tracing, with each level providing higher performance and better hardware utilization. This translates to greater real-world ray tracing performance with better efficiency for more complex effects and higher resolutions. Referring to the System, companies looking to deploy, or develop on, ray-tracing solutions will be able to confidently understand the market and find the right technology to meet their needs.
The Ray Tracing Level System consists of six levels, with the capabilities and requirements, described as follows:
- Level 0: Legacy solutions
- Level 1: Software on traditional GPUs
- Level 2: Ray/box and ray/tri-testers in hardware
- Level 3: Bounding Volume Hierarchy (BVH) processing in hardware
- Level 4: BVH processing and coherency sorting in hardware
- Level 5: Coherent BVH processing with Scene Hierarchy Generation (SHG) in hardware
As been pointed out before and elsewhere, ray tracing is not a new subject or a new computer technique. The following is a bit of an expansion on the six levels proposed by Imagination.
Level 0. There have been many ambitious Level 0 attempts but all unfortunately failed, and yet new designs with custom APIs continue to be announced. The biggest reason for failure was the discontinuity with how traditional GPUs process data. Part of the failure has been trying to create a new paradigm. Without continuity, a completely new and not compatible ecosystem is imposed and doesn’t offer an evolutionary adoption. Imagination Technologies’ OpenRL was the first attempt to have a link with standard 3D APIs such as OpenGL.
Level 1. Ray tracing has been treated as an app and runs on conventual processors, x86 being most common. Such a software solution ensures continuity with the existing ecosystem. Compute/Shader paths are used to execute ray tracing functionality. However, because a scene can have so many rays running simultaneously, a 2-, 4-, or even 16-core CPU will have difficulty with performance due to computational load. For realtime experience, one must use many tricks, hacks, and shortcuts as well as limit the resolution.
An example is Adshir’s LocalRay where the secondary rays are handled apriority in coherent beams. This not only improves the parallelism and performance but cache usage as well. It is not limited in resolution/usage and no tricks.
Level 2. Ray-box and ray-triangle testers can be implemented in hardware using standard fused multiply-add operations on GPUs but this repeated operation is expensive (cycles/power/area cost). A Level 2 solution offloads a large part of the ray tracing job to dedicated hardware improving efficiency.
Level 3. Bounding Volume hierarchical (BVH) processing provides a more extensive offloading of data flow management in hardware. BVH helps cut down the amount of ray testing needed through a hierarchical testing system thus making realtime ray tracing possible. Tracing a ray through the acceleration structure is much more complicated than just ray-box and ray-triangle testing. Complex and dynamic data flow is required where each box test step decides what happens next, e.g., more hierarchical box tests and/or triangle tests. There are significant opportunities to streamline this process by moving the full BVH tree structure walking into hardware. It can improve execution efficiency, bandwidth, and caching efficiency, enabling the next level of ray tracing acceleration.
Siliconart’s new MIMD ray tracing hardware design can be thought of as a level three device because its RayCore MC handles static and dynamic BVH structure to find hit points of rays.
Level 4. BVH processing with coherency sorting in hardware can increase the processing and bandwidth efficiency of ray tracing. Ray tracing struggles with coherency as bouncing rays generate ever more divergence in ray directions. Each ray needs to walk through the BVH structure and if each ray follows a different path this results in very poor memory access efficiency and caching. As divergent rays also hit different objects, this mismatches with the SIMD nature of all modern GPU architectures: different ray hits mean different shaders. A hardware coherency sorting engine can enable this 4th level of efficiency. Adding coherency sorting across the rays in flights helps with SIMD and BVH memory access efficiency for higher real-world ray rate utilization.
This type of hardware coherency engine is similar to Imagination Technologies’ tile-based deferred rending (TBDR) which uses a unique sorting block to ensure coherent processing of pixels, the coherency engine enables the same for rays. A hardware ray coherency sorting engine enables this 4th level of efficiency.
Level 5. Full acceleration of the ray tracing processing in hardware. Building an efficient BVH structure is complex and expensive. It can be done on the CPU and/or the GPU using a variety of algorithms and approaches. However, achieving optimal level 5 efficiency calls for a dedicated hardware solution. A hardware BVH builder enables much higher performance with high efficiency for very detailed dynamic 3D scenes. When this capability is added to a lower than Level 4 hardware design, it can be recognized as a plus level, e.g., Level 2 Plus solution. Brute force Ray Tracing is not realistic, runs out of steam rapidly, and is simply not compatible with mobile phone budgets. However, Imagination Technologies says it has offered Level 5 Ray Tracing solutions since 2014 with PowerVR GR6500. The company says its PowerVR ray tracing architectural licenses are available and already backed up by more than 200 granted patents.
What do we think?
Ray tracing is used for optical and RF lens design, audio speakers and microphone design, and almost anything and everything else that can be expressed in a wave. For visual effects, it provides a physically accurate representation of the scene if allowed to run long enough, incorporates all the environmental factors (global illumination, light sources, color, etc.), has accurate material values, and an accurate 3D model to paint. It’s devilishly and insidiously complicated and complex and that’s one of the reasons the computational workload is so high. The book of tricks and compromises to get an almost ray-traced object within a scene (that may or may not be raytraced is a thick one and pages are being added to it every day. And all that is just for a single static scene.
When you add camera, character, and objects movement, the workload goes up by the cube at the least. Then increase the display resolution and color-depth and a supercomputer begins to break out in a sweat.
The above levels are hardware-focused and do a good job of segmenting the various degrees of ray tracing acceleration. AIBs and OEMs might adopt it and then categorize their Imagination products just as Intel does with its four levels of core I. But it’s hard to envision AIBs/OEMs doing this for the market leader, Nvidia. This is due to the fact that Imagination’s construct of RT levels doesn’t take into account software support which has a great impact on the overall experience, and that raises the question of experience. In the end, the consumer isn’t going to care about how many transistors are dedicated to, say, full scene path tracing. What they will care about is the quality and performance of what they’re looking at.
In addition, there are post-processing features such as Nvidia’s AI-based denoiser (544 additional cores in the RTX 2080 chip). This is not actually part of the ray tracing engine; it is post-processing. The denoising and super-resolution system could also be applied to traditional rendering. This type of approach reduces the workload but does not increase the efficiency of the tracing of rays itself. It can be seen as a separate feature/function which indeed can be mapped on to dedicated processing blocks like the Nvidia AI accelerator or PowerVR NNA unit or could be mapped onto the GPU ALUs (AMD). Equally, the approach here can be neural network-based (Nvidia focus) but at the cost of complex and per-app training, or a more traditional image processing approach, e.g., traditional temporal and spatial filters.
However, it must be recognized that ray racing isn’t only hardware, and in fact, became a hardware issue in the late 1980s. Ray tracing is a math operation, and therefore may be more of a software issue, as Adshir points out. In reality, its hardware and software and the work in those two areas advances in a dependency method, unsynchronized, over time. The levels shouldn’t be viewed as supersets of each other, which is illustrated by Nvidia and Siliconart’s solutions—e.g., you can BVH in software with one hardware solution and in hardware in another depending upon the designer’s choice, or maybe the process technology available at the time.
What most of the suppliers, hardware, and software, will tell you, is it is the user’s experience that is the measure of accomplishment. The sampling and reconstruction needed for realtime path-traced light transport are tightly coupled with API evolution, as well as architectural advances. And to judge one solution over another strictly on a hardware taxonomy would be a mistake—it’s a platform, an ecosystem highly integrated and constantly in flux.
Nonetheless, Imagination’s hierarchy is a decent first attempt at categorizing (albeit basically) the level of hardware acceleration offered by a particular solution.