Benchmarking ray tracing

Ray tracing, as everyone knows, is a simple algorithm that can totally consume a processor. But how much a given processor is consumed is an unanswerable question because it depends upon the scene and of course the processor itself. So, approximations have to be made, and parameters fixed to get a consistent comparison. Then it’s left to the buyer to extrapolate the results to his or her situation.

Ray tracing is done on three platforms and soon four: servers, workstations,  and has been demonstrated on tablets. Non-geometric-based ray tracing is also run on supercomputers in field simulations ranging from optical analysis to nuclear explosions and fusion reactions.

At the workstation, server and supercomputing levels, the Standard Performance Evaluation Corporation (SPEC) has offered benchmarks based on professional applications since 1988.[1] SPEC is a non-profit corporation whose membership is open to any company or organization that is willing to support the group’s goals (and pay dues). Originally a bunch of people from hardware vendors devising CPU metrics, SPEC has evolved into an umbrella organization encompassing diverse areas of interest: Cloud, CPU, Graphics and Workstation Performance, Handheld, High-Performance Computing, Java Client/Server, Mail Servers, Storage, Power, Virtualization.

SPEC does not have a benchmark that focuses solely on ray tracing because all SPEC benchmarks are based on applications, not specific functionality within applications. There are many tests within those application-based benchmarks (especially those from the SPEC Graphics and Workstation Performance Group—SPEC/GWPG) that test ray tracing functionality, but the performance measurement is related to how an application performs as a whole on the system being tested. These types of tests more accurately reflect what a user would experience in the real world when running a professional application.

SPEC/GWPG produces benchmarks that work on top of actual applications (SPECapc) and ones that are based primarily on traces of applications (SPECviewperf for graphics performance, and SPECworkstation for comprehensive workstation performance). Members contributing to benchmark development include AMD, Dell, Fujitsu, HP, Intel, Lenovo, and Nvidia.

“Tribute to Myrna Loy” by Ive (2008). The figure is Vicky 4.1 from DAZ. The author, Ive, created it with Blender by using all images of her that he could find as a reference. Rendered with POV-Ray beta 25 using 7 light sources (and the “area_illumination” feature).

In 2018 SPEC released SPECworkstation 3, comprising more than 30 workloads containing nearly 140 tests to exercise CPU, graphics, I/O and memory bandwidth. The workloads are divided by application categories that include media and entertainment (3D animation, rendering), product development (CAD/CAM/CAE), life sciences (medical, molecular), financial services, energy (oil and gas), general operations, and GPU compute.

A scene from the updated LuxRender workload.

Accurately representing GPU performance for a wide range of professional applications poses a unique set of challenges for benchmark developers such as SPEC/GWPG. Applications behave very differently, so producing a benchmark that measures a variety of application behaviors and runs in a reasonable amount of time presents difficulties.

Even within a given application, different models and modes can produce very different GPU behavior, so ensuring sufficient test coverage is a key to producing a comprehensive performance picture.

Another major consideration is recognizing the differences between CPU and GPU performance measurement. Generally speaking, the CPU has an architecture with many complexities that allow it to execute a wide variety of codes quickly. The GPU, on the other hand, is purpose-built to execute pretty much the same set of operations on many pieces of data, such as shading every pixel on the screen with the same set of operations.

The SPECworkstation 3 suite for measuring GPU compute performance includes three workloads. The ray-tracing test uses LuxMark, an OpenCL-based benchmark based on the new LuxCore physically based renderer, to render a chrome sphere resting on a grid of numbers in a beach scene.

SPEC also offers viewsets within its SPECviewperf 13 and SPECworkstation 3 benchmarks that include ray-tracing functionality based on real-world application traces. For example, the Maya-05 viewset was created from traces of the graphics workload generated by the Maya 2017 application from Autodesk.

The viewset includes numerous rendering modes supported by the application, including shaded mode, ambient occlusion, multi-sample antialiasing, and transparency. All tests are rendered using Viewport 2.0.

One thing to consider in benchmarking ray tracing performance is that it doesn’t happen in a void. Even in a SPEC test that is predominantly centered on ray tracing, there is a lot of other stuff happening that impacts performance, including application overhead, housekeeping and implementation peculiarities. These need to be considered for any performance measurement to be representative of what happens in the real world.

In addition to benchmarks from consortiums such as SPEC, some ray tracing software suppliers offer their own benchmark programs. For example, Chaos Group has a V-Ray benchmark. The V-Ray Benchmark is a free stand-alone application to help users test how fast their hardware renders. The benchmark includes two test scenes, one for GPUs and another for CPUs, depending on the processor type you’d like to measure. V-Ray Benchmark does not require a V-Ray license to run.

One launches the application and runs the tests. After the tests are complete, the user can share the results online and see how his or her hardware compares to others at benchmark.chaosgroup.com. The company recommends noting any special hardware modifications that have been made like water cooling or overclocking.

V-Ray benchmark tests. (Source: Chaos Group)

Chaos Group says if one is looking to benchmark their render farm or cloud, they can try the command-line interface to test without a GUI. V-Ray Benchmark runs on Windows, Mac OS, and Linux.

For PCs the leading benchmark supplier is Underwriter Labs Futuremark team. Finland-based Futuremark has been making PC graphics benchmarks since 1997 [2] and in 2018 announced their ray-tracing benchmark 3DMark Port Royal, the first dedicated realtime ray tracing benchmark for gamers. One can use Port Royal to test and compare the realtime ray tracing performance of any graphics AIB that supports Microsoft DirectX Raytracing.

Realtime ray tracing promises to bring new levels of realism to in-game graphics. (Source: Underwriter Labs)

Port Royal uses DirectX Raytracing to enhance reflections, shadows, and other effects that are difficult to achieve with traditional rendering techniques.

As well as benchmarking performance, 3DMark Port Royal is a realistic and practical example of what to expect from ray tracing in upcoming games—ray tracing effects running in realtime at reasonable frame rates at 2560 × 1440 resolution.

3DMark Port Royal was developed with input from AMD, Intel, Nvidia, and other leading technology companies. UL worked especially closely with Microsoft to create an implementation of the DirectX Raytracing API.

Port Royal will run on any graphics AIB with drivers that support DirectX Raytracing. As with any new technology, there are limited options for early adopters, but more AIBs are expected to get DirectX Raytracing support.

Summary

Benchmarking will always be a challenge. There are two classes of benchmarking, synthetic or simulated, and application-based. SPEC uses application based and UL uses synthetic. The workload or script of a benchmark is always subject for criticism, especially by suppliers whose products don’t do well in the tests. The complaint is that the script (of actions in the application-based test) or the simulation (in the synthetic tests) don’t reflect real-world workloads or usage. That is statistically correct to a degree. However, SPEC benchmarks either run on top of actual applications or are developed based on traces of applications performing the same work as in the real world. Also, the organizations developing these benchmarks have been doing this work, and only this work, for over two decades, longer than the life of some of their critics, and over that period of time and with that much-accumulated experience they can be considered experts.

[1] https://www.spec.org/spec/

[2] https://en.wikipedia.org/wiki/Futuremark