Why application-based benchmarks matter

Hierarchy of benchmarking methodologies explained. 

By Bob Cramblitt

An article last year in Jon Peddie’s Tech Watch provided an overview of SPEC/GWPG benchmarks, citing the struggle in the early days of computer graphics between application-based and synthetic benchmarks to measure performance.

For the vast majority who are not in the day-to-day performance evaluation trenches, it’s good to define synthetic vs. application-based benchmarks.

Synthetic graphics and workstation benchmarks measure performance for discrete operations that exercise GPU, CPU, storage, or other functionality. They are typified by short run times and small models that provide a snapshot of performance taken out of the context of application overhead.

In contrast, application-based benchmarks produce performance metrics derived from how workstations run actual applications within a typical work environment.

The SPECapc for NX 9/10 benchmark runs on top of the actual application. It is a collaborative effort between Siemens and SPECapc.

A benchmarking hierarchy

Bill Martin-Otto, Lenovo’s SPEC/GWPG representative, provides this hierarchy of benchmarking methodologies, in order of effectiveness:

  1. Benchmarking your own workflow and applications
  2. Benchmarks that run on top of real applications
  3. Benchmarks based on application tracing
  4. Synthetic benchmarks

Rolling your own

Benchmarking your own workflow and applications can be challenging for the average user, who would be required to come up with specific workloads, benchmarking methodologies and scoring values. It’s difficult for vendors to help customers benchmark their workflow and applications as well, according to Martin-Otto: “Most customers don’t allow me to have their data and procedures to run benchmarks.”

Running on top of applications

The second method in Martin-Otto’s hierarchy is exemplified by SPECapc benchmarks.

“The value of a SPECapc benchmark is that it demonstrates how the actual application will behave on your particular hardware configuration,” says Trey Morton of Dell, the SPECapc project group chair.

Customers running SPECapc benchmarks get a true representation of how a workstation will perform for their most frequently used applications. Vendors, particularly those who are SPEC/GWPG members, obtain results that provide valuable data for improving performance.

“We strive to provide the highest performance and reliability in our workstations,” says Peter Torvi, an HP representative to the SPEC/GWPG committee. “The SPECapc benchmarks are real-world examples that help us measure how well we are doing in achieving those goals.”

“The SPECapc benchmarks provide the best possible analysis of our hardware and how to improve it,” says Jonathan Konieczny, an AMD SPEC/GWPG representative. “These tests run the actual application, providing a very accurate representation of how the full system behaves over the course of a user session.”

The SPECwpc benchmark does not require a licensed application to run, but uses traces of applications in its workloads.

Tracing a real application

Third on Martin-Otto’s list are benchmarks such as SPECviewperf and SPECwpc, which do not require a licensed application to run, but use traces of a target application.

Benchmarks based on application traces are not as representative of true performance as a SPECapc benchmark because they only show the portion of the application that was traced, not everything that it does. But, they are easy to install and run, and provide a more realistic alternative to synthetic benchmarks.

“Synthetic benchmarks can make very incorrect assumptions about what the application is doing,” says Allen Jensen, a SPEC/GWPG representative from Nvidia. “Real life is much messier than you can depict by synthetic tests. By using trace-based tests, we can reflect what the application is really doing.”

Passing speed or cup holders?

Dell’s Alex Shows, SPECgpc project group chair, says that application-based benchmarks, whether running on top of applications or using traces, measure performance where it matters most.

“Application-based benchmarks are like test-driving the car yourself. If you measure performance of the application, or at least measure performance based on the application, you measure what’s important—like the time it takes to pass an 18-wheeler in highway traffic.

“Unfortunately, with synthetic benchmarks, critical buying decisions might be based on factors analogous to judging a car by the number of cup holders it provides.”

Bob Cramblitt is communications director for SPEC. He writes frequently about performance issues and digital design, engineering and manufacturing technologies. To find out more about graphics and workstation benchmarking, visit the SPEC/GWPG website, subscribe to the SPEC/GWPG enewsletter or join the Graphics and Workstation Benchmarking LinkedIn group: https://www.linkedin.com/groups/8534330.