Graphics board tests are not created equal

Most industry benchmarks can be slanted to prefer or avoid performance characteristics. Jon Peddie explains the Pmark, used by Jon Peddie Research to provide a common standard for mearsuring graphics board performance.

Graphics add-in boards (AIBs) get measured differently depending upon their segmentation. A workstation board generally gets measured using SPECviewperf, or various SPECapc for a given application.

Consumer AIBs get benchmarked using a variety of games and/or Futuremark’s 3DMark 11. Also game engines like Unigine’s “Heaven” is used.

What we’ve observed, for example, is the majority of the enterprise buying decisions are based more on subjective power user input to a centralized engineering services IT team, based on a number of disassociated applications. For example, for one of the Big Three  US auto makers made their desktop CAD seat decision based on Bunkspeed perf (CUDA) rather than pure CATIA performance. These applications tickle the GPU in an entirely different way.

This is also being observed in the SMB base now as well, as Adobe and Autodesk drive forward with their products bundled as suites. For example, Adobe CS5 MPE performance is a function of CUDA + graphics—there is no existing benchmark that correlates. Autodesk’s 2012 suites are a full workflow now, so users are going to be looking at realtime, rendering, and modeling performance.

All these tests can be, and are, used by suppliers and fans to show off their AIB in the most favorable light. So if a workstation AIB does well in SPECap for SolidWorks 2007 and not in 3ds Max 9, then the SolidWorks test results are talked about, and not the 3ds scores. If a consumer AIB does well in “Starfighter” but not “Stalker” the same thing happens.

We think this is, at the least, irresponsible and, at the worst, misleading.

The Pmark

In 2009 we established the Pmark. The Pmark take three (and soon to be four) parameters and arrives at a number. The scale of the number is not important, the Pmark is relevant for comparing AIBs.

The Pmark takes into consideration Price, Performance, and Power consumption. The next version (coming soon) will add Pressure (noise) to the equation.

The calculation for the Pmark is:

Performance is measured in two ways on a graphics AIB, by frame rate (fps), or by an arbitrary score as generated by 3Dmark 11 and SPEC. Unigine’s “Heaven” offers both.

The performance number used in the Pmark can be either but not both. The score ranges for 3DMark 11 is from 2,000 to 12,000 and for SPEC from 1 to 100.

The scores are influenced by a number of variable such as screen resolution, and filters that are used (i.e., anti-aliasing, anisotropic filtering, shadows, lights, etc.), the version of the driver used, and the PC used in the testing. Here again it doesn’t matter what is turned on or off or what value a filter is set as long as it is done consistently.

One rule of thumb is to turn on every feature and set it to the max to give the AIB the maximum stress. That’s unreasonable and unrealistic because anyone using an AIB will dial back the features to the point they can get a reliable 30 fps, and preferably 60 fps.

Average performance

When we are testing we use the average score for all the tests and all the resolutions (but leave filters set the same). Our reasoning is no one will buy an AIB to use with just one application, CAD program, or game; it will be used with a variety of applications, and therefore an average performance number is more realistic.

For some AIB manufacturers the Pmark is annoying. The thing about benchmarking is that you only have one friend—the current winner; everyone else wishes you’d get hit by a bus. When an AIB manufacturer’s board gets a low Pmark score—usually due to drawing too much power or being too expensive relative to other AIBs tested, the complaints are: You should compare it to brand X because their AIB isn’t in the same price range; or, power consumption isn’t important to enthusiast/power users.

Our response is – don’t shoot the messenger, fix your damn board.

Dated results

Benchmarks are typically run on the latest PC. That means the scores for a given AIB are only good for a year, two at the most. That’s not too bad since new graphics AIBs are introduced every six months. However, one has to be careful in comparing their old AIB’s benchmark score (regardless of which benchmark is used) with a new one they are considering. To get a really correct view of the two AIBs, the newer one should be tested in the older PC—that will show off the difference most accurately. However, that’s not usually convenient and so the old AIB is tested in the new PC. As long as the PC used is the same it’s really OK.

Some recent consumer AIB scores

Just to give a flavor of the scores the following diagram shows the results for some consumer AIBs (higher is better).

The Pmark for a few popular AIBs.

For comparative purposes the following table shows the individual values that went into generating the Pmark.

GTX 480 GTX 580 GTX 480 SLI GTX 580 SLI GTX 590 HD 5970
Power (watts) 244 250 488 500 365 294
Price $ $504 $499 $1,008 $998 $695 $638
Average Performance 45.7 53.6 85.6 97.7 81.0 50.7
Pmark 0.037 0.043 0.017 0.020 0.032 0.027
Perf/Watt 0.19 0.21 0.35 0.39 0.32 0.20
Performance/Dollar 0.09 0.11 0.08 0.10 0.12 0.08

As the table shows the two GTX 580s give the highest performance, whereas the GTX 480 uses the least amount of power, and the GTX 580 is the least expensive—so which one is the best?

That’s exactly the dilemma most buyers face, how to decide. So if they don’t have the resources and time to test several boards themselves they turn to the web for benchmark data (in the past it used to be magazines but things are moving too fast for a magazine publishing schedule.)

Therefore we think the Pmark is the best measurement for a professional and/or a consumer in order to get a balanced point of view about a graphics AIB.

Some recent professional AIB scores

To apply the Pmark to the professional graphics AIBs, we used the latest SPECperf 11 results (http://www.spec.org/gwpg/gpc.data/vp11/summary.html ) for the performance numbers and the published prices and power consumption numbers for the rest.

Spec publishes eight scores for a system. The platform varies and the same AIB is used in various platforms. Since we are interested in AIB performance we took the average of all the systems with the same AIB to arrive at performance value.

Too noisy?

The next step will be the addition of noise. We will call it Pressure so we can keep it in the “P” family. Noise is measure in decibels and lower is better it will be denominator operator. (Sound is usually measured with microphones and they respond—approximately—proportionally to the sound pressure. The power in a sound wave, all else equal, goes as the square of the pressure, and expressed in decibels (dB).)

The equation will look like this:

And the scores would be:

Pmark with noise

Adding noise to the Pmark is a good thing to do if a consistent and repeatable testing scenario can be established. The actual dB number will not be that exact because we’re not going to be able to use an anechoic chamber (or build one) so ambient noise will be a factor. A tare value can be established to run the tests and several runs can be made, so this might be doable. Check back in a week or at JonPeddie.com so to find out how we’ve done.

In search of objectivity

Benchmarking is the only objective way to evaluate competition products, especially when several variables are involved. A single benchmark, synthetic or real-world, can be misleading. Also, just using a benchmark that only measures performance and does not rate the product in a realistic holistic manner, does not give the buyer an idea about what their experience might be like.

Therefore we propose adoption of the Pmark as a way to incorporate the parameters of importance. The Pmark can have weighting factors applied to the individual parameters if a user thinks one element is more important than another.

The Pmark is not trademarked or patented and is free to use by anyone who thinks it might be useful.

Jon Peddie is President of Jon Peddie Research, the publisher of GraphicSpeak.