Review: AMD FirePro W9100 raises the professional graphics bar

At $3,999 it isn’t for the average workstation user. If you need it, you really need it.

By Alex Herrera

Do you place heavy demands on GPU hardware, for rendering or other highly parallel workloads? Need more gigabytes of GPU memory than most users have in their workstation’s system memory? Got upwards of $4K burning a hole in your pocket? OK, maybe that doesn’t describe you — I know for sure it doesn’t describe me — but there are some professional users out there who would answer yes to all the above. And it’s specifically those users AMD targets with its new FirePro W9100 workstation-class graphics card.

The raw hardware specs for AMD’s new top-of-the-line professional GPU certainly impress. Built around AMD’s latest-and-greatest Hawaii GPU, the FirePro W9100 boasts 5.6 TeraFLOPS of peak single-precision throughput, and an even more impressive 2.8 TeraFLOPS double-precision rate. It comes supported by 16 GB of GDDR5 memory, sustaining 320 GB/sec of peak bandwidth. Not just one, but all of those aforementioned numbers set new high-water marks for AMD’s FirePro brand.

And here’s how the new W9100 stacks up in the context of its immediate predecessor, the FirePro W9000, as well as the rest of AMD’s mid-range and above FirePro line-up.

	FirePro W9100 (NEW)	FirePro W9000	FirePro W8000	FirePro W7000	FirePro W5000
MSRP	$3,999	$3,999	$1,599	$899	$599
GCN GPU	GCN 2.0 “Hawaii”	GCN 1.0 “Tahiti”	GCN 1.0 “Tahiti” (slower clock)	GCN 1.0 “Pitcairn”	GCN 1.0 “Pitcairn” (slower clock)
Peak FLOPS	5.6 TFLOPS	4.3 TFLOPS	3.23 TFLOPS	2.4 TFLOPS	1.3 TFLOPS
Memory size	16 GB GDDR5	6 GB GDDR5	4 GB GDDR5	4 GB GDDR5	2 GB GDDR5
Memory BW (peak)	320 GB/s	264 GB/s	176 GB/s	154 GB/s	102.4 GB/s
Display output	(6) Mini DP 3D stereo Framelock / genlock	(6) Mini DP 3D stereo Framelock / genlock	(4) DP 3D stereo Framelock / genlock	(4) DP 3D stereo (optional bracket) Framelock / genlock	(2) DP (1) dual-link DVI 3D stereo (optional bracket)
Typical board power	275 W (1×6 and 1×8 pin aux power required)	274 W (1×6 and 1×8 pin aux power required)	189 W	150 W	75 W
Form factor	Dual slot	Dual slot	Dual slot	Single slot	Single slot

The current top-half of the FirePro line-up, courtesy of Graphics Core Next (GCN) 1.0 and 2.0. (Source: AMD)

With an MSRP of $3,999 (the same price tags as the predecessor it will slowly replace, the W9000), the AMD FirePro W9100 will be available “this spring” from AMD’s global distribution partner Sapphire Technology, AMD FirePro Ultra Workstation providers, and in HP Z820 and HP Z620 Workstations for customer or channel integration. Other supporting OEMs and system Integrators at launch include: Armari Ltd., BOXX Technologies, Colfax, LumaForge, Mouse Computer, PSSC Labs, Scan Computers Ltd., SilverDraft, Supermicro, TAROX, Versatile Distribution Services, Workstation Specialists and Wortmann AG.

GCN 2.0: Hawaii comes to FirePro

Some architectures represent overhauled, bottoms-up redesigns, and some represent more modest incremental improvements over the previous generation. It’s the nature of the business, reflected by all vendors of big, complex semiconductors. Intel does it, denoting alternate generations as “tick” and “tock”, the former indicating mostly a shrink to a new process with modest enhancements, and the latter representing a more substantial upgrade or redesign of the microarchitecture.

GPU vendors do it, too, and there’s nothing inherently wrong with it. It’s a necessary strategy to provide incremental product improvements while keeping an already-too challenging design cycle somewhat sane. AMD’s well-executed and well-received Evergreen generation (2009) was pretty much a top-to-bottom redesign, while its successor, Northern Islands, turned out to be more of an incremental upgrade on the proven Evergreen architecture. And the next generation, Southern Islands (GCN 1.0) again proved to be a major architecture redesign.

Which leads us to the close of 2013, when AMD unveiled a follow-on graphics architecture, designated as GCN 2.0 and code-named Volcanic Islands. Given the company’s design cadence, one might assume the GCN 2.0 would be more of an incremental improvement than a complete overhaul … and that assumption would be generally valid, though not completely.

Perhaps better referred to as something like GCN 1.5, Volcanic Islands maintained the same general architecture, pushing performance by populating more of the existing compute elements and augmenting more on-chip storage, and adding a few ancillary features. Furthermore, GCN 2.0 wasn’t just a rehashing of existing GPU chips; it also introduced a new, bigger and badder flagship GPU, called Hawaii.

The first chip incarnations of GCN 2.0 and Hawaii shipped as the Radeon R9 and R7 2XX series, targeting gamers. As has become the norm, the professional, workstation-caliber GPU line adopts the technology a little later, with a lag anywhere from one quarter to several. Well, in turns out Hawaii’s lag was on the shorter side, as AMD on April 7 officially launched the FirePro W9100.

Block diagram of GCN 2.0’s flagship Hawaii GPU (Source: AMD)

AMD engineers crammed a lot more GCN resources onto the 28 nm Hawaii die than it achieved with the Southern Islands flagship, the 28 nm Tahiti. However, as is often the case with products from an experienced semiconductor design team, the successor is typically much more efficient in its use of die area. And that’s precisely what AMD achieved with Hawaii. While its raw processing rates are 1.2 to 1.9 times as fast as GCN 1.0’s Tahiti—and 1.9X in arguably the most important metrics of geometry throughput and pixel fill rate—Hawaii consumes only 24% more silicon real estate. Compare that to a reported transistor count that is nearly 45% more than Tahiti. Efficient physical design is something that takes significant design time, which is why it’s not always optimal in initial incarnations. But it’s something highly worthwhile to the business—and makes CFOs especially happy—as it directly goes to improving gross margins.

GPU family	“Southern Islands” GCN 1.0	“Volcanic Islands” GCN 2.0	Increase
Flagship GPU	Tahiti	Hawaii
Process	28 nm	28 nm
Stream processors	2816	2048
Geometry processing (billion primitives/s)	2.1 B	4 B	1.9X
Single-precision Compute (TFLOPS)	4.3	5.6	1.3X
Double-precision ratio (DP:SP)	1:4	1:2	2X
Texel rate (Gtexels/s)	134.4	176	1.3X
Pixel rate (Gpixels/s)	33.6	64	1.9X
Peak memory bandwidth (GB/s)	264	320	1.2X
Die area (mm²)	352	438	1.24X
Peak GFLOPS/mm	12.2	12.8	1.05X

AMD manages a lot more resources—and higher performance—in only 24% more die area. (Source: AMD)

FirePro W9100’s 16 GB graphics memory the biggest yet

This board is big, expensive, and consumes about as many Watts as any deskside add-in tower will allow: 275 Watts. Part of that is due to the Hawaii über-chip at its heart, but it also has a lot to do with the memory AMD paired with the GPU: an eye-popping 16 GB of GDDR5 memory. That’s not only a record for a GPU (that we’ve ever seen, anyway), it’s a full 167% more memory than the 6 GB in the card’s ultra-high end predecessor, the FirePro W9000.

Of course, a higher-FLOPS GPU paired with a lot more memory is going to see its performance handcuffed if the bandwidth to that memory isn’t up to snuff. To get the bandwidth up commensurately, AMD had to widen the bus (from 384 bits to 512), further complicating board design and driving up Watts.

But why spec it at exactly 275 Watt, the same level as W9000? It’s no coincidence, but rather constrained by limits imposed by the server market. Thermal density in rackmount applications dictates a 275 Watt maximum. And with this card’s hefty TFLOPS numbers and memory, AMD will be getting interest from HPC and supercomputing markets looking to harness that power for GPGPU (general purpose on GPU) computing.

Power efficiency improved … but performance of this level requires a lot of Watts

Two hundred and seventy-five Watts isn’t a low figure, not by any stretch of the imagination. But a card that boasts this kind of horsepower isn’t going to be winning business based on minimal Watts. When it comes to judging power for a card like this, the more appropriate metrics are maximum power and power efficiency. The former determines whether it can be supported electrically and thermally in its target host machines … more of a pass/fail criterion. The latter is an indication of how much bang you are getting for the buck (or in this case, the Watt).

Two hundred and seventy-five Watts turn out to be a lot to demand on a system, and it does require both an auxiliary 6-pin and an 8-pin connector, but it’s a level that can be supported by most higher-end deskside workstations. Furthermore, Hawaii’s power efficiency clearly improved from Tahiti, as AMD was able to populate more resources in a 24% bigger die, without incurring any additional Watts at the board level. Part of that improved power-efficiency goodness came from upgrading AMD’s PowerTune technology, which leverages architectural, logic and circuit techniques to extract the maximum performance from the fewest Watts.

The FirePro W9100 requires both a 6-pin and 8-pin auxiliary power connectors to supply the necessary Watts. (Photo: Jon Peddie Research)

Target applications

Everything about this card is big: its specs, its size, its power, and its price tag. It’s clear right off the bat this is not a GPU for all. Rather, AMD sees applications with the following key demands as compelling homes for the FirePro W9100:

4K video

4K has become the rage in professional graphics circles, a rallying cry from content creators and hardware providers alike. No question, a strong voice in that cry comes from the former, who are being pushed to raise the bar beyond today’s hum-drum 2K / FullHD resolutions. But to be honest, more than a little hype is coming from suppliers of hardware that can deliver 4K, hoping to help stoke some fires of demand. Regardless of which is the dominant force, real-time and near-real-time 4K performance has become a level higher-end graphics-focused workstations now need to hit. And with its memory footprint, bandwidth, and GPU horsepower complimented by six Mini DisplayPort 1.2 outputs, each capable of driving one 4K display (3840×2160) the W9100 hits it quite well.

Besides displaying 4K resolution on multiple monitors, what does “4K capability” really mean? Well, that will vary by context, but essentially a customer looking to upgrade to 4K support is going to need consistent performance, often real-time, from beginning to end in the creation workflow. The GPU’s traditional domain, rendering, is just one of several potential bottlenecks in the overall project pipeline.

Another key bottleneck AMD points to is pixel grading. Toward that end, AMD worked with Blackmagic, creator of DaVinci Resolve, a studio-caliber platform for delivering real-time color correction and effects (e.g. depth of field blur). With both supporting OpenCL 2.0, DaVinci Resolve can now tap the GPGPU capability of the W9100, seamlessly delivering real-time 4K throughput.

Blackmagic’s DaVinci Resolve in action. (Photo: Blackmagic)

The biggest datasets

A GPU rendering models and computing simulations out of local DRAM would prefer to wholly contain its dataset in memory. Operating in a piecemeal fashion—where only a portion of the dataset is physically resident at one time—is a workable solution. But as one would imagine, piecemeal execution comes at a significant cost in performance. As such, a big GPU memory, like the W9100’s record-setting 16 GB, can pay off handsomely when faced with big datasets.

Consider Hollywood-caliber CGI, for example. In making Avatar, artists wrapped so many layers of high-resolution textures on a Na’vi character, it took roughly 150 GB in all to bring a single character to life. In oil and gas exploration, surveys are both expansive and detailed, resulting in single data sets that can push well beyond 1 TB in size. If you’ve got big data, you want a big GPU memory … period.

Along those lines, AMD garnered this testimonial from David Wortley, Technical Director at Taylor James studio: “In our lab tests, we took some of our heaviest data which previously required disabling textures and lighting to give a usable performance. With the AMD FirePro W9100 included in our test workflow, we were able to have all our data loaded with real-time shadows and realistic materials in the viewport in 3ds Max. That was never possible before.”

Simulating and rendering on the same hardware

Getting more with less. It’s a win-win, and in the CAD space, delivering hardware that can address more than one compute-intensive portion of the workflows is a win-win scenario both Nvidia and AMD are pursuing with their GPUs. Beyond rendering, the two vendors are pushing GPGPU uses for CAD, in particular promising acceleration for both visualization and simulation with one common investment in hardware.

Where Nvidia positions both CUDA (first and foremost) and OpenCL as its GPGPU programming interfaces, AMD is squarely and singularly behind the OpenCL standard. Accordingly, the FirePro W9100 supports OpenCL 2.0, allowing seamless GPU acceleration of simulation applications for computational fluid dynamics (CFD), finite element analysis (FEA), and structural integrity analysis. Key applications that will tap W9100 acceleration include Simulia Abaqus, NX Nastran, Autodesk Moldflow, and CEI Ensight.

AMD is keenly aware that workstations sit at the intersection of high-performance graphics and GPGPU. (Source: AMD)

Benchmarking the AMD FirePro W9100

Our choice to benchmark the FirePro W9100 is the same tool we used previously on its W9000 predecessor: Viewperf 12. This latest version of SPEC’s venerable Viewperf benchmark is designed to isolate the stress on the graphics card specifically, rather than the system as a whole. As a result, its scores reflect the GPU installed and do not (at least should not) reflect differences in other key system components like CPU, memory and storage. It streams pre-defined viewsets (OpenGL and DirectX), representing typical, visual demands of popular workstation-caliber applications, including PTC Creo, Dassault Systèmes Catia and SolidWorks, Siemens NX, and Autodesk Maya applications. New for version 12 are energy and medical viewsets, which exercise volume rendering and dynamic data generation.

In this benchmarking exercise, we again attempted to follow our standard practice of using only a production driver publicly available at the time of testing. However, since this time we were benchmarking pre-launch, we relied on AMD’s commitment that the driver we used was the same one that would be posted for download on the first day the product was available for sale.

Viewperf 12 benchmark scores. (Source: Jon Peddie Research)

Simply put, the FirePro W9100 crushed Viewperf 12, delivering solid, across-the-board performance boosts on all viewsets, compared to results of the W9000. On average, the W9100 posted 18.2% better scores than its predecessor.

Also, bear in mind one of this card’s principal advantages, its 16 GB memory, isn’t being reflected in Viewperf 12 scores. While the benchmark will reward cards with more ample physical memory footprints, the reward peters out around 2 GB. As such, we think all of the W9100’s performance advantage over its predecessor, the 6 GB W9000, comes from Hawaii and the faster memory bus, and none of it from the bigger memory footprint. That of course, should not be construed to mean that the other 14 GB of the W9100’s memory do not offer value. It absolutely does for some viewsets … just not the ones Viewperf 12 presents.

The FirePro W9100 also outscored any other card we’ve benchmarked. It should, given that’s it’s the second highest priced card out there, behind only Nvidia’s Quadro K6000 (a card we haven’t yet benchmarked, but hope to soon). Check out SPEC submitted results page for other submitted results.

Given it has the same MSRP as its predecessor, the W9100’s benchmark Viewperf 12 scores per dollar increases commensurately with raw scores.

What do we think?

So at this point, you might still be thinking, “great product, but the price is just plain ridiculous. No one is paying that much for a graphics card.” And given where the prices for gaming-focused GPUs are these days, you’d be justified in that reaction. Well, I’m surely not going to be in the market for something like the W9100, and no consumer or corporate buyer will either. For that matter, even the vast majority of workstation buyers —up to 99% — won’t be giving this card serious consideration. But there are absolutely customers for it, and at the price, even modest volume can make for very appealing revenue on the back of hefty gross margins.

Who’s buying? Well, that other 1%, for whom a card’s price is a non-issue. And I don’t mean a non-issue in figurative or hyperbolical sense, but the literal one. For many of those 1% applications, the cost of the hardware is meaningless in the big picture — less than a drop in the bucket when it comes to the scope of the project the hardware is serving. Within the conventional workstation market, we’re talking the highest demand segments of oil and gas exploration, real-time Hollywood-caliber DME, with a tad of CAD and medical applications tossed in. Consider the costs involved in drilling for oil in the wrong place, or delaying the delivery date of a new Boeing aircraft. When talking potential risks and rewards in the tens to hundreds of millions of dollars, who is going to care about a few extra thousand in cost outfitting each of a few (or more) graphics workstations?

For that matter, those in the other 99% would most likely not own (or be in the market for) a workstation that could even house such a card in the first place. Remember your system needs that auxiliary power and PCI Express slots that can accommodate a long, dual-slot card. The W9100 wouldn’t fit my default graphics card test bench, a compact desktop (but otherwise high end and dual-socket) Lenovo ThinkStation C30. I had to go back to my standby, max-capacity, full tower HP Z800 to perform the benchmarking.

Furthermore, this card—or a derivative with the video I/O stripped out — will no doubt end up being marketed to the HPC and supercomputer crowd as well. Supporting OpenCL at 2.62 TFLOPS and 50% of single precision throughput, the W9100’s double-precision rate is one of the card’s most impressive traits, yet it’s virtually meaningless in conventional graphics markets. Furthermore, it’s not easy to get that rate up, as it consumes a non-trivial amount of silicon cost. No, that double-precision rate was consciously designed to GPGPU applications, and not only in workstations but server-side HPC and supercomputing markets (also precisely the markets that really care a card won’t exceed 275 Watts).

How does it compare to the offerings of Nvidia, the company that’s currently commanding the bulk of the volume in this segment? Well, let’s assume the street price for the W9100 ends up being closer to $3,400 – $3,500 in short order—an assumption that appears safe, given both the current street price of its predecessor and the general pricing dynamics of the marketplace. Based on the relative street prices and other submitted benchmarks at SPEC.org, the W9100’s Viewperf 12 performance appears to sit sensibly between two of rival Nvidia’s offerings: slower than the more expensive K6000 (street price around $4,900) but faster than the less expensive Quadro K5000 (street price around $1,850).

The bottom line is this: if price is one of your major criteria, this isn’t a card to consider, regardless. But if you’re a high-demand user dealing with large, complex datasets, supporting projects with huge dollars in the balance, you’ll want to take a long hard look at the FirePro W9100.

Alex Herrera is a senior analyst for Jon Peddie Research.