Lucid takes GPU virtualization to the cloud and beyond

Intelligent real-time graphics flow can synchronize CPU and GPU output.

By Jon Peddie

Started in 2004 Lucid came out at CES 2011 and formally announced its GPU virtualization technology at the show. It then demonstrated it at CeBIT, in March 2011 and in April ODMs announced Virtu licenses (including Intel, Asus, Gigabyte, MSI, Asrock, Foxconn, Sapphire and more). Today the company boasts of over 60 motherboard design wins and more than 4 million motherboards shipped with Virtu licenses.

Virtu basically assigns graphics processing tasks dynamically to the best available graphics resource, and only turns on the power hungry GPU when needed. It operates in a general way like AMD’s Switched Graphics, or Nvidia’s Optimus, but Lucid is not GPU specific and can work with any combination of suppliers and generation.

Graphics pipeline redundancies  

In addition to resolving the disparity of GPUs and getting them to work together, Lucid has also developed intelligent real-time graphics pipeline flow that provides balance between visual quality, frame rates, interaction responsiveness that affect game immersion. The idea is to improve gaming performance, responsiveness and visual quality (notebook or desktop) by intelligently predicting redundant rendering tasks and accurately removing or replacing them.

3D Games are well accepted as the most demanding applications, both on the CPU and the GPU (and even memory subsystem in some cases). And the insatiable demand for ever more performance is never ending. Both CPUs and GPUs keep improving performance annually. It’s a requirement.

However, PCs are always connected to a display (e.g. CRT, LCD, Plasma, built-in, etc.) to provide the images. While CPUs and GPUs offer better performance and advancements every year, the display’s refresh rate has been fairly flat, at 60Hz (some get to75Hz) for many years, and with recent S3D displays at 120Hz. So, even if a GPU could render and image at 100 fps (in mono vision), 40% of that rendering would be thrown away because the display simply couldn’t accept it.

Lucid saw this disconnect as an opportunity to exploit its proprietary technology HyperFormance, and VirtualV-sync. Consider the following scenario. You have a GPU that can deliver 87 frames per second in some 3D game. And if the game’s VSync option is turned off then the frame rate will be as high as 87 FPS. Also, mouse and keyboard responsiveness would be at rate of 1-2 frames, meaning ~ 11.5ms-23 milliseconds. So the effective responsiveness of 11.5-23 milliseconds means you feel the game as if you are playing at 42-85 FPS. And yet, the images are not in sync with the refresh rate of the display. So what do you expect to see? The GPU is 45% faster than the refresh rate! That means a new image from the GPU will be ready for display, before the display has finished drawing the whole frame.

Enter XLR8

Having won the PC market and proving the concept of eliminating graphics pipeline redundancies between CPU, GPUs and displays to improve gaming responsiveness and visual quality, Lucid is now introducing XLR8. The XLR8 technology promises to improve performance of graphics in general (OpenGL, CAD/CAM) but more specifically gaming on PCs, phones and tablets by intelligently managing display synchronization and GPU performance.

That tears it

When more frames are sent to the display than the display can post and VSynch is off, there is the potential for part of one image (frame 1) getting partially overwritten with some part of frame number two, and that’s known as tearing, and is illustrated by the two Neytiri images below.

Tearing occurs when the graphics system generates images faster than the display can show them. (Source: Lucid)

Lucid sees the extra portion of a frame as a waste of GPU rendering time and keyboard/mouse responsiveness. VSync, or vertical synchronization, prevents the GPU from updating the screen image until the display has refreshed. There are also other techniques used by different vendors to enable higher frame rates with no tearing such as Triple-Buffering. While this improves visual tearing, it has an inherit latency.

There’s a complimentary scenario where the GPU can’t quite deliver 60 fps and so we see what is called frame drop (actually its frame freeze as the display waits for the next frame). The challenge therefore is detecting, predicting, removing, and replacing redundant rendering tasks, to enable the best visual quality with cleaner and smoother frames, and peak responsiveness. That means, providing a more immersive experience that can be both seen and felt.

Comparing CPU and GPU data flow. (Source: Lucid)

‘B’ (in the above diagram) alone may help visual quality and tearing, while ‘A’ will better utilize the GPU to work on only “important tasks” and free the CPU quicker—clearly ‘B’ affects ‘A’ and the ultimate solution to implement both ‘A’ and ‘B’ to work together. Yet, achieving the objective result in the above case, even if redundant rendering tasks are well managed, the gained benefits greatly diminishes due to the fact the GPU is also snooping, displaying and synchronizing to the display—instead of working on a possible more suitable task. Lucid thinks the above can be resolved in hybrid systems.

Consider a system with two (or more) graphics processors where the integrated GPU (iGPU) is connected to the screen. The last part of the pipeline is responsible for display tasks (back-front frame buffering). The data flow can be improved if the last part of the graphics pipeline is managed by the iGPU (which is connected directly to the display). In that configuration, the discrete GPU (dGPU) is independent of the display refresh rate and does not deal with the last part of the pipeline and the display. Therefore the dGPU is free to work on more suitable tasks while the iGPU deals with the last part of the pipeline and the display refresh rate issues. And yes, the two GPUs need to talk to each other.

So suffice it to say, for now, that Lucid has come up with a way to manage this dual data flow, snooping, and integration which they call Virtual Vsync—removal of image tearing artifacts, and HyperFormance—leveraging the dual pipelines of an iGPU and a dGPU.

Neytiri, straight to cloud with VG

If a modern, powerful GPU could run a game at 80, 90, even 200 FPS, wouldn’t it be interesting if you could serve multiple users different games (or even CAD apps) at 60 FPS from a single GPU? And if you could, a logical place for such a game server would be the cloud—or, maybe, a game parlor in China.

Lucid says their VGWare offers such an approach to cloud media processing and 3D gaming. It offers the potential of optimizing infrastructure cost for service providers, enables running multiple games over a cluster of GPUs, and should improving capacity utilization and optimization for performance. VGWare, claims Lucid, efficiently load balances maximum number of games per graphic processing unit.

Lucid says VGware will offer low latency (less than 2 frames in WLAN), optimal rendering & responsiveness, offer support for all DX9/10/11 games and run high end PC games, remotely.

Lucid claims VGware will enable multiple concurrent games on a Single GPU, is scalable to multi games/apps on multi GPU, and offers optimal resources utilization and reduced latency.

What do we think?

Lucid has been doing clever things with data flow management since they started. This latest wave of technologies demonstrates their understanding of the graphics pipeline and how to maximize it. The AIB suppliers aren’t going to like the multi-game/app usage of a board in the cloud somewhere, and until the game developers start to really make use of the power of the GPUs it shows how much head room is available—so why not exploit it until the game developers wake up?