Three years ago, we had maybe six or less AI accelerators, today there’s over two dozen, and more are coming.
One of the first commercially available AI training accelerators was the GPU, and the undisputed leader of that segment was Nvidia. Nvidia was already preeminent in machine learning (ML) and deep-learning (DL) applications and adding neural net acceleration was a logical and rather straight-forward step for the company.
Nvidia also brought a treasure-trove of applications with their GPUs based on the company’s proprietary development language CUDA. The company developed CUDA in 2006 and empowered hundreds of Universities to give courses on it. As a result, the thousands of computer science graduates every year came out of school knowing how and wanting to use CUDA. And they did, resulting in hundreds of applications based on CUDA.
With the explosive interest in AI, Nvidia extended the reach of CUDA by developing containers—NGC—Nvidia GPU-accelerated Containers. The company has developed containers for AI software like TensorFlow, PyTorch, MXNet, TensorRT, RAPIDS, and others.
But CUDA remains a closed, proprietary language and only runs efficiently on Nvidia GPUs. There are translation programs like AMD’s Boltzmann Initiative and HIP (Heterogeneous-compute Interface for Portability) for porting CUDA source code to a common C++ programming model.
Users prefer open software if they can get it. It’s one of the reasons Linux is so popular. Open software allows more choices of processors, tools, and libraries. Khronos has been developing SPIR to provide a platform and compiler independent open-source compiler and Intel has developed a special version of it, the company calls ISPC.
Edinburgh-based Codeplay has also contributed to the TensorFlow stack, enabling it with SYCL, allowing any AI program to run on any OpenCL-enabled CPU, GPU, custom AI accelerators, DSPs, or FPGA. Intel has likewise adopted this approach by recently announcing their OneAPI across their processors (GPU, AI, and FPGA) with SYCL inside.
Codeplay recently commissioned the Linley Group to write a white paper on it and you can find it here.
Codeplay makes extensive use of Khronos’ C++ single source heterogeneous language for OpenCL and SYCL programs (which we have discussed in TechWatch several times). The company offers a fully supported and optimized version of OpenCL as part of its ComputeAorta toolkit. This product actually supports both SPIR-V and Vulkan as well and runs on a variety of platforms including Android, Linux, and Windows on Arm, Mips, and x86 hardware.
The toolkit is modular, so customers can build in only the standards that they need. It’s built to use the LLVM compiler, so the code is easily ported to new architectures. Customers can license the source code to perform their own ports, or Codeplay can handle any porting and customization, simplifying the customer’s development.
In their white paper, Linley concludes:
Competing vendors have found it difficult to match Nvidia’s software. Lacking CUDA, they have typically tried to hand-code a basic set of operations for TensorFlow and other popular AI frameworks, but this approach leaves 90% of the functions unsupported. Because of this limited support, customer applications written for Nvidia often fail to compile on other hardware, or the performance is far slower than anticipated.
An open software ecosystem addresses these shortcomings. Instead of each vendor having to start from scratch, they can adopt open standards that provide access to existing code. For example, SYCL provides an open alternative to CUDA that already has broad industry support. Whereas single-vendor efforts support only a few dozen TensorFlow operations, SYCL supports more than 400, greatly improving compatibility and enabling customers to innovate on non-Nvidia platforms. To further ease software development, Codeplay can assist accelerator vendors as they shift to these open standards, providing fully supported and optimized implementations of SYCL and OpenCL and helping to port them to new platforms.
What do we think?
A proprietary system can seldom match the sheer weight of a well-supported open system with its multitude of developers from all types of industries and organizations. However, as clever as Codeplay is, and they are damn clever and as robust as Khronos’ programs and libraries are, neither is a match for Nvidia’s install base or amazing marketing machine. We don’t see a stampede of CUDA users swapping out their CUDA code to use OpenCL or any derivatives of it. But we can see organizations giving a second thought to new projects. CUDA will have a strong position in AI and other ML applications for the next decade, and Nvidia continues to invest in it, it is a very effective competitive vehicle. Nonetheless, we do expect Nvidia will lose market share to more open software and open source alternative, it’s just a natural occurrence, and one Nvidia is well aware of. However, Open software runs just as well on Nvidia GPUs as anyone else’s, and we’d expect Nvidia to develop compilers and drivers that will make it run a little better than the others. Codeplay and Khronos will be partners to Nvidia, not an adversary.