Nvidia’s latest research hits a new 3D creation note

Tunes up the process with an inverse rendering pipeline.

Nvidia Research is striking a chord with its work on an inverse rendering pipeline. (Source: Nvidia)

Nvidia is certainly getting into the swing of things at the Conference on Computer Vision and Pattern Recognition in New Orleans this week. At the conference, Nvidia Research is presenting a paper on what it calls 3D MoMa, an inverse rendering pipeline that turns up the volume on the 3D creation process, enabling creators to quickly generate and improvise quality 3D objects and scenes from 2D still images.

Indeed, creating 3D models from 2D imagery is not unique, but such methods, including the use of photogrammetry, for example, is extremely time-consuming and requires a lot of manual input. 3D MoMa, however, could make that process much faster and simpler in the future. Although it is in the research phase right now, Nvidia is excited to show it off. As they say, seeing is believing. So, what better way to accompany a paper on a new technology than with a snappy video that not only illustrates 3D MoMa’s capabilities, but in the process, also pays homage to the conference’s host city, New Orleans, with a jazz theme.

“By formulating every piece of the inverse rendering problem as a GPU-accelerated differentiable component, the Nvidia 3D MoMa rendering pipeline uses the machinery of modern AI and the raw computational horsepower of Nvidia GPUs to quickly produce 3D objects that creators can import, edit, and extend without limitation in existing tools,” said David Luebke, vice president of graphics research at Nvidia.

The initial step requires the creation of a 3D image from a 2D source, although the image needs to be compatible with 3D game engines or popular 3D modeling programs. This means a triangle mesh format that is easily editable. While recent work in neural radiance fields can also quickly generate a 3D representation of an object or scene, it does not result in a triangle mesh format that can be edited easily. 3D MoMa, however, uses an Nvidia Tensor Core GPU to generate the necessary mesh model in about an hour.

The 3D objects need to have triangle meshes that are easily edited. (Source: Nvidia)

The resulting 3D model can then be imported into a 3D graphics engine or popular 3D modeling software, and creators can then fine-tune their work for their desired project by modifying the scale of the object, changing the material, and modifying the lighting effects.

For the video demo, the Nvidia team gathered close to 100 different 2D images shown from various angles of five individual jazz instruments, including a trumpet, trombone, saxophone, drum set, and clarinet. These 2D images were reconstructed by 3D MoMa into 3D meshes, one of each instrument. The instruments were extracted from their original scenes and imported into Nvidia Omniverse, where they were then edited. Nvidia points out that using any traditional graphics engine, the material from a shape generated by 3D MoMa can be easily changed to a different material. In the video, Nvidia highlighted this capability by exchanging the original plastic material of the trumpet model to gold, marble, cork, and then wood.

The video also highlights the ability of the 3D models to react to light. This was done by inserting the edited objects into a Cornell box, a commonly used test to determine the rendering accuracy. The instruments handled the test soundly, with the brass objects reflecting the light and the matte drums absorbing the light appropriately.