Docomo and Arcturus team up to bring volumetric video to 5G mobile customers in Japan

5G brings new possibilities for story-telling. Arcturus and Docomo are building a CDN for volumetric streaming and the audience can become part of the story.

Last week, Japan telecom giant, NTT Docomo, held its annual open house to introduce its current & future technology and activities. In conjunction with this event, Docomo, and USA/Canadian startup, Arcturus Studio, announced their working collaboration to stream high-quality volumetric videos of any length directly to Docomo 5G mobile network customers. Volumetric videos are characterized by large and complex files that may include detailed 3D geometry and multiple camera angles. As a result, volumetric video has not been practical for anything beyond short clips. Now, however, Docomo is leveraging its 5G network along with Arcturus’ delivery platform, which includes their volumetric video content editing tool HoloSuite.

HoloSuite consists of two components, HoloEdit and HoloStream. HoloEdit is a collection of editing, compression, and playback tools that take advantage of cloud-based computing to process volumetric video footage. Once it’s processed, HoloStream is used to publish and stream the volumetric video to desktop and mobile devices over the internet. It includes adaptive bit-rate encoding to support the highest possible visual quality, whether you are streaming to 4G or 5G mobile networks.

3D character boxing in software interface — HoloSuite includes a plugin for Autodesk Maya that enables artists to make changes to physical performances in post-production (Source: Arcturus Studios)

If you’re new to volumetric video, here’s a summary, including what problems the Arcturus platform is addressing:

Volumetric Video Primer

Volumetric video is a technique for recording, editing, and distributing a performance in 3D space. What makes volumetric different from a traditional 2D video is that the viewer can change their viewing angle or position within this 3D space at any time during playback. Typically, volumetric footage is shot on a capture stage (see Microsoft’s Mixed Reality Capture Studio). A capture stage is a cylindrical area consisting of many cameras and 3D sensors, positioned uniformly to capture a performance from a complete 360-degree angle. Then, for every video frame, an extraction process begins its work to fuse this 3D footage into a single point cloud. Further computation transforms these point-clouds into polygonal meshes coupled with their associated texture files.

Editing and encoding

A post-production team works to reduce, edit, and encode the mesh and texture files. During this process, an artist may use familiar post-production pipeline tools, such as Autodesk Maya, Nuke from The Foundry, and Side Effects Houdini. These tools are not well suited for editing such large geometry meshes and textures out of the box. They force the artist to work frame-by-frame, and they lack the intelligence necessary to perform advanced compression techniques that consider both spatial and temporal redundancies. This results in more processing time and unnecessarily larger video file sizes. Arcturus wrestled with these problems during their early days as a volumetric production studio. They began developing HoloEdit to solve them and later pivoted to become a technology vendor so that other studios could take advantage of their technology. Instead of making time-consuming and painstaking edits frame-by-frame, HoloEdit has the intelligence to edit geometry over multiple frames. It can also identify redundancies in both time and space within a sequence to achieve much higher compression ratios than static polygon reduction.

Playback

Unlike conventional video streaming of pixels, volumetric videos are textured, lit polygons rendered within a web browser or game engine. They can be viewed on traditional computer monitors as well as AR and VR headsets. Most volumetric viewing applications today require a full download of the video before viewing it. This approach is not ideal because, large file sizes, the time and storage required on the device could negatively impact the user experience. Instead, Arcturus developed the Holostream platform to stream volumetric videos to a device using adaptive bit rate encoding that accounts for variances in network bandwidth. These videos are also cached locally using a CDN (content delivery network) to reduce latency further. All in all, the combination of adaptive streaming and local CDN caching enables a much better viewing experience.

Click on this image, which links to Arcturus’ webpage, to view and interact volumetric videos in your web browser (Source: Arcturus)

What do we think?

In the coming spatial era, volumetric has strong potential to replace 2D video streaming entirely. Producers and publishers from a broad spectrum of industries, including social media, advertising, concerts, games, film, and E-commerce, can take advantage of this next evolution in video formats to deliver 3D content to their audiences, including live streaming performances. Volumetric streaming is also a compelling use case for 5G networks, and thus it is no surprise that the telecoms, like Docomo, are leaning in. In March last year, Docomo had already launched 5G services in 29 of Japan’s 47 prefectures. In tandem, Arcturus has worked in volumetric since 2016, proving its technology in production before developing HoloSuite. Thus, the Docomo and Arcturus team are well suited to bring volumetric streaming to 5G mobile customers.

With the rollout of 5G underway, the next big challenge for volumetric is generating enough content at a reasonable cost. Volumetric video production is expensive, averaging around USD $10,000 for the processing of one day of shooting. Volumetric capture studios are also scarce. At the end of last year, there were just 55 volumetric capture studios in 34 different cities worldwide. This dearth of capture studios is not unlike the situation before Sony released its first consumer camcorder in 1983. Initially designed for television studios, video cameras were large and heavy. But over time, they became more compact and eventually made their way into the hands of consumers. With 3D capture technologies such as lidar making their way into phones and tablets, it is only a matter of time before the cost of capturing and producing a volumetric video becomes affordable.

Alex Kelley is a Senior Analyst at Jon Peddie Research and Vice-President at Garibaldi Capital Advisors (GCA), an investment bank to technology companies with offices in the USA and Canada. Alex specializes in Japan and Asia-Pacific management consulting to companies working in 3D computer graphics.