This week Intel’s research division has been showcasing new technologies at the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR). Intel Labs, partnering with Blockade Labs, has recently launched a unique diffusion model named Latent Diffusion Model for 3D (LDM3D). This innovative generative artificial intelligence (AI) diffusion model is designed to generate realistic 3D visual content from text prompts. Check out the VR demo below.
“This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. “
Pioneering in its field, LDM3D is the first model capable of generating a depth map using the diffusion process, resulting in vivid and immersive 3D images with a complete 360-degree view. The potential uses for LDM3D span a variety of industries, including gaming, entertainment, architecture, and design, and it’s poised to dramatically change the landscape of content creation and digital experiences.
“”Generative AI technology aims to further augment and enhance human creativity and save time. However, most of today’s generative AI models are limited to generating 2D images and only very few can generate 3D images from text prompts.
Unlike existing latent stable diffusion models, LDM3D allows users to generate an image and a depth map from a given text prompt using almost the same number of parameters. It provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation and saves developers significant time to develop scenes,” said Vasudev Lal, AI/ML research scientist, Intel Labs.
Generative AI technologies aim to bolster and amplify human creativity while saving valuable time. However, current generative AI models primarily generate 2D images, with only a handful capable of producing 3D images from text prompts.
LDM3D diverges from the norm by enabling users to generate an image and a depth map from a given text prompt using almost an identical number of parameters as latent stable diffusion models. This approach delivers more precise relative depth for each pixel in an image compared to standard post-processing techniques for depth estimation, thus significantly reducing the time developers spend on scene development.
360 images from text prompts
The potential impact of this research is far-reaching, promising to transform the way we interact with digital content. By allowing users to visualize their text prompts in entirely new ways, LDM3D enables the transformation of text descriptions of a tropical beach, a modern skyscraper, or a sci-fi universe into a detailed 360-degree panorama.
This capability to capture depth information can drastically enhance realism and immersion, opening up new applications for a wide range of industries, from gaming and entertainment to interior design and real estate listings, as well as virtual museums and immersive virtual reality (VR) experiences.
To construct a dataset for training LDM3D, a subset of 10,000 samples from the LAION-400M database, comprising over 400 million image-caption pairs, was used. The Dense Prediction Transformer (DPT) large-depth estimation model, previously developed at Intel Labs, was used to annotate the training corpus. The DPT-large model provides highly accurate relative depth for each pixel in an image.
Source : Intel Labs
Latest Geeky Gadgets Deals
Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.