Meta’s latest foray into AI image generation is a quick one. The company introduced its new “3D Gen” model on Tuesday, a “state-of-the-art, fast pipeline” for transforming input text into high-fidelity 3D images that can output them in under a minute. What’s more, the system is reportedly able to apply new textures and skins to both generated and artist-produced images using text prompts.
According to research from the Meta Gen AI team, 3D Gen offers high-resolution textures and material maps, supporting physically-based rendering (PBR) and generative re-texturing capabilities. The team estimates an average inference time of just 30 seconds for creating the initial 3D model using Meta’s 3D AssetGen model.
Users can then refine the existing model texture or replace it with a new one, both via text prompts, using Meta 3D TextureGen. This process takes an additional 20 seconds of inference time. “By combining their strengths,” the team wrote in its study abstract, “3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space.”
Meta compared its 3D Gen model against industry baselines, assessing factors like text prompt fidelity, visual quality, texture details, and artifacts. The integrated two-stage process, combining the functions of both models, generated images that were favored by annotators over single-stage counterparts 68% of the time.
While the system is still under development and not yet available to the public, the technical advances demonstrated in this study have the potential to transform creative fields like game and film effects, and VR applications. By allowing users to create and edit 3D-generated content quickly and intuitively, the barrier to entry for such pursuits could be significantly lowered. The impact on game development, for instance, could be substantial.