MICROSOFT

Microsoft's TRELLIS: A high-quality 3D asset generation model

December 15, 2024

Microsoft has recently proposed a generative method for creating high-quality 3D assets, based on a unified Structured LATent (SLAT) representation and Rectified Flow Transformers, achieving flexible and efficient 3D generation.

Core of the paper

Unified Structured LATent Representation (SLAT)：

SLAT combines sparse 3D meshes with dense multi-view features extracted from vision foundation models.
Captures geometric structure and textural information, supporting multiple decoding formats including Radiance Fields, 3D Gaussians, and Meshes.
Provides flexible decoding capabilities to output diverse 3D formats according to different needs.

Powerful generative model architecture：

Uses a Rectified Flow Transformer specifically designed for SLAT as the core model.
Trained on a large-scale dataset of 3D assets containing over 500,000 diverse objects, with a parameter scale reaching up to 2 billion.

Flexible generation and editing capabilities：

Supports generating high-quality 3D assets through text or image inputs, significantly outperforming existing methods.
Provides flexible output format options and local 3D editing functions, which were previously unavailable in other models.

Innovative application scenarios：

Generated 3D assets can be used for complex artistic designs, asset variant generation, and precise manipulation of local areas.

Key features and demonstrations

Text-to-3D asset generation

Image-to-3D asset generation

Asset variant generation

Local area manipulation

Method overview: SLAT and TRELLIS

Structured LATent Representation (SLAT)

SLAT combines sparse structures with visual representations:

Defines local latent variables on active voxels intersecting the object surface.
Combines dense multi-view rendering image features generated by powerful pre-trained visual encoders.
Active voxels provide coarse geometry, while visual features capture fine geometry and texture details.

TRELLIS model architecture

Two-stage generation pipeline：

Generates the sparse structure of SLAT.
Generates latent variables for non-empty cells.

Rectified Flow Transformer：

Adapts to SLAT sparsity and serves as the backbone model.

Multi-format output and editing：

Maps SLAT into high-quality 3D representations through different decoders to meet diverse requirements.

Applications

I tried it on HuggingFace, and the results are decent. However, for commercial use, the controllability still falls short.

ABOUT THE AUTHOR

Renee's Entrepreneurial JourneyEssay Editor

This is my little corner of the internet where I share thoughts, ideas, and interesting stuff I come across in the world of AI. Things in this field move fast, and I use this space to slow down a bit—to reflect, explore, and hopefully spark some good conversations.

The "paradigm shift" of AI and the technical value stack

AGI

LLM

GOOGLE

Trial of Google's video generation model VOE2

GOOGLEMarch 23, 2025

Gemini 2.5 Pro, claimed to be far ahead of the competition, has been released with great fanfare: comprehensively surpassing other LLMs and topping the global rankings

GOOGLEMarch 26, 2025

AI-Researcher: LLM-driven全自动 scientific research assistant

GOOGLEMarch 30, 2025

MICROSOFT

Microsoft's TRELLIS: A high-quality 3D asset generation model

December 15, 2024

Core of the paper

Unified Structured LATent Representation (SLAT)：

SLAT combines sparse 3D meshes with dense multi-view features extracted from vision foundation models.
Captures geometric structure and textural information, supporting multiple decoding formats including Radiance Fields, 3D Gaussians, and Meshes.
Provides flexible decoding capabilities to output diverse 3D formats according to different needs.

Powerful generative model architecture：

Uses a Rectified Flow Transformer specifically designed for SLAT as the core model.
Trained on a large-scale dataset of 3D assets containing over 500,000 diverse objects, with a parameter scale reaching up to 2 billion.

Flexible generation and editing capabilities：

Supports generating high-quality 3D assets through text or image inputs, significantly outperforming existing methods.
Provides flexible output format options and local 3D editing functions, which were previously unavailable in other models.

Innovative application scenarios：

Generated 3D assets can be used for complex artistic designs, asset variant generation, and precise manipulation of local areas.

Key features and demonstrations

Text-to-3D asset generation

Image-to-3D asset generation

Asset variant generation

Local area manipulation

Method overview: SLAT and TRELLIS

Structured LATent Representation (SLAT)

SLAT combines sparse structures with visual representations:

Defines local latent variables on active voxels intersecting the object surface.
Combines dense multi-view rendering image features generated by powerful pre-trained visual encoders.
Active voxels provide coarse geometry, while visual features capture fine geometry and texture details.

TRELLIS model architecture

Two-stage generation pipeline：

Generates the sparse structure of SLAT.
Generates latent variables for non-empty cells.

Rectified Flow Transformer：

Adapts to SLAT sparsity and serves as the backbone model.

Multi-format output and editing：

Maps SLAT into high-quality 3D representations through different decoders to meet diverse requirements.

Applications

I tried it on HuggingFace, and the results are decent. However, for commercial use, the controllability still falls short.

ABOUT THE AUTHOR

Renee's Entrepreneurial Journey

Essay Editor

LLM

GOOGLE

Trial of Google's video generation model VOE2

GOOGLEMarch 23, 2025

Gemini 2.5 Pro, claimed to be far ahead of the competition, has been released with great fanfare: comprehensively surpassing other LLMs and topping the global rankings

GOOGLEMarch 26, 2025

AI-Researcher: LLM-driven全自动 scientific research assistant

GOOGLEMarch 30, 2025

Microsoft's TRELLIS: A high-quality 3D asset generation model

Core of the paper

Key features and demonstrations

Text-to-3D asset generation

Image-to-3D asset generation

Asset variant generation

Local area manipulation

Method overview: SLAT and TRELLIS

Structured LATent Representation (SLAT)

TRELLIS model architecture

Applications

ABOUT THE AUTHOR

RELATED

The "paradigm shift" of AI and the technical value stack

GPTs Experience (Upper Chapter)

The Smallville sandbox world - A town with 25 virtual residents

Langchain reads PDFs (Part 2)

【A Brief History of Intelligence】6. Speaking Language (Human)

POPULAR

LLM

GOOGLE

Microsoft's TRELLIS: A high-quality 3D asset generation model

Core of the paper

Key features and demonstrations

Text-to-3D asset generation

Image-to-3D asset generation

Asset variant generation

Local area manipulation

Method overview: SLAT and TRELLIS

Structured LATent Representation (SLAT)

TRELLIS model architecture

Applications

ABOUT THE AUTHOR

POPULAR

AI TOOLS

RELATED

The "paradigm shift" of AI and the technical value stack

GPTs Experience (Upper Chapter)

The Smallville sandbox world - A town with 25 virtual residents

Langchain reads PDFs (Part 2)

LLM

GOOGLE