AniPortrait - Audio-driven Realistic Portrait Animation Synthesis Technology

March 30, 2024

Today, let's take a look at Tencent's paper this week —— AniPortrait, an audio-driven realistic portrait animation synthesis technology.

AniPortrait aims to generate high-quality animations using audio and reference portrait images.

The framework works in two stages.

It extracts 3D facial meshes and head poses from audio information, then projects these two elements into 2D keypoints.
A diffusion model is used to transform these 2D keypoints into continuous portrait videos. These two stages are trained simultaneously within our framework.

The experimental results show the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, providing viewers with an enhanced perceptual experience; and demonstrating great potential in flexibility and controllability, making it very suitable for applications in areas such as facial motion editing or facial reenactment.

Diversified generation video showcase

Self driven
Face reenacment
Audio driven

ABOUT THE AUTHOR

Renee's Entrepreneurial JourneyEssay Editor

This is my little corner of the internet where I share thoughts, ideas, and interesting stuff I come across in the world of AI. Things in this field move fast, and I use this space to slow down a bit—to reflect, explore, and hopefully spark some good conversations.

OPENAI

OpenAI Deep Research: Intelligent research assistant launched

February 3, 2025

ANTHROPIC

Part 2 of Anthropic CEO Dario's AI article - Doubling the human lifespan to 150 years, potentially breaking through the "escape velocity" to achieve immortality

October 14, 2024

OXFORD

Agentic Reasoning Framework - Significantly enhance the reasoning ability of LLMs through the integration of external tools using agents

March 9, 2025

OPENAI

Advanced application capabilities of ChatGPT in the visual field - Level 1

October 11, 2023

"The 2024 Artificial Intelligence Index Report" - 2.11 Characteristics of LLM

May 5, 2024

LLM

GOOGLE

Trial of Google's video generation model VOE2

GOOGLEMarch 23, 2025

Gemini 2.5 Pro, claimed to be far ahead of the competition, has been released with great fanfare: comprehensively surpassing other LLMs and topping the global rankings

GOOGLEMarch 26, 2025

AI-Researcher: LLM-driven全自动 scientific research assistant

GOOGLEMarch 30, 2025

AniPortrait - Audio-driven Realistic Portrait Animation Synthesis Technology

March 30, 2024

Today, let's take a look at Tencent's paper this week —— AniPortrait, an audio-driven realistic portrait animation synthesis technology.

AniPortrait aims to generate high-quality animations using audio and reference portrait images.

The framework works in two stages.

It extracts 3D facial meshes and head poses from audio information, then projects these two elements into 2D keypoints.
A diffusion model is used to transform these 2D keypoints into continuous portrait videos. These two stages are trained simultaneously within our framework.

Diversified generation video showcase

Self driven
Face reenacment
Audio driven

ABOUT THE AUTHOR

Renee's Entrepreneurial Journey

Essay Editor

LLM

GOOGLE

Trial of Google's video generation model VOE2

GOOGLEMarch 23, 2025

Gemini 2.5 Pro, claimed to be far ahead of the competition, has been released with great fanfare: comprehensively surpassing other LLMs and topping the global rankings

GOOGLEMarch 26, 2025

AI-Researcher: LLM-driven全自动 scientific research assistant

GOOGLEMarch 30, 2025

AniPortrait - Audio-driven Realistic Portrait Animation Synthesis Technology

The framework works in two stages.

Diversified generation video showcase

ABOUT THE AUTHOR

RELATED

OpenAI Deep Research: Intelligent research assistant launched

Part 2 of Anthropic CEO Dario's AI article - Doubling the human lifespan to 150 years, potentially breaking through the "escape velocity" to achieve immortality

Agentic Reasoning Framework - Significantly enhance the reasoning ability of LLMs through the integration of external tools using agents

Advanced application capabilities of ChatGPT in the visual field - Level 1

"The 2024 Artificial Intelligence Index Report" - 2.11 Characteristics of LLM

POPULAR

LLM

GOOGLE

AniPortrait - Audio-driven Realistic Portrait Animation Synthesis Technology

The framework works in two stages.

Diversified generation video showcase

ABOUT THE AUTHOR

POPULAR

AI TOOLS

RELATED

OpenAI Deep Research: Intelligent research assistant launched

Part 2 of Anthropic CEO Dario's AI article - Doubling the human lifespan to 150 years, potentially breaking through the "escape velocity" to achieve immortality

Agentic Reasoning Framework - Significantly enhance the reasoning ability of LLMs through the integration of external tools using agents

Advanced application capabilities of ChatGPT in the visual field - Level 1

LLM

GOOGLE