CVPR 2026 Highlight
MeshFlow : Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer
~1s parallel, quantization-free artistic mesh generation.
Abstract
Fast artistic mesh generation, without token-by-token decoding.
We present MeshFlow, a method for generating artist-like 3D meshes with continuous geometry and explicit connectivity. Instead of autoregressively predicting discrete face tokens, MeshFlow learns a compact continuous latent space with MeshVAE and generates the latent mesh representation in parallel using a flow-based diffusion transformer. This avoids coordinate quantization, scales linearly with mesh size, and produces high-quality meshes for unconditional, point-cloud-conditioned, and image-conditioned generation.
Continuous mesh latent
Positions, normals, and edge embeddings are compressed without discretizing vertex coordinates.
Parallel generation
A flow-based transformer denoises all latent tokens together instead of decoding one token at a time.
Artist-like outputs
The decoded meshes keep explicit vertices and edges, making them suitable for downstream 3D workflows.
Gallery
Parallel generation turns noisy geometry into clean artist-like meshes.
Method
MeshVAE makes meshes compact; flow matching makes generation fast.
MeshFlow compresses a continuous vertex-based mesh representation into compact MeshVAE latents, then uses a flow-based transformer to generate all latent tokens in parallel before decoding explicit geometry and connectivity.
Vertices are more compact than faces.
A mesh with nv vertices is represented by nv continuous vectors. Because meshes usually have two to three times more faces than vertices, this representation is shorter than face-oriented tokenizers and avoids coordinate quantization.
Parallel flow generation from continuous latents.
MeshFlow encodes positions, normals, and edge features into MeshVAE latents. A flow-based diffusion transformer denoises these latents together, then the decoder recovers vertices, normals, and mesh connectivity.
TokenMerge forms a more compact mesh latent.
Inspired by pixel shuffle, TokenMerge downsamples vertex tokens into fewer latent tokens. TokenSplit reverses this process, while attention blocks refine geometry, edge embeddings, and the validity mask.
Continuous codes avoid quantization loss.
MeshVAE reconstructs meshes in continuous space with only 512 latent codes. Compared with quantized tokenizers, it preserves fine geometry and topology with far fewer variables.
What matters
Compact latents for fast mesh generation.
We introduce a continuous mesh representation that keeps vertex positions, outward normals, and connectivity on the vertices. Inspired by SpaceMesh, adjacency is encoded with contrastively learned edge embeddings instead of discrete face tokens.
We design MeshVAE to map this representation into compact continuous latents. TokenMerge downsamples along vertices, reconstructs accurately with fewer variables, and avoids coordinate quantization.
A flow-based transformer generates the latent mesh in parallel rather than autoregressively decoding tokens. Inference scales linearly with mesh size and runs significantly faster than strong AR baselines.
Acknowledgements
We are deeply grateful to Minghao Chen, Jianyuan Wang, Zihang Lai, and Thu Nguyen-Phuoc for their discussions and support. We also thank the authors of related mesh generation works, including MeshGPT, PolyDiff, SpaceMesh, MeshCraft, FastMesh, LATTICE, and LATO. This project page is also greatly inspired by the VGGT-Omega project page; we sincerely thank them for their excellent work.
BibTeX
@inproceedings{li2026meshflow,
author = {Weiyu Li and Antoine Toisoul and Tom Monnier and Roman Shapovalov and Rakesh Ranjan and Ping Tan and Andrea Vedaldi},
title = {{MeshFlow}: Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer},
booktitle = {Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition ({CVPR})},
year = {2026},
note = {Highlight},
}
