CVPR 2026 Highlight

MeshFlow : Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer

~1s parallel, quantization-free artistic mesh generation.

18x faster than AR-style mesh generation
Parallel flow transformer denoising
No quantization continuous vertex coordinates
Denoise process Generated mesh

Abstract

Fast artistic mesh generation, without token-by-token decoding.

We present MeshFlow, a method for generating artist-like 3D meshes with continuous geometry and explicit connectivity. Instead of autoregressively predicting discrete face tokens, MeshFlow learns a compact continuous latent space with MeshVAE and generates the latent mesh representation in parallel using a flow-based diffusion transformer. This avoids coordinate quantization, scales linearly with mesh size, and produces high-quality meshes for unconditional, point-cloud-conditioned, and image-conditioned generation.

Continuous mesh latent

Positions, normals, and edge embeddings are compressed without discretizing vertex coordinates.

Parallel generation

A flow-based transformer denoises all latent tokens together instead of decoding one token at a time.

Artist-like outputs

The decoded meshes keep explicit vertices and edges, making them suitable for downstream 3D workflows.

Method

MeshVAE makes meshes compact; flow matching makes generation fast.

MeshFlow compresses a continuous vertex-based mesh representation into compact MeshVAE latents, then uses a flow-based transformer to generate all latent tokens in parallel before decoding explicit geometry and connectivity.

Motivation

Vertices are more compact than faces.

A mesh with nv vertices is represented by nv continuous vectors. Because meshes usually have two to three times more faces than vertices, this representation is shorter than face-oriented tokenizers and avoids coordinate quantization.

Method overview

Parallel flow generation from continuous latents.

MeshFlow encodes positions, normals, and edge features into MeshVAE latents. A flow-based diffusion transformer denoises these latents together, then the decoder recovers vertices, normals, and mesh connectivity.

Detailed VAE

TokenMerge forms a more compact mesh latent.

Inspired by pixel shuffle, TokenMerge downsamples vertex tokens into fewer latent tokens. TokenSplit reverses this process, while attention blocks refine geometry, edge embeddings, and the validity mask.

VAE comparisons

Continuous codes avoid quantization loss.

MeshVAE reconstructs meshes in continuous space with only 512 latent codes. Compared with quantized tokenizers, it preserves fine geometry and topology with far fewer variables.

What matters

Compact latents for fast mesh generation.

We introduce a continuous mesh representation that keeps vertex positions, outward normals, and connectivity on the vertices. Inspired by SpaceMesh, adjacency is encoded with contrastively learned edge embeddings instead of discrete face tokens.

We design MeshVAE to map this representation into compact continuous latents. TokenMerge downsamples along vertices, reconstructs accurately with fewer variables, and avoids coordinate quantization.

A flow-based transformer generates the latent mesh in parallel rather than autoregressively decoding tokens. Inference scales linearly with mesh size and runs significantly faster than strong AR baselines.

Acknowledgements

We are deeply grateful to Minghao Chen, Jianyuan Wang, Zihang Lai, and Thu Nguyen-Phuoc for their discussions and support. We also thank the authors of related mesh generation works, including MeshGPT, PolyDiff, SpaceMesh, MeshCraft, FastMesh, LATTICE, and LATO. This project page is also greatly inspired by the VGGT-Omega project page; we sincerely thank them for their excellent work.

BibTeX

@inproceedings{li2026meshflow,
  author    = {Weiyu Li and Antoine Toisoul and Tom Monnier and Roman Shapovalov and Rakesh Ranjan and Ping Tan and Andrea Vedaldi},
  title     = {{MeshFlow}: Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer},
  booktitle = {Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year      = {2026},
  note      = {Highlight},
}