teaser 1
3D models rendered with our synthesized neural materials.
teaser 2
Scenes rendered with our synthesized neural materials produce rich visuals.

Abstract

High-quality material synthesis is essential for replicating complex surface properties to create realistic scenes. Despite advances in the generation of material appearance based on analytic models, the synthesis of real-world measured BRDFs remains largely unexplored. To address this challenge, we propose M3ashy, a novel multi-modal material synthesis framework based on hyperdiffusion. M3ashy enables high-quality reconstruction of complex real-world materials by leveraging neural fields as a compact continuous representation of BRDFs. Furthermore, our multi-modal conditional hyperdiffusion model allows for flexible material synthesis conditioned on material type, natural language descriptions, or reference images, providing greater user control over material generation. To support future research, we contribute two new material datasets and introduce two BRDF distributional metrics for more rigorous evaluation. We demonstrate the effectiveness of M3ashy through extensive experiments, including a novel statistics-based constrained synthesis, which enables the generation of materials of desired categories.

Dataset and base model

Our NeuMERL dataset are uploaded at AI community Hugging Face NeuMERL dataset, which contains 2400 BRDFs.
For material synthesis, the weights of the pre-trained base models are uploaded at Hugging Face Synthesis model weights.

Material synthesis framework

synthesis framework
Figure: an overview of M3ashy, our novel neural material synthesis framework, consisting of three main stages. 1 (top left): Data augmentation using RGB permutation and PCA interpolation to create an expanded dataset, AugMERL; 2 (middle): Neural field fitted to individual materials, resulting in NeuMERL, a dataset of neural material representations; 3 (bottom): Training a multi-modal conditional hyperdiffusion model on NeuMERL to enable conditional synthesis of high-quality, diverse materials guided by inputs such as material type, text descriptions, or reference images. We further propose a novel statistics-based constrained synthesis method to generate materials of a specified type (top right).

Novel metrics for material synthesis

The evaluation mainly focus on the fidelity and diversity of the synthesized materials, which is still an open problem in this field. To fill in the gap, we raise the idea of utilizing Minimum matching distance (MMD), Coverage (COV) and 1-nearest neighbor (1-NNA) with either the image quality metric or the BRDF distribution as underlying distance function.

Please refer to our paper about the details of the BRDF distributional metrics. We demonstrate the effectiveness of M3ashy through extensive experiments, including on this brand-new series of metrics.

Unconditional synthesis evaluation

uncond-vis
Figure: material synthesis across various pipelines. All baseline models fail to capture the underlying distribution effectively, resulting in meaningless samples with severe artefacts in the synthesized materials. In contrast, M3ashy successfully captures the complex neural material distribution, achieving significantly better fidelity and diversity. Our materials also support spatially varying rendering configurations (last three columns).

unconditional-synthesis-metric-score
Figure: qualitative evaluation of unconditional synthesis with metrics assessing generation fidelity and diversity. ↓ indicates that a lower score is better and ↑ indicates the opposite. M3ashy significantly outperforms all baseline models across these metrics, underscoring its effectiveness in neural material synthesis.

Multi-modal synthesis evaluation

We demonstrate the conditional synthesis capabilities of M3ashy across various modalities of input: material type, text description, or material images.

type-cond
Figure: type-conditioned synthesis.
The synthesized materials are diverse and closely align with the specified material type.


text-cond
Figure: text-conditioned synthesis.
M3ashy synthesizes materials aligning with the texts and generalizes to unseen inputs: “green metal”, “red plastic”, and “highly specular material”.


img-cond
Figure: image-conditioned synthesis.
In each of the six pairs, the left image is the conditioning input, while the right image is the synthesized result. M^3ashy effectively generates realistic materials that closely align with the conditioning images.

Constrained synthesis evaluation

We classify materials into seven categories based on their reflective properties: diffuse, metallic, low-specular, medium-specular, high-specular, plastic, and mirror, via a novel approach called constrained synthesis.
It complements our conditional pipeline by enforcing constraints on unconditionally synthesized samples, allowing for targeted material generation according to desired reflective characteristics. Please refer to our paper regarding the statistical constraints details.

constrained-syn-vis
Figure: synthesized materials of seven distinct categories using our novel constrained synthesis. Grounded in BRDF statistical analysis, this approach provides enhanced explainability and interpretability compared to standard conditional synthesis methods.

Ablation Study

To assess the impact of our augmented material dataset Aug-MERL, we further train our model on the original MERL dataset in the unconditional synthesis task. We report both quantitative and qualitative results for this model, presented in the “MERL100” columns. The results indicate that the model trained on Aug-MERL exhibits higher quality and diversity compared to the one trained on MERL, demonstrating the effectiveness of our augmented dataset in enhancing the synthesis pipeline.
Additionally, we conduct sparse BRDF reconstruction and BRDF compression experiments following a previous method (Gokbudak et al. 2023). For sparse reconstruction, we set the sample size to N = 4000, while for compression, we use a latent dimension of 40. In both experiments, we train the model on either the original MERL dataset or the AugMERL dataset. The results in the below table demonstrate that training on AugMERL consistently enhances material quality across all evaluated metrics, further validating the effectiveness of our augmented dataset.

Metric Sparse reconstruction Compression
MERL AugMERL MERL AugMERL
PSNR (↑) 32.2 36.3 45.2 48.3
Delta E (↓) 2.1 1.8 0.693 0.623
SSIM (↑) 0.972 0.983 0.994 0.994
Table: Quantitative comparison of training on MERL versus AugMERL in the sparse BRDF reconstruction and BRDF compression experiments. The results demonstrate that training on AugMERL consistently enhances performance across all metrics.

Citation

If you found the paper or code useful, please consider citing:
@inproceedings{
    M3ashy2026, 
    author = {Chenliang Zhou and Zheyuan Hu and Alejandro Sztrajman and Yancheng Cai and Yaru Liu and Cengiz Oztireli}, 
    title = {M$^{3}$ashy: Multi-Modal Material Synthesis via Hyperdiffusion}, 
    year = {2026}, 
    booktitle = {Proceedings of the 40th AAAI Conference on Artificial Intelligence}, 
    location = {Singapore}, 
    series = {AAAI'26} 
}


The website template was inspired by FrePolad.