teaser

Abstract

Implicit neural representation (INR) has proven to be accurate and efficient in various domains. In this work, we explore how different neural networks can be designed as a new texture INR, which operates in a continuous manner rather than a discrete one over the input UV coordinate space. Through thorough experiments, we demonstrate that these INRs perform well in terms of image quality, with considerable memory usage and rendering inference time. We analyze the balance between these objectives. In addition, we investigate various related applications in real-time rendering and down-stream tasks, e.g. mipmap fitting and INR-space generation.

Introduction

Implicit Neural Representations (INRs) are effective by using neural networks to directly represent continuous, coordinate-based signals (such as images, shapes, audio, or physical fields) rather than discrete grids or explicit parametrizations. Examples are multilayer perceptron (MLP) and convolutional neural network (CNN).

Textures are one of the largest GPU memory usage and account for the majority of the energy consumed by processors. Thus, INR presents a memory- and power-efficient method, while maintaining high rendering quality. Furthermore, the compact form also helps with downstream tasks, including material generation and asset baking.

A tiny multilayer perceptron (MLP) or other instant neural models designed for graphics primitives will take as input the UV coordinates and output the corresponding RGB color. The optimized network weights will form a compressed representation of textures.

INR_texture_v0
For a single data sample, we visualize the training loss function, reconstructed texture, and residual error in this Figure.
The contributions of our work are as follows,
(1) implement four different INRs of textures, evaluating three of them in terms of performance, efficiency and memory usage;
(2) integrate INRs into Mitsuba 3, a customizable Python / C++ renderer, by extending the plugin that reconstructs from MLP weights during rendering,
(3) explore downstream tasks, including mipmap fitting and INR weight-space generation.

Texture dataset

Describable Textures Dataset (DTD) consists of 47 different categories from human perception and a total of 5640 images. From which, we perform analysis and select the most diverse samples based on metrics.

To filter out the most diverse images I(u, v) in the frequency domain, we utilize the Laplacian Variance (LAPV), also known as Laplacian response, as a measure of focus or sharpness. While different image complexity metrics exist, e.g. compression ratio, image entropy, and edge density, we use LAPV due to its correlation with perceived quality.

The dataset Laplacian response histogram is shown in the below Figure, from which we sampled 25 images at regular intervals and visualized them in the following Figure.

laplacian_response_histogram
Laplacian response histogram with its Cumulative Distribution Function (red curve).
sampled_textures_grid
Sampled textures with their Laplacian response.

Methodology

We include a summary of the main aspects of our methodology in the figure and table below. For more details, please refer to our paper.

INR Architectures

MLP architecture
Aspect Evaluated Variants
MLP Architectures (1) Plain MLP (no positional encoding)
(2) SIREN (sinusoidal activation)
(3) MLP with Fourier positional encoding
Network Depth 1, 2, or 3 hidden layers
Network Width 128, 256, or 512 neurons per hidden layer (except 3 × 512)
Positional Encoding None or Fourier feature encoding
Activation Function ReLU (plain MLP, Fourier MLP), sin (SIREN)
Optimizers Adam, Rprop
Evaluation Objective Performance comparison versus model bitrate
For more details about mipmap fitting and INR-space generation, please refer to our paper.

Experiments and Results

For evaluation metrics, please refer to our paper.

Single texture performance evaluation

The below Figure shows the plots for the performance for the INRs against the bitrate of the model, for each model architecture. To reduce noise, this plot is bucketed by rounding the bitrate to the nearest 2 bits. This was done because the images in the dataset have differing sizes, and therefore the results have a large variety of bitrates, which produces visual noise when plotted. For LPIPS, both the SIRENs and the Fourier MLPs performed very well, being very close to the maximum similarity to the original of 0, though the SIRENs had a noticeable drop in performance for very high compression. On the other hand, the pure MLP clearly struggles to meet the performance of the other two. This shows that the INRs effectively captured key features of the image, effectively capturing the high-level structure of the image.

The SSIM, VMAF, and PSNR of the INRs are more mixed, with a noticeable difference to the original. Across both metrics, the best performing INRs are the Fourier MLPs, followed by the SIRENs. All of the models have broadly similar curves, implying that the models all struggle to accurately learn similar features. These results imply that the INRs may struggle to accurately capture details. This could be improved by adjusting the frequency parameters of the Fourier MLPs and the SIRENs.

In addition, we also compare these INR textures with traditional texture compression methods, including ASTC. Please check our paper for details.

Performance vs. Bitrate among models

Comparison of Adam and Rprop

Overall, Adam produced notably better and more consistent results than Rprop. The difference can be seen visually in the learning snapshots presented in Figures. Across all three examples, the Rprop is clearly blurrier, with some of the images having additional artefacts from the learning process, such as the rectangular structures as a result of the high frequencies of the Fourier encoding.

It is worth noting that the given image for the pure MLPs represents an unusually large difference in performance between Adam and Rprop: for most of the images learned by the pure MLP neither optimiser made significant progress to accurately learning the image.

Conclusion: Adam has demonstrated itself to be the better optimiser across the board. Whilst it is possible that hyper parameter tuning could improve Rprop's performance, we doubt that it will improve performance enough to consistently and significantly outperform Adam.

Adam vs. Rprop
For more details results and analysis on mipmap fitting and INR-space generation, please refer to our paper.

Conclusion

In this work, we present how different INRs for textures can be built, with evaluation over their performance, efficiency, memory usage and complexity. In addition, we integrate INR in graphics pipeline, and different down-stream tasks. We have shown that perceptually, INRs can be very effective at image compression, though can struggle on exact details. Despite this, INRs can outperform classical compression methods across all metrics for similar bitrates. We have shown that whilst simple MLPs can achieve good results, Fourier encoding and SIRENs provide significant improvement. We have also shown the importance of hyper-parameter tuning, which can significantly alter the performance for different image types.

Future work

This paper has demonstrated that for reliable INR compression are consistent process is needed for selecting hyper-parameters. Notably, due to time and computation constraints, we were unable to tune the frequency values used by the SIRENs and Fourier MLPs, which could had significant impacts on the performance. For INR compression to be fast and reliable, an efficient way to select these hyper-parameters should be developed. Future work could explore the selection of hyper-parameters, either through traditional image processing techniques such as the Fourier Transform, or machine learning techniques such as training a model to select the best hyper parameters.

In addition to simply learning the image's pixels, the model could be trained on a greater variety of randomly sampled points, which may prevent grid-related artefacts when training.

This paper has also demonstrated the potential of encoding multiple images into one image, in this case through mipmaps without significantly impacting quality. This suggests that even higher compression ratios might be possible when compressing multiple similar images together, which could be used when compressing image libraries, or animations.

The paper's exploration of mipmapping could also be taken further. Effective compression for anisotropic filtering could allow the technique to use less GPU RAM. INRs could also provide novel opportunities in perfectly modelling the filtering, taking in the viewing angle and lod, and outputting the colour averaged over the pixel area.

In addition to better representing filtering, INRs could also efficiently represent Spatially Varying BiDirectional Scattering Distribution Function (SVBSDF), allow for more expressive materials to be efficiently stored in memory.

Acknowledgements

We are deeply grateful to our supervisor, Dounia Hammou, for her insightful guidance and detailed suggestions throughout this project and in the preparation of this report.

Special thanks to Professor Rafał Mantiuk for lecturing and organizing the Advanced Graphics and Image Processing module, which has been enjoyable along the way.

Citation

If you found the paper or code useful, please consider citing:
@misc {KH2026INR-Tex,
      title={Implicit neural representation of textures}, 
      author={Albert Kwok and Zheyuan Hu and Dounia Hammou},
      year={2026},
      eprint={2602.02354},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.02354}, 
}


The website template was inspired by M3ashy.