LATTICE: Efficient In-Memory DNN Model Versioning Conference

Saha, MP, Ghosh, A, Rangaswami, R et al. (2025). LATTICE: Efficient In-Memory DNN Model Versioning . 16-29. 10.1145/3757347.3759139

cited authors

  • Saha, MP; Ghosh, A; Rangaswami, R; Wu, Y; Bhimani, J

abstract

  • DNN model versions are used for various tasks such as fine-tuning for downstream tasks, explainability, and debugging. Numerous checkpointing solutions exist that can be adapted to persist intermediate versions of a model, as it is being trained, at different storage locations. Additionally, version management tools allow us to log, visualize, compare, and query metadata related to ML, tracking changes made to previously built models. However, the version creation process of existing methods incurs high runtime and storage overheads. In this paper, we introduce LATTICE, a low-latency, direct persistence-based DNN versioning library for Non-Volatile Memory (NVM) expansion devices. LATTICE minimizes stalls during model versioning and reduces end-to-end versioning time by reorganizing the version creation workflow, streamlining memory allocation and deallocation for efficient snapshot creation, and leveraging multi-threaded parallelism. We also develop a user-friendly versioning API that transparently implements direct persistence. Our comprehensive evaluation with diverse DNN models shows that LATTICE can reduce persistence time by as much as 99.99%, decrease end-to-end versioning time by up to 72%, reduce versioning stalls by up to 35%, and increase versioning frequency by 0.2×-3.84× compared to state-of-the-art solutions. LATTICE also reduces space utilization for different workloads. The space savings are from 23.8% to 43.2% for workloads where model layers are progressively frozen and from 84.8% to 98.9% for fine-tuning workloads where only the last layers are tuned.

publication date

  • September 8, 2025

Digital Object Identifier (DOI)

start page

  • 16

end page

  • 29