Stochastic Additively Preconditioned Trust-Region Strategies for Distributed Neural Network Training
Facoltà di scienze informatiche - Segreterie degli studi
Data: 16 dicembre 2025 / 16:00 - 19:00
USI East Campus, Room D1.13
You are cordially invited to attend the PhD Dissertation Defence of Samuel Adolfo Cruz Alegria on Tuesday 16 December 2025 at 16:00 in room D1.13.
Abstract:
Training large-scale neural networks is computationally demanding, particularly when hyperparameter tuning is required for first-order optimization methods such as stochastic gradient descent and Adam. Domain decomposition methods from scientific computing offer a framework for distributed computation. Among them, additive domain decomposition methods enable fully parallel processing. This thesis investigates the stochastic additively preconditioned trust-region strategy (SAPTS), which combines domain decomposition with trust-region optimization to reduce hyperparameter sensitivity. We formulate three SAPTS variants for neural network training: one for data parallelism and two for parameter-space decomposition. We implement these algorithms in PyTorch and evaluate their performance on three distinct problem classes: physics-informed neural networks solving partial differential equations, image classification on MNIST and CIFAR-10, and language modelling on sequential text data. Our empirical evaluation characterizes the convergence behaviour, computational efficiency, and scalability properties of each variant relative to SGD and Adam baselines. Results indicate that SAPTS achieves competitive performance with minimal hyperparameter tuning on physics-informed problems. For image classification, SAPTS performs well on MNIST while showing slightly higher loss on CIFAR-10. Language modelling tasks reveal performance gaps compared to tuned first-order methods, suggesting that sequential data structures present challenges for domain decomposition approaches. We conclude that SAPTS is particularly well-suited to physics-informed applications and domains where hyperparameter tuning constraints are significant. Our findings further suggest that SAPTS could serve as an effective pre-training method, where the reduced hyperparameter tuning requirements may justify the higher per-iteration computational cost during an initial training phase before standard fine-tuning.
Dissertation Committee:
- Prof. Rolf Krause, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Alena Kopanicakova, Università della Svizzera italiana, Switzerland (Research co-Advisor)
- Prof. Cesare Alippi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Michael Multerer, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Eric Cyr, Sandia National Laboratories, USA (External Member)
- Prof. Alexander Heinlein, TU Delft DIAM, Netherlands (External Member)