PhD defenses at the Faculty of Informatics

New Architectures for Very Deep Learning

You are cordially invited to attend the PhD Dissertation Defense of Rupesh Kumar SRIVASTAVA on Thursday, February 1st 2018 at 14h30 in room SI-006 (Informatics building)



Artificial Neural Networks are increasingly being used in complex real-world applications because many-layered (i.e., deep) architectures can now be trained on large quantities of data. However, training even deeper, and therefore more powerful networks, has hit a barrier due to fundamental limitations in the design of existing networks. This thesis develops new architectures that, for the first time, allow very deep networks to be optimized efficiently and reliably. Specifically, it addresses two key issues that hamper credit assignment in neural networks: cross-pattern interference and vanishing gradients.

Cross-pattern interference leads to oscillations of the network's weights that make training inefficient. The proposed Local Winner-Take-All networks reduce interference among computation units in the same layer through local competition. An in-depth analysis of locally competitive networks provides generalizable insights and reveals unifying properties that improve credit assignment.

As network depth increases, vanishing gradients make a network's outputs increasingly insensitive to the weights close to the inputs, causing the failure of gradient-based training. To overcome this limitation, the proposed Highway networks regulate information flow across layers through additional skip connections which are modulated by learned computation units. Their beneficial properties are extended to the sequential domain with Recurrent Highway Networks that gain from increased depth and learn complex sequential transitions without requiring more parameters.


Dissertation Committee:

  • Prof. Jürgen Schmidhuber, Università della Svizzera italiana/IDSIA, Switzerland (Research Advisor)
  • Prof. Michael Bronstein, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Antonio Carzaniga, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Sepp Hochreiter, Johannes Kepler University Linz, Austria (External Member)
  • Prof. Ruslan Salakhutdinov, Carnegie Mellon University, USA (External Member)