Our research focuses on the co-design of algorithms and hardware for better understanding, search, and reduced complexity training of deep neural networks for machine learning applications.


Neural networks in machine learning are critical drivers of new technologies such as automatic speech recognition, self-driving cars, and computer vision.

Unfortunately, training neural networks is computationally intensive and can take weeks, which severely limits efficient search for network models and hyperparameters. As more data becomes available, the problem is further exacerbated because larger, more effective models become desirable.

The ultimate goal of our research is to democratize and distribute the ability to conduct large-scale neural network training and architecture search at high speed using hardware accelerators.

Project supported by the
NSF CCF-1763747 award

Research Plan

Our interdisciplinary research plan spans theory, hardware architecture and design, software control, and system integration.

We propose a new class of pre-defined sparse neural networks co-designed with parallel and reconfigurable hardware architectures that maximize circuit speed in Field Programmable Gate Arrays (FPGAs). Algorithm-hardware co-design differentiates our approach from previous work where sparsity is enforced during or at the end of training.

Our research is applied to simplify and automate neural architecture search by developing software and algorithms for system performance analysis, scheduling and managing multiple FPGA boards.