On the Design of Novel Attention Mechanism for Enhanced Efficiency of Transformers Conference

Jha, SK, Jha, S, Ewetz, R et al. (2024). On the Design of Novel Attention Mechanism for Enhanced Efficiency of Transformers . 10.1145/3649329.3658253

cited authors

  • Jha, SK; Jha, S; Ewetz, R; Velasquez, A

abstract

  • We present a new xor-based attention function for efficient hardware implementation of transformers. While the standard attention mechanism relies on matrix multiplication between the key and the transpose of the query, we propose replacing the computation of this attention function with bitwise xor operations. We mathematically analyze the information-theoretic properties of the standard multiplication-based attention, demonstrating that it preserves input entropy, and then computationally show that the xor-based attention approximately preserves the entropy of its input despite small variations in correlations between the inputs. Across various admittedly simple tasks, including arithmetic, sorting, and text generation, we show comparable performance to baseline methods using scaled GPT models. The xor-based computation of the attention function shows substantial improvement in power consumption, latency, and circuit area compared to the corresponding multiplication-based attention function. This hardware efficiency makes xor-based attention more compelling for the deployment of transformers under tight resource constraints, opening new application domains in sustainable energy-efficient computing. Additional optimizations to the xor-based attention function can further improve efficiency of transformers.

publication date

  • November 7, 2024

Digital Object Identifier (DOI)