User:Mirabbosm/sandbox

Attention in Machine Learning

Attention mechanisms in machine learning allow models to focus on the most relevant parts of input data when performing tasks such as translation, summarization, or image analysis. First introduced in neural machine translation, attention has since become a core component in deep learning architectures, most notably in the Transformer model (Vaswani et al., 2017).

Origins and Development

The attention mechanism was introduced by Bahdanau, Cho, and Bengio in 2014 to solve limitations in sequence-to-sequence models used for translation. Their model, sometimes called additive attention, aligned each output word with its most relevant input positions instead of compressing the entire input into a single vector (Bahdanau et al., 2014). This allowed for better performance on long sequences and inspired further innovations in the field.

Luong et al. (2015) later refined this with global and local attention types, and introduced multiplicative attention, which simplified computation and improved training speed (Luong et al., 2015).

Self-Attention and Transformers

The major breakthrough came with self-attention, where each element in the input sequence attends to all others, enabling the model to capture global dependencies. This idea was central to the Transformer architecture, which replaced recurrence entirely with attention mechanisms. As a result, Transformers became the foundation for models like BERT, GPT, and T5 (Vaswani et al., 2017).

Applications

Attention is widely used in natural language processing, computer vision, and speech recognition. In NLP, it improves context understanding in tasks like question answering and summarization. In vision, visual attention helps models focus on relevant image regions, enhancing object detection and image captioning.

References

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473
Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv:1508.04025
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems

This is a user sandbox of Mirabbosm. You can use it for testing or practicing edits.
This is not the place where you work on your assigned article for a dashboard.wikiedu.org course.
Visit your Dashboard course page and follow the links for your assigned article in the My Articles section.

Get Help