User:Mirabbosm/sandbox
Attention in Machine Learning
[edit]Attention mechanisms in machine learning allow models to focus on the most relevant parts of input data when performing tasks such as translation, summarization, or image analysis. First introduced in neural machine translation, attention has since become a core component in deep learning architectures, most notably in the Transformer model (Vaswani et al., 2017).
Origins and Development
[edit]The attention mechanism was introduced by Bahdanau, Cho, and Bengio in 2014 to solve limitations in sequence-to-sequence models used for translation. Their model, sometimes called additive attention, aligned each output word with its most relevant input positions instead of compressing the entire input into a single vector (Bahdanau et al., 2014). This allowed for better performance on long sequences and inspired further innovations in the field.
Luong et al. (2015) later refined this with global and local attention types, and introduced multiplicative attention, which simplified computation and improved training speed (Luong et al., 2015).
Self-Attention and Transformers
[edit]The major breakthrough came with self-attention, where each element in the input sequence attends to all others, enabling the model to capture global dependencies. This idea was central to the Transformer architecture, which replaced recurrence entirely with attention mechanisms. As a result, Transformers became the foundation for models like BERT, GPT, and T5 (Vaswani et al., 2017).
Applications
[edit]Attention is widely used in natural language processing, computer vision, and speech recognition. In NLP, it improves context understanding in tasks like question answering and summarization. In vision, visual attention helps models focus on relevant image regions, enhancing object detection and image captioning.
References
[edit]- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473
- Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv:1508.04025
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems
![]() | This is a user sandbox of Mirabbosm. You can use it for testing or practicing edits. This is not the place where you work on your assigned article for a dashboard.wikiedu.org course. Visit your Dashboard course page and follow the links for your assigned article in the My Articles section. |