This deep learning course provides a comprehensive introduction to attention mechanisms and transformer models the foundation of modern GenAI systems. Begin by exploring the shift from traditional neural networks to attention-based architectures. Understand how additive, multiplicative, and self-attention improve model accuracy in NLP and vision tasks. Dive into the mechanics of self-attention and how it powers models like GPT and BERT. Progress to mastering multi-head attention and transformer components, and explore their role in advanced text and image generation. Gain real-world insights through demos featuring GPT, DALL·E, LLaMa, and BERT.



Recommended experience
What you'll learn
Apply self-attention and multi-head attention in deep learning models
Understand transformer architecture and its key components
Explore the role of attention in powering models like GPT and BERT
Analyze real-world GenAI applications in NLP and image generation
Skills you'll gain
Details to know

Add to your LinkedIn profile
June 2025
7 assignments
See how employees at top companies are mastering in-demand skills

There are 2 modules in this course
Explore the power of attention mechanisms in modern deep learning. Compare traditional neural architectures with attention-based models to see how additive, multiplicative, and self-attention boost accuracy in NLP and vision tasks. Grasp the core math and flow of self-attention, the engine behind Transformer giants like GPT and BERT and build a solid base for advanced AI development.
What's included
10 videos1 reading3 assignments
Master multi-head attention and transformer models in this advanced module. Learn how multi-head attention improves context understanding and powers leading transformer architectures. Explore transformer components, text and image generation workflows, and real-world use cases with models like GPT, BERT, LLaMa, and DALL·E. Ideal for building GenAI-powered applications.
What's included
11 videos4 assignments
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor

Offered by
Why people choose Coursera for their career




New to Machine Learning? Start here.

Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
The attention mechanism allows transformer models to focus on relevant parts of input sequences, weighing relationships between tokens to improve context understanding and accuracy in tasks like translation or text generation.
Yes, ChatGPT is built on the transformer architecture, specifically using a variant of the GPT (Generative Pre-trained Transformer) model, which enables it to generate human-like responses.
The Vision Transformer (ViT) applies self-attention to image patches instead of pixels, enabling the model to capture spatial relationships and global context for accurate image classification and understanding.
More questions
Financial aid available,