- deep learning model
- Type of neural network architecture https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/
- Introduced in 2017
- First described in 2017 paper called: “Attention is All You Need” by Ashish Vaswani et al. at Google Brain. https://arxiv.org/abs/1706.03762
Reference: https://www.ibm.com/topics/transformer-model - Can translate text and speech in near-real-time.
Used by:
- OpentAI’s popular ChatGPT
- BERT Model (Bidirectional Encoder Representations from Transformers)
Primary Innovations:
- Positional Encoding
- Self Attention: pay more attention to relevant information
Key Terms:
- Tokens: Text converted into numerical representations; Each token is then contextualized within the scope of the context window other tokens https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
Visual Explaination:
- Attention in transformers, visually explained | Chapter 6, Deep Learning https://www.youtube.com/watch?v=eMlx5fFNoYc
Example:
https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/
References:
- What is transformer model ? https://www.ibm.com/topics/transformer-model
- https://www.datacamp.com/tutorial/how-transformers-work
- https://towardsdatascience.com/transformers-141e32e69591
- https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/#.XIWlzBNKjOR
- https://jalammar.github.io/illustrated-transformer/
- https://towardsdatascience.com/openai-gpt-2-understanding-language-generation-through-visualization-8252f683b2f8