Transformers: Product Documentation

Jul 11

Written By Vivek V

The Transformer Architecture
(click on image for source)

Overview

The Transformer Model is based on the idea of applying "attention" to only a handful of details to complete a particular task while discarding the rest.

Conceptually, it is based on the idea of mimicking the human cognitive process of selectively focusing only on specific parts of any information resource, compared to the rest, to solve a problem. From machine translation to natural language processing tasks and beyond, Transformer models have come to redefine the state-of-the-art across several machine learning (ML) research domains and continue to enable spectacular breakthroughs within the broader artificial intelligence community. Therefore, this project aims to explain:

what the Transformer model is
how the Transformer model works
all important details you would need to gain a useful intuition
some technical implementations to solve specific real-world tasks

Scope

Based on the Divio Documentation Style, attributed to Daniele Procida at PyCon Australia 2017, the project is designed to cover the following facets about the Transformer model:

Getting Started
- Provides a quick walkthrough of the steps needed to use Transformers for sentiment-review text classification
How-To Guide
- How Transformers can be designed for the purpose of machine translation, i.e., translating English sentences into Spanish.
Important Concepts
- What is the Transformer model?
- Why is it more successful than traditional sequence-to-sequence (seq2seq) deep learning models?
- How does the theoretical design of the Transformer model ensure improved learning, in comparison to traditional deep recurrent neural networks?
References & Further Reading
- All references used to create this documentation and resources needed to correctly understand and implement Transformer models for real-world tasks.

Outcomes

At the end of the documentation, you will have a clear understanding of:

what the Transformer model is and why it is significant
why this model performs better than traditional seq2seq deep learning models
how you can build this model for your task

Target Audience

The documentation is meant for readers interested in/working on:

understanding how attention models work
understanding the underlying mathematics underpinning attention models
understanding how attention models can be implemented
novel solutions to natural language processing tasks
latest deep learning research ideas

Prerequisites

The documentation assumes:

You already have a background in machine learning and deep learning.
You understand the mathematical basics of deep neural networks, especially:
- linear algebra
- differential calculus
You understand how recurrent neural networks work.
You understand the basics of Natural Language Processing like vectorisation, tokenisation, and embeddings.
You have prior knowledge about seq2seq models.
You understand the basics of the attention mechanism.
You have prior experience with computer programming in Python.
You have prior experience with deep learning libraries such as Keras and TensorFlow.