Transformers: Product Documentation

The Transformer Architecture
(click on image for source)

Overview

The Transformer Model is based on the idea of applying "attention" to only a handful of details to complete a particular task while discarding the rest.

Conceptually, it is based on the idea of mimicking the human cognitive process of selectively focusing only on specific parts of any information resource, compared to the rest, to solve a problem. From machine translation to natural language processing tasks and beyond, Transformer models have come to redefine the state-of-the-art across several machine learning (ML) research domains and continue to enable spectacular breakthroughs within the broader artificial intelligence community. Therefore, this project aims to explain:

  • what the Transformer model is

  • how the Transformer model works

  • all important details you would need to gain a useful intuition

  • some technical implementations to solve specific real-world tasks

Scope

Based on the Divio Documentation Style, attributed to Daniele Procida at PyCon Australia 2017, the project is designed to cover the following facets about the Transformer model:

  • Getting Started

    • Provides a quick walkthrough of the steps needed to use Transformers for sentiment-review text classification

  • How-To Guide

    • How Transformers can be designed for the purpose of machine translation, i.e., translating English sentences into Spanish.

  • Important Concepts

    • What is the Transformer model?

    • Why is it more successful than traditional sequence-to-sequence (seq2seq) deep learning models?

    • How does the theoretical design of the Transformer model ensure improved learning, in comparison to traditional deep recurrent neural networks?

  • References & Further Reading

    • All references used to create this documentation and resources needed to correctly understand and implement Transformer models for real-world tasks.

Outcomes

At the end of the documentation, you will have a clear understanding of:

  • what the Transformer model is and why it is significant

  • why this model performs better than traditional seq2seq deep learning models

  • how you can build this model for your task

Target Audience

The documentation is meant for readers interested in/working on:

  • understanding how attention models work

  • understanding the underlying mathematics underpinning attention models

  • understanding how attention models can be implemented

  • novel solutions to natural language processing tasks

  • latest deep learning research ideas

Prerequisites

The documentation assumes:

  • You already have a background in machine learning and deep learning.

  • You understand the mathematical basics of deep neural networks, especially:

    • linear algebra

    • differential calculus

  • You understand how recurrent neural networks work.

  • You understand the basics of Natural Language Processing like vectorisation, tokenisation, and embeddings.

  • You have prior knowledge about seq2seq models.

  • You understand the basics of the attention mechanism.

  • You have prior experience with computer programming in Python.

  • You have prior experience with deep learning libraries such as Keras and TensorFlow.

Project Link

Transformer: Project Documentation (on GitHub)

Next
Next

A Deeper Analysis Of Instagram’s New Changes. Here’s How Your Brand Can Effectively Leverage Them!