In this series of workshops, we learned about a relatively new kind of neural network called the Transformer. Like an RNN, these are great for working with sequences of data, like text, time series, audio, and even video and images.
In recent months, transformers have been at the core of some of the greatest breakthroughs in language models. If you've used ChatGPT, Bing Chat, you've seen what transformers are capable of.
This workshop series followed the two-week format.
In the first week, members learned how transformers use the attention mechanism to model complex relationships between tokens in a sequence and can process an entire input in one forward pass, letting it train faster.
In particular, we explored how transformers are used in language modelling and text generation, to translate, generate, and classify text. Members trained their own transformer to classify product reviews, tweets, and other sentences as positive or negative.
Members learned that the process they used is called fine tuning.
There are many publicly available machine learning models that were trained on massive datasets and perform well on general language tasks. A common approach to creating machine learning models, is to start with an existing model as a base and train it on smaller dataset to achieve a specific task.