BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that uses a transformer architecture to encode and decode text. It is a "large" model, meaning it has a large number of parameters, which allows it to achieve strong performance on a wide range of NLP tasks. BERT is trained on a large corpus of text, allowing it to learn the complex patterns and relationships present in natural language. It can then be fine-tuned on specific tasks, such as language translation or text classification, by adding task-specific layers on top of the pre-trained model. Overall, BERT is a powerful and widely-used tool for NLP tasks. The code for BERT can be found at this Github repository

RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. RoBERTa, which was implemented in PyTorch, modifies key hyperparameters in BERT, including removing BERT’s next-sentence pretraining objective, and training with much larger mini-batches and learning rates. This allows RoBERTa to improve on the masked language modeling objective compared with BERT and leads to better downstream task performance. RoBERTa was created by Facebook AI Research group - you can view the paper describing RoBERTa at this link

XLNET shot up to fame after it beat BERT in roughly 20 NLP tasks, sometimes with quite substantial margins. So, what is XLNet and how is it different from BERT? XLNet has a similar architecture to BERT. However, the major difference comes in it’s approach to pre-training. BERT is an Autoencoding (AE) based model, while XLNet is an Auto-Regressive (AR) - this allows XLNET to capture more dependency pairs given the same target and contains “denser” effective training signals than BERT. The paper describing XLNET can be found here

If you are interested in comparing and contrasting the various Transformer models listed on this page there is a nice cheat sheet added by NL Plation to Medium which provides a great explanation. See the Medium article here

Friends of GPT

Google vs ChatGPT

Act as if you were…