GPT-3
2020
Web text
Transformer
Number of layers, hidden size, sequence length
Language generation, chatbots, Q&A systems
BERT
2018
BookCorpus, English Wikipedia
Transformer
Number of layers, hidden size, sequence length
Language understanding, sentiment analysis, text classification
ELMO
2018
1 Billion Word Benchmark
BiLSTM
Number of layers, hidden size, sequence length
Language understanding, sentiment analysis, text classification
OpenAI GPT-2
2019
Web text
Transformer
Number of layers, hidden size, sequence length
Language generation, chatbots, Q&A systems
ULMFiT
2018
Wikipedia, IMDB, AG News
AWD-LSTM
Number of layers, hidden size, sequence length
Language understanding, sentiment analysis, text classification
Transformer-XL
2019
Web text
Transformer
Number of layers, hidden size, sequence length
Language generation, chatbots, Q&A systems
RoBERTa
2019
BookCorpus, English Wikipedia
Transformer
Number of layers, hidden size, sequence length
Language understanding, sentiment analysis, text classification
ALBERT
2019
BookCorpus, English Wikipedia
Transformer
Number of layers, hidden size, sequence length
Language understanding, sentiment analysis, text classification