Paper review on “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2018-10-11)”
abstraction
- Bidirectional Encoder Representations from Transformer
- wiki + BooksCorpus(total 3300M words) unlabeled data pre-training and labeled data transfer learning
- masked language models(MLM), next sentence prediction(NSP) on pre-training
- BERT advances the state of the art for eleven NLP tasks
- with few architecture change, small fine-tuning data and epochs