BERT

2019/11/06

Paper review on “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2018-10-11)”

abstraction

  • Bidirectional Encoder Representations from Transformer
  • wiki + BooksCorpus(total 3300M words) unlabeled data pre-training and labeled data transfer learning
  • masked language models(MLM), next sentence prediction(NSP) on pre-training
  • BERT advances the state of the art for eleven NLP tasks
  • with few architecture change, small fine-tuning data and epochs

BERT-PDF-download