BERT for text classification

BERT can achieve high accuracy with small sample size (e.g. 1000): https://github.com/Socialbird-AILab/BERT-Classification-Tutorial/blob/master/pictures/Results.png

To simply get features (embedding) from BERT, this Keras package is easy to start with: https://pypi.org/project/keras-bert/

For fine-tuning with GPUs, this PyTorch version is handy (gradient accumulation is implemented): https://github.com/huggingface/pytorch-pretrained-BERT

For people how have access to TPUs: https://github.com/google-research/bert

Combine different word embeddings

How to take advantage of different word embeddings in text classification task? Please check my Kaggle post: https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/71778

A subset of this field is called meta-embedding. Here is a list of papers: https://github.com/Shujian2015/meta-embedding-paper-list

I found that just taking average of different embeddings is already powerful enough.

One thing to try is BERT: https://gluebenchmark.com/leaderboard

Random seed optimization in GAN research?

Two quotes from this paper: Are GANs Created Equal? A Large-Scale Study

  • Even with everything else being fixed, varying the random seed may influence on the results
  • authors often report the best FID which opens the door for random seed optimization