BERT can achieve high accuracy with small sample size (e.g. 1000): https://github.com/Socialbird-AILab/BERT-Classification-Tutorial/blob/master/pictures/Results.png
To simply get features (embedding) from BERT, this Keras package is easy to start
For fine-tuning with GPUs, this PyTorch version is handy (gradient accumulation is implemented): https://github.com/huggingface/pytorch-pretrained-BERT
For people how have access to TPUs: https://github.com/google-research/bert