Syntax is the grammar. It describes the way to construct a correct sentence. For example, this water is triangular is syntactically correct.
Semantics relates to the meaning. this water is triangular does not mean anything, though the grammar is ok.
BERT can achieve high accuracy with small sample size (e.g. 1000): https://github.com/Socialbird-AILab/BERT-Classification-Tutorial/blob/master/pictures/Results.png
To simply get features (embedding) from BERT, this Keras package is easy to start
For fine-tuning with GPUs, this PyTorch version is handy (gradient accumulation is implemented): https://github.com/huggingface/pytorch-pretrained-BERT
For people how have access to TPUs: https://github.com/google-research/bert
How to take advantage of different word embeddings in text classification task? Please check my Kaggle post: https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/71778
A subset of this field is called meta-embedding. Here is a list of papers: https://github.com/Shujian2015/meta-embedding-paper-list
I found that just taking average of different embeddings is already powerful enough.
One thing to try is BERT: https://gluebenchmark.com/leaderboard
Two quotes from this paper: Are GANs Created Equal? A Large-Scale Study
- Even with everything else being fixed, varying the random seed may influence
- authors often report the best FID which opens the door for random seed optimization
Do people “cherry-picking” to get better results in RL research?