How to take advantage of different word embeddings in text classification task? Please check my Kaggle post: https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/71778
A subset of this field is called meta-embedding. Here is a list of papers: https://github.com/Shujian2015/meta-embedding-paper-list
I found that just taking average of different embeddings is already powerful enough.
One thing to try is BERT: https://gluebenchmark.com/leaderboard