Papers to Start with

Some papers I really love:

  • Zhang Zhi, et al. “Bag of Freebies for Training Object Detection Neural Networks.” arXiv preprint arXiv:1902.04103 (2019).
  • Xie, Junyuan, et al. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” arXiv preprint arXiv:1812.01187 (2018).
  • Howard, Jeremy, and Sebastian Ruder. “Universal language model fine-tuning for text classification.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018.
  • Smith, Leslie N. “A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay.” arXiv preprint arXiv:1803.09820 (2018).
  • Chahal, Karanbir, Manraj Singh Grover, and Kuntal Dey. “A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks.” arXiv preprint arXiv:1810.11787 (2018).
  • Neishi, Masato, et al. “A bag of useful tricks for practical neural machine translation: Embedding layer initialization and large batch size.” Proceedings of the 4th Workshop on Asian Translation (WAT2017). 2017.
  • Joulin, Armand, et al. “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759 (2016).
  • Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
  • He, Xinran, et al. “Practical lessons from predicting clicks on ads at Facebook.” Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.
  • McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

NER with small strongly labeled and large weakly labeled data

Small strongly labeled and large weakly labeled data is a very common situation we may run into in NLP or ASR modeling. Amazon search team used this three-stage NEEDLE Framework to take advantage of large weakly labeled data to improve NER. Their noise-aware loss function is interesting and worth taking a deep dive into. Paper link:

To Review Deep Learning

I will go back to work on deep learning after writing bash for 6 months. Here is my plan to pick up deep learning.


  • Deep learning (Andrew Ng):
  • Book (Part 2):

CV or NLP:

  • Convolutional Neural Networks for Visual Recognition (Spring 2017):
  • CS224N: Natural Language Processing with Deep Learning | Winter 2019:

PyTorch and Tensorflow:

  • Fast AI (PyTorch):
  • Tensorflow: