Papers to Start with

Some papers I really love:

  • Zhang Zhi, et al. “Bag of Freebies for Training Object Detection Neural Networks.” arXiv preprint arXiv:1902.04103 (2019).
  • Xie, Junyuan, et al. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” arXiv preprint arXiv:1812.01187 (2018).
  • Howard, Jeremy, and Sebastian Ruder. “Universal language model fine-tuning for text classification.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018.
  • Smith, Leslie N. “A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay.” arXiv preprint arXiv:1803.09820 (2018).
  • Chahal, Karanbir, Manraj Singh Grover, and Kuntal Dey. “A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks.” arXiv preprint arXiv:1810.11787 (2018).
  • Neishi, Masato, et al. “A bag of useful tricks for practical neural machine translation: Embedding layer initialization and large batch size.” Proceedings of the 4th Workshop on Asian Translation (WAT2017). 2017.
  • Joulin, Armand, et al. “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759 (2016).
  • Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
  • He, Xinran, et al. “Practical lessons from predicting clicks on ads at Facebook.” Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.
  • McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

ICML 2019 Best paper

The choice of random seed across different runs has a larger impact on disentanglement scores than the model choice and the strength of regularization (while naively one might expect that more regularization should always lead to more disentanglement). A good run with a bad hyperparameter can easily beat a bad run with a good hyperparameter.