- Alibaba: https://arxiv.org/abs/2106.09297
- Alibaba: https://arxiv.org/abs/2210.04170
- Amazon: https://arxiv.org/abs/1907.00937
- Amazon: https://arxiv.org/abs/2105.02978
- Coveo: https://arxiv.org/abs/2104.02061
- eBay: https://dl.acm.org/doi/abs/10.1145/3366424.3382715
- Facebook: https://arxiv.org/abs/2006.11632
- Facebook: https://research.facebook.com/publications/que2search-fast-and-accurate-query-and-documen;t-understanding-for-search-at-facebook/
- Facebook: https://research.facebook.com/publications/que2engage-embedding-based-retrieval-for-relevant-and-engaging-products-at-facebook-marketplace/
- Google: https://arxiv.org/abs/2010.01195
- HomeDepot: https://arxiv.org/abs/2008.08180
- Instacart: https://sigir-ecom.github.io/ecom22Papers/paper_8392.pdf
- JD: https://arxiv.org/abs/2006.02282
- Pins: https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc
- Spotify: https://engineering.atspotify.com/2022/03/introducing-natural-language-search-for-podcast-episodes
- Walmart: https://dl.acm.org/doi/abs/10.1145/3308560.3316603
- Walmart: https://dl.acm.org/doi/abs/10.1145/3534678.3539164
- Wayfair: https://arxiv.org/abs/2204.05231
Some papers I really love:
- Zhang Zhi, et al. “Bag of Freebies for Training Object Detection Neural Networks.” arXiv preprint arXiv:1902.04103 (2019).
- Xie, Junyuan, et al. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” arXiv preprint arXiv:1812.01187 (2018).
- Howard, Jeremy, and Sebastian Ruder. “Universal language model fine-tuning for text classification.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018.
- Smith, Leslie N. “A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay.” arXiv preprint arXiv:1803.09820 (2018).
- Chahal, Karanbir, Manraj Singh Grover, and Kuntal Dey. “A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks.” arXiv preprint arXiv:1810.11787 (2018).
- Neishi, Masato, et al. “A bag of useful tricks for practical neural machine translation: Embedding layer initialization and large batch size.” Proceedings of the 4th Workshop on Asian Translation (WAT2017). 2017.
- Joulin, Armand, et al. “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759 (2016).
- Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
- He, Xinran, et al. “Practical lessons from predicting clicks on ads at Facebook.” Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.
- McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.
Bloomberg trained an LLM from scratch on AWS (64 × 8 A100 40GB for 53 days). They constructed a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general-purpose datasets.
BloombergGPT outperforms similarly-sized open models on financial NLP tasks by significant margins — without sacrificing performance on general LLM benchmarks
This guide has tons of tricks that are usually not covered in textbooks: https://github.com/google-research/tuning_playbook
Two recent papers on few-shot learning in NLP caught my eye: 1st on retrieval by Google Research and 2nd on classification by Intel and HuggingFace
Dai, Zhuyun, et al. “Promptagator: Few-shot Dense Retrieval From 8 Examples.” arXiv preprint arXiv:2209.11755 (2022).
we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data.
Tunstall, Lewis, et al. “Efficient Few-Shot Learning Without Prompts.” arXiv preprint arXiv:2209.11055 (2022).
we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner.
I feel quite amazed by the few-shot or even zero-shot learning capabilities of some recent (very) large language models. Here are three papers I read recently and would like to recommend:
– 540B PaLM by Google: https://arxiv.org/abs/2204.02311
– 11B Atlas by Meta: https://arxiv.org/abs/2208.03299
– 20B AlexaTM by Amazon: https://arxiv.org/abs/2208.01448
Grbovic, Mihajlo, and Haibin Cheng. “Real-time personalization using embeddings for search ranking at airbnb.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018. [blog 1, blog 2]
Haldar, Malay, et al. “Applying deep learning to airbnb search.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
Haldar, Malay, et al. “Improving deep learning for airbnb search.” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
Abdool, Mustafa, et al. “Managing diversity in airbnb search.” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
Zhang, Kai, et al. “LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval.” arXiv preprint arXiv:2208.13661 (2022).
Sciavolino, Christopher, et al. “Simple entity-centric questions challenge dense retrievers.” arXiv preprint arXiv:2109.08535 (2021).
Chen, Xilun, et al. “Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?.” arXiv preprint arXiv:2110.06918 (2021).
Version mismatch in embedding-based retrieval is challenging, esp. on the infra side.
Small strongly labeled and large weakly labeled data is a very common situation we may run into in NLP or ASR modeling. Amazon search team used this three-stage NEEDLE Framework to take advantage of large weakly labeled data to improve NER. Their noise-aware loss function is interesting and worth taking a deep dive into. Paper link: https://www.amazon.science/publications/named-entity-recognition-with-small-strongly-labeled-and-large-weakly-labeled-data