- Alibaba: https://arxiv.org/abs/2106.09297
- Alibaba: https://arxiv.org/abs/2210.04170
- Amazon: https://arxiv.org/abs/1907.00937
- Amazon: https://arxiv.org/abs/2105.02978
- Coveo: https://arxiv.org/abs/2104.02061
- eBay: https://dl.acm.org/doi/abs/10.1145/3366424.3382715
- Facebook: https://arxiv.org/abs/2006.11632
- Facebook: https://research.facebook.com/publications/que2search-fast-and-accurate-query-and-documen;t-understanding-for-search-at-facebook/
- Facebook: https://research.facebook.com/publications/que2engage-embedding-based-retrieval-for-relevant-and-engaging-products-at-facebook-marketplace/
- Google: https://arxiv.org/abs/2010.01195
- HomeDepot: https://arxiv.org/abs/2008.08180
- Instacart: https://sigir-ecom.github.io/ecom22Papers/paper_8392.pdf
- JD: https://arxiv.org/abs/2006.02282
- Pins: https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc
- Spotify: https://engineering.atspotify.com/2022/03/introducing-natural-language-search-for-podcast-episodes
- Walmart: https://dl.acm.org/doi/abs/10.1145/3308560.3316603
- Walmart: https://dl.acm.org/doi/abs/10.1145/3534678.3539164
- Wayfair: https://arxiv.org/abs/2204.05231
Papers to Start with
Some papers I really love:
- Zhang Zhi, et al. “Bag of Freebies for Training Object Detection Neural Networks.” arXiv preprint arXiv:1902.04103 (2019).
- Xie, Junyuan, et al. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” arXiv preprint arXiv:1812.01187 (2018).
- Howard, Jeremy, and Sebastian Ruder. “Universal language model fine-tuning for text classification.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018.
- Smith, Leslie N. “A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay.” arXiv preprint arXiv:1803.09820 (2018).
- Chahal, Karanbir, Manraj Singh Grover, and Kuntal Dey. “A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks.” arXiv preprint arXiv:1810.11787 (2018).
- Neishi, Masato, et al. “A bag of useful tricks for practical neural machine translation: Embedding layer initialization and large batch size.” Proceedings of the 4th Workshop on Asian Translation (WAT2017). 2017.
- Joulin, Armand, et al. “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759 (2016).
- Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
- He, Xinran, et al. “Practical lessons from predicting clicks on ads at Facebook.” Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.
- McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.
LLM Use Cases
Use AI to learn and automate boring tasks: https://nicholas.carlini.com/writing/2024/how-i-use-ai.html
Using Cursor AI to code fast: https://youtu.be/yk9lXobJ95E?si=gaulaloka2RUIMTF
Use GPT to revolutionize education: https://www.amazon.com/Brave-New-Words-Revolutionize-Education/dp/0593656954
AI powered search: www.perplexity.ai
BloombergGPT
https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/
Bloomberg trained an LLM from scratch on AWS (64 × 8 A100 40GB for 53 days). They constructed a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general-purpose datasets.
BloombergGPT outperforms similarly-sized open models on financial NLP tasks by significant margins — without sacrificing performance on general LLM benchmarks
Google’s Deep Learning Tuning Playbook
This guide has tons of tricks that are usually not covered in textbooks: https://github.com/google-research/tuning_playbook
Few-Shot Learning in NLP
Two recent papers on few-shot learning in NLP caught my eye: 1st on retrieval by Google Research and 2nd on classification by Intel and HuggingFace
Dai, Zhuyun, et al. “Promptagator: Few-shot Dense Retrieval From 8 Examples.” arXiv preprint arXiv:2209.11755 (2022).
we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data.
Tunstall, Lewis, et al. “Efficient Few-Shot Learning Without Prompts.” arXiv preprint arXiv:2209.11055 (2022).
we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner.
(Very) Large Language Models in 2022
I feel quite amazed by the few-shot or even zero-shot learning capabilities of some recent (very) large language models. Here are three papers I read recently and would like to recommend:
– 540B PaLM by Google: https://arxiv.org/abs/2204.02311
– 11B Atlas by Meta: https://arxiv.org/abs/2208.03299
– 20B AlexaTM by Amazon: https://arxiv.org/abs/2208.01448
Airbnb Search Papers
Grbovic, Mihajlo, and Haibin Cheng. “Real-time personalization using embeddings for search ranking at airbnb.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018. [blog 1, blog 2]
Haldar, Malay, et al. “Applying deep learning to airbnb search.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
Haldar, Malay, et al. “Improving deep learning for airbnb search.” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
Abdool, Mustafa, et al. “Managing diversity in airbnb search.” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
Haldar, Malay, et al. “Learning To Rank Diversely At Airbnb.” Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023.
Tan, Chun How, et al. “Optimizing Airbnb Search Journey with Multi-task Learning.” Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023.
Dense Retriever for Salient Phrase
Zhang, Kai, et al. “LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval.” arXiv preprint arXiv:2208.13661 (2022).
Sciavolino, Christopher, et al. “Simple entity-centric questions challenge dense retrievers.” arXiv preprint arXiv:2109.08535 (2021).
Chen, Xilun, et al. “Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?.” arXiv preprint arXiv:2110.06918 (2021).
Version mismatch in embedding-based retrieval
Version mismatch in embedding-based retrieval is challenging, esp. on the infra side.
https://recsysml.substack.com/p/a-common-mistake-when-using-embeddings
NER with small strongly labeled and large weakly labeled data
Small strongly labeled and large weakly labeled data is a very common situation we may run into in NLP or ASR modeling. Amazon search team used this three-stage NEEDLE Framework to take advantage of large weakly labeled data to improve NER. Their noise-aware loss function is interesting and worth taking a deep dive into. Paper link: https://www.amazon.science/publications/named-entity-recognition-with-small-strongly-labeled-and-large-weakly-labeled-data