Skip to content

Bag of Tricks in Machine Learning

Public Personal Notebook

  • Home
  • random
  • paper
  • list
  • nlp
  • reinforcementlearning
  • gan
  • cv

Embedding-based Search Retrieval Papers and Blogs

October 12, 2022July 17, 2022 by admin
  • Alibaba: https://arxiv.org/abs/2106.09297 
  • Alibaba: https://arxiv.org/abs/2210.04170
  • Amazon: https://arxiv.org/abs/1907.00937
  • Amazon: https://arxiv.org/abs/2105.02978
  • Coveo: https://arxiv.org/abs/2104.02061 
  • eBay: https://dl.acm.org/doi/abs/10.1145/3366424.3382715 
  • Facebook: https://arxiv.org/abs/2006.11632
  • Facebook: https://research.facebook.com/publications/que2search-fast-and-accurate-query-and-document-understanding-for-search-at-facebook/ 
  • Google: https://arxiv.org/abs/2010.01195 
  • HomeDepot: https://arxiv.org/abs/2008.08180 
  • Instacart: https://sigir-ecom.github.io/ecom22Papers/paper_8392.pdf
  • JD: https://arxiv.org/abs/2006.02282
  • Pins: https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc
  • Spotify: https://engineering.atspotify.com/2022/03/introducing-natural-language-search-for-podcast-episodes
  • Walmart: https://dl.acm.org/doi/abs/10.1145/3308560.3316603 
  • Walmart: https://dl.acm.org/doi/abs/10.1145/3534678.3539164
  • Wayfair: https://arxiv.org/abs/2204.05231
Categories nlp, papers
Version mismatch in embedding-based retrieval
Dense Retriever for Salient Phrase

Best practice to follow this website

  1. In Feedly, click “add content”
  2. Input the url “bagoftricks.ml” and follow

Shujian Follow

Software engineer @Google Travel. PhD in renewable energy from @UMassAmherst. @kaggle triple master. IR/NLP/ASR.

Shujian_Liu
Retweet on Twitter Shujian Retweeted
fchollet François Chollet @fchollet ·
18 Mar

"It's autocomplete" is not a helpful analogy to understand LLMs. A LLM is more like a database that lets query information in natural language. You can query both knowledge, and "patterns" (associative programs seen in the training data, that can be applied to new inputs).

Reply on Twitter 1637121320340299776 Retweet on Twitter 1637121320340299776 157 Like on Twitter 1637121320340299776 1221 Twitter 1637121320340299776
Retweet on Twitter Shujian Retweeted
_jasonwei Jason Wei @_jasonwei ·
13 Mar

Hot take supported by evidence: for a given NLP task, it is unwise to extrapolate performance to larger models because emergence can occur.

I manually examined all 202 tasks in BIG-Bench, and the most common category was for the scaling behavior to *unpredictably* increase.

Reply on Twitter 1635338409370865665 Retweet on Twitter 1635338409370865665 56 Like on Twitter 1635338409370865665 359 Twitter 1635338409370865665
Retweet on Twitter Shujian Retweeted
cosminnegruseri Cosmin Negruseri @cosminnegruseri ·
28 Feb

this slide is great, and focuses on one ranking model + postprocessing, but if your team owns an end to end system with indexing, candidate retrieval, ranking, blending oncall is even more complicated

Reply on Twitter 1630451840503668740 Retweet on Twitter 1630451840503668740 3 Like on Twitter 1630451840503668740 17 Twitter 1630451840503668740
Retweet on Twitter Shujian Retweeted
jobergum Jo Kristian Bergum @jobergum ·
27 Feb

Hm, a ready-to-ship e-commerce search solution with tunable hybrid ranking, auto-complete query suggestions, and query contextualized navigation. Better than any commercial vendor, but with open-source technology, seeing how the sausage is made.

Reply on Twitter 1630275264314744833 Retweet on Twitter 1630275264314744833 7 Like on Twitter 1630275264314744833 81 Twitter 1630275264314744833
Retweet on Twitter Shujian Retweeted
edwardsun0909 Zhiqing Sun @edwardsun0909 ·
22 Feb

How can LLMs such as GPT-3 and ChatGPT achieve greater factual accuracy without relying on an external retrieval search engine?

Our #ICLR2023 paper shows that recitation can help - like humans!

Recitation-Augmented Language Models
https://arxiv.org/abs/2210.01296

1/N

Reply on Twitter 1628494281588740096 Retweet on Twitter 1628494281588740096 92 Like on Twitter 1628494281588740096 376 Twitter 1628494281588740096
Load More

Categories

  • ml (2)
  • nlp (15)
  • papers (4)
  • random (8)
  • reinforcementlearning (1)
  • search (1)
  • Uncategorized (59)

Tags

anomaly (1) automl (1) ctr (1) cv (2) data (1) distributedtraining (1) gan (1) kaggle (1) list (2) ml (2) NER (1) nlp (10) nn (1) paper (1) random (3) reinforcementlearning (1) sql (1)

Recent Posts

  • Google’s Deep Learning Tuning Playbook
  • Few-Shot Learning in NLP
  • (Very) Large Language Models in 2022
  • Airbnb Search Papers
  • Dense Retriever for Salient Phrase

Archives

  • January 2023 (1)
  • October 2022 (1)
  • August 2022 (3)
  • July 2022 (1)
  • July 2021 (2)
  • June 2021 (1)
  • May 2020 (1)
  • October 2019 (1)
  • August 2019 (11)
  • July 2019 (8)
  • June 2019 (6)
  • May 2019 (1)
  • April 2019 (11)
  • March 2019 (4)
  • February 2019 (2)
  • January 2019 (32)

Recent Comments

    July 2022
    M T W T F S S
     123
    45678910
    11121314151617
    18192021222324
    25262728293031
    « Jul   Aug »

    • 0
    • 13
    • 91,001
    • 45,682
    • 86
    • 0

    Recent Posts

    • Google’s Deep Learning Tuning Playbook
    • Few-Shot Learning in NLP
    • (Very) Large Language Models in 2022
    • Airbnb Search Papers
    • Dense Retriever for Salient Phrase

    Categories

    • ml (2)
    • nlp (15)
    • papers (4)
    • random (8)
    • reinforcementlearning (1)
    • search (1)
    • Uncategorized (59)

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    © 2023 Bag of Tricks in Machine Learning • Built with GeneratePress