Version mismatch in embedding-based retrieval is challenging, esp. on the infra side.
https://recsysml.substack.com/p/a-common-mistake-when-using-embeddings
Bag of Tricks in Machine Learning
Public Personal Notebook
Version mismatch in embedding-based retrieval is challenging, esp. on the infra side.
https://recsysml.substack.com/p/a-common-mistake-when-using-embeddings
Small strongly labeled and large weakly labeled data is a very common situation we may run into in NLP or ASR modeling. Amazon search team used this three-stage NEEDLE Framework to take advantage of large weakly labeled data to improve NER. Their noise-aware loss function is interesting and worth taking a deep dive into. Paper link: https://www.amazon.science/publications/named-entity-recognition-with-small-strongly-labeled-and-large-weakly-labeled-data