BloombergGPT

https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/

Bloomberg trained an LLM from scratch on AWS (64 × 8 A100 40GB for 53 days). They constructed a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general-purpose datasets.


BloombergGPT outperforms similarly-sized open models on financial NLP tasks by significant margins — without sacrificing performance on general LLM benchmarks

Few-Shot Learning in NLP

Two recent papers on few-shot learning in NLP caught my eye: 1st on retrieval by Google Research and 2nd on classification by Intel and HuggingFace

Dai, Zhuyun, et al. “Promptagator: Few-shot Dense Retrieval From 8 Examples.” arXiv preprint arXiv:2209.11755 (2022).

we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data.

Tunstall, Lewis, et al. “Efficient Few-Shot Learning Without Prompts.” arXiv preprint arXiv:2209.11055 (2022).

we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. 

Dense Retriever for Salient Phrase

Zhang, Kai, et al. “LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval.” arXiv preprint arXiv:2208.13661 (2022).

Sciavolino, Christopher, et al. “Simple entity-centric questions challenge dense retrievers.” arXiv preprint arXiv:2109.08535 (2021).

Chen, Xilun, et al. “Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?.” arXiv preprint arXiv:2110.06918 (2021).

Embedding-based Search Retrieval Papers and Blogs

NER with small strongly labeled and large weakly labeled data

Small strongly labeled and large weakly labeled data is a very common situation we may run into in NLP or ASR modeling. Amazon search team used this three-stage NEEDLE Framework to take advantage of large weakly labeled data to improve NER. Their noise-aware loss function is interesting and worth taking a deep dive into. Paper link: https://www.amazon.science/publications/named-entity-recognition-with-small-strongly-labeled-and-large-weakly-labeled-data

AutoML for NLP

  • Jin, Haifeng, Qingquan Song, and Xia Hu. “Efficient neural architecture search with network morphism.” arXiv preprint arXiv:1806.10282 (2018). [code]
  • Pham, Hieu, et al. “Efficient Neural Architecture Search via Parameter Sharing.” arXiv preprint arXiv:1802.03268 (2018). [code]
  • Liu, Hanxiao, Karen Simonyan, and Yiming Yang. “Darts: Differentiable architecture search.” arXiv preprint arXiv:1806.09055 (2018). [code]

  • H2O.ai: http://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-2017/nlp/index.html
  • Google ai: https://cloud.google.com/natural-language/automl/docs/beginners-guide

More: https://github.com/markdtw/awesome-architecture-search