I feel quite amazed by the few-shot or even zero-shot learning capabilities of some recent (very) large language models. Here are three papers I read recently and would like to recommend:
– 540B PaLM by Google: https://arxiv.org/abs/2204.02311
– 11B Atlas by Meta: https://arxiv.org/abs/2208.03299
– 20B AlexaTM by Amazon: https://arxiv.org/abs/2208.01448