A large language model is a type of artificial intelligence (AI) model that is trained on a large data set of text to generate language outputs that are coherent and natural-sounding.
- Large refers to the enormous scale of the model that gives them their nuanced understanding and capability.
- Language refers to how they are designed to understand and interact using human languages.
- Model refers to the computational model or algorithms that processes the input and produces the output.1
Large language models (LLMs) are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.2
It might seem like large language models have arrived out of the blue along with new developments in generative AI. However, many companies, including IBM, have spent years implementing LLMs at different levels to enhance their natural language understanding (NLU) and natural language processing (NLP) capabilities. This has occurred alongside advances in machine learning, machine learning models, algorithms, neural networks and the transformer models that provide the architecture for these AI systems.
LLMs are a class of foundation models, which are trained on enormous amounts of data to provide the foundational capabilities needed to drive multiple use cases and applications, and resolve a multitude of tasks. This is in stark contrast to the idea of building and training domain-specific models for each of these use cases individually, which can be expensive and time consuming.