Context window

A context window is the amount of information (in tokens) that a transformer-based AI model, such as a large language model (LLM), can process in one prompt. It sets the limit for how much information the model can “remember” while generating responses.

In practical terms, a context window acts like a model’s working memory. It determines how far back in a conversation the model can refer and how much content it can analyze at once. When the input exceeds the model’s context window, it needs to be shortened or summarized for the model to continue processing.

The context window size directly affects a model’s performance. A larger window allows for better understanding of long passages and relationships between ideas, while a smaller one limits comprehension to shorter segments. For example, in RAG (Retrieval-Augmented Generation) systems, the context window defines how much retrieved information the model can use to form accurate outputs.

Increasing context length can improve output quality, reduce errors, and allow longer interactions. However, larger windows also require more computational power and can increase vulnerability to certain types of attacks.