Context Window LLM

45 Views Asked by At

I would like to know how the context window works in Rag for example: GPT3 2048 tokens GPT4 8192 to 32768 tokens

in the gpt3 example in the documents we will have a window of 2048 forward and 2048 backward? Does this mean it can only recover within that window?

explanation of how the context window works in llm

1

There are 1 best solutions below

0
Mark McDonald On

The input part of the context window determines how much input you can ask - this is typically the request and prompt with any additional context added.

The "Retrieval" part of RAG happens outside of the LLM, so is unaffected by the LLM context window (though if you use an embedding model, it likely has it's own input size limit). After retrieving content (e.g. from your database via a vector similarity search), the relevant document chunks are then added into the prompt that is passed to the LLM (this is the input part of the context window).

For a QA task, it might look like this:

You are a helpful question answering bot. Answer the question using the information provided here. If you cannot answer with this context, say "I don't know"

Question: Who is the president of the USA?

Context 1: The United States of America (USA or U.S.A.), commonly known as the United States...

Context 2: Abraham Lincoln was an American lawyer, politician, and statesman...

Here the input context is consumed by:

  1. The guidance and instruction part of the prompt ("You are a helpful...")
  2. The input question being answered ("Who is the president...")
  3. All context chunks added ("Context N: ...")

Depending on your library, you can usually control things like the chunk size for your docs/nodes (this is the size of the "Context" blocks) and the number of context nodes you supply. For smaller windows you can use smaller and fewer context nodes, but this means you are providing less information. You likely also have control over the instruction part of the prompt - you could make that bigger or smaller, but it's typically consistent across invocations.

Output context is purely the generated response, and so the full "context window" is the sum of these two - input + output.