Language Models and Their Limitations

Learning Objectives

By the end of this module you will understand:

What language models are
How modern Large Language Models (LLMs) generate text
The concepts of tokens, attention, and context windows
Why LLMs sometimes produce incorrect or fabricated information
The fundamental limitations of LLMs that lead to the need for Retrieval-Augmented Generation

This module explains how language models work internally and why they require external knowledge systems.

1. What is a Language Model?

A language model is a machine learning system trained to predict the next word in a sequence of text. At its core, a language model answers one question:

Given the previous words, what is the most probable next word?

Example:

Input: The capital of France is

Prediction: Paris

The model chooses Paris because during training it saw many examples where the sequence “capital of France” was followed by “Paris”.

Language models therefore learn statistical patterns in language.

2. Large Language Models (LLMs)

Modern systems like:

GPT-4
Claude 3
Llama 3

are called Large Language Models (LLMs). They are called “large” because:

They are trained on massive datasets
They contain billions or trillions of parameters
They learn complex patterns in language

These models can perform tasks such as:

answering questions
writing essays
summarizing documents
translating languages
generating code

Despite these impressive abilities, they still operate using the same fundamental mechanism: predicting tokens.

3. Tokens: How Models Read Text

Language models do not process text word by word. Instead they process tokens. A token is a small unit of text. It can be:

a word
part of a word
punctuation
a symbol

Example sentence: Artificial intelligence is transforming technology.

Possible tokenization:

Artificial
intelligence
is
transforming
technology
.

Another example: unbelievable

Might become:

un
believ
able

Tokens allow models to process language efficiently.

4. Context Windows

A context window defines how much text the model can consider at one time.

Example:

A model with a context window of: 8,000 tokens can only read 8,000 tokens of input at once. If the input exceeds that size, earlier tokens must be removed. Modern models have larger context windows.

For example:

GPT-4 supports very large contexts
Claude 3 can process extremely long documents

However, context is always limited. This means an LLM cannot read an entire knowledge base at once.

5. The Transformer Architecture

Most modern LLMs are built using the Transformer architecture. This architecture was introduced in 2017 and revolutionized natural language processing. Transformers rely on a mechanism called attention.

6. Attention: Understanding Relationships Between Words

Attention allows the model to decide which words in a sentence are important when predicting the next token.

Example sentence: The cat sat on the mat because it was tired. The word “it” refers to “cat”. Attention helps the model learn that relationship. Instead of reading text strictly left-to-right, attention allows the model to connect related words across the entire sentence.

This ability allows transformers to understand:

context
relationships
meaning

7. How Language Models Generate Responses

When you ask a question, the model performs the following steps:

Step 1 — Convert input into tokens
Step 2 — Process tokens through the neural network
Step 3 — Predict the most likely next token
Step 4 — Append the token to the response
Step 5 — Repeat until the response is complete

Example:

Input: What is the capital of France?

Output generation process:

The
capital
of
France
is
Paris
.

Each token is generated one step at a time.

8. Why LLMs Sometimes Produce Incorrect Answers

Because LLMs generate text based on probability, they sometimes produce answers that sound correct but are actually wrong. This phenomenon is called hallucination.

Example:

Question: What is the refund policy of my company?

If the model has never seen your company’s policy, it may invent an answer based on patterns from other companies. The model is not intentionally lying. It is simply predicting what a typical answer might look like.

9. Knowledge Limitations of LLMs

LLMs face several important knowledge limitations.

Static Knowledge

Models are trained once and then deployed. They do not automatically learn new information.

Example: A model trained in 2024 may not know events that occur in 2025.

No Access to Private Data

LLMs cannot access internal documents such as:
company policies
private databases
internal reports
confidential knowledge bases
Limited Context

Even with large context windows, LLMs cannot process millions of documents simultaneously. This creates a major challenge. How can a model answer questions about information stored in large knowledge bases?

10. The Core Problem

We now arrive at a critical realization:

Language models are excellent at generating explanations,
but they are not designed to retrieve information from large knowledge bases.

This creates a gap between generation and retrieval. To solve this problem, modern AI systems combine:

retrieval systems that search for relevant information
language models that interpret and explain the results

This approach is known as Retrieval-Augmented Generation.

Key Takeaways

Important ideas from this module:

Language models generate text by predicting tokens
Modern LLMs use the Transformer architecture
Tokens are the basic units used to process text
Context windows limit how much information a model can read
LLMs sometimes hallucinate because they generate answers probabilistically
Language models are not designed to retrieve knowledge from large document collections

These limitations motivate the need for retrieval-based AI systems.

Next Module

In the next module we will examine the knowledge retrieval problem in detail. We will explore:

why traditional search systems struggle with semantic understanding
how modern AI systems retrieve relevant information
the conceptual foundations behind Retrieval-Augmented Generation

Language Models and Their Limitations

Learning Objectives

1. What is a Language Model?

2. Large Language Models (LLMs)

3. Tokens: How Models Read Text

4. Context Windows

5. The Transformer Architecture

6. Attention: Understanding Relationships Between Words

7. How Language Models Generate Responses

8. Why LLMs Sometimes Produce Incorrect Answers

9. Knowledge Limitations of LLMs

Static Knowledge

No Access to Private Data

Limited Context

10. The Core Problem

Key Takeaways

Next Module