Foundations of AI Knowledge

Learning Objectives

By the end of this module you will understand:

What knowledge means in computer systems
The difference between data, information, and knowledge
How machines store and retrieve knowledge
Why traditional systems struggle with unstructured knowledge
Why modern AI systems require new approaches

This module builds the conceptual foundation required to understand Retrieval-Augmented Generation.

1. What is Knowledge?

Before learning RAG, we must first understand what knowledge means in computing systems. In simple terms:

Data → Raw facts
Information → Organized data
Knowledge → Meaningful understanding

Example:

Data:

Paris
France
Capital

Information: Paris is the capital of France.

Knowledge: Understanding that Paris functions as the political center of France.

Machines must transform raw data into usable knowledge.

2. Types of Knowledge in Computer Systems

There are two major categories of knowledge.

Structured Knowledge

Structured knowledge is organized in a predefined format.

Examples:

Databases
Tables
Spreadsheets

Example database table:

Country	Capital
France	Paris
Japan	Tokyo

This type of knowledge is easy for computers to query.

Example query:

SELECT capital FROM countries WHERE country = 'France'

The system returns: Paris

Structured data is easy to search but limited in flexibility.

Unstructured Knowledge

Most real-world knowledge is unstructured.

Examples:

Books
Research papers
PDFs
Emails
Documentation
Websites

Example paragraph:

Paris has served as the capital city of France since the 10th century and remains the country's political and cultural center.

Unlike structured data, this information cannot be easily queried using SQL. This creates a major challenge.

3. The Knowledge Retrieval Problem

If knowledge is stored as unstructured text, how do we retrieve it? Traditional search systems rely on keyword matching.

Example query: capital of france

A traditional search system scans documents and finds those containing the same words. This approach is used by many search engines including early versions of Google. However, keyword search has limitations.

Example:

Query: What city governs France?

A keyword system might fail because the words “capital” and “governs” are different. Humans understand the meaning. Machines struggle.

4. Semantic Understanding

Humans understand meaning, not just words.

Example: capital of france and which city governs france

These sentences have different words but identical meaning. Traditional search systems cannot easily recognize this relationship. Modern AI systems solve this using semantic representations. These representations allow machines to understand meaning instead of keywords.

This idea is the foundation for modern AI search systems.

5. Knowledge Systems

A complete knowledge system has three components:

Storage
Retrieval
Reasoning

Example:

Storage → documents, databases, knowledge bases
Retrieval → search systems that find relevant information
Reasoning → systems that interpret and explain knowledge

Traditional systems separate these components. Modern AI systems combine them.

6. The Rise of AI Knowledge Systems

Modern AI models can now reason over text. Examples include large language models such as:

GPT-4
Claude 3
Llama 3

These models can:

summarize documents
answer questions
generate explanations
analyze information

However, they have a major limitation. They do not have direct access to external knowledge sources. Understanding this limitation will lead us directly to the concept of RAG.

7. Key Takeaways

Important ideas from this module:

Knowledge systems must store, retrieve, and reason over information
Most real-world knowledge is unstructured
Traditional keyword search struggles with semantic meaning
Modern AI systems attempt to understand meaning rather than words
Large language models introduce new capabilities but also new challenges

These challenges lead directly to the need for Retrieval-Augmented Generation.

Next Module

In the next module we will explore how large language models work internally. Understanding how models like GPT-4 generate language will help explain why they need retrieval systems to access knowledge.