Lecture 08 — RAG, AI Agents & Agentic Multimodal Systems
~4–6 hours (modern AI systems design)
🧠 Why This Lecture Changes Everything
Models are not intelligent alone.
Systems are.
RAG and AI Agents represent a shift:
- ❌ From static models
- ✅ To interactive, grounded, tool-using intelligence
This lecture connects:
- LLMs
- Multimodality
- Memory
- Tools
- Reasoning
- Real-world deployment
🧩 What Is Retrieval-Augmented Generation (RAG)?
RAG = Knowledge + Reasoning
Instead of forcing the model to:
- memorize everything
- hallucinate confidently
We let it:
- Retrieve relevant information
- Reason over it
- Generate grounded answers
🔁 Classical LLM vs RAG
| Aspect | Classical LLM | RAG |
|---|---|---|
| Knowledge | Frozen | Dynamic |
| Hallucination | High | Lower |
| Updates | Retrain | Re-index |
| Traceability | Poor | Strong |
| Enterprise-ready | ❌ | ✅ |
RAG turns LLMs into “open-book thinkers.”
🧠 Core RAG Pipeline
User Query
↓
Embedding
↓
Retriever (Vector DB)
↓
Relevant Context
↓
LLM Reasoning
↓
Answer + Citations
📦 What Can Be Retrieved?
- 📄 Documents
- 🖼 Images
- 🎥 Videos
- 🧾 Tables
- 📊 Logs
- 🧠 Memories (Agent state)
Multimodal RAG = cross-modal retrieval + reasoning
🧠 Embeddings: The Heart of RAG
Embedding models map meaning → vectors.
Examples:
- Text: sentence-transformers
- Image: CLIP
- Video: InternVideo
- Document: Layout-aware embeddings
Good retrieval beats bigger models.
🐍 Python: Minimal RAG Example
query = "What is transformer attention?"
q_emb = embedder.encode(query)
docs = vector_db.search(q_emb, top_k=5)
context = "\n".join(docs)
answer = llm.generate(
prompt=f"Answer using the context below:\n{context}\n\nQuestion:{query}"
)
⚠️ Common RAG Failure Modes
- Retrieving irrelevant chunks
- Context too long
- Context ignored
- Conflicting documents
- Over-trusting retrieved text
Mitigation:
- Chunking strategy
- Reranking
- Instruction tuning
- Answer verification
🤖 What Is an AI Agent?
An agent is an LLM that can act.
Agent abilities:
- Decide next action
- Use tools
- Store memory
- Observe outcomes
- Iterate
🧠 Agent Loop (Canonical)
Observe → Think → Act → Reflect → Repeat
This is not prompting — it is control flow.
🧩 Agent Components
| Component | Role |
|---|---|
| LLM | Reasoning |
| Memory | State |
| Tools | Actions |
| Planner | Decomposition |
| Executor | Tool calling |
| Critic | Self-evaluation |
🛠 Tools an Agent Can Use
- Search engines
- Databases
- Code execution
- APIs
- OCR
- Vision models
- File systems
Tools extend intelligence beyond tokens.
🐍 Python: Simple Agent Skeleton
while not task_done:
thought = llm.think(state)
action = planner.select(thought)
result = tools.run(action)
state.update(result)
🧠 What Is Agentic AI?
Agentic AI means:
- Long-horizon goals
- Autonomous planning
- Self-correction
- Tool orchestration
- Memory persistence
Examples:
- Research agents
- Coding agents
- Multimodal assistants
- Auto-analysts
🔗 RAG + Agents = Power
RAG answers questions. Agents decide what to retrieve and why.
Agent
├── Query RAG
├── Verify answer
├── Ask follow-up
├── Use tools
└── Deliver result
This is how real AI systems are built today.
🧠 Multimodal Agent Example
Task:
“Analyze this traffic video and explain why the accident occurred.”
Agent flow:
- Extract video frames
- Retrieve traffic rules (RAG)
- Detect events
- Reason causality
- Generate explanation
⚠️ Risks of Agentic Systems
- Tool misuse
- Infinite loops
- Overconfidence
- Hidden failures
- Alignment drift
Mitigation:
- Guardrails
- Cost limits
- Human approval
- Logging
- Evaluation
📏 Evaluating RAG & Agents
RAG Evaluation
- Retrieval recall
- Faithfulness
- Answer correctness
- Citation accuracy
Agent Evaluation
- Task success rate
- Steps efficiency
- Error recovery
- Human satisfaction
🧠 Research Insight
Intelligence is no longer inside the model It is distributed across systems
The future:
- Smaller models
- Better retrieval
- Smarter agents
- Human oversight
🧪 Student Knowledge Check (Hidden)
Q1 — Objective
What problem does RAG primarily solve?
Answer
Hallucination and static knowledge.
Q2 — MCQ
Which is NOT a core agent component?
A. Memory B. Planner C. Tool interface D. Dataset labeler
Answer
D. Dataset labeler
Q3 — MCQ
Why combine RAG with agents?
A. Reduce cost B. Improve UI C. Enable decision-driven retrieval D. Increase model size
Answer
C. Enable decision-driven retrieval
Q4 — Objective
What is agentic AI?
Answer
AI systems that plan, act, use tools, and self-correct toward goals.
Q5 — Objective
Why is human oversight important for agents?
Answer
To prevent unsafe, incorrect, or misaligned actions.
🌱 Final Reflection
If AI agents can act autonomously, what must humans always control?
Goals, values, boundaries, and accountability.
✅ Key Takeaways
- RAG grounds intelligence
- Agents enable action
- Agentic AI is system-level intelligence
- Multimodal agents are the future
- Humans must remain in the loop