Guest Lecture @ University of Moratuwa - LLM Penetration Testing
Delivered a session on LLM Penetration Testing — Breaking AI Systems for final-year Cyber Security students at University of Moratuwa.
🚨 Why LLM Security Matters
- LLMs don’t understand security — only text prediction
- 12+ major incidents (2023–2024)
- Attacks require zero technical skill
Natural language is the new attack surface.
🔍 LLM Attack Surface
- User Input → Prompt Injection
- System Prompt → Prompt Extraction
- LLM Core → Hallucination
- Tools → Command Injection
- External Data → Indirect Injection
⚠️ Real Incidents
- Samsung (2023): Source code leaked via ChatGPT
- Bing Chat: System prompt extracted (“Sydney”)
- DPD Bot: Manipulated to damage its own brand
💡 All attacks used simple prompts — not exploits.
🧠 Key Insight
LLMs treat all input equally
(System vs User vs Malicious)
🛡️ What Actually Works
- Allowlist actions
- Sandbox execution
- Human approval
- Input/output validation
- Least privilege
Prompts are NOT a security boundary.
🔴 Live Demo (AI Bot Exploit)
-
Bypassed 3 layers:
- Input WAF
- LLM Guardrails
- Command Filter
-
Used:
- Prompt injection
- Roleplay / impersonation
- Command tag abuse
[CMD]...[/CMD]
➡️ Result: Full system compromise
📌 Takeaway
AI systems are not hacked — they are manipulated.
Security must be enforced at the application layer, not the prompt.