From 162K Words to 226K Lessons: How We Built the Orb Platform
In March 2026, the Orb Platform serves 162,000 words across 47 languages, 226,000 structured lessons, 21,000 assessments, and 30,000 knowledge graph connections — all from Cloudflare's edge network in under 5ms. This is the technical story of how we got here.
Architecture Overview
The Orb Platform runs entirely on Cloudflare's developer platform:
- 32 Cloudflare Workers — Stateless compute at the edge, handling routing, API logic, content generation, and page rendering
- 4 D1 databases — SQLite-based databases for words, lessons, assessments, and the knowledge graph
- 14 R2 buckets — Object storage for pronunciation audio (240,000 files), word images, Kelly avatar assets, and backups
- KV namespaces — Edge caching for hot paths (word lookups, pronunciation audio URLs)
- Cloudflare AI — On-demand inference for content classification and image generation
There is no origin server. No EC2 instance. No Kubernetes cluster. Every request is handled at the edge, typically within 10km of the user. The median response time for a word lookup is 3ms.
The Data Pipeline
Words: From Dictionary to API
The Word Orb database contains 162,000 words. Each word entry includes:
- Verified English definition
- Part of speech and IPA pronunciation
- Etymology
- Translations in up to 47 languages (native script + transliteration)
- Classification (tier, domain, complexity score)
- AI-generated visual aid (stored in R2)
- Pronunciation audio (stored in R2)
The pipeline for each word follows a multi-stage process. First, the word enters the content queue — either from a direct lookup (if the word is not in the database) or from batch processing. The scheduled Worker picks up queued words every 5 minutes, generates the full data package using Cloudflare AI for definitions and classifications, stores the result in D1, and uploads assets to R2.
The critical design decision was to make the database the source of truth, not the AI model. Once a word is generated and verified, it serves from D1 forever. The AI is used for authoring, not serving. This means responses are deterministic — the same word returns the same data every time — and we can guarantee content quality because we have reviewed what we serve.
Lessons: The 5-Phase Pedagogical Structure
Every lesson in the Orb Platform follows a 5-phase structure refined over 20 years of education research:
- Hook — Grab attention in 2-3 sentences. Create curiosity, not confusion.
- Story — Teach through narrative in 4-6 sentences. Humans remember stories, not facts.
- Wonder — Spark the "why" in 3-5 sentences. Transform passive reception into active inquiry.
- Action — Something to try right now in 3-5 sentences. Learning requires doing.
- Wisdom — Land the takeaway in 2-3 sentences. What will the learner carry forward?
This structure is not arbitrary. It maps to well-established learning science: attention capture (Hook), narrative memory encoding (Story), metacognitive activation (Wonder), experiential learning (Action), and long-term memory consolidation (Wisdom). Each phase targets a different cognitive process, and the sequence is designed to build on the previous phase.
226,000 lessons are generated across 3 tracks (Learn, Grow, Teach), multiple age groups (kid, teen, adult, elder), and 10 teaching archetypes that rotate daily. The archetype system ensures instructional variety — a Scientist archetype teaches through evidence and experiments, while a Storyteller archetype uses narrative and analogy. Same concept, different pedagogical approach, rotating on a daily cycle.
The Knowledge Graph: 30,000 Connections
The Knowledge Graph connects words to lessons to assessments. When a user looks up "photosynthesis," the graph returns not just the definition but which lessons teach it, which assessments test it, and which related words (chlorophyll, carbon dioxide, light energy) form a learning cluster.
The graph is stored in D1 as an adjacency list with weighted edges. Connections are typed: word-to-word (semantic similarity), word-to-lesson (appears in), lesson-to-assessment (tests), and cross-language (translation equivalence). The weighting allows the API to return the most relevant connections first, enabling AI agents to build coherent learning paths rather than random word lists.
The Sovereign Infrastructure Thesis
The Orb Platform does not depend on any single AI provider. This is a deliberate architectural decision, not an accident of technology choice.
Consider the dependency chain of a typical EdTech API: content generated by GPT-4 (OpenAI), served from AWS (Amazon), translated by Google Translate API (Google), with audio from ElevenLabs. If OpenAI changes its content policy, your definitions change. If AWS has an outage, your service is down. If Google deprecates a translation endpoint, your multilingual support breaks.
The Orb Platform uses AI models during the authoring phase but stores all generated content in our own D1 databases. Audio pronunciation is generated locally using Kokoro TTS running on our own RTX 5090 hardware, then uploaded to R2. Translations are generated using Groq's free tier, then stored permanently. The models are tools in the authoring pipeline, not runtime dependencies.
This means we can survive any single provider shutting down, changing pricing, or altering their content policy. Our data is ours. Our infrastructure is ours. Our mission cannot be blocked by a vendor decision.
Edge Performance: Why 5ms Matters
Cloudflare Workers execute in every one of Cloudflare's 300+ data centers worldwide. When a robot in Tokyo makes an API call, it hits the Tokyo edge node. When a chatbot in São Paulo queries a word, it hits São Paulo. There is no round-trip to us-east-1.
The practical impact: a word lookup takes 3ms median, 8ms at P99. A full lesson retrieval (5 phases, metadata, audio URLs) takes 5ms median. These numbers matter when your product is a real-time conversation between a robot and a human. A 200ms API call is a noticeable pause. A 3ms API call is invisible.
We achieve this through aggressive caching (KV for hot words, R2 for audio, CDN for images) and by keeping the compute simple — database reads, not model inference. The Worker code for a word lookup is essentially: read from KV cache → if miss, read from D1 → return JSON. No chained API calls, no model inference, no complex orchestration.
Scaling to 8 Billion
The Orb Platform is designed for a world where every robot, every educational app, and every AI agent needs multilingual language capability. The architecture scales horizontally by default — Cloudflare Workers auto-scale to any request volume, D1 replicates globally, and R2 serves assets from the nearest edge.
The content pipeline scales linearly: more languages require more generation cycles, but the per-word cost is fixed and the infrastructure cost is near-zero thanks to Cloudflare's generous free tier for Workers and D1. We currently generate content at approximately 28,800 lessons per day across 18 non-English languages.
The next frontier is offline delivery. We are building iLearn — a purpose-built educational computer that ships with the full Orb database pre-loaded. A child in a village without internet connectivity gets the same 162,000 words, 226,000 lessons, and 21,000 assessments as a student in Manhattan. The edge becomes the device itself.
This is infrastructure built for the long game. Not a startup looking for an exit. A public benefit corporation building language infrastructure that will serve 8 billion learners — and every robot that teaches them — for as long as language matters.