This winter, I shipped Glyf — a spaced repetition platform for learning writing systems: Greek, Hangul and more to come soon. On the surface it looks like a fairly standard flashcard app. You open it, study a few characters, and come back tomorrow.
Under the hood, getting there required figuring out how to use AI tools not just as autocomplete-on-steroids, but as a genuine development system with real architecture decisions behind it. This post is about that process: what worked, what didn't, and what I'd do differently.
Start with what you're not building
The first and most important decision I made for Glyf had nothing to do with code. It was about scope.
Glyf covers syllabaries, abjads, and alphabetic scripts. It explicitly does not cover logographic systems like Kanji. That's not an oversight — Kanji has hundreds of characters with layered semantic meaning, and there are already excellent dedicated resources for it. Trying to include it in Glyf would've meant either building something mediocre or spending months on a problem that was already solved.
That kind of deliberate exclusion matters more when you're working with AI tools. Without clear scope, an AI assistant will happily help you build in every direction at once. Establishing what Glyf was not going to do gave every subsequent conversation a boundary to work within.
The tool stack and why
This project is the result of many months of playing around with every AI tool I could get my hands on — building new projects, cleaning up old ones, experimenting with what these tools actually do well versus what they just make look easy. Stepping into 2026, I wanted to build with purpose rather than novelty. The hype is still very much ongoing, but I think this year more than any before it, we're going to want to see real return on the investment in AI tools. Yes, they can code fast and tackle almost anything — but how do they fit into an existing workflow? How do you get consistent output that actually delivers value? That's the question Glyf was built to answer, at least for me.
Three tools did most of the work:
Pencil Dev for UI design. It's a native desktop design app with an agent (Opus 4.5) built in, which made it easy to spec out what I wanted through conversation rather than fighting with a complex design tool. For a solo developer, the speed of getting to something usable matters more than pixel-perfect Figma fidelity.
Claude Code for implementation. The agentic coding environment that handles the full loop: reading the codebase, writing code, running tools to verify the output, and cleaning up after itself.
Coda for documenting the process — decisions, dead ends, what changed and why.
The backend runs on Deno with Express, the frontend on SvelteKit, PostgreSQL for persistence, and PostHog for analytics. For the backend I chose Deno — it's built on Rust, which means fast startup and efficient request handling with a small memory footprint. Native TypeScript support out of the box, no configuration needed. And when it came to deploying via Docker, the final image size was noticeably leaner than a comparable Node setup. For a project like this, where the backend is serving spaced repetition state and progress data rather than doing heavy computation, that efficiency profile is exactly what you want.
Design to code via MCP
One of the more interesting new tools I got to try with this project was Pencil — it offers a new level of going from design to code, and for someone who doesn't consider himself a designer, a helping hand with the visual language of the page itself was an extra win. Getting the Pencil design into the codebase was straightforward. Pencil ships with an MCP server that Claude Code can connect to, which means you can reference the design file directly from your terminal.
The command that kicked off the implementation of the landing page looked something like this:
❯ with pencil mcp get_screenshot of Landing page in @../docs/ui.pen
and make a plan to replace current landing page /routes/+page.svelte
with the new one
What happened next was more thorough than I expected. Claude batched through all the nodes in the design file, found the one named "Landing Page," extracted the ID, then pulled a detailed screenshot of that specific node. It generated an implementation plan — and then flagged that there was a color in the design that didn't exist yet in the CSS variables file, and asked whether to hard-code it or add a new variable to follow existing conventions.
That's a small thing, but it's the right question to ask. It meant the implementation would be consistent with the rest of the project rather than introducing a one-off value.
You can also watch Claude's attention move through the design file in real time in the Pencil desktop app — it highlights the nodes it's looking at as it works.
The algorithm that made everything harder
The core of Glyf is the SM-2 spaced repetition algorithm — the same one that's powered Anki since 1988. The way it works: each card has an "easiness factor" that adjusts based on your performance. If you recall something easily, the interval before you see it again grows. If you struggle, the interval shrinks. Cards aren't just pass/fail — you rate yourself on a 0–5 quality scale, and the algorithm uses that gradient to calculate individual intervals per card.
The user-facing version of this is a four-state journey: New → Learning → Good → Mastered, with a Difficult state that can interrupt that path if you miss a recall.
Getting this to feel coherent across all those states — not just technically correct but communicatively right — was the hardest problem in the project. What does the UI show when you have no cards? When you've learned some but not all? When you've reached Mastered but your recall is slipping? Each state has a different emotional valence and needs different feedback.
The answer wasn't to prompt my way through it. It was to fully spec the feature before writing a line of code.
Context architecture, not just prompting
The most important shift in how I worked on this project came from treating context as something you design, not something that accumulates.
I used AgentOS — a set of pre-configured markdown files that structure how an agent understands a project — to create detailed feature specifications before implementing anything. For the spaced repetition system specifically, I wrote out the full user journey: every state, every transition, every edge case. What happens when a user is at the beginning of the learning phase versus the middle. How the platform communicates each of those states.
This sounds like a lot of upfront work, but it meant that when I handed the implementation to Claude Code, the context window was loaded with the right information at the right granularity. The agent wasn't guessing about what "done" meant. It had a spec to work against and could confirm when each piece was working.
The payoff was also downstream. When bugs appeared — and they did — there was documentation to trace back to. When I wanted to add a feature later, the foundation was already described in agent-readable terms.
What a cheaper model actually costs you
Partway through the project I ran a parallel experiment: implementing the same UI changes in Cursor using smaller, cheaper models versus Claude Code using Claude.
The smaller models finished faster. They also introduced more bugs.
The pattern was consistent. A cheaper model working on a UI implementation would find the shortest path to something that looked right and stop. Parsing a JSON payload: mostly correct, but with gaps in the redundancy handling. CSS consistency: close, but with one-off values that deviated from the existing system. Dead code from old attempts: left in place.
Claude Code approached the same task differently. It spent more time — reading related files, running verification steps, double-checking that the changes didn't break adjacent functionality. It removed dead code without being asked. When it wasn't sure, it asked a targeted follow-up rather than making an assumption.
Cursor's smaller models also had a specific failure mode with SvelteKit: they confused it with Dart projects often enough to be a real problem. The fix was the same one that worked everywhere else — feed the context window with project standards created by AgentOS, and the confusion rate dropped significantly.
The conclusion I reached: model capability matters less than context quality, but when context is equal, capability is not interchangeable. For the core, high-stakes features that define whether the product works, the bigger model was worth it. For peripheral additions — a new PostHog event, a UI tweak on a well-documented component — the smaller models were fine.
Running parallel instances
Once the core architecture was stable and AgentOS was set up with enough project knowledge, I started running parallel Claude Code instances for the Phase 2 features: the notification service and the onboarding flow.
This is where the upfront investment in documentation paid off most clearly. Each instance could start with enough context to work independently without needing to re-establish what the project was, what conventions it followed, or what the other instance was doing. The handoff cost went down because the ground truth lived in the spec files, not in a single running conversation.
The remaining work — testing the edge cases, making sure the platform handles every state of the SM-2 journey correctly from a user's perspective — is what comes next.
What I'd tell someone starting a project like this
The temptation when you have capable AI tools is to start building immediately. That's usually wrong.
The investment that paid the highest return on this project was the work I did before writing code. Just as the software industry has known for decades: measure twice, cut once. 80% is in the planning and preparing. The difference now is that the last 20% — which used to mean months of development after months of preparation — can be a week of running parallel instances of AI agents. Scoping what Glyf wasn't going to be, writing detailed feature specs, setting up AgentOS with enough project context to make each subsequent session more informed than the last: that's where the real leverage was.
The AI tools didn't replace the need to think carefully about architecture. They amplified the quality of that thinking when it was present, and amplified the mess when it wasn't.
Glyf is live at glyf.bravno.com. If you're learning a new script, give it a try — and if something seems off, the feedback loop is short.
Tools used: Pencil Dev, Claude Code, AgentOS, Coda. Stack: SvelteKit, Deno/Express, PostgreSQL, PostHog.