Instant Essays

A backend for AI-coded apps

Joe Averbukh, Stepan Parunashvili, Daniel Woelfel, Drew Harris — Thu, 26 Mar 2026 00:00:00 GMT

After 4 years, we’re releasing Instant 1.0!

Instant turns your favorite coding agent into a full-stack app builder. And we’re fully open source. ^[1]

Our claim is that Instant is the best backend you could use for AI-coded apps.

In this post we’ll do two things. First we’ll show you a series of demos, so you can judge for yourself. Second, we’ll cover the architecture.

The constraints behind a real-time, relational, and multi-tenant backend pushed us towards some interesting design choices. We built a multi-tenant database on top of Postgres, and a sync engine in Clojure. We’ll cover how all this works and what we’ve learned so far.

Let’s get into it.

Demos

When you choose Instant you get three benefits:

You can make unlimited apps and they’re never frozen.
You get a sync engine, so your apps work offline, are real-time, and feel fast.
And when you need more features you have built-in services: auth, file storage, presence, and streams.

To get a sense of what we mean, I’ll dive into each point and show you how they look.

Unlimited Apps

Traditionally, when you want to host apps online you either pay for VMs, or you’re limited. Many services cap how many free apps you can make, and freeze them when they’re idle. Unfreezing can often take more than 30 seconds and sometimes a few whole minutes.

We thought this sucked. So with Instant, you can spin up as many projects as you like and we’ll never freeze them.

We can do this because Instant is designed to be multi-tenant. When you create a new project, we don’t spin up a VM. We just insert a few database rows in a multi-tenant instance.

If your app is inactive, there are no compute or memory costs at all. And when it is active, it’s only a few kilobytes of extra RAM in overhead — as opposed to the many hundreds of megabytes required for VMs.

This means you can truly create unlimited apps. In fact, the process is so efficient that we can create an app for you right inside this essay. No sign up required.

If you click the button, you’ll get an isolated backend:

And with that we have our backend. Including the round-trip to your computer, the whole process takes a few hundred milliseconds. Actual time:

You get a public App ID to identify your backend, and a private Admin Token that lets you make privileged changes. This gives you a relational database, sync engine, and the additional services we mentioned, like auth and storage.

Combine limitless apps with agents, and you’ll start building differently. Today you can already use agents to make lots of apps. With Instant you’ll never be blocked from pushing them to production.

Sync Engine

But once you create an app, how do you make it good?

It’s easy to build a traditional CRUD app. Just get an agent to wire up some database migrations, backend endpoints, and client-side stores. But it’s hard to make these apps delightful.

Compare a traditional CRUD app to modern apps like Linear, Notion, and Figma. Modern apps are multiplayer, they work offline, and they feel fast. If you change a todo in Linear, it changes everywhere. If you go offline in Notion, you can still mark up your docs. When you color a shape in Figma, it doesn’t wait for a server, you just see it.

These kinds of apps need custom infrastructure. For real-time you add stateful websocket servers. For offline mode you store caches in IndexedDB. And for optimistic updates, you figure out how to apply and undo mutations in the client.

Linear, Notion, and Figma all built custom infra to handle this. As an industry we’ve called their infra sync engines ^[2]. Developers write UIs and query their data as though it was locally available. The sync engine handles all the data management under the hood.

If modern apps need sync engines, then you shouldn’t have to build them from scratch each time.

So we built a generalized sync engine in Instant. Every app comes with multiplayer, offline mode, and optimistic updates by default.

You can try it yourself. Since we’ve created our isolated backend, let’s go ahead and use it:

What you’re seeing are two iframes that render a todo app. They’re powered by the backend you just created (we passed the iframes your App ID).

Now if you add a todo in one iframe, it will show up in the other. If you go offline, you can make changes and they will sync together. You can try degrading your network, and changes will still feel fast.

And here’s what the todo app’s backend code is like:

That’s about lines. This is even more concise than if you had built a traditional CRUD app. You would have needed to write backend endpoints and frontend stores. Instead you just make queries and transactions directly in your frontend.

db.useQuery lets you write relational queries and they stay in sync. db.transact lets you make changes and it works offline.

This is better for you as a builder: the code is understandable and it’s easy to maintain. It’s better for your users: they get a delightful app. And it’s better for your agents. Sync engines are a tight abstraction ^[3], so agents can use them to write more concise code with fewer tokens and fewer mistakes.

Additional Services

You saw data sync, but it doesn’t stop there. Apps often need more than data sync.

For example, right now every person who opens our demo app sees the same set of todos. What if we want to add auth or permissions? We may also want to support file uploads, or a “who’s online” section. Or heck maybe we add an AI assistant, and would need infra to stream tokens to the client.

These are common features that most apps need. But often we have to string together different services to get them. Not only is that annoying, but it introduces a new level of complexity. When you manage multiple services, you manage multiple sources of truth.

So to make it easier to enhance your apps, we baked in a bunch of common services inside Instant. Each service is built to work together as a single, integrated system.

To get a sense of these services, let’s look at our todo app again, but this time we’ll add support for file uploads:

What would be the traditional way to do this? We would first create a files table in our transactional database, and link it to todos. But then we would need to store the actual file blobs, so we’d probably add S3.

Once we add S3, we have multiple sources of truth to deal with. If we delete a todo for example, we’d need to run a background worker to get rid of the corresponding blob in S3.

With Instant, all of this is a non-issue.

You get File Storage by default, and file objects are just rows in your database. They’re just like any other entity: you can create them, link them to other data, and run real-time queries against them.

This means you can even create CASCADE delete rules, so you can say “when you delete todos, delete files”. There’s no need for background workers. Instead of multiple sources of truth, you get one integrated database. The shared infra handles all the edge cases under the hood ^[4].

And this is just Instant Storage. You also get Auth. You can use Magic Codes, OAuth, and Guest Auth out of the box. Plus when your users sign up, they’re just rows in your database too.

If you want to share cursors, typing indicators, or ‘who’s online’ markers, you can use Instant Presence.

And if you need to share durable streams, you get, well, Instant Streams.

If you’re curious, we have a bunch of real examples you can play with in the recipes page. You’ll notice that most of these services require little setup and little code. Both you and your agents can move faster and make your apps feature-rich. You don’t have to scour for different providers and deal with bi-directional data sync.

Bonus: What you can do, your agent can do

Throughout this essay, you may have wondered, how do all these demos work?

Well, Instant is completely programmatic. You can create apps, push schemas and update permissions either through an API or a CLI. This essay uses the API, but likely your agents will use the CLI.

Most of the time you don’t have to click any dashboards. Your agents can just take actions on your behalf.

At this point, we hope you’re excited enough to sign up.

You technically don’t even need to sign up to play around, but we do notice that if you do, you’re more likely to stick around. So we really encourage you to!

And if you want to get your agents playing with Instant right away, here are a few things you can do:

# This scaffolds a new starter for you in either NextJS, Tanstack, Bun, Vite, or Expo
# Your agent will have everything it needs to build
npx create-instant-app

# If you have an existing app, you can also add our handy skill and tell your agent
# to make some new features
npx skills add instantdb/skills

And with that, we can dive into the architecture that powers all of this.

Architecture

There are three unique things about how Instant works. We have the Client SDK, the Clojure Backend, and the Multi-Tenant Database.

Your app sends queries and transactions directly to the Client SDK. It’s responsible for resolving your queries offline, and for applying transactions as soon as you make them.

The Client SDK then talks to The Clojure Backend. The Clojure Backend keeps queries real-time. It takes transactions and figures out which clients need to know about them. It also implements all the additional services: permissions, auth, presence, storage, and streams.

Finally, The Clojure Backend sends queries and transactions to a single Postgres Instance. We treat Postgres as a multi-tenant Triple store, and logically separate every database by App ID.

That’s the sketch of our system. Now let’s get deeper.

The Client SDK

The design behind the Client SDK is motivated by two constraints: we need a system that works offline, and we need it to support optimistic updates.

Here’s roughly where we ended up:

IndexedDB

Let’s start with the most obvious box. If we want to show the app offline, we need a place for data to live across refreshes.

For the web you don’t have too many choices. IndexedDB is the best candidate. You can store many megabytes of data, and you even have some limited querying capabilities.

So we chose IndexedDB ^[5]. The next question was, what kind of data would we store there?

Triple store

Consider a query like “Show me all the open todos and their attachments”. This is how you would write it in Instant:

{
  todos: {
    $: { where: { done: false } },
    attachments: { },
  },
}

If we just wanted a read-only cache, we could store whatever the server returns to us. But we don’t just want a read-only cache.

We need the client to respond to actions before the server acknowledges them. If a user adds a new todo for example, our query should just update right away.

That means the client needs to understand queries. So then what our client really needs is a database itself. A database that can handle where clauses (i.e., ‘done is false’), and relations (‘todos and their attachments’).

One option would have been to use SQLite. We could store normalized tables there — like todos, and files — and run SQL over them. But this was too heavy. SQLite is about 300 KB gzipped. For most apps it wouldn’t make sense to add such a heavy dependency.

After some sleuthing though we discovered Triple stores and Datalog.

Triple stores let you store data as [entity, attribute, value] tuples. Here’s what todos would look like inside a Triple store:

This uniform structure can model both attributes and relationships. Once data is stored in this way, you can use Datalog to make queries against it.

Datalog is a logic-based query engine. Here’s what that looks like:

The syntax looks weird, but Datalog is powerful. It can support where clauses and relations just as well as SQL. And it’s simple to implement. In fact, you can write a basic Datalog engine in less than a hundred lines of code ^[6].

So we built a Triple store and a Datalog engine. This lets us evaluate queries completely in the client, without having to wait for the server.

If a user creates a new todo, we have what we need to re-run the query and observe the change right away. Well, almost. We need a way to apply changes to our query.

Pending Queue

We can’t just mutate the result in place. We have to be mindful of the server too.

For example, what would happen if the server rejects our transaction? If we mutated the query result, there would be no way for us to undo the change. ^[7]

That’s where the Pending Queue comes in. When a user makes a change, we don’t apply it directly to the Triple store. Instead we track the change in a separate queue.

To satisfy any query, we can apply pending changes to our triple store, and see the result:

This choice pushes us to make our Triple store immutable. This way we can apply the change and produce a new Triple store, rather than mutating the committed one. To make this work, we wrap the transact API with mutative, a library for immutable changes in Javascript ^[8].

With that we have undo. If the server returns a failure, we simply remove the change from the pending queue and undo works out of the box.

Bonus: InstaQL

You may have noticed that Instant queries don’t look like Datalog though. Instead they’re written in a language we call InstaQL:

{
  todos: {
    $: { where: { done: false } },
    attachments: { },
  },
}

We made this because we thought that the most ergonomic way for apps to query for data was to describe the shape of the response they were looking for.

This idea was heavily inspired by GraphQL. The main difference with our implementation is syntax sugar. Instead of introducing a specific grammar, InstaQL is built on top of plain javascript objects. This choice lets users skip a build step, and it lets them generate queries programmatically ^[9].

Reactor

With that, we have a somewhat full view of the Client SDK!

Users write InstaQL queries, which get turned into Datalog. Those queries are satisfied by Triple stores, which combine changes from a pending queue. Data gets cached to IndexedDB.

That’s a lot of interesting choices generated from just two constraints!

The final question on the client is this: how do all these boxes tie together?

That’s where the Reactor comes in. It’s the main state machine that coordinates all these different processes. When an app wants a query, the Reactor is responsible for looking at IndexedDB, and for communicating with the server. It handles when the internet goes offline or pending changes fail.

The Reactor communicates to the server through websockets. It sends requests for queries and transactions, and the server sends results and novelty from the database.

Which brings us to the server.

Clojure Backend

The design behind the backend is motivated by two constraints: we need to make queries reactive, and we need to be fair about multi-tenant resources.

Here’s roughly how the system looks:

Query Store

Let’s start by thinking through what happens when a user asks for a query.

First the server can go ahead and ask the database. In a stateless system that would be just about the end of the story. We could return our response and call it a day.

But remember, our queries have to be reactive. For that we need a place to store which users have made which queries. That’s what the Query Store is for:

If we were to track just the queries and the socket connections that asked for them, in principle we would have what we need to make an app reactive. For example we could tail every transaction and refresh every query. That would work, but our database would get hammered with lots of spam.

Ideally, we should only change queries that need to be changed.

Topics

We scoured around for ideas, and found the architecture behind Asana’s Luna ^[10] and Figma's LiveGraph ^[11] very promising. Asana wrote about how they turn queries into sets of “topics”. Roughly, a topic describes the part of the index that the query in question cares about.

For something like “Give me all todos”, you could imagine a topic that says: “Track all updates to the TodosIndex”.

We adapted this idea into our system. When we run queries, we also generate a set of topics that it cares for:

Here’s our topic for “Watch all todos”:

Now we have a data structure we can use to describe the dependencies for a query. The next step is to track transactions and find these affected queries.

Invalidator

That’s where the invalidator comes in. The invalidator tracks Postgres’ WAL (Write-Ahead Log).

We can take WAL entries and generate topics from them too. For example, if we had an update like “Set todo.done = false for id = 42’”, we could transform it:

This gets us the exact same kind of topic structure that our queries make. Now we can match them together, and discover what’s stale:

Our version zero for this algorithm was very inefficient. We would effectively do an N^2 comparison from every transaction topic to every query topic. But you can intuit how these topic vectors are amenable to indexes. We now keep them in a tree-like structure. We only compare subsets and we prune early. ^[12]

With that we can take a WAL entry and refresh queries based on them. The next step is to parallelize.

Grouped Queues

Since our database is multi-tenant, our WAL includes updates from multiple apps.

In order for the invalidation algorithm to work, transactions within a single app have to be processed serially and in order. But, we can certainly parallelize invalidations across different apps.

We needed some way to guarantee order within a single app and parallelize across apps. We also needed to make sure that one high-traffic app didn’t hog all resources.

This is where the Grouped Queue abstraction comes in:

Each app gets its own subqueue. This guarantees that all items for a particular app are handled serially.

Workers however can take from multiple different subqueues. This lets us parallelize invalidations across apps.

When we push a WAL entry into the grouped queue, it gets added to the app’s subqueue, but the global order of the subqueue does not change. This makes it so even if one app is adding thousands of items per second, other apps still get an equal chance to get picked up by an invalidator.

This data structure has turned out to be very useful for us, and has seeped all across the code base, including the Session Manager.

The Session Manager, and Praise for Clojure and the JVM

Which brings us to the main coordinator inside the system. When the Client SDK opens up a websocket connection, it’s the session manager that picks up the messages:

The Session Manager’s job is to glue everything together. It makes reactive queries, it runs permissions, and it passes along requests to the other services.

Notice the Grouped Queue abstraction makes an appearance here too. If different clients start bombarding the backend, the Grouped Queue makes sure to both parallelize as much as possible, and to prevent one bad socket from hogging all the resources.

And with this it may be the right place to pause and praise Clojure and the JVM. They’ve been a huge win for us in building this infrastructure.

First, Clojure comes with great concurrency primitives and has real threads. This lets us scale further with bigger machines and helped us avoid splitting the system up too early. The abstractions are also really simple and easy to compose. Our grouped queue for example is only 215 lines of code ^[13]

Second, the JVM has a thriving ecosystem and we really enjoy the libraries. For example, we needed a way for users to define permissions inside Instant. We wanted a language that would be fast and easy to sandbox. After some searching, we discovered Google’s CEL. Thankfully CEL Java was available, and we could just pick it off the shelf.

And third, Clojure is great for DSLs and for experimental programming. When we started building Instant we had to discover a lot of these abstractions, and playing with them in the REPL was instrumental.

Many folks deride DSLs but I think we couldn’t have built Instant without them. Case in point: multi-tenant queries. We needed to make our database multi-tenant. To do that we would need to write some pretty complex SQL. Rather than do this by hand, we made a DSL that both made it easy to reason about, and guaranteed that you could pass in an App ID.

And this brings us to the Multi-Tenant Database.

The Multi-Tenant Database

Our database was also motivated by two constraints: we needed a way to spin new databases cheaply, and we needed it to be relational.

Here’s where we ended up:

The Triples Table

Let’s start with the question: how can we let users create lots of different databases?

The most straight forward path would have been to spin up Postgres VMs. But as we mentioned, VMs come with lots of overhead in RAM. There’s no sustainable way to support unlimited apps if you’re spinning up VMs.

Another option would have been to use Postgres schemas. We could have created different tables for different apps, and then kept a mapping of who can see what. This would work, but Postgres wasn’t designed to scale well with tables. From our research we saw that after about 6000 tables, Postgres starts having issues: you get problems with how many files get created on disk, and pg_dump and autovacuum starts failing.

This makes sense. The average Postgres app has a few big tables, not many small tables, which means big tables get optimized. Well, if big tables work, what if we reframed this problem into a giant table?

And this brings us back to…Triple stores!

They worked well on the client because they’re a simple DB that supports relational queries. We thought this could work well for us in Postgres too. So we added a triples table:

All the data lives in a single triples table, and they’re logically isolated by an app_id.

If we wanted to get post_1 from the app blog for example, we could generate a SQL query that looks roughly like this:

select *
from triples
where app_id = 'blog' and entity_id = 'post_1' and attr_id in (posts/id, posts/title)

With that, creating a new database is effectively free. Just as we mentioned in the demos, it’s a few rows in the database.

Surprising benefits

Our choice came with some surprising benefits too.

Since we manage columns ourselves, we were able to optimize the developer experience.

For example, Postgres locks the table when you create a column. Since we implemented columns ourselves, we could make them lock-free.

When you delete a column in Postgres, the data is gone. But we thought this was way too dangerous in the world of agents. So we implemented soft deletes at the column level. Even if a rogue agent deletes your columns, you can undo it and get all your data back in milliseconds.

These were the benefits, but of course there were costs too.

Partial Indexes

Consider a user who says, “I want my posts to have a unique ‘slug’”. In Postgres it’s easy to create unique columns. But since we’re implementing our own columns, we have to do this ourselves.

This is where partial indexes came to the rescue. We could add boolean markers to our triples table:

table_name: triples
app_id | entity_id | attr_id | value | column_unique | ...

Once we have that, we can create a partial index for the whole table, flipped on by the marker:

create unique index unique_columns
  on triples(app_id, column, value) where column_unique

Now if a user tries to insert two posts with the same slug:

app_id  | object_id | column | value   | column_unique
'blog' | 1         | 'slug' | 'hello' | true
'blog' | 2         | 'slug' | 'hello' | true

The unique_columns index triggers and prevents it!

And this same trick makes our queries more efficient. If we want to find posts with the slug ‘hello’ for example, we can generate this query:

select entity_id
from triples
where app_id = 'blog' and attr_id = 'slug' and value = 'hello' and column_unique;

And we can extend this pattern to a whole range of queries: unique columns, indexes, dates, references, and so on.

Just using partial indexes and relying on Postgres to make the right queries worked great for us for a while. But after we reached a few hundred million tuples in scale, Postgres started having troubles.

Count-Min Sketches

If you are a Postgres expert reading this, you may have taken a pause looking at that triples table. In Postgres circles this is called the EAV pattern, and is generally discouraged.

It’s discouraged because Postgres relies on tables and columns for statistics.

Those statistics are what let the query planner decide which indexes are most efficient and which joins to do in what order.

Once you keep all data in one table, Postgres loses information about the underlying frequencies in the dataset. It can't tell the difference between a column with 10 distinct values and one with 10 million.

To solve for this, we started keeping track of our statistics. We use a data structure called count-min sketches, which help us estimate frequencies for columns. If you’re curious about how that works, we wrote an essay about it ^[14].

We could give those statistics to our query engine, and make those queries efficient again.

The Query Engine

Which brings us to the query engine.

So far I’ve been showing you SQL queries that are simple and easy to understand. But imagine translating more complicated InstaQL queries. Even a query with one where clause will start to have CTEs in them. And then you’ll want to use those statistics to decide which indexes to turn on.

That’s what the query engine does. It takes InstaQL queries as well as the count-min sketches, and generates SQL query plans:

This engine is written in the Clojure backend. We took a lot of inspiration from Postgres’ own query engine. Sometimes these queries can look scarily long, but we have been so darn surprised with how well Postgres can handle them. We pass in some hints with pg_hint_plan, and Postgres just churns away and produces results.

Four Years in the Making

And that covers the database, which covers our whole system!

We hope you found this fun! This has been a labor of love. We’ve built Instant because we want to power the next generation of builders. Any product we build, we built with Instant, and thousands of developers have trusted to run their core infrastructure.

If you're building with agents, I think you will love using us.

We hope you give us a try, and join us on Discord.

^[1]: Every single line of code behind the company lives on GitHub, including this post

^[2]: Nikita wrote a great blog post about this here

[3] LLMs have already learned about Instant in their training data, but there really isn’t that much to learn. Queries and transactions have a predictable DSL.

[4] Fun fact, your files are still stored in S3. Since both services are built together though, the system can handle bi-directional data sync on your behalf!

[5] On React Native we use react-native-async-storage, because it's available on Expo Go. The API for storage is pluggable though, so you can replace this pretty easily.

^[6]: Check out Datalog in Javascript

^[7]: There would be a lot more problems too. Check out the sync engine page, especially the conflict resolution demo.

^[8]: https://mutative.js.org/ -- it's a great library!

[9] This came very handy in our Explorer page. You can switch around a bunch of filters, and we'll dynamically generate the query for it.

^[10]: See this post to get started on the rabbit hole.

^[11]: See this great essay

[12] We do some even more cool things. For example we take where clauses and transform them into little programs for additional filtering.

^[13]: Check out the source!

^[14]: Check out Count-Min Sketches in JS

Counter-Strike Bench: GPT 5.3 Codex vs Claude Opus 4.6

Stepan Parunashvili — Thu, 05 Feb 2026 00:00:00 GMT

We're Instant. We give you and your agents unlimited databases and backends. Build whatever you like, from your next startup to an alternative Counter-Strike. Sign up and build your first app in minutes

GPT 5.3 Codex and Claude Opus 4.6 shipped within minutes of each other. How well do they compare building a multiplayer Counter-Strike?

We tried and found out. Here's how the gameplay feels:

And if you're curious, you can play it yourself:

GPT 5.3 Codex's attempt: https://53strike.vercel.app/
Claude Opus 4.6's attempt https://46strike.vercel.app/

We have a full recording of us building it here.

Here's what surprised us:

Both models were a leap over any previous generation

You can compare the results with our last benchmark. These models made much more realistic maps on the first try. Their weapons were better. And they got much more right on the first shot. Codex had some issues with accounting for HP under respawns, and Claude has some issues spawning inside obstacles. But a simple paste got them fixing it. At no point did they get stuck and require guidance.

GPT 5.3 Codex was much faster than Claude

In just about every prompt GPT 5.3 Codex finished in about half the time. This could be because of the harness: We noticed Claude Code did much more upfront research than Codex.

Claude Opus 4.6 performed better on 5/6 prompts

But perhaps the upfront research came to good use, because Claude Opus 4.6 beat out GPT 5.3 Codex on all prompts but one (and the last one was a tie).

	GPT 5.3 Codex	Claude Opus 4.6
Frontend
Boxes + Physics	🥈	🥇
Gun + Creativity	🥈	🥇
Sounds + Animations	🥈	🥇
Backend
Multiplayer	🥈	🥇
Maps	🥈	🥇
Bonus	🤝	🤝

Claude drew more interesting maps. Claude made a nicer weapon. The gameplay UI was much nicer on Claude's first try.

Both models struggled a bit with physics

At this point, neither model had issues drawing out the UI, setting up the backend, or getting caught up in bugs from three.js. The frontier now seems to be about physics.

For example, Claude generated maps where players could end up stuck. Here's Claude's "inferno valley" and "nuke zone" produced 4-wall obstacles in the center:

There would be no way for users to leave. Codex also had trouble with direction. The enemy's "point of view" was coming out from the back of their had, rather than the front.

With both models you could shoot through obstacles. Claude Opus 4.6 at least made it so you couldn't walk through the obstacles -- but with Codex you could.

With either one, it was fun to build and play!

Team features are free through the end of Februrary

Instant — Mon, 19 Jan 2026 00:00:00 GMT

February is all about celebrating the people you care for, so we're making team features free through the end of February. You can invite as many members as you like to your free Instant apps and orgs as long as you add them before the end of February.

At the end of of February, the members you added will still be able to access your app. You just won't be able to add new members until you convert to a paid app.

This is an experiment for us. We're hoping that free teams will encourage people to add their team members to their Instant apps earlier. It takes a lot to convince your coworkers to use a new database. If we make it easy to bring them in earlier, before you're required to put down a credit card, will companies be more likely to adopt Instant for their projects? We're going to find out!

GPT 5.2 on the Counter-Strike Benchmark

Stepan Parunashvili — Fri, 12 Dec 2025 00:00:00 GMT

About 2 weeks ago we asked Codex 5.1 Max, Claude 4.5 Opus, and Gemini 3 Pro to build Counter Strike. It had to be a 3D UI, and it had to be multiplayer.

How good of a job does GPT 5.2 do at this task?

Here's the TL:DR: Even though GPT 5.2 is not a coding model, it did better than Codex 5.1 Max on almost every prompt. GPT 5.2 was still behind Claude on frontend changes, but it began to go toe-to-toe with Gemini on the backend.

You can try out the version that GPT 5.2 built here:

GPT 5.2's first attempt: https://codex52strike.vercel.app/
A second version with a map reminiscient of de dust 2: https://codex52dust.vercel.app/

Here's a full video of us going through the build, but for those who prefer text you get this post.

Overview

We evaluated GPT 5 on the Codex CLI set to medium. ^[1]. All prompts were the same as our last benchmark post. Take a look at how the leaderboard ^[2] changed with GPT 5.2:

Claude still holds the top position on frontend changes (maps, characters, and threejs).

But GPT 5.2 did much better than it's predecessor overall. GPT 5.2's frontend changes were noticeably better than Codex 5.1 Max.

And the backend changes were about as good as Gemini 3 Pro: both of them effectively one shotted multiplayer positions, shots, and maps.

You can see for yourself: let's dive into the prompts.

1. Boxes and Physics

The first thing we asked it to do was build a basic 3D map with polygons.

Prompt

I want you to create a browser-based version of counter strike, using three js.

For now, just make this local: don't worry about backends, Instant, or anything like that.

For the first version, just make the main character a first-person view with a cross hair. Put enemies at random places. Enemies have HP. You can shoot them, and kill them. When an enemy is killed, they respawn.

Make everything simple polygons -- rectangles.

GPT 5.2 got one type error in it's first try. But we pasted the error back it built a working frontend.

Here's GPT 5.2 versus it's predecessor:

Codex 5.1 Max	GPT 5.2

GPT 5.2 makes clear improvement over Codex 5.1 Max. If you compare this to Claude and Gemini, we think Claude still did the best job. The map and the lighting look the most interesting. But at this point it feels like GPT 5.2 did about as well as Gemini:

Claude 4.5 Opus	Gemini 3 Pro

2. Characters

The next challenge was to make the characters more interesting. Instead of a simple box, we wanted enemis that looked like people:

Prompt

I want you to make the enemies look more like people. Use a bunch of square polygons to represent a person, and maybe a little gun

GPT 5.2 improved a bunch here:

Codex 5.1 Max	GPT 5.2

That's a noticeable improvement in our book. If you compare to Claude and Gemini, it feels like Claude still wins, but GPT 5.2 is about as good as Gemini again:

Claude 4.5 Opus	Gemini 3 Pro

3. Gun in our field-of-view

Next up was adding a gun in our field of view alongside an animation when we shoot:

Prompt

I want you to make it so I also have a gun in my field of view. When I shoot, the gun moves a bit.

We didn't notice much of an improvement here. In fact, GPT 5.2 had an error, when 5.1 Max got it done in one shot. Here's the side-by-side with it's predecessor:

Codex 5.1 Max	GPT 5.2

It's interesting to note that the error it had was similar to Gemini's (troubles attaching the gun to the field of view).

Claude 4.5 Opus	Gemini 3 Pro

In our last test Gemini 3 Pro got really stuck here, so despite the slight error from 5.2, the rankings didn't change.

3.Adding sounds and animations

The final challenge for the frontend was sounds and animations:

Prompt

I want you to use chiptunes to animate the sound of shots. I also want to animate deaths.

Here's predecessor vs 5.2:

Codex 5.1 Max

GPT 5.2

We didn't change the ratings here. We like the 5.2's animation, but Claude's version still felt more interesting.

Claude 4.5 Opus

Gemini 3 Pro

4. Sharing positions

Things started to change when time came to add the backend! Goal 1 was to just make it so we shared positions for each player:

Prompt

I want you to use Instant presence. Don't save anything in the database, just use presence and topics. You can look up the docs. There should should just be one single room. You no longer the need to have the enemies that are randomly placed. All the players are what get placed. For now, don't worry about shots. Let's just make it so the positions of the players are what get set in presence.

Previously Codex 5.1 Max needed a few iterations to get things right. GPT 5.2 got this done out of the box. Here's a snippet of how it felt:

It was interesting to note that like Codex, GPT 5.2 was the rate model that relied very heavily on REPLing to understand an API, rather than reading docs.

5. Sharing shots

Next up was making sure shots worked. GPT 5.2 got a lot better with making shots work.

Prompt

Now let's make shots work. When I shoot, send the shot as a topic, and make it affect the target's HP. When the target HP goes to zero, they should die and respawn.

Just like Claude, GPT 5.2 got this done in one shot:

5.1 Codex Max needed more shots to get this right.

6. Maps

The final part of the game was to build maps. This included creating schema, seeding data, and making sure permissions worked.

Prompt

So, now I want you to make it so the front page is actually a list of maps. Since our UI is using lots of polygons, make the style kind of polygonish Make the UI look like the old counter strike map selection screen. I want you to save these maps in the database. Each map has a name. Use a script to generate 5 random maps with cool names. Then, push up some permissions so that anyone can view maps, but they cannot create or edit them. When you join a map, you can just use the map id as the room id for presence.

We remember Claude having a lot of trouble with this. Codex 5.1 Max needed a few shots to get this right, and Gemini 3 Pro got this done in one shot.

Well, GPT 5.2 now got this done in one shot too. We think Gemini's UI is a bit better, but the backends were similar.

One surprise here though, was that GPT 5.2 was a lot more sheepish about running CLI commands. It simply asked us to run the commands for it.

We first thought this was a gotcha for this particular task, but after prodding it to push it's changes to Vercel, it also made the mistake of just "telling" us vercel cli commands, rather than running it.

Finishing thoughts

GPT 5.2 did do better than Codex 5.1 Max. It chose some surprising steps (like using REPLs instead of reading docs, or sharing commands rather than running them), but overall it's an improvement. We're excited to see how the 5.2 codex model feels.

^[1]: You may ask: why medium? Lots of hackers prefer using high. For now we choose whatever the CLI default is. We didn't want to start customizing CLIs and introduce bias that way.

[2] A bit of a revealed preference in this leaderboard: we vibe-coded the animations using Claude 4.5 Opus.

Codex, Opus, Gemini try to build Counter Strike

Stepan Parunashvili — Wed, 26 Nov 2025 00:00:00 GMT

In the last week we’ve had three major model updates: Gemini 3 Pro, Codex Max 5.1, Claude Opus 4.5. We thought we’d give them a challenge:

Build a basic version of Counter Strike. The game had to be a 3D UI and it had to be multiplayer.

If you're curious, pop open (an ideally large computer screen) and you can try out each model's handiwork yourself:

Codex Max 5.1: https://cscodex.vercel.app/
Claude Opus 4.5: https://csclaude.vercel.app/
Gemini 3 Pro: https://csgemini.vercel.app/

We have a full video of us going through the build here, but for those who prefer text, you get this post.

We'll go over some of our high-level impressions on each model, then dive deeper into the performance of specific prompts.

The Setup

We signed up for the highest-tier plan on each model provider and used the defaults set for their CLI. For Codex, that’s 5.1 codex-max on the medium setting. For Claude it’s Opus 4.5. And with Gemini it's 3 pro.

We then gave each model about 7 consecutive prompts. Prompts were divided into two categories:

Frontend: At first agents only having to worry about the game mechanics. Design the scene, the enemies, the logic for shooting, and some sound effects.

Backend: Once that was done agents would then make the game multiplayer. They would need to build be selection of rooms. Users could join them and start shooting.

A High-Level Overview

So, how'd each model do?

In a familiar tune with the other Anthropic models, Opus 4.5 won out on the frontend. It made nicer maps, nicer characters, nicer guns, and generally had the right scene from the get-go.

Once the design was done, Gemini 3 Pro started to win in the backend. It got less errors adding multiplayer and persistence. In general Gemini did the best with making logical rather than visual changes.

Codex Max felt like an “in-between” model on both frontend and backend. It got a lot of “2nd place” points in our book. It did reasonably well on the frontend and reasonably well on the backend, but felt less spikey then the other models.

Here’s the scorecard in detail:

	Codex	Claude	Gemini
Frontend
Boxes + Physics	🥉	🥇	🥈
Characters + guns	🥉	🥇	🥈
POV gun	🥈	🥇	🥉
Sounds	🥈	🥇	🥈
Backend
Moving	🥈	🥉	🥇
Shooting	🥉	🥇	🥉
Saving rooms	🥈	🥉	🥇
Bonus	🥈	🥉	🥇

Okay, now let’s get deeper into each prompt.

1. Boxes and Physics

Goal number 1 was to set up the physics for the game. Models needed to design a map with a first-person viewpoint, and the ability to shoot enemies.

Prompt

I want you to create a browser-based version of counter strike, using three js.

For now, just make this local: don't worry about backends, Instant, or anything like that.

For the first version, just make the main character a first-person view with a cross hair. Put enemies at random places. Enemies have HP. You can shoot them, and kill them. When an enemy is killed, they respawn.

Make everything simple polygons -- rectangles.

Here’s a side-by-side comparison of the visuals each model came up with:

Codex	Claude	Gemini

Visually Claude came up with the most interesting map. There were obstacles, a nice floor, and you could see everything well.

Gemini got the something nice working too.

Codex had an error on it’s first run ^[1] (it called a function without importing it), but it fixed it real quick. Once bugs were fixed, it’s map was the least visually pleasing. Things were darker, there were no obstacles, and it was hard to tell the floor.

2. Characters

Now that we had a map and some polygons, we asked the models to style up the characters. This was our prompt:

I want you to make the enemies look more like people. Use a bunch of square polygons to represent a person, and maybe a little gun

Here’s the result of their work:

Codex	Claude	Gemini

Again it feels like Claude did the best job here. The character look quite human — almost at the level of design in Minecraft. Gemini did well too. Codex made it’s characters better, but everything was a single color, which really diminished it compared to the others.

3. Gun in our field-of-view

We then asked each model to add a gun to our first-person view. When we shoot, we wanted a recoil animation.

I want you to make it so I also have a gun in my field of view. When I shoot, the gun moves a bit.

Here’s the side-by-side of how the recoil felt for each model:

Codex	Claude	Gemini

Here both Claude and Codex got the gun working in one shot. Claude’s gone looks like a real darn pistol though.

Gemini had an issue trying to stick the gun to the camera. This got us in quite a back and forth, until we realized that the gun was transparent.

4. Adding sounds…and animations

We were almost done the frontend: the final step was sound. Here’s what we asked:

I want you to use chiptunes to animate the sound of shots. I also want to animate deaths.

All models added sounds pretty easily. The ending part in our prompt: “I also want to animate deaths.” was added at the spur of the moment in the video. Our intention was to add sound to deaths. But that’s not what happened.

All 3 models misunderstood the sentence in in the same way: they thought the wanted to animate how the characters died. Fair enough, re-reading the sentence again, we would understand it that way too.

Here’s the results they came up with:

Codex

Claude

Gemini

All the models got the sound done easily. They all got animations, but we thought Claude’s animation felt the most fun.

5. Sharing positions

Now that all models had a real frontend, we asked them to make it multiplayer.

We didn’t want the models to worry about shots just yet: goal 1 was to share the movement positions. Here’s what we asked it to do:

I want you to use Instant presence.

Don't save anything in the database, just use presence and topics. You can look up the docs.

There should should just be one single room.

You no longer the need to have the enemies that are randomly placed. All the players are what get placed.

For now, don't worry about shots. Let's just make it so the positions of the players are what get set in presence.

Gemini got this right in one shot. Both Codex and Claude needed some more prodding.

	Codex	Claude	Gemini
Moving	🥈	🥉	🥇

It was interesting to see how each model tried to solve problems:

Codex used lots of introspection. It would constantly look at the typescript library and look at the functions that were available. It didn’t seem to look at the docs as much.

Claude looks at the docs a bunch. It read and re-read our docs on presence, but rarely introspected the library like Codex did.

Gemini seemed to do both. It looked at the docs, but then I think because it constantly ran the build step, it found any typescript errors it had, and fixed it up.

Gemini made the fastest progress here, though all of them got through, as long as we pasted the errors back.

6. Making shots work

Then we moved to getting shots to work. Here was the prompt:

Now let's make shots work. When I shoot, send the shot as a topic, and make it affect the target's HP. When the target HP goes to zero, they should die and respawn.

	Codex	Claude	Gemini
Shooting	🥉	🥇	🥈

Claude got this right in one shot. Gemini and Codex had a few issues to fix, but just pasting the errors got them though.

7. Multiple maps

Now that all models had a single room working, it was time to get them supporting multiple rooms.

The reason we added this challenge, was to see (a) how they would deal with a new API (persistence), and (b) how they would deal with the refactor necessary for multiple rooms.

So, now I want you to make it so the front page is actually a list of maps. Since our UI is using lots of polygons, make the style kind of polygonish

Make the UI look like the old counter strike map selection screen. I want you to save these maps in the database. Each map has a name. Use a script to generate 5 random maps with cool names.

Then, push up some permissions so that anyone can view maps, but they cannot create or edit them.

When you join a map, you can just use the map id as the room id for presence.

The maps UI

All models did great with the UI. Here’s how each looked:

Codex	Claude	Gemini

We kind of like Gemini’s UI the most, but they were all pretty cool.

The Persistence

And the persistence worked well too. They all dutifully created schema for maps, pushed a migration, and seeded 5 maps.

The Refactor

But things got complicated in the refactor.

	gpt 5.1 codex max (medium)	Claude 4.5 Opus	Gemini 3 Pro
Saving rooms	🥈	🥉	🥇

Gemini got things done in one shot. It also chose to keep the map id in the URL, which made it much handier to use. Codex took one back and forth with a query error.

But Claude really got stuck. The culprit was hooks. Because useEffect can run multiple times, it ended up having a few very subtle bugs. For example, it made 2 canvas objects instead of 1. It also had multiple animation refs running at once.

It was hard to get it to fix things by itself. We had to put our engineer hats on and actually look at the code to unblock Claude here.

This did give us a few ideas though:

Claude’s issues were human-like. How many of us get tripped up with useEffect running twice, or getting dependency arrays wrong? I think improving the React DX on these two issues could really push humans and agents further.
And would have happened if a non-programmer was building this? They would have gotten really stuck. We think there needs to be more tools to go from “strictly vibe coding”, to “real programming”. Right now the jump feels too steep.

At the end, all models built real a multiplayer FPS, with zero code written by hand! That’s pretty darn cool.

Parting thoughts

Well, models have definitely improved. They can take much higher-level feedback, and much higher-level documentation. What really strikes us though is how much they can iterate on their own work thanks to the CLI.

There’s still lots to go though. The promise that you never have to look at the code doesn’t quite feel real yet.

^[1]: Interestingly, Gemini was very eager to run npm run build over and over again, before terminating. Codex did not do this, and Claude did this more sparingly. This may explain why Gemini got fewer errors.

Count-Min Sketches in JS — frequencies, but without the data

Stepan Parunashvili — Mon, 13 Oct 2025 00:00:00 GMT

Our teammate Daniel introduced Count-Min Sketches in Instant (a sync engine you can spin up in less than a minute). Sketches were so small and so fast that I got into a rabbit hole learning about them. The following post came out of the process.

I have read and re-read just about every one of PG Wodehouse’s 71 novels. He’s one of my favorite authors. Wodehouse can take seemingly silly plots (quite a few involving stealing pigs) and twist them until you’re rapt with attention. And he’s a master of the English language.

Wodehouse is known for eccentric diction. Instead of "Freddie walked over", he’ll say "Freddie (shimmied | beetled | ambled) over". You may wonder, how many times did he use the word 'beetle'?

Well I could tell you approximately how many times Wodehouse used any word in his entire lexicon, just by loading the data structure embedded in this image:

Compressed, it's 50 kilobytes and covers a 23 megabyte text file, or 3.7 million words. We can use it to answer count estimates with 0.05% error rate and 99% confidence. (If you aren't familiar with the probability terms here, no worries, we'll go over them in this post.)

You can try it yourself right here:

The Count-Min Sketch

The magic needed to make this happen is called the Count-Min Sketch — a data structure that can give you frequency estimates over giant amounts of data without becoming a giant object itself.

You could use it to make passwords safer: track all known passwords on the internet, and detect whenever someone chooses a common password. ^[3]

Or you could use it estimate the popularity of links: update a sketch whenever a user looks at a tweet, and you can query for approximate views. ^[4]

Or, use it to make databases faster: track the values of different columns, so you can estimate how many rows a filter would return. This is how we use them in Instant: our query planner decides which indexes to use based on estimates from sketches. ^[5]

So how do Count-Min Sketches work? In this post we'll find out by building one from scratch, in JavaScript!

Setup

Let's dust off Bun ^[6] and spin up a project:

mkdir sketches
cd sketches
bun init
cat > wodehouse.txt << 'EOF'
At the open window of the great library of Blandings Castle,
drooping like a wet sock, as was his habit when he had nothing
to prop his spine against, the Earl of Emsworth, that amiable
and boneheaded peer, stood gazing out over his domain.
EOF

We've just made an index.ts file, and a little toy wodehouse.txt that we can play with as we go along.

Time to bun run --watch, and we're ready to hack!

bun run --watch index.ts

An exact solution

First things first: let's write a straightforward algorithm. If we wanted to count words exactly, how would we do it?

Well we could read wodehouse.txt, parse each word and count them. Here we go:

// index.ts
import fs from 'fs';

// 1. Read the file
const wodehouse = fs.readFileSync('wodehouse.txt', 'utf-8');

// 2. Split it into words
function toWords(text: string): string[] {
  return text
    .split('\n')
    .flatMap((line) => line.split(' '))
    .map((w) => w.trim().toLowerCase())
    .filter((w) => w);
}

// 3. Get exact counts
function countWords(words: string[]): { [w: string]: number } {
  const result: { [w: string]: number } = {};
  for (const word of words) {
    result[word] = (result[word] || 0) + 1;
  }
  return result;
}

const exactCounts = countWords(toWords(wodehouse));

console.log('exactCounts', exactCounts);

This logs a little map in our terminal:

exactCounts {
  at: 1,
  the: 3,
  // ...
  "castle,": 1,
  drooping: 1,
  // ...
  "domain.": 1,
}

It works, but we'll have a few problems.

Stems

What if the word "castle" was used without a comma? Or if instead of "drooping" Wodehouse wrote "drooped"?

We would get different counts. It would be nice if we could normalize each word so no matter how Wodehouse wrote "droop", we'd get the same count.

This is a common natural-language processing task called "stemming". There are some great algorithms and libraries for this, but for our post we can write a rough function ourselves:

// index.ts
// ...
// 2. Split it into words
function stem(word: string) {
  let w = word.toLowerCase().replaceAll(/^[a-z]/g, '');
  if (w.endsWith('ing') && w.length > 4) {
    w = w.slice(0, -3);
  } else if (w.endsWith('ed') && w.length > 3) {
    w = w.slice(0, -2);
  } else if (w.endsWith('s') && w.length > 3 && !w.endsWith('ss')) {
    w = w.slice(0, -1);
  } else if (w.endsWith('ly') && w.length > 3) {
    w = w.slice(0, -2);
  } else if (w.endsWith('er') && w.length > 4) {
    w = w.slice(0, -2);
  } else if (w.endsWith('est') && w.length > 4) {
    w = w.slice(0, -3);
  }
  return w;
}

function toWords(text: string): string[] {
  return text
    .split('\n')
    .flatMap((line) => line.split(' '))
    .map(stem)
    .filter((w) => w);
}
// ...

With it our console.log starts to show stemmed words:

exactCounts {
  at: 1,
  the: 3,
  // ...
  castle: 1, // No more `,`
  droop: 1, // No more `ing`!
  // ...
  "domain": 1, // No more `.`
}

And now we have better exact counts. But there's another problem.

Growth

What happens when you look at more words? Our exactCounts grows with the vocabulary of words:

This isn't too big of an issue with Wodehouse specifically: after all the English dictionary itself could fit in memory.

But as our vocabulary gets larger, our data structure gets more annoying. Imagine if we had to track combinations of words: suddenly keeping counts would take more space than the words themselves. Could we do something different?

An intuition for sketches

Ideally, we would be able to divorce the size of our vocabulary from the size of our counts data structure. Here's one way to do that.

Columns of Buckets

Our exactCounts was an unbounded hash map. Let's make a bounded version.

We can spin up a fixed number of buckets. Each bucket stores a count. We then take a word, hash it, and increment its corresponding bucket. Here's how this could work:

When we want to know the count of word, we hash it, find the corresponding bucket, and that's our count:

With this we've solved our growth problem! No matter how large our vocabulary gets, our buckets stay a fixed size.

But of course this comes with new consequences.

The 'sketch' in sketches.

Our counts become estimates. If you look at the demo, both 'wet' and 'castle' ended up in the second bucket. If we asked "How many times is 'castle' used?", we'd get 622.

Now, it does suck that we got 622 instead of 454 for 'castle'. But if you think about it, it's not such a big deal. Both words are used infrequently. Even when you put them together they pale in comparison to more common words. And if you're worried about errors we can already intuit a way to reduce them.

More buckets, fewer errors

To reduce errors we can add more buckets. The more buckets we have, the fewer collisions we'll have, and the lower our chances of errors are. (You may wonder how much lower do our errors get? We'll get to that soon!)

We may be feeling pretty good here, but we're not done yet. We're going to have a serious problem with high-frequency words.

Managing frequencies

What happens if we add a word like 'like'? Say it landed where 'peer' was:

If we asked for the count of 'peer', we'd now get back 9,262. That estimation is wildly inflated by 'like'. Not very useful.

If we want to make our estimations better, we would need a way to reduce the chance of very-high frequency words influencing counts. How can we do this?

Rows of Hashes

Here's one way to reduce the influence of high-frequency words: we'll add more hashes!

We can set up a row of hash functions, each with their own buckets. To add a word, we go through each row, hash it and increment the corresponding bucket. Here's how this looks:

When we want to know the count, we go through each row, find the corresponding bucket and pick the minimum value we find. ^[10]

This is pretty cool: a particular word could get unlucky in one hash function, but as long as it gets a lucky bucket from some row, we'll get a respectable count.

We can look at 'peer' again for an example. hash1 got us into the same bucket as 'like'. But hash2 got us into our own bucket. That means a better estimation! And it also means we can intuit a way to improve our confidence even more.

More hash functions...more confidence

To improve confidence we can add more hash functions. The more hash functions we have, the higher the chance that we find at least one good bucket. (You may wonder, how much more confident do we get? We'll get to that soon!)

Of course, this depends on how correlated the hash functions are. We'll want to be sure that they are independent of each other, so adding a new hash function fully shuffles around the words.

If we do this right, and we build out columns of buckets and rows of hashes, we'll have our Count-Min Sketch!

Implementing the Sketch

Let's go ahead and write out our ideas in code then.

Creating a sketch

We'll kick off by typing our Sketch:

// index.ts

// 4. Create a sketch
type Sketch = {
  rows: number;
  columns: number;
  buckets: Uint32Array;
};

We keep track of a rows, columns, and all of our buckets. Technically buckets are arranged as a matrix so we could use an array of arrays to store them. But a single array of buckets is more efficient. ^[7]

To make life easier let's create a little builder function:

// index.ts

// 4. Create a sketch
// ...
function createSketch({
  rows,
  columns,
}: {
  rows: number;
  columns: number;
}): Sketch {
  return { rows, columns, buckets: new Uint32Array(rows * columns) };
}

If we use it, we've got ourselves a sketch!

const sketch = createSketch({ rows: 2, columns: 5 });

console.log('created: ', sketch);

Our console.log shows us a nifty object!

created: {
  rows: 2,
  columns: 5,
  buckets: Uint32Array(10) [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
}

Adding words

Alright, now for the meat and potatoes. Let's implement add. We want to say:

Take a word
For each row, hash it and find its corresponding bucket
Increment the corresponding bucket

Here we go:

function add({ rows, columns, buckets }: Sketch, word: string) {
  for (let rowIdx = 0; rowIdx < rows; rowIdx++) {
    const hash = Bun.hash.xxHash3(word, BigInt(rowIdx));
    const columnIdx = Number(hash % BigInt(columns));
    const globalIdx = rowIdx * columns + columnIdx;
    buckets[globalIdx]!++;
  }
}

We go through each row. xxHash3 takes a seed argument. We can pass the rowIdx into our 'seed', so for every row we produce an independent hash value!

const hash = Bun.hash.xxHash3(word, BigInt(rowIdx));

columnIdx tells us which bucket to use inside a particular row:

const columnIdx = Number(hash % BigInt(columns));

And globalIdx accounts for the particular row that we we're looking at:

const globalIdx = rowIdx * columns + columnIdx;

Increment that bucket, and we're done!

buckets[globalIdx]!++;

We can try it out and see how it feels.

add(sketch, stem('castle'));
console.log('after castle', sketch);

after castle {
  rows: 2,
  columns: 5,
  buckets: Uint32Array(10) [ 0, 0, 0, 1, 0, 0, 1, 0, 0, 0 ],
}

Suuper cool! Notice the two increments in buckets, accounting for our different rows.

Getting counts

All that's left is to get a count. This is going to look similar to 'add'. We want to:

Take a word
For each row, hash it and nab the corresponding bucket
Find the minimum value from all the corresponding buckets

Let's do it:

function check({ rows, columns, buckets }: Sketch, word: string) {
  let approx = Infinity;
  for (let rowIdx = 0; rowIdx < rows; rowIdx++) {
    const hash = Bun.hash.xxHash3(word, BigInt(rowIdx));
    const columnIdx = Number(hash % BigInt(columns));
    const globalIdx = rowIdx * columns + columnIdx;
    approx = Math.min(approx, buckets[globalIdx]!);
  }
  return approx;
}

We do the same math to get our globalIdx for each row as we did in add.

We track the minimum number we see, and we have our check! Let's try it out:

console.log('check castle', check(sketch, 'castle'));

Aaand we get our result!

check castle 1

Congratulations, you've implemented a Count-Min Sketch!

Getting real

Alright, now that we have a real Count-Min Sketch, let's put it to the test. We'll find out approximately how many times 'beetle' is used in Wodehouse's texts.

Get all of Wodehouse

I went ahead and compiled all 61 novels from Project Gutenberg into one giant text file. You can go ahead and download it:

curl https://www.instantdb.com/posts/count_min_sketch/wodehouse-full.txt \
  -o wodehouse-full.txt

We have a wodehouse-full.txt file we can play with now. Let's load it up:

// index.ts
// ...
const allWodehouse = fs.readFileSync('wodehouse-full.txt', 'utf-8');

Getting exact counts

We can use up our toWords and exactCounts to get a feel for the vocabulary:

// index.ts
const allWodehouse = fs.readFileSync('wodehouse-full.txt', 'utf-8');
const allWords = toWords(allWodehouse);
const allExactCounts = countWords(allWords);

console.log('exact beetle', allExactCounts[stem('beetle')]);

If we look at "beetle", we can see it's used exactly 59 times. What would a sketch return?

Trying out sketches

Let's create a sketch for our wodehouse words:

// index.ts
// ...
const allSketch = createSketch({ rows: 5, columns: 5437 });

And add our words:

for (const word of allWords) {
  add(allSketch, word);
}

Now if we check out 'beetle':

console.log('allSketch beetle', check(allSketch, stem('beetle')));

We'll see 78!

allSketch beetle 78

A bit over, but not so bad. ^[8]

If you're curious, try out different sizes and see what you get:

A breather to celebrate

Congratulations! You just built a Count-Min Sketch from scratch, and used it on Wodehouse. If you'd like to see the full code example, I put this up in its entirety on GitHub.

Hope you had a lot of fun :).

If you're still curious there's more to learn here, I present to you...2 bonus sections!

Bonus 1: Probabilities

When we created our sketch for Wodehouse, we chose some seemingly random numbers: 5437 columns and 5 rows. Is there a method to this madness?

Absolutely. We can use some math to help set bounds around our estimations.

Error Rate & Confidence

There are two numbers we can play with:

The errorRate tells us how far off we expect our estimation to be
The confidence tells us how likely it is that we are actually within our estimation.

Let's make them concrete. The full text for Wodehouse is about 3.7 million words long (not unique words, here we are counting every occurrence).

Say we want an error rate of 0.05% and a 99% confidence.

0.05% of 3.7 million is 1850. We are in effect saying:

"You can expect the estimation we give you to be overcounted by at most 1850, and we'll be right 99% of the time"

That's pretty cool! How can we be certain like this?

Formulas

Turns out, you can tie the errorRate and the confidence to the number of rows and columns in a sketch! Here are the formulas:

Given an errorRate, get this many columns:

$$ columns = \frac{e}{errorRate} $$

Given a confidence, get this many rows:

$$ rows = \ln(\frac{1}{1 - confidence}) $$

Now how did we get these formulas? Let's derive them.

Variables

We can start by writing out some of the numbers that we just went through.

We have:

The totalWords. This tells us how many occurrences have been counted in our Sketch. For Wodehouse, that's 3.7M
The errorRate. How far off we expect our estimation to be as a percentage of totalWords. For us it's 0.05%
The maximumOvercount. Our maximum allowed overestimation for a particular totalWords. In our case, it's 1850.
The confidence. This tells us how likely we are to be within within our estimation. We want 99%.

And our sketch has two properties that we can influence:

The columns. This is the number of buckets in one row. We somehow picked 5,437 for our Wodehouse sketch.
The rows. This is the number of hash functions in our sketch. We somehow picked 5 rows for our Wodehouse sketch.

Goal

Our goal is to relate errorRate and confidence to a specific number of columns and rows.

Tying errorRate to columns

To build our intuition let's consider a sketch with only 1 row:

Say we ask for a count of a word ('wet'). Our hash function will direct us to a bucket. What would we see if we looked into that bucket?

Well it would be composed of the "actual number of times" 'wet' was used, and the noise that comes from all the other collisions that hit our bucket.

If we write this out:

$$ bucket_{word} = actualCount_{word} + noise_{word} $$

Expected Noise

Now here's a question: what is the expected value ^[11] of our noise for a word?

The first thing we can remember is that our hash function distributes words uniformly across columns. This means that each word has a $\frac{1}{columns}$ chance of hitting our particular bucket.

So if we write our our expectation, it would be:

$$ expectedNoise_{word} = \frac{totalWords - actualCount_{word}}{columns} $$

Simplifying Noise

If you think about, do we really need to subtract the $actualCount_{word}$? We can simplify this formula by getting more conservative about what we promise.

We can bound ourselves to the worst case scenario, where we ask for a word that hasn't been seen before:

$$ expectedNoise_{word} \le \frac{totalWords}{columns} $$

Pretty cool. Now we have a simple relation for our expected noise!

Help from Markov

But an expected value for noise isn't useful yet. It just gives us an average. What we want is the probability that something is below a maximumOvercount.

That's where Markov's Inequality ^[9] comes in. Markov's Inequality is a proof about random variables that says:

For any non-negative random variable, the probability that something is at least $n$ times its expected value is at most $\frac{1}{n}$.

To get concrete, if we plug in $n = e$ ^[14] to Markov's Inequality, we get:

The probability that something is at least $e$ times its expected value is at most $\frac{1}{e}$.

Well, our noise is a non-negative random variable ^[12]. And we have its expected value. If we use Markov's Inequality we'll get a real probability that we can use!

$$ P(\text{Noise} \ge e \times expectedNoise_{word}) \le \frac{1}{e} $$

A maximumOvercount with about 63% confidence

Let's look that probability a bit more:

$$ P(\text{Noise} \ge e \times expectedNoise_{word}) \le \frac{1}{e} $$

Let's get it's complement:

$$ P(\text{Noise} \le e \times expectedNoise_{word}) \ge 1 - \frac{1}{e} $$

And to make things more concrete, $1 - \frac{1}{e}$ is about 0.63.

What is it saying then? Let's write it out in English:

"The probability that noise is at most e times expectedNoise is at least ~63%"

If you squint, we are talking about maximumOvercount with about 63% confidence!

If we set maximumOvercount to to $e \times expectedNoise$, we can say with $1 - \frac{1}{e}$ confidence that our estimation will be within our bounds!

An errorRate with about 37% confidence

Now that we have a probability that uses maximumOvercount, let's start tying things back to errorRate.

We said before:

You can expect the estimation we give you to be overcounted by at most 1850

Translated to a formula, this was:

$$ 3.7 \text{ million} \times 0.05% \le 1850 $$

If we use variables:

$$ totalWords \times errorRate \le maximumOvercount; $$

Now let's start expanding maximumOverCount, and see where we get:

$$ totalWords \times errorRate \le e \times expectedNoise; $$

And since we know expectedNoise:

$$ totalWords \times errorRate \le \frac{e \times totalWords}{columns} $$

We've just tied errorRate and columns together! Let's keep going:

$$ errorRate \le \frac{e}{columns} {} \ {} \ columns \ge \frac{e}{errorRate} $$

Voila! We've gotten a formula for columns.

A solution for 1 row

If our goal was to get a particular error rate with about 63% confidence, we could just set:

$$ columns = \frac{e}{errorRate} {} \ {} \ rows = 1 $$

But 63% confidence kind of sucks. How can we improve that?

Tying confidence to rows

Let's remember our initial Markov Inequality:

$$ P(\text{Noise} \ge e \times expectedNoise_{word}) \le \frac{1}{e} $$

All bad rows

When Noise > maximumOvercount, it basically means that our estimation has failed.

We've gotten a "bad row", where the bucket has highly frequent words in it. In this case we can paraphrase our probability to:

$$ P(\text{1 row is bad}) \le \frac{1}{e} $$

Now what happens if we add more rows? What is the chance that 2 rows are bad?

Since our hash functions are independent, we know that our probabilities will be too. This means:

$$ P(\text{2 rows are bad}) \le \left(\frac{1}{e}\right)^{2} $$

Which generalizes. Given some number of rows, what is the probability that all rows are bad?

$$ P(\text{all rows are bad}) \le \left(\frac{1}{e}\right)^{rows} $$

And now that we know the formula for "all rows are bad", we actually also know the formula for confidence.

Confidence

As long as we get 1 good row, we know that we'll return a number within our estimation. In that case we can say our confidence is:

$$ confidence = P(\text{at least 1 good row}) $$

So what's the probability of at least 1 good row? It's the complement of getting all bad rows:

$$ P(\text{at least 1 good row}) = 1 - P(\text{all rows are bad}) $$

Which gets us:

$$ confidence = 1 - P(\text{all rows are bad}) $$

Expanding things out

Since we know $P(\text{all rows are bad})$, let's expand it:

$$ confidence = 1 - \left(\frac{1}{e}\right)^{rows} $$

Aand we've just connected confidence to rows! Let's keep going.

Isolate the term for rows:

$$ \left(\frac{1}{e}\right)^{rows} = 1 - confidence $$

Remember $\left(\frac{1}{e}\right)^{rows}$ is the same as $e^{-rows}$

$$ e^{-rows} = 1 - confidence $$

Take the natural log of both sides:

$$ \ln(e^{-rows}) = \ln(1 - confidence) $$

Simplify the left side:

$$ -rows = \ln(1 - confidence) \ {} rows = -\ln(1 - confidence) $$

Push the - inside the ln:

$$ rows = \ln(\frac{1}{1 - confidence}) $$

And we've gotten our formula for rows!

Formulas to Code

Now we have formulas for both columns and rows!

$$ columns = \frac{e}{errorRate} {} \ {} \ rows = \ln(\frac{1}{1 - confidence}) $$

So if we wanted an error rate of 0.05% and a confidence of 99%, how many rows and columns would we need? Let's calculate it in JavaScript:

function sketchWithBounds({
  errorRate,
  confidence,
}: {
  errorRate: number;
  confidence: number;
}): Sketch {
  const columns = Math.ceil(Math.E / errorRate);
  const rows = Math.ceil(Math.log(1 / (1 - confidence)));
  return createSketch({ rows, columns });
}

We try it out:

const withBounds = sketchWithBounds({
  errorRate: 0.0005,
  confidence: 0.99,
});

console.log('withBounds', withBounds.columns, withBounds.rows);

And we got 5437 columns and 5 rows!

withBounds 5437 5

Bonus 2: PNGs

Now, you may have wondered, how did we create our cool PNG? For posterity I thought I'd write out the algorithm.

Let's start off by installing a library to create PNGs:

bun add pngjs
bun add -D @types/pngjs

Now, we'll take a series of bytes. One pixel can be expressed as R G B A, each that's one byte. So we can fit 4 bytes per pixel. Here's a quick function to do that:

import { PNG } from 'pngjs';

function createPNG({
  width,
  buffer,
}: {
  width: number;
  buffer: Buffer;
}): Buffer {
  const bytesPerPixel = 4; // RGBA
  const height = Math.ceil(buffer.length / (width * bytesPerPixel));
  const png = new PNG({
    width,
    height,
    colorType: 6, // RGBA
  });

  for (let i = 0; i < png.data.length; i++) {
    png.data[i] = buffer[i] ?? 0;
  }

  return PNG.sync.write(png);
}

A PNG for our Sketch

Let's pick up our allSketch we created before, and save it as a PNG:

const compressedSketch = await Bun.zstdCompress(allSketch.buckets);

fs.writeFileSync(
  'compressedSketch.png',
  createPNG({ width: 150, buffer: compressedSketch }),
);

Aand we get our image!

But you may wonder, how would it look if we saved the exact counts?

A PNG for our exact counts

Let's try that. We can pick up our allExactCounts ^[13], and save it as a PNG too:

const compressedExactCounts = await Bun.zstdCompress(
  JSON.stringify(allExactCounts),
);

fs.writeFileSync(
  'compressedExactCounts.png',
  createPNG({ width: 150, buffer: compressedExactCounts }),
);

Load it up, and we see:

Let's see them side by side:

Sketch

Exact Counts

Fin

Congratulations, you made it all the way through the bonus too!

If you're into this stuff, I'd suggest reading Small Summaries for Big Data. It goes over the Count-Min Sketch, as well as a bunch of other probabilistic data structures. Plus, one of the co-authors invented the Count-Min Sketch!

Thanks to Joe Averbukh, Daniel Woelfel, Predrag Gruevski, Irakli Safareli, Nicole Garcia Fischer, Irakli Popkhadze, Mark Shlick, Ilan Tzitrin, Drew Harris, for reviewing drafts of this post

^[1]: A sync engine you can try without even signing up!

^[3]: See this interesting paper

^[4]: I think X is doing this, though I am not sure if it's still the case.

^[5]: For the curious, some of the code behind this lives here.

[6] Bun's standard library comes with a bunch of cool hashing and compression functions, so we won't have to install extra packages to get our algorithms working:

[7] If we used a 2D array, each subarray would live in a separate place in memory. When we iterate, the CPU would have to jump around different places in memory, which would make its cache less useful.

^[8]: You may be wondering: can we improve the error rate even more? Yes. One idea: conservative updating.

^[9]: This is a great explainer on Markov's inequality.

^[10]: Why do we pick the minimum value across rows? Well, when we added a word, we incremented the corresponding bucket in every row. This means we know that at the minimum, a corresponding bucket will record the true count of our word. If some rows show a larger count, it's because other words have collided and influenced the counts there.

^[11]: Intuitively, an Expected Value is a weighted average. This video explains it well.

[12] It's non-negative because we only ever increment buckets.

[13] You may wonder, is JSON stringify an efficient way to serialize it? At a glance it feels like it isn't. But I ran a few tests with protobufs and msgpack, only to find out that JSON.stringify + zstd was more efficient. My guess is because zstd does a great job compressing the repetition in the JSON.

^[14]: The original paper chose to pick $e$, because it minimizes the number of buckets needed for a particular error rate and confidence. We could have picked any number here though, and we'd still be able to go through the proof.

Video: Founding Firebase with James Tamplin

Stepan Parunashvili — Mon, 29 Sep 2025 00:00:00 GMT

Many of us built on top of Firebase, what was it like to build Firebase itself? We sat down with James Tamplin, the founder of Firebase to go over the early days. We recorded the video of the interview. We hope you enjoy it!

HeroUI helps people build beautiful apps, on top of Instant

Stepan Parunashvili — Mon, 25 Aug 2025 00:00:00 GMT

This is part of a series of posts about the people who power their startups with Instant. Stories like Junior’s are what drive us to keep working on making the best tool for builders.

Junior Garcia is on a mission to help everyone make beautiful apps with HeroUI.

Starting off in Venezuela, Junior built HeroUI components — a suite of primitives that have helped thousands of developers build frontends. From there Junior was accepted to YCombinator and launched HeroUI chat, an AI-powered tool that helps everyone build beautiful apps.

In this post we’ll share his backstory, the lessons learned, and what’s ahead!

From Venezuela

We start off in Valencia, Venezuela. Junior was studying electrical engineering at University, when he learned about microchips and how he could program them. He got hooked with an idea: how cool would it be to write a program that could get a microchip to make millions of calculations a second? It felt like a superpower.

He just had one problem.

To Paper

For the first 19 years of his life, Junior didn’t have computer or an internet connection. He would have to be creative about how he would learn to program.

So, Junior got creative. He went to his University’s library and picked up a book on Java. He would would go through each chapter and write out his solutions on paper. Then he would visit his girlfriend’s house, where he could borrow her computer and run his solutions. (You may be thinking, that’s a great girlfriend. Well, soon she would become Junior’s wife!)

As Junior finished his book on Java, he got good enough to get a job at a tech company writing it. He dropped out of University and went to programming full time.

To Customizing Java

When Junior started building, he realized that the software written in Java didn’t tend to look so good. Java has it’s own renderer — different from what’s native on Windows and MacOS. This meant that the software often came off foreign and outdated.

Bad UIs bothered Junior a lot more than many of his peers. In order to make Java programs beautiful, he would go through trouble after trouble. In those days it meant doing a bunch of black magic with PNGs to make Java programs feel natural. But once Junior went through the trouble he saw how users reacted, and he knew it was worth it.

To discovering a love of Design

Junior felt it in his bones that when you make apps beautiful, it’s not just about cosmetics. It’s about building something accessible and intuitive. People use intuitive software differently and it has a meaningful effect on their lives. After all, many of us use software for hours a day.

This is when Junior realized that programming wasn’t his only passion. He loved design too.

After writing apps in Java, he moved onto web technologies. Soon Junior started building a side project: he wanted to make it easy for developers to build portfolio sites.

To 25,000 stars on GitHub

To help developers build portfolio sites, Junior knew he’d need to use a series of shareable components. He couldn’t find anything that fit his needs, so he started to build them from scratch.

Junior built component after component. He made sure that every primitive was accessible, came with animations, and all the best practices that delight users. Soon Junior realized that he was building a full component system.

In early 2021, Junior packaged everything together and released HeroUI components (formerly NextUI). He was floored by the reception.

Developers loved HeroUI components. Within a year, it hit 3000 stars. Within 2 years, 9000 stars. Today, over 25,000 stars.

To the first check

GitHub stars grew, but it didn’t mean the journey was easy.

Junior had a full-time job, which meant he did all of this work on nights and weekends. When he had doubts, Junior had his wife’s belief to fall back too. He would get energized and then focus on users.

One user reached out and surprised Junior. Turns out Vicente Zavarce was a happy HeroUI user. He was also the founder Yummy, one of Latin America’s biggest startups.

Vicente wanted meet the team behind the components and scheduled a call. He was expecting to see a large group, but he just found Junior.

Vicente was so impressed that he offered to make an investment. That kicked off a pre-seed round and HeroUI became a company. Junior could now focus full time on making it easy to create beautiful apps.

To YC S24

What followed was a flurry of work. HeroUI kept getting better. Junior and his team launched support for Tailwind and started to work on a series of pro components.

When the team was building a checkout page for pro components, they released a secret URL to dogfood it. Users were so excited that they actually found the link and started buying.

At this point Junior knew he had to grow the team. So he applied to YCombinator.

Junior had no expectations, but YC saw the potential in him and in HeroUI. The YC partners know how hard it was to build a startup as a solo founder, and Junior was a solo founder. But after meeting him, the partners were convinced that he could do it, and HeroUI joined YC S24.

To HeroUI Chat

Being surrounded by such talented people, Junior was invigorated and kept making HeroUI better. At this point he was able to hire some of the best HeroUI contributors full-time, and they were full-steam ahead.

During YC they started off by building a tool to help companies with design systems. They built the product, but the more they talked to users, the more they realized they actually wanted something else.

Users wanted help building their UIs. At this point Claude Sonnet 3 had come out. That’s when Junior thought, what if you could use AI to help you create truly beautiful UIs? That got Junior and the team excited. So they started building HeroUI Chat.

Optimizing for speed

The first step was to decide how to build it. If you’re making an app that helps users make beautiful apps, you better make sure the app itself is beautiful.

Junior was confident in the UI. He wanted to make sure the backend felt great too:

I was obsessed with speed. I wanted HeroUI chat to be really fast. Creating a new conversation, modifying a title, every tiny detail should feel fast.

Junior Garcia

They did an investigation, and they listed out what they needed to make apps feel fast. They would have to leverage the browser and work with IndexedDB. They would have to add caches and build out a suite of optimistic updates.

Most of the solutions they found were too constraining: either they were full frameworks, or they were exceptionally difficult to use. So they decided they would build a sync engine from scratch. Until they found Instant.

Finding Instant: A fast and realtime MVP in 2 days

Junior was scrolling Bookface, when he saw a post about Instant’s infra:

Not only did you have the optimistic updates I was looking for, but you had the real-time updates. You handled collisions too. Basically everything we were worried about.

Junior Garcia

Instant looked like an exact fit, so they decided to give it a try. Within 2 days, HeroUI had a full MVP on Instant. When we asked Junior how he thought about Instant after that, he answered:

At that point I did not want to use any database other than Instant

Junior Garcia

Junior and the team had used Firebase before, and knew how difficult it was to build apps when you don’t have relations. They were very happy with Instant’s relational query engine.

And the optimistic updates had paid off too. Many HeroUI users (and employees) lived across continents. Instant’s local caches made people feel like everything fast, without Junior and team having to worry about setting up replicas across the globe.

Hitting #1 on Product Hunt

With the right infra in place, they could focus on their product, and they made a tool that was truly useful. They launched on Product Hunt, and hit #1 for the day.

Users could build delightful UIs. Since everything was on top of hand-crafted components, AIs could focus on writing simpler code, which was easier for humans to maintain. And whenever AIs made a mistake, Junior and the team would jump and fix it in the platform.

This flywheel of improvement kept on going and making HeroUI better.

The productivity benefits of real-time sync

Instant made it easy to build it MVP, but the HeroUI team saw that Instant helped them scale too. The biggest lever came from the client-side abstraction.

In traditional apps every feature requires (a) a frontend change (b) an update to the store (c) an update to endpoints, and (d) an update on the database. With Instant, all of this compressed to one change. This meant the codebase was easier for engineers to onboard too, and features were simpler to implement.

When we asked Junior what he would missed the most if he couldn’t use Instant anymore, this abstraction was what he mentioned:

I think I couldn't deal with the frontend-backend request schlep anymore. Having to call an endpoint, send data, receive data, update the UI, handle the update, handle the rollback. Instant just does this automatically. You don't have to communicate to the backend, or listen to changes. Losing this would make our lives so much harder on Hero.

Junior Garcia

From UIs to full apps

It’s been 5 years of work, but Junior and the team are just getting started. Today developers, business owners, and big companies use HeroUI to build full frontends. But HeroUI keeps getting better faster.

Soon you’ll be able to build full-stack web apps and mobile apps, with the same design system across platforms. HeroUI keeps marching towards the same goal — to help make all apps beautiful — and we are so excited to support them!

Mirando transforms Latin-American Real-Estate on top of Instant

Stepan Parunashvili — Mon, 18 Aug 2025 00:00:00 GMT

This is a first of a series of posts about the people who power their startup with Instant! Stories like Ignacio and Javier’s are what drive us to keep working on making the best tools for builders.

Ignacio De Haedo and Javier Rey left their software engineering jobs at Meta to build Mirando, an AI-powered real-estate platform for Latin America. In this post we’ll share their backstory, the lessons learned, and what’s ahead!

From Meta

Our story starts with Ignacio. Ignacio worked at Meta for 6 and half years. Towards the tail-end he started to burn out. On one flight from Latin America back to London he had a realization: he wasn’t tired of building, but he was tired of building things he didn’t care about. Ignacio knew it was time to move on.

There was of course one person he had to convince: his wife. He got her blessing, pushed the button, and started to hack on his own projects.

To Mentor

The next 9 months was a crash-course on startups.

When you work at Meta you get speciality tools to ship products and support to grow them. Ignacio had to get acquainted with infra outside of Meta and learn to grow his own products as a solo engineer.

Coming up with an idea

One app Ignacio wished he had when he was younger was a tool to help him pick careers. Talking to a lot of younger folks (including his younger brother), it felt more important than ever. And AI was just coming on the scene.

What if AI could help you flesh out your thinking? You could start with rough goals, and AI would help you think through next steps. Ignacio soon saw that this was more general than careers.

This product could help you achieve any goal. It would be almost like having a…mentor!

Building Mentor

So Ignacio started building Mentor. He wanted to move quickly and focus on what made his product special: the UX for goals and a great AI integration.

He searched for infra and discovered Instant. He tried it out and was able to build his version 1 within 2 weeks. From his own words:

The fact that you include everything: from auth, a data layer, a client sdk, a way to mutate on the server, and permissions…it’s not any one thing. When you put this together they create a great experience.

Ignacio De Haedo

When a tool is batteries-included, things just tend to work. Instant’s real-time abstractions also made it easier for Ignacio to create a delightful UX.

Goals are inherently relational: every goal has a subgoal and so on. This was easy for him to express with Instant’s relational query language. And since everything is real-time, every tab auto-synced. Because Instant worked offline, it meant Ignacio’s users could run Mentor in spotty connections. And since optimistic updates came by default, every action on Mentor felt snappy.

Soon Ignacio had a pretty darn compelling app. Here’s a quick demo of how Mentor can you help get in the best shape of your life:

Now that Ignacio had a product he could focus on users.

Hitting the top of Product Hunt…twice

He launched on Product Hunt and hit the front page not once, but twice. He iterated until he felt Mentor reached a stable state: users were fans, and Ignacio himself used Mentor every day.

As Ignacio kept improving Mentor, he traveled back to his hometown in Uruguay and met his close friend Javier. That’s when everything changed again.

To Meeting Javier

Javier was a veteran AI engineer. He worked on AI for the last 10 years, long before people had ever heard of ChatGPT or transformers.

They both shared a love for real-estate and prop-tech, and they both saw an opportunity in Latin America — a market they were deeply familiar with, and knew was underserved.

Latin America is full of great engineers, but most of them export their work to the United States. This means a lot of technology used day-to-day feels outdated. Ignacio and Javier experienced this first-hand with real-estate.

To Building Mirando

As home buyers, Ignacio and Javier found themselves frustrated. There was no single place to find every listing. Agents all had separate sites, and many agents listed the same homes. This meant that you’d have spend lots of time scouring different websites and manually de-duping homes.

They realized the experience was no better for agents too. When a home buyer signs up with an agent, they want a tailored experience. Home buyers want to see a list of places that fit their requirements. To build a list like this agents would have to go across multiple different sites, negotiate with their colleagues, and build custom documents. That would take days.

So Ignacio and Javier started to think of solutions. What if you could get all homes in place, and you could get an AI that could help you find homes that you love? Agents could use this too and reduce their research time from days to minutes.

The Script that Started It

As a proof of concept Javier built a script that amalgamated homes across a few agent sites in one place. Just this was already a huge improvement. So they started to turn the script into a real app.

Convincing Javier on Instant

Javier first started to build a version 1 on an Instant competitor ^[1]. When Ignacio saw this, he knew he had convince Javier to switch. The competitor’s product worked, but things weren’t real time, it took longer to build, and the devex didn’t feel right.

So what did Ignacio do? He shipped a PR to demonstrate the difference. The PR was full of deletions:

The diff. It was crazy the number of lines of code I deleted. And we got wins. The live sync. The auth was better. And the free tier was a lot better. I showed him the diff, and [Javier] said I trust you.

Ignacio De Haedo

Ignacio reduced the code-size by 70%. With Javier on board, Ignacio built out the UX and Javier built out the AI.

Shipping Mirando

Mirando launched with very happy users. Home buyers and agents finally had a place where they could look through multiple homes.

Search was a first class citizen on Mirando too. You could create a search, you could share it, and it was real-time. The more you customized your search, the more info the AI had to tailor your experience.

Soon large agencies were knocking down their doors. They wanted to give Mirando to their agents, so they could create custom-tailored searches for clients.

Surprises in real-time sync

Ignacio and Javier believed that when you’re searching for homes, it should feel fun. You should be able to share your search with friends and loves ones. If someone likes a home, everyone should see it right away.

Instant’s real-time sync came surprisingly handy for that. With reactive queries, they could create a search experience that felt collaborative. Here’s a demo of how a Mirando user could collaborate on a search with someone else:

When we asked Ignacio what he loved the most about Instant, he picked sync:

The real-time sync. It's the feature that reduces the most boilerplate code. It's the feature that makes the app feel like magic. And it helps justify a lot of the UX efforts we want to build.

Ignacio De Haedo

From Montevideo to Latin America

It’s only the beginning for Ignacio and Javier. They started working on Mirando in January. Today they have large agencies signing up in Montevideo, Uruguay. They have a truly delightful experience, and an AI agent that keeps getting better after every search.

They plan to grow to all of Latin America, and we’re so darn excited to be supporting their infra.

[1] For gentlemanly reasons we will not mention names

GPT 5 vs Opus 4.1 for Vibe-Coded Apps

Stepan Parunashvili — Fri, 08 Aug 2025 00:00:00 GMT

We're InstantDB, we make it easy to add a backend with auth, file storage, and real-time updates to your web and mobile apps.

I've been seeing posts comparing GPT-5 and Sonnet, but thought comparing GPT-5 and Opus 4.1 would be more interesting!

So how do GPT-5 and Opus 4.1 perform with building apps? To find out I asked them both to build a full stack app for making chiptunes in Instant. Here’s the prompt I used:

Create a chiptunes app.

- Log in with magic codes
- Users should be able to compose songs
- Users should be able to share songs
- Users can only edit their own songs
- Make the theme really cool
- Let’s keep everything under 1000 lines of code.

I recorded myself going through the process in this video. In this post I’ll share the results and some of the surprises I discovered when prompting!

What a change in 4 months…

We actually ran the same test in April. We compared o4-mini with Claude 3 Sonnet. o4-mini made a barebones version (see here). Sonnet made a good UI but couldn’t actually write the backend logic.

Now both apps look pretty cool, both apps have auth, permissions, and a much slicker way to compose songs.

GPT5’s work

Here’s the result that GPT-5 came up with: https://gpt5-chiptunes.vercel.app/

You can log in, create songs, and share them. This was my creation:

Opus’ Work

And this is what Opus came up with: https://opus-chiptunes.vercel.app/

We needed a few more prompts to get sharing working, but once it did, here’s one song our co-founder Joe came up with:

How much got done in one shot

Both models got a lot done in one shot.

What got done in one shot
	GPT-5	Opus
Schema?	✅	✅
Permissions?	✅	✅
Create songs?	✅	✅
Share Songs?	✅	❌
UI?	❌	✅

They both figured out auth, data models, permissions, and at least the flow to create songs in one go.

One difference is that GPT-5 was able to get song sharing working in one shot. Opus needed two additional nudges to get there. Initially Opus talked about making songs shareable, but did not actually implement it. First Opus added support for sharing songs, but gated it to logged in users. A second prompt helped Opus open songs for public consumption.

However, Opus’ UI was more slick. You can also see that GPT-5's UI has some responsiveness issues on mobile. I do think OpenAI improved UI skills a lot compared to their earlier models. For now I think Opus has the edge in UI.

Hiccups

Both models made a few errors before the projects built. Here’s how that looked:

Places the models had an error
	GPT-5	Opus
db.SignedIn?	❌	✅
Query Issues?	✅	❌
Next Query Params?	❌	❌

Both models made about 2 errors. All errors were all related to new features. Next.js has a new flow for query params, and Instant just added a "db.SignedIn" component.

But both models fixed all errors in one shot. They just needed me to paste an error message and they were able to solve it.

It was interesting to see how GPT-5 made an error with "db.SignedIn". Instructions for how to use it were already included in the rules.md file. I think this is related to how closely the models follow rules.

Opus seemed to follow the rule file more closely, while GPT-5 seems to explore more. Opus used the exact same patterns that provided in the rules file. This let them skip past the "db.SignedIn" bug. On the other hand, GPT-5 seemed to be more free with what it tried. It did get more bugs, but it wrote code that was objectively more "different" then the examples that we provided. In one case, it wrote a simpler schema file.

Gaps are closing

This is GPT-5 source, and this is the Opus source. In the last few months it feels like Claude and Claude Code have been the dominant choice for vibe coding apps. With the new GPT5 model it feels like the gap is closing.

Really interesting times ahead!

Thanks to Joe Averbukh, Daniel Woelfel for reviewing drafts of this post

How and where will agents ship software?

Stepan Parunashvili, Nikita Prokopov — Mon, 14 Jul 2025 00:00:00 GMT

We’re entering a new phase of software engineering. People are becoming addicted to agents. Beginners are vibe-coding apps and experts are maxing out their LLM subscriptions. This means that a lot more people are going to make a lot more apps, and for that we’re going to need new tools.

Today we’re releasing an API that gives you and your agents full-stack backends. Each backend comes with a database, a sync engine, auth tools, file storage, and presence.

Agents can use these tools to ship high-level code that’s easier for them to write and for humans to review. It’s all hosted on multi-tenant infrastructure, so you can spin up millions of databases in milliseconds. We have a demo at the end of this essay.

Let us explain exactly why we built this. We think that humans and agents can make the most progress when they have (1) built-in abstractions that (2) can be hosted efficiently and (3) expose data.

Built-in Abstractions

To build an app you write two kinds of code. The business logic that solves your specific problem, and the generic stuff that most apps have to take care of: authenticating users, making queries, running permissions, uploading files, and executing transactions.

These are simultaneously critical to get right, full of edge cases, and also not the differentiating factor for your app — unless they’re broken.

If all this work isn’t differentiating, why build it? When a good abstraction exists, it’s a waste of tokens to build it again.

And agents need good abstractions even more than human programmers do.

Locality

To make agents work well we need to manage their context windows. It’s very easy to break through limits. Especially when agents write code that involves multiple moving pieces.

Consider what happens when an agent adds a feature to a traditional client-server app. They change (a) the frontend (b) the backend and (c) the database. In order to safely make these changes, they have to remember more of the codebase and be exact about how things work together.

Good abstractions can combine multiple moving pieces into one piece. This is more conducive to local reasoning. The agent only has to concern themselves with a smaller interface, so they don’t have to remember so much. They can use less context and write higher-level code. And that’s great for humans too. After all we have to review the agent’s work. Shorter, higher-level code is easier to understand. ^[1]

And when both humans and agents make more progress, they build more apps. And when they build more apps, how will they host them?

Cost-Efficient Hosting

The dominant way to host applications has been to use virtual machines. VMs are efficient when you have a single app that serves many users. They’re inefficient when you have many apps that serve fewer users.

Overhead

Let me illustrate with some napkin math. Consider 1 app that servers 20,000 active users, versus 20,000 apps that serve 1 active user:

For our 1 big app, we would need 2 beefy VMs. That’s about $800 a month. Not only is this affordable, but it makes for a fast app. Slow algorithms can take advantage of hefty CPUs and a lot more data can stay in memory.

For our 20,000 small apps we would need 40,000 VMs. That’s about $95,000 a month. Not only is this expensive, but it makes for slow apps. Slow algorithms would choke tiny CPUs and less data would stay in memory.

Friction

We’re not suggesting that people want to make 20,000 apps. We’re pointing out an inefficiency. Running applications today comes with overhead, particularly in RAM.

And when there’s overhead there’s friction. Today platforms freeze machines or limit how many apps you can spin up. In an era where every human can create lots of apps, this feels like a bummer.

Could we do better?

Getting Specific

Let’s think about why we needed VMs in the first place. VMs let programmers write code that’s arbitrarily different. But most apps aren’t arbitrarily different.

If we can get specific about what applications actually do, we can choose better isolation strategies.

For example what if we knew that an agent didn’t have to use the GPU? We could skip traditional VMs and use Micro VMs ^[2] instead. That reduces the overhead by a few tens of megabytes of RAM, and lets us spin down inactive apps ^[3]. That’s better, but we can keep going.

What if we knew that an agent wanted to write Javascript functions? We could skip VMs and use V8 Isolates ^[4]. Each isolate takes about 3 megabytes of RAM. That’s 2 orders of magnitude more efficient. But we can still keep going.

What if we knew that agent wanted to write access controls? We could give them a more restricted language like CEL ^[5]. CEL only needs a few kilobytes of overhead per function. That’s close to 4 orders of magnitude more efficient than VMs. And we can still keep going.

What if the agent didn’t have to write any code at all? If we knew what the agent was trying to accomplish — say to authenticate users — we could give them a multi-tenant service which did that.

A maximally efficient future

We can create efficient apps by choosing appropriate isolation strategies.

Shared abstractions could be served from multi-tenant services on big machines. Permissions could use CEL, javascript callbacks could run on V8 Isolates, and shell commands run on Micro VMs. If we did that, 20,000 apps with 1 active user would cost about the same as 1 app with 20,000 users.

Humans and agents would be able to deploy apps with little friction. Once these apps are deployed, how will people use them?

Exposed Data

Traditionally, end-users were non-technical and would be stuck with whatever the application developer gave them. But now every user has an LLM too.

If one agent helps build the software, why shouldn’t another agent be able to extend it?

When every user has an agent, extendable software is an advantage. It’s in the application developer’s best interest: it can turn their apps into platforms, which are stickier. And it’s in the end-user’s best interest: they can get more out of their apps.

To make software extendable, developers generally used APIs. But APIs have a problem: application developers have to build them first. This means users are limited by what application developers thought were needed.

Databases are different. When apps are written on a database-like abstraction, users are free to make arbitrary queries and transactions. The application developer doesn’t have to foresee much. End-users can read and write whatever data they need to build all sorts of custom UIs ^[6].

And if that's true, database-like abstractions are going to be an advantage.

A Multi-Tenant Sync Engine

So if agents and humans work best when they have (1) built-in abstractions that are (2) hosted efficiently and (3) expose data, what infrastructure works best?

Let's start by thinking through what agents are good at. Agents are good at writing self-contained code. Code that they can reason about in one place, without too much extraneous state and edge cases. This is why the traditional client-server architecture is hard for them: it involves multiple parts that all need to work in unison — a server, a client, and a database.

There are several ways to build self-contained apps. You can build a local-only desktop app (but then — no internet, multiple devices, or collaboration). You can build a server-only app (then you get latency, no offline mode, hosting costs). Or you could build a client-only app that treats the backend like a remote database.

In other words, a sync engine.

Sync engines let you work with data as if it was local and not worry about fetching it, persisting it, managing optimistic state, atomic transactions, retries and many other schleps. That’s a powerful abstraction (1).

Queries and transactions are straight-forward to sandbox. You can host them on multi-tenant platforms. Which makes for efficient apps (2).

And since you get a database-like abstraction, exposing data is relatively straightforward too (3).

That’s the future we are building Instant for.

A Tool for Builders

When we started Instant, agents were nowhere in sight. We focused on builders. Turns out if you design for builders, you end up making something good for agents too.

Builders want good abstractions. So we built a sync engine, permissions, auth, file storage, and ephemeral state (like cursors).

Builders also want efficient hosting. They have lots of projects, and it sucks when apps end up frozen. So we made our sync engine and database multi-tenant. This way we could offer a generous free tier.

Exposing the API

Instant is already great for builders. Real startups use Instant, and push upwards of 10,000 concurrent connections.

Today we're making it even easier. We're releasing three things:

A platform SDK that lets you create new apps on demand
A remote MCP server that makes it easy to integrate Instant in your editor.
A set of Agent rules that teach LLMs how to use Instant

Put this together and you get a toolkit that lets humans and agents make more progress and do it efficiently. Let's try them out.

Discussion on HN

Thanks to Joe Averbukh, Daniel Woelfel, Alex Kotliarskyi, Ian Alejandro Sinnott, Cam Glynn, Anupam Batra, Predrag Gruevski, Irakli Popkhadze, Cody Breene, Kote Mushegiani, Nicole Garcia Fischer for reviewing drafts of this essay

[1] We can probably make the review experience even better. If code is high-level enough, maybe we don’t need to show it. We could build UIs around abstractions and use them to summarize changes.

^[2]: To learn more, check out Firecracker

^[3]: Though there’s some caveats to Micro VMs. Spinning up VMs still take a few hundred milliseconds. Some operations are slow, and sometimes you can't spin them down (if you have a database with logical replication for example).

^[4]: Check out this essay from Cloudflare

^[5]: The CEL website is a good place to learn more.

[6] This opens up more questions. If you expose data, could you expose UIs too? What if every app shared their UI components. This is a bit too hazy to include in the essay, but it could make for an interesting experiment.

Sync Engines are the Future

Nikita Prokopov — Mon, 17 Mar 2025 00:00:00 GMT

Hi! Niki here, also known as @nikitonsky. You might know me for DataScript, The Web After Tomorrow or Your frontend needs a database. Last December, I joined Instant to continue my journey of bringing databases into the browser. Here’s my mission:

The modern browser is an OS. Modern web app is a distributed app. So any web app developer is facing a well-known, well-understood, notoriously hard problem: syncing data.

Look, I’ve been around. I’ve seen trends come and go. I’ve seen data sync treated as a non-existent problem for two decades now. You’ve got XHR. You’ve got fetch. You’ve got REST and GraphQL. What else might you want?

The problem is, all these tools are low-level. They solve the problem of getting data once. But getting data is a continuous process: data changes over time and becomes stale, requests fail, updates arrive later than you might’ve wanted, or out of order. Errors will happen. Occasional if (!response.ok) will not get you very far.

fetch(new Request('/user/update', { method: 'POST' })).then((response) => {
  if (!response.ok) {
    // Do what? Pretend it never happened?
    // Stop the entire application?
    // Retry? What if user already issued another update that invalidates this one?
    // What if update actually got through?
  }
});

And you can’t just give up and declare everything invalid. You have to keep working. You need a system. You can’t solve this problem at the level of single request.

It’s also ill-advised to try to solve data sync while also working on a product. These problems require patience, thoroughness, and extensive testing. They can’t be rushed. And you already have a problem on your hands you don’t know how to solve: your product. Try solving both, fail at both ^[1].

Funny enough, edge cases aren’t that unique from project to project. Everyone wants their data synced. Everyone wants their data correct and delivered exactly once. Everyone wants it fast, compact, and in time. A perfect case for a library.

Such a library would be called a database. But we’re used to thinking of a database as something server-related, a big box that runs in a data center. It doesn’t have to be like that! Databases have two parts: a place where data is stored and a place where data is delivered. That second part is usually missing.

Think about it: we want two computers to talk and coordinate how to sync data. It’s obvious that both computers will need to run some code, and that code will need to be compatible. In short, we want to run a database on the frontend. It’s not enough to “just fetch data” over some simple JSON protocol or a generic JDBC driver. As data changes on both sides on completely independent timelines, you need to push, pull, coordinate, negotiate, validate, retry, guard against. Data sync is a complex problem, and the client needs to be as sophisticated as the backend. They need to work together.

But once you do that, you’re free. You’ll get your data synced for you—more reliably and efficiently than you could ever do by hand. You’ll be able to work with your data as if it’s all local and forget about sync most of the time.

In a perfect world, where everything is solved, what would programming look like? 99% business logic, 1% setup, right? Pure data and operations on data. People don’t want quarter-inch drill bits, they want quarter-inch holes. Paraphrasing that for programming: people don’t want databases. They want data.

Well, that’s what sync engines are supposed to solve—pure, clean, functional business code, decoupled from the horrors of an unreliable network. The best time of my life was when I was working with local data and something else synced it in the background.

You’d get a database on your hands, too. It might sound controversial, but databases can be good at managing data. Queries are more concise, access is faster, and data is more organized. I’m a minimalist myself, but some things are simply better when queried from a (local) database. Would be faster, too.

for (id of ids) {
  const user = users[id];
  for (const post_id of user.post_ids) {
    const post = posts[post_id];
    for (const comment_id of post.comment_ids) {
      const comment = comments[comment_id];
      if (comment.author_id === id) {
        // there must be a better way...
      }
    }
  }
}

Quick: what’s the data structure for when you want to query both posts by authors and authors by posts? Or: I’ve yet to see a code base that has maintained a separate in-memory index for data they are querying. Or does a hash join, for that matter. Usually it’s some form of four nested loops over an uncontrollable mix of maps and arrays. Not judging—I’ve been there—but there are tools that do it better and faster for you. Easier to read, too.

Then there’s SQL. It’s the best, and it’s the worst. I took a break from it for a few years, and I completely forgot what crazy things it can do—but also how crazy some simple things are. Something as simple as

const query = {
  goals: {
    todos: {},
  },
};

turns into

SELECT g.*, gt.todos
FROM goals g
JOIN (
  SELECT g.id, json_agg(t.*) as todos
  FROM goals g
  LEFT JOIN todos t on g.id = t.goal_id
  GROUP BY 1
) gt on g.id = gt.id

when queried through SQL.

Of course, there’s legacy, there’s existing tooling, and there are all the teaching materials. It’s hard to replace SQL, and it’s twice as hard to beat it. All I’m saying is: if you don’t like databases because of SQL, I get you. Really. I understand. You are not alone.

What I know for a fact is that you can get where you going without SQL. I worked with Datalog for a while, and did all the same things without ever touching SQL. I know it’s possible—I’ve seen it myself. There are other, equally powerful query languages that can get real work done with (possibly) better ergonomics. SQL is not the end of the road.

So, what’s the significance of sync engines? I have a theory that every major technology shift happened when one part of the stack collapsed with another. For example:

Web apps collapsed cross-platform development. Instead of developing two or three versions of your app, you now develop one, available everywhere!
Node.js collapsed client and server development. You get one language instead of two! You can share code between them!
Docker collapsed the distinction between dev and prod.
React collapsed HTML and JS, Tailwind collapsed JS and CSS.

So where does that leave sync engines? They collapse the database and the server. If your database is smart enough and capable enough, why would you even need a server? Hosted database saves you from the horrors of hosting and lets your data flow freely to the frontend.

I never thought this was possible in practice, but then Roam Research proved me wrong. For the first few years after public release, they didn’t have a single server. Everything was synced to and served from Firebase. Living the dream.

That more or less covers it. We are building a sync engine because syncing data ad hoc, situationally is both hard and error-prone. We are also building it because we believe it simplifies the stack in a meaningful way. After all, we want our AI overlords to have a good time programming, too.

Discussion on HN

Thanks Stepan Parunashvili, Joe Averbukh, and Kevin Lynagh for reviewing drafts of this essay.

[1] Unless you have unlimited time and resources. Yes, Figma and Linear both built their sync engines while also building their product. Exceptions happen.

A Major Postgres Upgrade with Zero Downtime

Stepan Parunashvili, Daniel Woelfel — Wed, 29 Jan 2025 00:00:00 GMT

We’re Instant, a modern Firebase. You can spin up a database and make queries within a minute — no login required.

Right before Christmas we discovered that our Aurora Postgres instance needed a major version upgrade. We found a great essay by the Lyft team, showing how they ran their upgrade with about 7 minutes of downtime.

We started with Lyft’s checklist but made some changes, particularly with how we switched masters. In our process we got to 0 seconds of downtime.

Doing a major version upgrade is stressful, and reading other’s reports definitely helped us along the way. So we wanted to write an experience report of our own, in the hopes that it’s as useful to you as reading others were for us.

In this write-up we’ll share the path we took — from false starts, to gotchas, to the steps that ultimately worked. Fair warning, our system runs at a modest scale. We have less than a terabyte of data, we read about 1.8 million tuples per second, and write about 500 tuples per second as of this writing. If you run at a much higher scale, this may be less relevant to you.

With all that said, let’s get into the story!

State of Affairs

Let’s start with a brief outline of our system:

Browsers connect to sync servers. Sync servers keep track of active queries. Sync servers also listen to Postgres’ write-ahead log; they take transactions, find affected queries, and send novelty back to browsers. ^[1] Crucially, all Instant databases are hosted under one Aurora Postgres instance. ^[2]

Trouble Erupts

After our open source launch in August ^[3], we experienced about a 100x increase in throughput. For the first 2 months, whenever we saw perf issues they usually lived in our Client SDK or the Sync Server. When we hit a new high in December though, our Aurora Postgres instance started to spike in CPU and stumble.

To give us breathing room, we kept upgrading the size of the machine, until we reached db.r6g.16xlarge. ^[4] We had to do something about the queries we were writing.

Sometimes, new is better than old

We started to reproduce slow queries locally and began to optimize them. Within the first hour we noticed something strange: one teammate constantly reported faster query results then the rest of us.

Turns out this teammate was running Postgres 16, while most of us (and our production instance) were running Postgres 13.

We did some more backtesting and realized that Postgres 16 improved many of the egregious queries by 30% or more. Not bad. There came our first learning: sometimes, just upgrading Postgres is a great way to improve perf. ^[5]

So we thought, let’s upgrade to Postgres 16. Now how do we go about it?

False Starts

We were a team of 4 and we were in a crunch. If we could find a quick option we’d have been happy to take it. Here’s what we tried:

1) In-Place Upgrades...but they take 15 minutes

The easiest choice would have been to run an in-place upgrade. Put the database in maintenance mode, upgrade major versions, then turn it back on again. In RDS console you can do this with a few button clicks.

The big problem is the downtime. Your DB is in maintenance mode for the entirety of the upgrade. The Lyft team said an in-place upgrade would have caused them a 30 minute outage.

We wanted to test this for ourselves though, in case a smaller database upgraded more quickly. So we cloned our production database and tested an in-place upgrade. Even with our smaller size, it took about 15 minutes for the clone to come back online.

Crunch or not, a 15-minute outage was off the table for us. Since launch we had folks sign up across the U.S, Europe and Asia; traffic ebbed and flowed, but there wasn’t a period where 15 minutes of downtime felt tolerable.

2) Blue-Green Deployments...but you can’t have active replication slots

Well, Aurora Postgres also has blue-green deployments. AWS spins up an upgraded replica for you, and you can switch masters with a button click. They promise about a minute of downtime.

With such little operational effort, a minute of downtime sounded like a great option for us.

So we cloned our DB and tested a blue-green deployment. Yup, the connection came back in a minute! It looked like we were done. Until we tried a full rehearsal.

We spun up a complete staging environment, this time with active sync servers and connected clients. Now the blue-green deployment would go on for 30 minutes, and then break with a configuration error:

Creation of blue/green deployment failed due to incompatible parameter settings. See link to help resolve the issues, then delete and recreate the blue/green deployment.

The next few hours were frustrating: we would change a setting, start again, wait 30 minutes, and invariably end up with the same error.

Once we exhausted the suggestions from this error message, we began a process of elimination: when did the upgrade work, and what change made it fail? Eliminating the sync servers revealed the issue: active replication slots.

Remember how our sync servers listen to Postgres’ write-ahead log? To do this, we opened replication slots. We couldn’t create a blue-green deployment when the master DB had active replication slots. The AWS docs did not mention this. ^[6]

At least this experience highlighted a learning: always run a rehearsal that’s as close to production as possible, you never know what you’ll find.

In order to stop using replication slots we’d have to disconnect our sync servers. But then we would lose reactivity, potentially for 30 minutes. Apps would appear broken if we queries were out of sync that long; blue-green deployments were off the table too.

A Plan for Going Manual

When the managed options don’t work, it’s time to go manual. We knew that a manual upgrade would have to involve three steps:

First, we would stand up a new replica running Postgres 16 — Let’s call this machine "16". Once 16 was running, we could get our sync servers to subscribe to 16. The remaining step would be to switch writes "all in one go" (what this meant TBD) to 16. When that was done, migration done.

Now to figure out the steps

1) Replicate to 16

The first problem was to create our replica running Postgres 16.

a) Clone-Upgrade-Replicate led to...lost data

Lyft had a great series of steps to create a replica, so we tried to follow it. There were three stages:

First, we clone our database, then we upgrade our clone, and then we start replication. By the end, our clone would have become a replica running Postgres 16.

Steps 1 (clone) & 2 (upgrade) worked great. The trouble started with step 3 (replicate).

Lost PG functions

When we turned on replication, we saw this error:

:ERROR: function is_jsonb_valid_timestamp(jsonb) does not exist at character 1

That’s weird. We did have a custom Postgres function called is_jsonb_valid_timestamp. And the function existed on both machines; if we logged in with PSQL, we could write queries:

select is_jsonb_valid_timestamp('1724344362000'::jsonb);

 is_jsonb_valid_timestamp
--------------------------
 t

We thought maybe there was an error with our WAL level, or maybe some input worked in 13, but stopped working in 16.

Search paths

So we went down a rabbit hole investigating and searching in PG’s mailing list. Finally, we discovered the problem was search paths. ^[7]

show search_path;

   search_path
-----------------
 "$user", public

Postgres stores custom functions in a schema. When you write a function in your query, PG uses a search_path to decide which schema to look into. During replication, Postgres was having trouble finding our function. To get around this issue, we wrote a PR to add the public prefix explicitly in all our function definitions:

-- Before:
create or replace function is_jsonb_valid_timestamp(value jsonb)
-- After:                   👇
create or replace function public.is_jsonb_valid_timestamp(value jsonb)

Note to us: make sure to use public in all our function definitions. ^[8]

With PG functions working, 3) replicate ran smoothly! Or so we thought.

Missing data

For all intents and purposes, our new clone looked like a functioning replica. But we wanted to absolutely make sure that we didn’t lose any data.

Thankfully, we had a special transactions table — it’s an immutable table we use internally ^[9]:

instant=> \d transactions;

   Column   |            Type             | -- ...
------------+-----------------------------+
 id         | bigint                      |
 app_id     | uuid                        |
 created_at | timestamp without time zone |

Since we never modify rows, we could also use the transactions table for quick sanity checks — was there any data lost in the table? Here’s the query we ran to do that:

-- On 13
select max(id) from transactions;
select count(*) from transactions where id < :max-id;

-- Wait for :max-id to replicate ...
-- On 16
select COUNT(*) from transactions where id < :max-id;

To our surprise...we found 13 missing transactions! That definitely stumped us. We weren’t quite sure where the data loss came from ^[10]

b) Create, Replicate...worked great!

So we went back to the drawing board. One problem with our replica checklist was that it had about 13 steps in it. If we could remove the number of steps, perhaps we could kill whatever caused this data loss.

So we cooked up an alternate approach:

Instead of creating, cloning, and then upgrading, we would start with a fresh database running Postgres 16, and replicate from scratch. Lyft chose to clone their DB, because they had over 30TB of data and could leverage Aurora Cloning. But we had less than a terabyte of data; starting replication from scratch wasn’t a big deal for us. ^[11]

So we created a checklist and ended up with 7 steps:

Checklist: Create an upgraded Replica

16: Create a new Postgres Aurora Database on Postgres 16.

Make sure to set wal_level = logical

13: Extract the schema

pg_dump ${DATABASE_URL} --schema-only -f dump.schema.sql

16: Import the schema into 16

psql ${NEW_DATABASE_URL} -f dump.schema.sql

13: Create a publication

create publication pub_all_table for all tables;

16: Create a subscription with copy_data = true

create subscription pub_from_scratch
connection 'host=host_here dbname=name_here port=5432 user=user_here password=password_here'
publication pub_from_scratch
with (
  copy_data = true, create_slot = true, enabled = true,
  connect = true,
  slot_name = 'pub_from_scratch'
);

Confirm that there’s no data loss

 -- On 13
 select max(id) from transactions;
 select count(*) from transactions where id < :max-id;

 -- Wait for :max-id to replicate ...
 -- On 16
 select count(*) from transactions where id < :max-id;

16: Run vacuum analyze
```
 vacuum (verbose, analyze, full);
```

We ran step 6 with bated breath...and it all turned out well! ^[12] Now we had a replica running Postgres 16.

2) Switching Subscriptions

Next step, to switch subscriptions. Let’s remind ourselves what we’re looking to do:

We’d need to get our sync servers to create replication slots in 16, rather than 13.

To do this, we added a next-database-url variable to our sync servers. During startup, if next-database-url was set, sync servers would subscribe from there:

;; invalidator.clj
;; `start` runs when the machine boots up
(defn start
  ([process-id]
    ; ...
    (wal/start-worker {:conn-config
                      (or (config/get-next-aurora-config)
                          ;; Use the next db so that we don't
                          ;; have to worry about restarting the
                          ;; invalidator when failing over to a
                          ;; new db.
                          (config/get-aurora-config))})
    ; ...
    ))

Once we deployed this change, sync servers replicated from 16. Phew, this was at least one step in the story that didn’t feel nerve-wracking!

3) Switching Writes

Now to worry about writes:

Ultimately, we needed to click some button and trigger a switch. To make the switch work, we’d need to follow two rules:

16 must be caught up

If there are any writes in 13 that haven’t replicated to 16 yet, we can’t turn on writes to 16. Otherwise transactions would come in the wrong order
Once caught up, all new writes must go to 16

If any write accidentally goes to 13, we could lose data.

So, how could we follow these rules?

We could stop the world...but that’s downtime

The simplest way to switch writes would have been to stop the world:

Turn off all writes.
Wait for 16 to catch up
Enable writes again — this time they all go to 16

If we manually executed each step in ‘stop the world', we’d have about a minute of downtime. We could write a function which did these steps for us, and get to only a few seconds of downtime. But we had already spent a day setting up our manual method, could we do better?

Since we were switching manually we had finer control over our connections. We realized that with just a little bit more work...we could have no downtime at all!

Or we could write an algorithm with zero downtime!

Our co-author Daniel shared an algorithm he used at his previous startup:

First, we pause all new transactions. Then, we wait for active transactions to complete and for 16 to catch up. Finally we unpause all transactions and have them go to 16. If we did this right, we could switch major versions without any downtime at all!

The benefits of being small

Sounds good in theory, but it can be hard to pull off. Unless of course you run at a modest scale.

Our switching algorithm hinges on being able to control all active connections. If you have tons of machines, how could you control all active connections?

Well, since our throughput was still modest, we could temporarily scale our sync servers down to just one giant machine. Clojure and Java came handy here too. We had threads and the JVM is efficient, so we could take full advantage of the m6a.16xlarge sync server we moved to for the switch.

Writing out a failover function

So we went forward and translated our zero-downtime algorithm into code. Here’s how it looked:

(defn do-failover-to-new-db []
  (let [prev-pool aurora/-conn-pool
        next-pool (start-new-pool next-config)
        next-pool-promise (promise)]

    ;; 1. Make new connections wait
    (alter-var-root #'aurora/conn-pool (fn [_] (fn [] @next-pool-promise)))

    ;; 2. Give existing transactions 2.5 seconds to complete.
    (Thread/sleep 2500)
    ;; Cancel the rest
    (sql/cancel-in-progress sql/default-statement-tracker)

    ;; 3. Wait for 16 to catch up
    (let [tx (transaction-model/create! aurora/-conn-pool
                                        {:app-id (config/instant-config-app-id)})]
      (loop [i 0]
        (if-let [row (sql/select-one next-pool
                                      ["select * from transactions where app_id = ?::uuid and id = ?::bigint"
                                      (config/instant-config-app-id) (:id tx)])]
          (println "we are caught up!")
          ;; Still waiting...
          (do (Thread/sleep 50)
              (recur inc i)))))


    ;; 4 accept new connections!
    (deliver next-pool-promise next-pool)
    (alter-var-root #'aurora/-conn-pool (fn [_] next-pool))))

We spun up staging again, ran our failover function...buut transactions failed again. We were getting unique constraint violations on our transactions table.

Don’t forget sequences

This time the fix was easy to catch: sequences. Postgres does not replicate sequence data. This meant that when a new transaction row was created, we were using ids that already existed.

To fix it, we incremented our sequences in the failover function:

-           (println "we are caught up!")
+           (sql/execute! next-pool
+                         ["select setval('transactions_id_seq', ?::bigint, true)"
+                         (+ (:id row) 1000)])

This time we ran the failover function...and it worked great!

If you’re curious, here’s how the actual failover function looked for production.

Running in Prod

Now that we had a good practice run, we got ourselves ready, had our sparkling waters in hand, and began to run our steps in production.

After about a 3.5 second pause ^[13], the failover function completed smoothly! We had a new Postgres instance serving requests, and best of all, nobody noticed. ^[14]

Future Improvements

Our do-failover-to-new-db worked at our scale, but will probably fail us in a few months. There are two improvements we plan to make:

We paused both writes and reads. But technically we don’t need to pause reads. Daniel pushed up a PR to be explicit about read-only connections. In the future we can skip pausing them.
In December we were able to scale down to one big machine. We’re approaching the limits to one big machine today. ^[15] We’re going to try to evolve this into a kind of two-phase-commit, where each machine reports their stage, and a coordinator progresses when all machines hit the same stage.

Fin

Aand that’s our story of how did our major version upgrade. We wanted to finish up with a summary of learnings, in the hopes that’s easier for you to get back to this essay when you’re considering an upgrade. Here’s what we wish we knew when we started:

Sometimes, newer Postgres versions improve perf. Make sure to check this if you face perf issues.
If you need to upgrade
1. Pick a buddy if you can, it’s a lot more fun (and less nerve-racking) to do this with a partner.
2. Before you do anything in production, do a full rehearsal. Use a staging environment that mimics production as closely as possible
3. If you are okay with 15 minutes of downtime, do an in-place upgrade
4. If you don’t have active replication slots and are okay with a minute of downtime, try a blue-green deployment
5. When you need to do a manual upgrade:
  1. If you can, skip cloning and create a replica from scratch. There are only 7 steps
  2. If you wrote custom pg functions, make sure to check your search_path
  3. Do some sanity checks to make sure you don’t lose data
  4. If you can get writes down to one machine, try our algorithm for zero downtime

Hopefully, this was a fun read for you :)

Dicussion on HN

Thanks to Nikita Prokopov, Joe Averbukh, Martin Raison, Irakli Safareli, Ian Sinnott for reviewing drafts of this essay

^[1]: Our sync strategy was inspired by Figma’s LiveGraph and Asana’s Luna. The LiveGraph team wrote a great essay that explains the sync strategy. You can read our original design essay to learn more about Instant

[2] You may be wondering: how do we host multiple "Instant databases", under one "Aurora database"? The short answer is that we wrote a query engine on top of Postgres. This lets us create a multi-tenant system where we can "spin up" dbs on demand. I hope to share more about this in a separate essay.

^[3]: All of the code (including this blog) is open sourced here.

^[4]: db.r6g.16xlarge would cost us north of 6K per month. That was out of the question for the kind of traffic we were handling.

[5] In case you were wondering, we also looked to optimize the queries. After we upgraded (took about a day and a half), we added a partial index that improved perf another 50% or so.

^[6]: We did see a note about replication in "Switchover Guardrails", but this note is about the second step: after 1) creating a green deployment, we 2) run the switch.

[7] The key to discovering this issue was our co-author Daniel’s sleuthing. He planned test upgrades locally: going from 13 → 14 → 15 → 16, to see where things broke. When Daniel tried 13 → 14, it failed. To sanity check things, he then tried a migration from 13 → 13…and that failed too! From there we knew something had to be up with our process.

[8] An alternative would have been to enhance the dump file with the search path. We like the idea of being more explicit in our definitions though; especially if we can find a good linter.

[9] Why do we have it? We use the transaction’s id column for record-keeping inside sync servers.

^[10]: If you are curious, you can look at a slice of the checklist we used here. If you have a hunch where the data loss could have come from, let us know

[11] Though even with 30TB, it would only take a week to transfer at a modest 50 mb/second.

^[12]: You may be wondering — sure, the transactions table was okay, but what if there was data loss in other tables? We wrote a more involved script to check for every table too. We really wanted to make sure there was no data loss.

[13] About 2.5 seconds to let active queries complete, and about 1 second for the replica to catch up

[14] You may be wondering, how did we run the function? Where’s the feature flag? That’s one more Clojure win: we could SSH into production, and execute this function in our REPL!

[15] The big bottleneck is all the active websocket connections on one machine — it slows down the sync engine too much. If we improve perf, perhaps we can get to one big machine again!

Video: Building a Sync Engine in Clojure

Joe Averbukh — Thu, 24 Oct 2024 00:00:00 GMT

Following up from A Graph-Based Firebase, Stopa (CTO of Instant), gave a talk about building Instant at Clojure Conj 2024! In this talk he discusses the common schleps developers face when building apps, and how Instant compresses them.

Give this a watch if you’re interested in learning more about how Instant works under the hood!

Instant raises $3.4M seed to build a modern Firebase

Joe Averbukh — Tue, 01 Oct 2024 00:00:00 GMT

One month ago we open sourced Instant and had one of the largest Show HN’s for a YC company. Today we’re announcing our $3.4M seed. We’re backed by YCombinator, SV Angel, and a number of technical angels, including James Tamplin, the original CEO of Firebase, Paul Graham, Co-founder of YCombinator, Greg Brockman, Co-founder of OpenAI, and Jeff Dean, chief scientist of Google DeepMind.

What is Instant?

In two sentences: Instant is a modern Firebase. We make you productive by giving your frontend a real-time database.

What does that actually mean?

Imagine you’re a hacker who loves building apps. You have an exciting idea, and are ready to make something people want. You want to build an MVP fast, that doesn’t completely suck. So how do you do it?

Most of the time we make a three-tier architecture with client, server, and a database. On the server side we write endpoints to glue our frontend with our database. We might use an ORM to make it easier to work with our db, and add a cache to serve requests faster. On the client we need to reify json from the server and paint a screen. We add stores to manage state, and write mutations to handle updates. This is just for basic functionality.

If we want our UIs to feel fast, we write optimistic updates so we don’t need to wait for the server. If we want live updates without refreshing we either poll or add websockets. And if we want to support offline mode, we need to integrate IndexedDB and pending transaction queues.

That’s a lot of work!

To make things worse, whenever we add a new feature, we go through the same song and dance over and over again: add models to our DB, write endpoints on our server, create stores in our frontend, write mutations, optimistic updates, etc.

Could it be better? We think so!

If you had a database on the client, you wouldn’t need to manage stores, selectors, endpoints, caches, etc. You could just write queries to fetch the data you want. If these queries were reactive, you wouldn’t have to write extra logic to re-fetch whenever new data appears. Similarly you could just make transactions to apply mutations. These transactions could apply changes optimistically and be persisted locally. Putting this all together, you can build delightful applications without the normal schleps.

So we built Instant. Instant gives you a database you can use in the client, so you can focus on what’s important: building a great UX for your users, and doing it quickly.

How is Instant different from Firebase or Supabase?

You may be wondering, what makes Instant so modern compared to Firebase, and how is it different from Supabase?

Both Firebase and Supabase provide a database on the client as well. Firebase comes with realtime, optimistic updates, and offline mode, but does not support relations. Supabase is relational at it’s core, but optimistic updates and offline mode need to be hand-rolled for every feature. If you could have Firebase with relations, you’d have an infrastructure capable of building some of the best apps today like Figma, Notion, or Linear.

Our architecture is inspired by Figma’s LiveGraph and Asana’s LunaDB. We also built Instant to be multi-tenant and don’t need to spin up an actual database for users. This enables us to give users a database in <10ms with a click of a button. And unlike our competitors, we can offer a free tier to users where their projects are never paused and there is no limit to the number of active projects they can have.

To learn more about how Instant works under the hood, check out our essay A Graph-Based Firebase

Who is Instant?

We’re Joe and Stopa, engineers, best friends, and co-founders. We first met in San Francisco in 2014 and worked together as senior and staff engineers at Facebook and Airbnb.

When we worked at Facebook, most designers used Sketch. At that time no one thought there could be something better. Figma came out and changed the game. Similarly, in the 2010s, Evernote was one of the best note taking apps. In 2024 most people use Notion instead.

Features like multiplayer, optimistic updates, and offline mode are what differentiate the best apps. As app users grow accustomed to instant experiences, reactivity will become table stakes for modern applications. Today delivering these features is difficult and requires a bespoke solution from a team of engineers at top tech companies. In the future, there will be infrastructure that all developers use to get these features for free.

That’s what we’re building with Instant, a platform to build applications of the future.

Instant is growing

After being heads down for two years, Instant open sourced at the end of August 2024. On the same day we announced on Hacker News, amassed over 1k points, and hit #1 for several hours. It’s been a whirlwind since.

We’re getting a new office in San Francisco and looking for founding engineers to grow Instant. If you want to be part of a small team solving some of the hardest problems in web development check out our hiring page!

A Graph-Based Firebase

Stepan Parunashvili — Thu, 25 Aug 2022 00:00:00 GMT

In A Database in the Browser, I wrote that the schleps we face as UI engineers are actually database problems in disguise ^[1]. This begged the question: would a database-looking solution solve them?

My co-founder Joe and I decided to build one and find out. This became Instant. I’d describe it as a graph-based successor to Firebase.

You have relational queries, auth, and permissions. Optimistic updates come out of the box, and everything is reactive. It's an architecture you can use today.

Working on Instant has felt like an evolutionary process. We picked constraints and followed the path that unfolded. This led us to places we would never have predicted. For example, we started with SQL but ended up with a triple store and a query language that transpiles to Datalog.

What were these constraints? Why triple store? What query language? In this essay, I’ll walk you through the design journey — from problems to solve, to choices made, to what’s next.

I hope by the end, you’re as excited as I am about what this could mean for building apps and the people who use them.

Delightful Apps

Our journey starts by looking at what exists today. Think about the most delightful apps you’ve tried. What comes to mind? To me, it’s apps like Figma, Linear, and Notion. And if you asked why, I’d say three reasons: Optimistic Updates, Multiplayer, and Offline-Mode.

Optimistic Updates

Once you’re in the flow of Figma or Notion, you rarely see a loading screen. This is because every change you make is applied instantly. It’s painful to do this well. You need a method for applying changes on the client and server. You need a queue to maintain order. You need undo. And the edge cases get daunting: if you have multiple changes waiting and the first one fails, what should happen? You need some way to cancel the dependents ^[2].

Challenging to build but transformative once done. Interaction time changes how you use an application. Get fast enough, and your fingertips become your primary constraint. I think this is the key to unlocking flow. ^[3]

Multiplayer

Speed itself is delightful, but it’s taken further with multiplayer. Every feature in Linear is collaborative by default. Assigned a task? All active sessions see your change. ^[4]

There’s a pattern to multiplayer too. Developers think it’s a nice-to-have. But then some company builds it, and we’re stunned by the result. Figma did this for Sketch, and Notion did this for Evernote.

But most apps aren’t multiplayer. This isn’t because we’ve hit a sweet spot of text editors, task managers, and design tools. Multiplayer is just too hard to build. ^[5]

Offline-Mode

Finally, delightful apps work offline. Some not completely offline, but they all handle spotty connections.

And offline-mode has the same pattern as multiplayer. It feels like a nice-to-have, but build it and you leap past your competitors. Why? Two reasons:

First, though internet connectivity is abundant, there’s a tail end. The subway, the airplane, the spotty cafe. Seems minor, but eliminating the tail-end can be transformative. When we know that an app will work no matter what, we use it differently. ^[6]

Second, your app becomes even faster. Offline-mode amortizes read latency. For example, the first time you load Linear, it may take time to fetch everything. But then, subsequent loads feel instant; you’ll just see offline data first. ^[7]

Applications from The Future

Combine these features, and you get an application available everywhere, as fast as your fingertips, and multiplayer by default.

Compared to the average web app, this is a difference in kind. Linear is so fast that you fall into flow states closing tasks. No one would say this about Jira. Notion’s offline-mode lets you store every note there. People don’t do this in Dropbox Paper. In Figma, two designers can collaborate on the same file. This was unheard of in the days of Sketch.

These applications let you work in new ways. They become tools that you can master. And I think this is how most apps will be in the future. We prefer the experience, and the Notions of the world teach us to expect it.

As an industry, we’ll need to find new abstractions that make building apps like this easy. I think it’s worth the effort to find them now.

Bespoke Solutions

So let’s try to discover this abstraction. What works today? Linear and Notion exist; how do they do it?

Thankfully there’s lots ^[8] of ^[9] interesting ^[10] work ^[11] that explains their architecture. Here’s a simplified view:

Let’s go bottom-up:

A. DB

On the backend we start with a database. Users want a live view of some subset of data. We can keep live views by either polling the database or leveraging a write-ahead log. ^[12]

B. Permissions

The DB gives us a set of results, but we can’t just send this data up to users. We need to filter for what they are allowed to see.

So we build a permission layer. This starts simple. But as an app gets complex, permissions resemble their own language. Facebook had the best design I’ve seen. Here’s how it looked:

function IDenyIfArchived(_user, task) {
  if (task.isArchived) {
    return deny();
  }
  return allow();
}
// ...
{
  "task": {
    read: [
      IAllowIfTeamUser,
    ],
    write: [
      IDenyIfArchived,
      IAllowIfTeamUser,
    ],
  }
}

Developers write a set of IAllow or IDeny rules per model. Since all reads and writes go through this layer, engineers can be sure that their queries are safe. ^[13]

C. Sockets

Now we reach the websocket layer. Clients subscribe to different topics. For Notion, it could be “documents and comments.” Or for Linear it could be “team, task, and users.”

Backend developers hand-craft live queries to satisfy these topics. There’s a balancing act to play here. The more complicated the query, the harder it is to keep a live view. ^[14] So we need to simplify queries as much as possible. Most often, this means we skip pagination and overfetch. ^[15]

D. In-Memory Store

Now we move to the frontend. Sockets funnel all this data into an in-memory store:

const Store = {
  teams: {
    teamIdA: {...}
  },
  users: {
    userIdA: {...}
  },
  tasks: {
    taskIdA: {..., teamId: "teamIdA", ownerId: ["userIdA"]
  }
}

We do this so all screens have consistent information. For example, if a user changes their profile picture, we should see updates everywhere. The best way to do that is to keep data normalized and in one place.

E. IndexedDB

But we need our app to work offline too. So we back our store with durable storage. For web this is IndexedDB. When our app loads, we hydrate the store with what was saved before. This is what enables offline-mode and amortizes read latency.

F. Screens

Okay, time to paint screens. Right now we have a store with normalized data. But normalized data isn’t directly useful for rendering. What a screen wants is a graph. Say we show a “team tasks” page in Linear; we’d want team info, all the tasks for the team, and the owner for the task:

We can build this with a javascript function:

function dataForTaskPage(store, teamId) {
  return {
    ...store.teams[teamId],
    tasks: store.tasksForTeam(teamId).map((task) => {
      return { ...task, owner: store.users[task.ownerId] };
    }),
  };
}

If this causes too many re-renders, we can memoize it or use some kind of dirty-checking. With that, we have a page a user can interact with.

G. Mutations

Then users make changes. We want those changes to feel instant, so we support optimistic updates. This is how it usually looks:

Whatever mutation we make, our local store and server need to understand them. This way we can apply changes immediately.

To do this well, we need to support undo. We need to maintain order, and we need to be able to cancel dependent mutations. Hard stuff, but Linear, Figma, and Notion all go through the schlep.

Once this is done, we’ve got an application from the future on our hands.

What Exists

Oof. Lots of custom work. Could these apps have used an existing tool instead?

Firebase

Firebase comes closest. It has optimistic updates out of the box. It supports offline mode and is reactive by default. But, I think Firebase has two dealbreakers: relations and permissions.

Relations

The biggest dealbreaker is Firebase’s query strength. You’re limited to document lookups. When Firebase was built, this was a great tradeoff to make. It’s simpler to support optimistic updates and offline mode for document stores. But for sophisticated apps, you need relations.

Figma, Notion, and Linear all have relations. Notion has a recursive model where blocks reference other blocks. Linear has users, tasks, and teams. Figma has documents, objects and properties.

If you need relations, document stores explode in complexity. You end up having to implement your own joins with hand-tuned caches. Another schlep.

Permissions

The second dealbreaker is Firebase’s permission system. ^[16] Firebase Realtime has a language that looks like a long boolean expression:

auth != null && (!data.exists() || data.child('users').hasChild(auth.id));

This gets unmaintainable fast ^[17]. It improved in Firestore — there’s now a function-like abstraction:

function isAuthorOrAdmin(userId, article) {
  let isAuthor = article.author == userId;
  let isAdmin = exists(/databases/$(database)/documents/admins/$(userId));
  return isAuthor || isAdmin;
}

But again, this wasn’t built for complex use cases. There’s no way to write an early return statement for example. If we’re aiming for Linear, Figma, or Notion, we need a system that can scale to complex rules.

Supabase, Hasura

So Firebase won’t work. What about Supabase or Hasura?

They solve Firebase’s greatest dealbreaker: relations. Both Supabase and Hasura support relations.

But they do this at the expense of a local abstraction. Neither support offline-mode or optimistic updates. Multiplayer is still crude. You write basic subscriptions and manage the client yourself.

Supabase and Hasura also don’t have a powerful permission system. They use Postgres’s Row-Level Security. Permissions are written as policies. But this won’t work for sophisticated apps. You’ll need to write so many policies, that it’ll be impossible to reason about. It’ll get slow too — the planner will struggle with them.

The Missing Column

So Firebase has a great local abstraction, but no support for relations. Supabase and Hasura support relations, but have a poor local abstraction. Put this in a table and you have an interesting column to think about:

What if a tool could support relations and a local abstraction? You could write any query that a Figma, Linear, or Notion would need. And you could handle all of the hard work they do locally: optimistic updates, multiplayer, and offline-mode.

Add support for complex permissions, and you have a tool to build applications from the future!

Inspiration

A daunting column to satisfy. But again, if we look at how Figma, Linear, and Notion work, we find clues. Squint, and their architecture looks like a database!

Again, screens need consistent data. Previously, we wrote functions and got data from the store. Remember dataForTasksPage?

function dataForTaskPage(store, teamId) {
  return {
    ...store.teams[teamId],
    tasks: store.tasksForTeam(teamId).map((task) => {
      return { ...task, owner: store.users[task.ownerId] };
    }),
  };
}

Well, this is just a query! If we had a local database — let’s call it Local DB — that understood some GraphQL-looking language, we could instead declare:

teams {
  ...
  tasks: {
    ...
    owner: {
      ...
    }
  }
}

And voila, we’d have data for our screens.

Next, we backed our data into IndexedDB. Well, databases are good at caching. Our Local DB could back itself up in IndexedDB!

And the mutation system? If our Local DB and Backend DB spoke the same language, both could understand and apply the same mutations. Local DB can handle undo/redo, and with that we have optimistic updates out of the box.

What about sockets? Databases handle replication. So what if we made the client a special node? The Local DB already knows the queries to satisfy. So it can talk to the backend and get the data it needs.

On the backend, what if we had the same kind of permission system that Facebook had? We’d have a fully expressive language that could scale to complex rules.

Make the Backend DB handle live queries, and we have all the pieces for our missing column!

Local DB

Let’s dive into our Local DB first. This is what’s going to handle queries, caching, and talking to our server. If we do this right, we inform everything else.

Requirements

The minimum our Local DB needs is support for relations. Whatever we do, we should be able to express “Give me team info, related tasks, and the owner for each task”.

We should also support recursive queries. For Notion, we need to say “Give me a block and expand all children recursively”.

Our Local DB should also be easy to use. Firebase is famous for this. You can start working with a single index.html file. API calls are consistent and simple. You don’t need to specify a schema to get started. We should be just as easy to use. ^[18]

And our Local DB should be light. At least on the client. Yes we can cache the download. But I don’t think developers will take you up on an offer that doubles their bundle.

Finally, our Local DB should be simple. Every feature in our Local DB needs to be supported by our multiplayer backend. This won’t ship if our spec is too large.

Exploring SQL

A SQL-based tool is closest at hand. I enjoyed looking at absurd-sql. This uses sql.js (SQLLite compiled to webassembly) and persists state into IndexedDB.

SQL is battle tested and supports a wide array of features. But if you take the constraints we set out, you’ll see it’s a bad bet.

Schema and Size

My investigation began with two light issues.

First, SQL has a schema. Schema is useful, but it make things less easy than Firebase. You can hack immediately in Firebase, but there’s upfront work with a schema. ^[19]

Second, there’s size. sql.js is about 400KBs gzipped. Yes this can be cached, but I just don’t see most apps adopting a library that adds this overhead.

Both reservations have reasonable counters. We could infer a schema on our user’s behalf, or write a lighter implementation of SQL. With problems like this we could have moved forward.

Language

But SQL as a language turns out to be a dealbreaker. SQL isn’t simple or easy. It’s a tough combination of lots of features, with little of it being useful for the frontend.

Consider the most common query for UIs: Fetch nested relations. Remember our dataForTaskPage?

function dataForTaskPage(store, teamId) {
  return {
    ...store.teams[teamId],
    tasks: store.tasksForTeam(teamId).map((task) => {
      return { ...task, owner: store.users[task.ownerId] };
    }),
  };
}

This is one SQL query for it:

SELECT
  teams.*, tasks.*, owner.*
FROM teams
JOIN tasks ON tasks.team_id = teams.id
JOIN users as owner ON tasks.owner_id = owner.id
WHERE teams.id = ?

And it works. But it’s inconvenient. Our query will return an exploded list of rows. Each row represents an owner, with tasks and teams duplicated. But what we actually wanted was a nested structure. Something like:

{
  teams: [{id: 2, name: "Awesome Team", tasks: [{..., owner: {}}, ...]}, ...]
}

To make this work, we could use a GROUP BY with json_group_array and json_object. Like this:

SELECT
  teams.*,
  json_group_array(
    json_object(
      'id', tasks.id,
      'title', tasks.title,
      'owner', json_object('id', owner.id, 'name', owner.name))
  ) as tasks
FROM teams
JOIN tasks ON tasks.team_id = teams.id
JOIN users as owner ON owner.id = tasks.owner_id
GROUP BY teams.id
WHERE teams.id = ?

Try it here.

But you can already see we’re going off the beaten path. What if we had subscribers for each task? We’d need at least two more joins. One more GROUP BY. Likely we’d want a subquery. And if we wanted to support the Notion case? We’d want a WITH RECURSIVE clause.

Now we’re in a tough spot. The frontend’s common case is SQL’s advanced case. We shouldn’t need advanced features for common cases.

Plus, what about all the SQL features we’d rarely use in the frontend? The spec for the core language is over 1700 pages long ^[20]. We’d have to implement reactivity for all 1700 pages. I don’t think the schlep is worth it.

Another Approach

SQL is out. Let’s start with a different question then: How do we make frontend queries easy?

The most common query is our “fetch nested relations”. For Linear it’s “team, with related tasks and their owners”. Or for Notion, we want “blocks, with child blocks expanded”. Or for Figma, “documents with their comments, layers, and properties”.

See a pattern here? They’re all graphs:

And this pointed us to a question: would a graph database make frontend queries easy?

Triple Stores

So we wrote a graph database to find out. We chose Triple Stores, one of the simplest kinds of graph databases. If you haven’t tried one, here’s a quick intuition:

Imagine we’re trying to express a graph with data structures. What do we need?

Well, we need to be able to express a node with attributes. To say:

User with id 1 has name "Joe"
Team with id 2 has name "Awesome Team"
Task with id 3 has title "Code"

These sentences translate to lists:

[1, 'name', 'Joe'][(2, 'name', 'Awesome Team')][(3, 'title', 'Code')];

Then we want a way to describe references. To say:

Task with id 3 has an "owner" reference to User with id 1
Team with id 2 has a "task" reference to Task with id 3

Well...these translate to lists just as well:

[3, 'owner', 1][(2, 'tasks', 3)];

Put these lists in a table, and you have a triple store! Triple is the name of the list we’ve been writing:

[1, 'name', 'Joe'];

The first item is always an id, the second the attribute, and the third, the value. Turns out triples are all we need to express a graph.

Here’s a more fleshed out example:

And once you’ve expressed a graph, you can traverse it. Triple stores have interesting query languages. Here’s Datalog:

(pull db '[* {:team/task [* {:task/owner [*]}]}] team-id)

With this we’ve replaced dataForTasksPage!

Exploring Triple Stores

Triple stores felt like our rubicon moment. An entire architecture unravelled from our choice.

Schema and Size

My investigation kicked off with two happy surprises.

First, I always assumed that if we wanted relations, we would need a schema. But it turns out triple stores don’t need one. ^[21] I think a schema is helpful. But to compete with Firebase, it’s a win that we can make this optional.

Then there’s size. Triple stores are notoriously light. Datascript is one of the most battle-tested triple stores. It’s transpiled from Clojurescript and carries the extra weight of Clojure. But even then, the bundle size is about 90KB.

Simple

But the killer feature is how simple triple stores are. You can write a roughly complete implementation in less than a hundred lines of Javascript ^[22].

The query planner uses 3 main indexes ^[23]. Datalog — the query language I mentioned — is so simple that there isn’t a spec ^[24]. The mutation system boils down two primitives ^[25].

Even with the 100 LOC version, you can express a query like “Give me all the owners for the tasks where this person is a subscriber” ^[26]

80/20 for Multiplayer

Turns out triple stores are a great answer for multiplayer too. Once we make our Local DB collaborative, we’ll need to support conflicts. What should happen when two people change something at the same time?

Notion, Figma, and Linear all use last-write-wins. This means that whichever change reaches the server last wins.

This can work well, but we need to be creative about it. Imagine if two of us changed the same Figma Layer. One of us changed the font size, and the other changed the background color. If we’re creative about how we save things, there shouldn’t be a conflict in the first place.

How does Figma do this? They store their properties in a special way. They store them as...triples! ^[27]

[1, 'fontSize', 20][(1, 'backgroundColor', 'blue')];

These triples say that the Layer with id 1 has a fontSize 20 and backgroundColor blue. Since they are different rows, there’s no conflict.

And voila, we have the same kind of conflict-resolution as Figma. ^[28]

But Speed and Scale?

At this point, you may wonder: this is great and all, but what about speed and scale?

Well, the core technology is old ^[29]. Datalog and triple stores have been around for decades. This also means that people have built reactive implementations ^[30].

But what makes me most optimistic about the answer here, is that Facebook runs on a graph database. Tao is facebook’s in-house data store. If you look at Tao, it’s not so different from a triple store! ^[31]

Easy?

This is getting exciting. But what about ease of use? This is how the “Give me all the owners for the tasks where this person is a subscriber” query looks in Datalog:

{:find ?owner,
 :where [[?task :task/owner ?owner]
         [?task :task/subscriber sub-id]}

Datalog as a language is elegant and simple. But it’s not easy the same way Firebase is. You need to learn a logic-based language. Then you get back triples. But in the UI you want typed objects.

This would be a deal-breaker. But here’s where Datalog’s strength comes in. It’s so small that we can just keep it as our base layer, and write a friendlier language on top.

InstaQL

That’s how InstaQL was born. If you look at what’s intuitive for the UI, I think GraphQL syntax comes closest:

teams {
  ...
  tasks: {
    ...
    owner: {
      ...
    }
  }
}

You just declare what you want; the shape of the query looks like the result.

InstaQL was heavily inspired by GraphQL. It’s a similar-looking language and produces Datalog. Here’s how queries look:

{
  teams: {
    $: {where: {id: 1}},
    tasks: {owner: {}},
  },
}

You can see the first departure from GraphQL: InstaQL is written with plain javascript objects. This lets us avoid a build step; after all Firebase doesn’t need one. And there’s another win: if the language itself is written with objects and arrays, engineers can write functions that manipulate them.

The second departure is in the mutation system. In GraphQL you define mutations as functions in the backend. This is a problem because then you can’t do optimistic updates out of the box. Without talking to the server, there’s no way to know what a mutation does.

In InstaQL, mutations look like this:

transact([
  tx.tasks[taskId]
    .update({title: "New Task"})
    .link({owner: ownerId}}
])

These mutations produce triple store assertions and retractions. So our Local DB can apply them, and we have optimistic updates out of the box again. ^[32]

Instant Today

So we wrote a triple store, and Instant was born. Today you have a reactive database with offline mode, optimistic updates, multiplayer, auth, and permissions at your fingertips.

Locally, there's a triple store that understands InstaQL. You can write queries like:

{
  teams: {
    $: {where: {id: 1}},
    tasks: {owner: {}}
  },
}

And get back objects:

{
  teams: [
    {
      id: 1,
      name: 'Awesome Team',
      tasks: [{ id: 3, title: 'Code', owner: [{ id: 1, name: 'Joe' }] }],
    },
  ];
}

Every query works offline, and all changes are applied instantly. The server has a reactive layer that broadcasts novelty. You can write permissions, and you have an SDK you can use for web, React Native, and Node.

It's been thrilling to see users try Instant. When they write their first relational query I see delight in their eyes, and boy is that thrilling.

If you’re excited about this stuff, sign up and give us a try. We will reach out to your personally for feedback.

Dicussion on HN

Thanks Joe Averbukh, Alex Reichert, Mark Shlick, Slava Akhmechet, Nicole Garcia Fischer, Daniel Woelfel, Jake Teton-Landis, Rudi Chen, Dan Vingo, Dennis Heihoff for reviewing drafts of this essay.

[1] Think optimistic updates, reactivity, and offline mode. I’ll cover them in this essay, so no need to jump into the previous one.

[2] Or you could put them in a failure queue and try again later. Lots to think about.

[3] I am still on the lookout for a paper about this, but in the meantime, consider this thought experiment. Imagine a guitar. How would the experience be, if when you pulled on string, there was a lag before you heard the sound?

^[4]: Even “Changed your profile info” is reactive!

[5] Streaming changes alone is a painful task. But consider the nuances. For example, you would think you could apply all changes everywhere immediately. But this doesn’t always work. Imagine Facebook comments. If new comments showed up as you viewed a post, your screen would constantly shift. This is why you see a button instead.

[6] Consider Dropbox Paper and Notion. Could you realistically keep a journal in Paper? What would you do on an airplane? Or what if you want to jot something down in some foreign place? Well, this is why Notion eats Paper’s cake.

^[7]: The CTO of Linear goes over this win and more in his tweet thread.

^[8]: This talk on Linear’s architecture was great.

^[9]: The data model behind Notion’s flexibility is awesome.

^[10]: Figma’s multiplayer essay is a classic.

^[11]: Figma’s LiveGraph is very cool.

^[12]: I say this like it’s no big deal. But live views are challenging. Here’s a quora answer that explains some nuances.

[13] The alternative approach is to make permission checks ad-hoc; some at the API layer, some inside functions, etc. But then, you can never be sure if you’re really allowed to see the data you’re manipulating.

[14] If you poll, complicated queries can took long. If you leverage a write-ahead log, it’s difficult to know what change affects them.

[15] Eventually backend developers evolve their work into sophisticated systems. This is how Figma’s LiveGraph was born.

[16] The examples that follow are from Firebase’s documentation.

^[17]: At Airbnb I helped build Awedience. This worked on top of Firebase. I had to do hack after hack to make permissions work. I almost wrote a higher level language for it.

^[18]: Kevin Lacker has a great talk about writing these kind of APIs.

[19] I think over the long-term a database benefits from a schema. But it hurts ease-of-use. This doesn’t mean we chuck schema entirely. It just means we should be upfront about this cost. As you’ll see, we can be creative about it too.

^[20]: See this. I found the link in “Against SQL”. I loved the thoughtfulness in the essay.

^[21]: Here’s asami, a schemaless implementation. Now, I think at the very least you should distinguish between attributes and references. But you don’t need to.

^[22]: We wrote a tutorial to do it!

^[23]: It gets more complicated, but honestly not much more complicated. If you’re curious, this doc links into great research.

^[24]: The syntax for logic-based datalog can be expressed in 8 lines. Edn-style datalog doesn’t have a spec, but it’s simpler than SparQL. SparQL is a competitive graph-based query language, and the spec there is less than a hundred pages long.

[25] Every mutation is either an assertion or a retraction of a triple.

^[26]: Here’s a query of similar complexity, tested over a datalog engine that’s less than a hundred lines.

^[27]: Cmd +F for (ObjectID, Property, Value) in this essay.

^[28]: At this point, you may be thinking…last-write-wins? C’mone — what about more serious CRDTs or OTs? I think last-write-wins is a great 80/20, and gets us the same level as Notion, Figma, and Linear. Buut, there’s a lot of exciting research (1, 2). It’s reassuring though that a lot of this research centers around triples and Datalog. We can do what Figma does today, and when the research is more mature, integrate it down the road.

^[29]: Datalog launched in 1986

^[30]: Differential Datalog is interesting

^[31]: Here’s the paper. They store objects instead of triples, and store associations differently. They support 1 Billion (!!) reads / sec, and 1 Million writes / sec

^[32]: If you’re curious, here’s an expanded spec.

[33] You may be wondering — what about the details on the backend? We’re already 4000 words. Let us know if you’re interested and we’ll write a follow-on essay!

Datalog in Javascript

Stepan Parunashvili — Mon, 25 Apr 2022 00:00:00 GMT

Query engines make me feel like a wizard. I cast my incantation: “Give me all the directors and the movies where Arnold Schwarzenegger was a cast member”. Then charges zip through wires, algorithms churn on CPUs, and voila, an answer bubbles up.

How do they work? In this essay, we will build a query engine from scratch and find out. In 100 lines of Javascript, we’ll supports joins, indexes, and find our answer for Arnold! Let’s get into it.

Choice

Our first step is to choose which language we’ll support. SQL is the most popular, but we wouldn’t get far in 100 lines. I suggest we amble off the beaten path and make Datalog instead.

If you haven’t heard of Datalog, you’re in for a treat. It’s a logic-based query language that’s as powerful as SQL. We won’t cover it completely, but we’ll cover enough to fit a good weekend’s worth of hacking.

To grok Datalog, we need to understand three ideas:

Data

The first idea is about how we store data.

SQL Tables

SQL databases store data in different tables:

Here we have a movie table, which stores one movie per row. The record with the id 200 is "The Terminator".

Notice the director_id. This points to a row in yet another person table, which keeps the director’s name, and so on.

Datalog Triples

In Datalog databases, there are no tables. Or really everything is just stored in one table, the triple table:

A triple is a row with an id, attribute, and value. Triples have a curious property; with just these three columns, they can describe any kind of information!

How? Imagine describing a movie to someone:

It's called "The Terminator" It was released in 1987

Those sentences conveniently translate to triples:

[200, movie / title, 'The Terminator'][(200, movie / year, 1987)];

And those sentences have a general structure; if you can describe a movie this way, you can describe tomatoes or airplanes just as well.

Queries

The second idea is about how we search for information.

SQL Algebra

SQL has roots in relational algebra. You give the query engine a combination of clauses and statements, and it gets you back your data:

SELECT id FROM movie WHERE year = 1987

This returns:

[{ id: 202 }, { id: 203 }, { id: 204 }];

Voila, the movie ids for Predator, Lethal Weapon, and RoboCop.

Datalog Pattern Matching

Datalog databases rely on pattern matching. We create “patterns” that match against triples. For example, to find all the movies released in 1987, we could use this pattern:

[?id, movie/year, 1987]

Here, ?id is a variable: we’re telling the query engine that it can be any value. But, the attribute must be movie/year, and the value must be 1987.

Our query engine runs through triple after triple. Since ?id can be anything, this matches every triple. But, the attribute movie/year and the value 1987 filter us down to just the triples we care about:

[
  [202, movie / year, 1987],
  [203, movie / year, 1987],
  [204, movie / year, 1987],
];

Notice the ?id portion; those are the ids for Predator, Lethal Weapon, and RoboCop!

Datalog `find`

In SQL, we just got back ids though, while our query engine returned more. How can we support returning ids only? Let’s adjust our syntax; here’s find:

{ find: [?id],
  where: [
    [?id, movie/year, 1987]
  ] }

Our query engine can now use the find section to return what we care about. If we implement this right, we should get back:

[[202], [203], [204]];

And now we’re as dandy as SQL.

Joins

The third idea is about how joins work. Datalog and SQL’s magic comes from them.

SQL clauses

In SQL, if we wanted to find “The Terminator’s” director, we could write:

SELECT
  person.name
FROM movie
JOIN person ON movie.director_id = person.id
WHERE movie.title = "The Terminator"

Which gets us:

[{ name: 'James Cameron' }];

Pretty cool. We used the JOIN clause to connect the movie table with the person table, and bam, we got our director’s name.

Datalog…Pattern Matching

In Datalog, we still rely on pattern matching. The trick is to match multiple patterns:

{
  find: [?directorName],
  where: [
    [?movieId, movie/title, "The Terminator"],
    [?movieId, movie/director, ?directorId],
    [?directorId, person/name, ?directorName],
  ],
}

Here we tell the query engine to match three patterns. The first pattern produces a list of successful triples. For each successful triple, we search again with the second pattern, and so on. Notice how the ?movieId and ?directorId are repeated; this tells our query engine that for a successful match, those values would need to be the same across our different searches.

What do I mean? Let’s make this concrete; here’s how our query engine could find The Terminator’s director:

The first pattern finds:

[200, movie/title, "The Terminator"].

We bind ?movieId to 200. Now we start searching for the second pattern:

[?movieId, movie/director, ?directorName].

Since ?movieId needs to be 200, this finds us

[200, movie / director, 100];

And we can now bind ?directorId to 100. Time for the third pattern:

[?directorId, person/name, ?directorName]

Because ?directorId has to be 100, our engine finds us:

[100, person / name, 'James Cameron'];

And perfecto, the ?directorName is now bound to "James Cameron"! The find section would then return ["James Cameron"].

Oky doke, now we grok the basics of Datalog! Let’s get to the code.

Syntax

First things first, we need a way to represent this syntax. If you look at:

{ find: [?id],
  where: [
    [?id, movie/year, 1987]
  ] }

We could almost write this in Javascript. We use objects and arrays, but ?id and movie/year get in the way; they would throw an error. We can fix this with a hack: let’s turn them into strings.

{ find: ["?id"],
  where: [
    ["?id", "movie/year", 1987]
  ] }

It’s less pretty, but we can now express our queries without fanfare. If a string begins with a question mark, it’s a variable. An attribute is just a string; it’s a good idea to include a namespace like "movie/*", but we won’t force our users.

Sample Data

The next thing we’ll need is sample data to play with. There’s a great datalog tutorial ^[1], which has the movie dataset we’ve been describing. I’ve taken it and adapted it to Javascript. Here’s the file.

// exampleTriples.js
export default [
  [100, 'person/name', 'James Cameron'],
  [100, 'person/born', '1954-08-16T00:00:00Z'],
  // ...
];

Let’s plop this in and require it:

import exampleTriples from './exampleTriples';

Now for our query engine!

matchPattern

Goal

Our first goal is to match one pattern with one triple. Here’s an example:

We have some variable bindings: {"?movieId": 200}. Let’s call this a context.

Our goal is to take a pattern, a triple, and a context. We’ll either return a new context:

{"?movieId": 200, "?directorId": 100}

Or a failure. We can just say null means failure.

This could be the test we play with:

expect(
  matchPattern(
    ['?movieId', 'movie/director', '?directorId'],
    [200, 'movie/director', 100],
    { '?movieId': 200 },
  ),
).toEqual({ '?movieId': 200, '?directorId': 100 });
expect(
  matchPattern(
    ['?movieId', 'movie/director', '?directorId'],
    [200, 'movie/director', 100],
    { '?movieId': 202 },
  ),
).toEqual(null);

Code

Nice, we have a plan. Let’s write the larger function first:

function matchPattern(pattern, triple, context) {
  return pattern.reduce((context, patternPart, idx) => {
    const triplePart = triple[idx];
    return matchPart(patternPart, triplePart, context);
  }, context);
}

We take our pattern, and compare each part to the corresponding one in our triple:

So, we’d compare "?movieId" with 200, and so on.

matchPart

We can delegate this comparison to matchPart:

function matchPart(patternPart, triplePart, context) {
  if (!context) return null;
  if (isVariable(patternPart)) {
    return matchVariable(patternPart, triplePart, context);
  }
  return patternPart === triplePart ? context : null;
}

First we address context; if context was null we must have failed before, so we just return early.

isVariable

Next, we check if we’re looking at a variable. isVariable is simple enough:

function isVariable(x) {
  return typeof x === 'string' && x.startsWith('?');
}

matchVariable

Now, if we are looking at a variable, we’d want to handle it especially:

function matchVariable(variable, triplePart, context) {
  if (context.hasOwnProperty(variable)) {
    const bound = context[variable];
    return matchPart(bound, triplePart, context);
  }
  return { ...context, [variable]: triplePart };
}

We would check if we already have a binding for this variable. For example, when comparing ?movieId, we’d already have the binding: “200”. In this case, we just compare the bound value with what’s in our triple.

// ...
if (context.hasOwnProperty(variable)) {
  const bound = context[variable];
  return matchPart(bound, triplePart, context);
}
// ...

When we compare ?directorId though, we’d see that this variable wasn’t bound. In this case, we’d want to expand our context. We’d attach ?directorId to the corresponding part in our triple (100).

return { ...context, [variable]: triplePart };

Finally, if we weren’t looking at a variable, we would have skipped this and just checked for equality. If the pattern part and the triple part match, we keep the context; otherwise we return null:

// ...
return patternPart === triplePart ? context : null;
// ...

And with that, matchPattern works as we like!

querySingle

Goal

Now for our second goal. We can already match one pattern with one triple. Let’s now match one pattern with multiple triples. Here’s the idea:

We’ll have one pattern and a database of triples. We’ll want to return the contexts for all the successful matches. Here’s the test we can play with:

expect(
  querySingle(['?movieId', 'movie/year', 1987], exampleTriples, {}),
).toEqual([{ '?movieId': 202 }, { '?movieId': 203 }, { '?movieId': 204 }]);

Code

Well, much of the work comes down to matchPattern. Here’s all querySingle needs to do:

function querySingle(pattern, db, context) {
  return db
    .map((triple) => matchPattern(pattern, triple, context))
    .filter((x) => x);
}

We go over each triple and run matchPattern. This would return either a context (it’s a match!), or null (it’s a failure). We filter to remove the failures, and querySingle works like a charm!

queryWhere

Goal

Closer and closer. Now to support joins. We need to handle multiple patterns:

So we go pattern by pattern, and find successful triples. For each successful triple, we apply the next pattern. At the end, we’ll have produced progressively larger contexts.

Here’s the test we can play with:

expect(
  queryWhere(
    [
      ['?movieId', 'movie/title', 'The Terminator'],
      ['?movieId', 'movie/director', '?directorId'],
      ['?directorId', 'person/name', '?directorName'],
    ],
    exampleTriples,
    {},
  ),
).toEqual([
  { '?movieId': 200, '?directorId': 100, '?directorName': 'James Cameron' },
]);

Code

This too, is not so difficult. Here’s queryWhere:

function queryWhere(patterns, db) {
  return patterns.reduce(
    (contexts, pattern) => {
      return contexts.flatMap((context) => querySingle(pattern, db, context));
    },
    [{}],
  );
}

We start off with one empty context. We then go pattern by pattern; for each pattern, we find all the successful contexts. We then take those contexts, and use them for the next pattern. By the end, we’ll have all the expanded contexts, and queryWhere works like a charm too!

Query

Goal

And now we’ve just about built ourselves the whole query engine! Next let’s handle where and find.

This could be the test we can play with:

expect(
  query(
    {
      find: ['?directorName'],
      where: [
        ['?movieId', 'movie/title', 'The Terminator'],
        ['?movieId', 'movie/director', '?directorId'],
        ['?directorId', 'person/name', '?directorName'],
      ],
    },
    exampleTriples,
  ),
).toEqual([['James Cameron']]);

Code

Here’s query:

function query({ find, where }, db) {
  const contexts = queryWhere(where, db);
  return contexts.map((context) => actualize(context, find));
}

Our queryWhere returns all the successful contexts. We can then map those, and actualize our find:

function actualize(context, find) {
  return find.map((findPart) => {
    return isVariable(findPart) ? context[findPart] : findPart;
  });
}

All actualize does is handle variables; if we see a variable in find, we just replace it with its bound value. ^[2]

Play

And voila! We have a query engine. Let’s see what we can do.

When was Alien released?

query(
  {
    find: ['?year'],
    where: [
      ['?id', 'movie/title', 'Alien'],
      ['?id', 'movie/year', '?year'],
    ],
  },
  exampleTriples,
);

[[1979]];

What do I know about the entity with the id 200 ?

query(
  {
    find: ['?attr', '?value'],
    where: [[200, '?attr', '?value']],
  },
  exampleTriples,
);

[
  ['movie/title', 'The Terminator'],
  ['movie/year', 1984],
  ['movie/director', 100],
  ['movie/cast', 101],
  ['movie/cast', 102],
  ['movie/cast', 103],
  ['movie/sequel', 207],
];

And, last by not least…

Which directors shot Arnold for which movies?

query(
  {
    find: ['?directorName', '?movieTitle'],
    where: [
      ['?arnoldId', 'person/name', 'Arnold Schwarzenegger'],
      ['?movieId', 'movie/cast', '?arnoldId'],
      ['?movieId', 'movie/title', '?movieTitle'],
      ['?movieId', 'movie/director', '?directorId'],
      ['?directorId', 'person/name', '?directorName'],
    ],
  },
  exampleTriples,
);

🤯

[
  ['James Cameron', 'The Terminator'],
  ['John McTiernan', 'Predator'],
  ['Mark L. Lester', 'Commando'],
  ['James Cameron', 'Terminator 2: Judgment Day'],
  ['Jonathan Mostow', 'Terminator 3: Rise of the Machines'],
];

Now this is cool!

Indexes

Problem

Okay, but you may have already been thinking, “Our query engine will get slow”.

Let’s remember querySingle:

function querySingle(pattern, db, context) {
  return db
    .map((triple) => matchPattern(pattern, triple, context))
    .filter((x) => x);
}

This is fine and dandy, but consider this query:

querySingle([200, "movie/title", ?movieTitle], db, {})

We want to find the movie title for the entity with the id 200. SQL would have used an index to quickly nab this for us.

But what about our query engine? It’ll have to search every single triple in our database!

Goal

Let’s solve that. We shouldn’t need to search every triple for a query like this; it’s time for indexes.

Here’s what we can do; Let’s create entity, attribute, and value indexes. Something like:

{
  entityIndex: {
    200: [
      [200, "movie/title", "The Terminator"], [200, "movie/year", 1984],
      //...
    ],
    // ...
  },
  attrIndex: {
    "movie/title": [
      [200, "movie/title", "The Terminator"],
      [202, "movie/title", "Predator"],
      // ...
    ],
    // ...
  },
}

Now, if we had a pattern like this:

[200, "movie/title", ?movieTitle]

We could be smart about how to get all the relevant triples: since 200 isn’t a variable, we could just use the entityIndex. We’d grab entityIndex[200] , and voila we’d have reduced our search to just 7 triples!

We can do more, but with this we’d already have a big win.

createDB

Okay, let’s turn this into reality. We can start with a proper db object. We were just using exampleTriples before; now we’ll want to keep track of indexes too. Here’s what we can do:

function createDB(triples) {
  return {
    triples,
    entityIndex: indexBy(triples, 0),
    attrIndex: indexBy(triples, 1),
    valueIndex: indexBy(triples, 2),
  };
}

We’ll take our triples, and start to index them.

indexBy

And indexBy will handle that. It can just take the triples and create a mapping:

function indexBy(triples, idx) {
  return triples.reduce((index, triple) => {
    const k = triple[idx];
    index[k] = index[k] || [];
    index[k].push(triple);
    return index;
  }, {});
}

Here idx represents the position in the triple; 0 would be entity, 1 would be attribute, 2 would be value.

querySingle, updated

Now that we have indexes, we can use them in querySingle:

export function querySingle(pattern, db, context) {
  return relevantTriples(pattern, db)
    .map((triple) => matchPattern(pattern, triple, context))
    .filter((x) => x);
}

The only change is relevantTriples. We’ll lean on it to figure out which index to use.

relevantTriples

Here’s all relevantTriples does:

function relevantTriples(pattern, db) {
  const [id, attribute, value] = pattern;
  if (!isVariable(id)) {
    return db.entityIndex[id];
  }
  if (!isVariable(attribute)) {
    return db.attrIndex[attribute];
  }
  if (!isVariable(value)) {
    return db.valueIndex[value];
  }
  return db.triples;
}

We take the pattern. We check the id, attribute, and the value. If any of them aren’t variables, we can safely use the corresponding index.

With that, we’ve made our query engine faster 🙂

Fin

I hope you had a blast making this and got a sense of how query engines work to boot. If you’d like to see the source in one place, here it is.

This is just the beginning. How about functions like “greater than” or “smaller than”? How about an “or” query? Let’s not forget aggregate functions. If you’re curious about this, I’d suggest three things:

First go through the Learn Datalog website; that’ll give you a full overview Datalog. Next, I’d suggest you go through the SICP chapter on logic programming. They go much further than this essay. Finally, you can look at Nikita Tonsky’s datascript internals, for what a true production version could look like.

Credits

Huge credit goes to SICP. When I completed their logic chapter, I realized that query languages didn't have to be so daunting. This essay is just a simplification of their chapter, translated into Javascript. The second credit needs to go to Nikita Tonsky’s essays. His Datomic and Datascript internals essays are a goldmine. Finally, I really enjoyed Learn Datalog, and used their dataset for this essay.

Discussion on HN

Thanks to Joe Averbukh, Irakli Safareli, Daniel Woelfel, Mark Shlick, Alex Reichert, Ian Sinnott, for reviewing drafts of this essay.

^[1]: Learn Datalog Today — very fun!

^[2]: You may be wondering, won’t find always have variables? Well, not always. You could include some constant, like {find: ["movie/title", "?title"]}

Database in the Browser, a Spec

Stepan Parunashvili — Thu, 29 Apr 2021 00:00:00 GMT

How will we build web applications in the future?

If progress follows it's usual strategy, then whatever is difficult and valuable to do today will become easy and normal tomorrow. I imagine we'll discover new abstractions, which will make writing Google Docs as easy as the average web app is today.

This begs the question — what will those abstractions look like? Can we discover them today? One way to find out, is to look at all the schleps we have to go through in building web applications, and see what we can do about it.

Dear reader, this essay is my attempt to follow that plan. We’ll take a tour of what it's like to build a web application today: we'll go over the problems we face, assess solutions like Firebase, Supabase, Hasura and friends, and see what's left to do. I think by the end, you'll agree with me that one of the most useful abstractions looks like a database in the browser. I'm getting ahead of myself though, let's start at the beginning:

Client

The journey begins with Javascript in the browser

A. Data Plumbing

The first job we have is to fetch information and display it in different places. For example, we may display a friends list, a friends count, a modal with a specific group of friends, etc

The problem we face, is that all components need to see consistent information. If one component sees different data for friends, it’s possible that you’ll get the wrong "count" showing up, or a different nickname in one view versus another.

To solve for this, we need to have a central source of truth. So, whenever we fetch anything, we normalize it and plop it in one place (often a store). Then, each component reads and transforms the data it needs (using a selector), It’s not uncommon to see something like:

// normalise [posts] -> {[id]: post}
fetchRelevantPostsFor(user).then((posts) => {
  posts.forEach((post) => {
    store.addPost(post);
  });
});

// see all posts by author:
store.posts.values().reduce((res, post) => {
  res[post.authorId] = res[post.authorId] || [];
  res[post.authorId].push(post);
  return res;
}, {});

The question here is, why should we need to do all this work? We write custom code to massage this data, while databases have solved this problem for a long time now. We should be able to query for our data. Why can’t we just do:

SELECT posts WHERE post.author_id = ?;

on the information that we have inside the browser?

B. Change

The next problem is keeping data up to date. Say we remove a friend — what should happen?

We send an API request, wait for it to complete, and write some logic to "remove" all the information we have about that friend. Something like this:

deleteFriend(user, friend.id).then((res) => {
  userStore.remove(friend.id);
  postStore.removeUserPosts(friend.id);
});

But, this can get hairy to deal with quick: we have to remember every place in our store that could possibly be affected by this change. It’s like playing garbage collector in our heads. Our heads are not good at this.

One way folks avoid it, is to skip the problem and just re-fetch the whole world:

deleteFriend(user, id).then((res) => {
  fetchFriends(user);
  fetchPostsRelevantToTheUser(user);
});

Neither solutions are very good. In both cases, there are implicit invariants we need to be aware of (based on this change, what other changes do we need to be aware of?) and we introduce lag in our application.

The rub is, whenever we make a change to the database, it does it’s job without us having to be so prescriptive. Why can’t this just happen automatically for us in the browser?

DELETE FROM friendships WHERE friend_one_id = ? AND friend_two_id = ?
-- Browser magically updates with all the friend and post information removed

C. Optimistic Updates

The problem you may have noticed with B., was that we had to wait for friendship removal to update our browser state.

In most cases, we can make the experience snappier with an optimistic update — after all, we know that the call will likely be a success. To do this, we do something like:

friendPosts = userStore.getFriendPosts(friend);
userStore.remove(friend.id);
postStore.removeUserPosts(friend.id);
deleteFriend(user, id).catch((e) => {
  // undo
  userStore.addFriend(friend);
  postStore.addPosts(friendPosts);
});

This is even more annoying. Now we need to manually update the success operation, and the failure operation.

Why is that? On the backend, a database is able to do optimistic updates ^[1] — why can’t we do that in the browser?

DELETE friendship WHERE friend_one_id = ? AND friend_two_id = ?
-- local store optimistically updated, if operation fails we undo

D. Reactivity

And data doesn’t just change from our own actions. Sometimes we need to connect to changes that other users make. For example, someone could unfriend us, or someone could send us a message.

To make this work, we need to do the same work that we did in our API endpoints, but this time on our websocket connection:

ws.listen(`${user.id}/friends-removed`, friend => {
  userStore.remove(friend.id);
  postStore.removeUserPosts(friend.id);
}

But, this introduces two problems. First, we need to play garbage collector again, and remember every place that could be affected by an event.

Second, if we do optimistic updates, we have race conditions. Imagine you run an optimistic update, setting the color of a shape to blue, while a stale reactive update comes in, saying it’s red.

1. Optimistic Update: `Blue`
2. Stale reactive update: `Red`
3. Successful Update, comes in through socket: `Blue`

Now, you’ll see a flicker. The optimistic update will come in to blue, a reactive update will change it to red, but once the optimistic update succeeds, a new reactive update will turn it back to blue again. ^[2]

Solving stuff like this has you dealing with consistency issues, scouring literature on…databases.

It doesn’t have to be that way though. What if each query was reactive?

SELECT friends.* FROM users as friends JOIN friendships on friendship.user_one_id ...

Now, any change in friendships would automatically update the view subscribed to this query. You wouldn’t have to manage what changes, and your local database could figure out what the "most recent update" is, removing much of the complexity.

Server

It only gets harder on the server.

E. Endpoints

Much of backend development ends up being a sort of glue between the database and the frontend.

// db.js
function getRelevantPostsFor(userId) {
  db.exec('SELECT * FROM posts WHERE ...');
}

// api.js
app.get('relevantPosts', (req, res) => {
  res.status(200).send(getRelevantPosts(req.userId));
});

This is so repetitive that we end up creating scripts to generate these files. But why do we need to do this at all? They are often coupled very closely to the client anyways. Why can’t we just expose the database to the client?

F. Permissions

Well, the reason we don’t, is because we need to make sure permissions are correctly set. You should only see posts by your friends, for example. To do this, we add middleware to our API endpoints:

app.put("user", auth, (req, res) => {
  ...
}

But, this ends up getting more and more confusing. What about websockets? New code changes sometimes introduce ways to update database objects that you didn’t expect. All of a sudden, you’re in trouble.

The question to ask here, is why is authentication at the API level? Ideally, we should have something very close to the database, making sure any data access passes permission checks. There’s row-level security on databases like Postgres, but that can get hairy quick ^[3]. What if you could "describe" entities near the database?

User {
  view: [
    IAllowIfAdmin(),
    IAllowIfFriend(),
    IAllowIfSameUser(),
  ]
  write: [
    IAllowIfAdmin(),
    IAllowIfSameUser(),
  ]
}

Here we compose authentication rules, and make sure that any way you try to write too and update a user entity, you are guaranteed to that you are permitted. All of a sudden, instead of most code changes affecting permissions, only a few do.

G. Audits, Undo / Redo

And at some point, we get requirements that blow up complexity for us.

For example, say we need to support "undo / redo", for friendship actions. A user deletes a friend, and then they press "undo" — how could we support this?

We can’t just delete the friendship relation, because if we did, then we wouldn’t know if this person was "already friends", or was just asking now to become friends. In the latter case we may need to send a friend request.

To solve this, we’d evolve our data model. Instead of a single friendship relation, we’d have "friendship facts"

[
  { status: 'friends', friend_one_id: 1, friend_two_id: 2, at: 1000 },
  { status: 'disconnected', friend_one_id: 1, friend_two_id: 2, at: 10001 },
];

Then the "latest fact" would represent whether there is a friendship or not.

This works, but most databases weren’t designed for it: the queries don’t work as we expect, optimizations are harder than we expect. We end up having to be very careful about how we do updates, in case we end up accidentally deleting records.

All of a sudden, we become "sort of database engineers", devouring literature on query optimization.

This kind of requirement seems unique, but it’s getting more common. If you deal with financial transactions, you need something like this for auditing purposes. Undo / Redo is a necessity in lots of apps.

And god forbid an error happens and we accidentally delete data. In a world of facts there would be no such thing — you can just undo the deletions. But alas, this is not the world most of us live in.

There are models that treat facts as a first class citizen (Datomic, which we’ll talk about soon), but right now they’re so foreign that it’s rarely what engineers reach too. What if it wasn't so foreign?

H. Offline Mode

There’s more examples of difficulty. What about offline mode? Many apps are long-running and can go for periods without internet connection. How can we support this?

We would have to evolve our data model again, but this time really keep just about everything as a "fact", and have a client-side database that evolve it’s internal state based on them. Once a connection is made, we should be able to reconcile changes.

This gets extremely hard to do. In essence, anyone who implements this becomes a database engineer full-stop. But, if we had a database in the browser, and it acted like a "node" in a distributed database, wouldn’t this just happen automatically for us?

Turns out, fact-based systems in fact make this much, much easier. Many think we need to resort to operational transforms to do stuff like this, but as figma showed, as long as we’re okay with having a single leader, and are fine with last-write-wins kind of semantics, we can drastically simplify this and just facts are enough. When time for even more serious resolution comes, you can open up the OT rabbit hole.

Imagine…offline mode off the bat. What would the most applications feel like after this?

I. Reactivity

We talked about reactivity from the client. On the server it’s worrying too. We have to ensure that all the relevant clients are updated when data changes. For example, if a "post" is added, we need to make sure that all possible subscriptions related to this post are notified.

function addPost(post) {
  db.addPost(post);
  getAllFriends(post).forEach(notifyNewPost);
}

This can get hairy. It’s hard to know all the topics that could be related. It could also be easy to miss: if a database is updated with a query outside of addPost, we’d never know. This work is up to the developer to figure out. It starts off easy, but gets ever more complex.

Yet, the database could be aware of all these subscriptions too, and could just handle updating the relevant queries. But most don’t. RethinkDB is the shining example that did this well. What if this was possible with the query language of your choice?

J. Derived Data

Eventually, we end up needing to put our data in different places: either caches (Redis), search indexes (ElasticSearch), or analytics engines (Hive). Doing this becomes pretty daunting. You may need to introduce some sort of a queue (Kafka), so all of these derived sources are kept up to date. Much of this involves provisioning machines, introducing service discovery, and the whole shebang.

Why is this so complicated though? In a normal database you can do something like:

CREATE INDEX ...

Why can’t we do that, for other services? Martin Kleppman, in his Data Intensive Applications, suggests a language like this:

db |> ElasticSearch;
db |> Analytics;
db.user |> Redis;
// Bam, we've connected elastic search, analytics, and redis to our db

Monkey Wrenches

Wow, we’ve gone up to J. But these are only issues you start to face once you start building your application. What about before?

K. TTP — Time to Prototype

Perhaps the most restrictive problem for developers today is how hard it is to get started. If you want to store user information and display a page, what do you do?

Before, it was a matter of index.html and FTP. Now, it’s webpack, typescript, build processes galore, often multiple services. There are so many moving pieces that it’s hard to take a step.

This can seem like a problem only inexperienced people need to contend with, and if they just spent some time they’ll get faster. I think it’s more important than that. Most projects live on the fringe — they aren’t stuff you do as a day job. This means that even a few minutes delay in prototyping could kill a magnitude more projects.

Making this step easier would dramatically increase the number of applications we get to use. What if it was easier than index.html and FTP?

Current Solutions

Wow, that’s a lot of problems. It may seem bleak, but if you just look a few years back, it’s surprising how much has improved. After all, we don’t need to roll our own racks anymore. Many great folks are working on solutions to these problems. What are some of them?

1) Firebase

I think Firebase has done some of the most innovative work in moving web application development forward. The most important thing they got right, was a database on the browser.

With firebase, you query your data the same way you would on the server. By creating this abstraction, they solved A-E. Firebase handles optimistic updates, and is reactive by default. It obviates the need for endpoints by providing support for permissions.

They’re strength also stems for K: I think it still has the best time-to-prototype in the market. You can just start with index.html!

However, it has two problems:

First, query strength. Firebase’s choice of a document model makes the abstraction simpler to manage, but it destroys your query capability. Very often you’ll fall into a place where you have to de-normalize data, or querying for it becomes tricky. For example, to record a many-to-many relationship like a friendship, you’d need to do something like this:

userA: friends: userBId: true;
userB: friends: userAId: true;

You de-normalize friendships across two different paths (userA/friends/userBId) and (userB/friends/userAId). Grabbing the full data requires you to manually replicate a join:

1. get `userA/friends`
2. for each id, get `/${id}`

These kind of relationships sprout up very quickly in your application. It would be great if a solution helped you handle it.

Second, permissions. Firebase lets you write permissions using a limited language. In practice, these rules get hairy quickly — to the point that folks resort to writing some higher-level language themselves and compiling down to Firebase rules.

We experimented a lot on this at Facebook, and came to the conclusion that you need a real language to express permissions. If Firebase had that, it would be much more powerful.

With the remaining items (audits, Undo / Redo, Derived Data) — Firebase hasn’t tackled them yet.

2) Supabase

Supabase is trying to do what Firebase did for Mongo, but for Postgres. If they did this, it would be quite an attractive option, as it would solve Firebase’s biggest problem: query strength.

Supabase has some great wins so far. Their auth abstraction is great, which makes it one of the few platforms that are as easy to get started with as firebase was.

Their realtime option allows you to subscribe to row-level updates. For example, if we wanted to know whenever a friendship gets created, updated, or changed, we could write this:

const friendsChange = supabase
  .from('friendships:friend_one_id=eq.200')
  .on('*', handleFriendshipChange)
  .subscribe();

This in practice can get you far. It can get hairy though. For example, if a friend is created, we may not have the user information and we’d have to fetch it.

function handleFriendshipChange(friendship) {
  if (!userStore.get(friendship.friend_two_id)) {
      fetchUser(...)
  }
}

This points to Supabase’s main weakness: it doesn’t have a "database on the browser" abstraction. Though you can make queries, you are responsible for normalizing and massaging data. This means that they can’t do optimistic updates automatically, reactive queries, etc.

Their permission model is also similar to Firebase, in that they defer to Postgres’ row-level security. This can be great to start out, like Firebase gets hairy quickly. Often these rules can slow down the query optimizer, and the SQL itself gets harder and harder to reason about.

3) GraphQL + Hasura

GraphQL is an excellent way to declaratively define data you want from the client. services like Hasura can take a database like Postgres, and do smart things like give you a GraphQL API out of it.

Hasura is very compelling for reads. They do a smart job of figuring joins, and can get you a good view for your data. With a flip, you can turn any query into a subscription. When I first tried turning a query into a subscription, it certainly felt magical.

The big issue today with GraphQL tools in general, is their time-to-prototype. You often need multiple different libraries and build steps. Their write-story is less compelling too. Optimistic Updates don’t just happen automatically — you have to bust caches yourself.

Lay of the Land

We’ve looked at the three most promising solutions. Right now, Firebase solves the most problems off the bat. Supabase gives you query strength at the expense of more client-side support. Hasura gives you more powerful subscriptions and more powerful local state, at the expense of time-to-prototype. As far as I can see, none are handling conflict resolution, undo / redo, powerful reactive queries on the client yet.

Future

Now the question: what will the evolution of these tools look like?

In some ways, the future is happening now. I think Figma, for example, is an app from the future: it handles handle offline-mode, undo / redo and multiplayer beautifully.

If we wanted to make an app like that, what would an ideal abstraction for data look like?

Requirements

1) A database on the client, with a powerful query language

From the browser, this abstraction would have to be like firebase, but with a strong query language.

You should be able to query your local data, and it should be as powerful as SQL. Your queries should be reactive, and update automatically if there are changes. It should handle optimistic updates for you too.

user = useQuery('SELECT * FROM users WHERE id = ?', 10);

2) A real permission language

Next up, we’d need a composable permission language. FB’s EntFramework is the example I keep going back too, because of how powerful it was. We should be able to define rules on entities, and should just be guaranteed that we won’t accidentally see something we’re not allowed to see.

User {
  view: [
    IAllowIfAdmin(),
    IAllowIfFriend(),
    IAllowIfSameUser(),
  ]
  write: [
    IAllowIfAdmin(),
    IAllowIfFriend(),
  ]
}

3) Offline Mode & Undo / Redo

Finally, this abstraction should make it easy for us to implement offline mode, or undo redo. If a local write happens, and there’s a conflicting write on the server, there should be a reconciler which does the right thing most of the time. If there are issues, we should be able to nudge it along in the right direction.

Whatever abstraction we choose, it should give us the ability to run writes while we’re offline.

4) The Next Cloud

Finally, we should be able to express data dependencies without having to spin anything up. With a simple

db.user |> Redis;

all queries to users would magically be cached by Redis.

Sketch of an Implementation

Okay, those requirements sound magical. What would an implementation look like today?

Datomic & Datascript

In the Clojure world, folks have long been fans of Datomic, a facts-based database that lets you "see every change over time". Nikita Tonsky also implemented datascript, a client-side database and query engine with the same semantics as Datomic!

They’ve been used to build offline-enabled applications like Roam, or collaborative applications like Precursor. If we were to package up a Datomic-like database on the backend, and datascript-like database on the frontend, it could become "database on the client with a powerful query language"!

Reactivity

Datomic makes it easy for you to subscribe to new committed facts to the database. What if we made a service on top if, which kept queries and listened to these facts. From a change, we would update the relevant query. All of a sudden, our database becomes realtime!

Permission Language

Our server could accept code fragments, which it runs when fetching data. These fragments would be responsible for permissions, giving us a powerful permission language!

Pipe

Finally, we can write up some DSL, which lets you pipe data to Elastic Search, Redis, etc, all according to the user’s preferences.

With that, we have a compelling offering.

Considerations

So, why doesn’t this exist yet? Well...

Datalog is unfamiliar

If we were to use a Datomic-like database, we wouldn’t use SQL anymore. Datomic uses a logic-based query language called Datalog. Now, it is just as, if not more, powerful than SQL. The only gotcha is that for the uninitiated it looks very daunting:

[:find [(pull ?c [:conversation/user :conversation/message]) ...]
 :where [?e :session/thread ?thread-id]
        [?c :conversation/thread ?thread-id]]

This query would find all messages, alongside with the user information, for the active thread in this current "session". Not bad!

Once you get to know it, it’s an unbelievably elegant language. However, I don’t think that’s enough. Time-to-prototype needs to be blazing fast, and having to learn this may be too much.

There have been some fun experiments in making this easier. Dennis Heihoff tried using natural language for example. This points to an interesting solution: Could we write a slightly more verbose, but more natural query language that compiles to Datalog? I think so.

The other problem, is that data modeling is also different from what people are used too. Firebase is the gold-standard, where you can write your first mutation without specifying any schema.

Though it will be hard, I think we should aim to be as close to "easy" as possible. Datascript only requires you to indicate references and multi-valued attributes. Datomic requires a schema, but perhaps if we used an open-source, datalog-based database, we could enhance it to do something similar. Either as little schema as possible, or a "magically detectable schema".

Datalog would be hard to make reactive

A big problem with both SQL and Datalog, is that based on some new change, it’s hard to figure out which queries need to be updated.

I don’t think it’s impossible though. Hasura does polling and it scaled ^[4]. We could try having a specific language for subscriptions as well, similar to Supabase. If we can prove certain queries can only change by some subset of facts, we can move them out of polling.

This is a hard problem, but I think it’s a tractable one.

A permission language would slow things down

One problem with making permission checks a full-blown language, is that we’re liable to overfetch data.

I think this is a valid concern, but with a database like Datomic, we could handle it. Reads are easy to scale and cache. Because everything’s a fact, we could create an interface that guides people to only fetch the values they need.

Facebook was able to do it. It will be hard, but it’s possible.

It may be too large of an abstraction

Frameworks often fail to generalize. For example, what if we wanted to share mouse position? This is ephemeral state and doesn’t fit in a database, but we do need to make it realtime — where would we keep it? There’s a lot of these-kinds-of-things that are going to pop up if you build an abstraction like this, and you’re likely to get it wrong.

I do think this is a problem. If someone were to tackle this, the best bet would be to go the Rails approach: Build a production app using it, and extract the internals out as a product. I think they’d have a good shot at finding the right abstraction.

It will only be used for toys

The common issue with these kind of products, is that people will only use them for hobby projects, and there won’t be a lot of money in it. I think Heroku and Firebase point to a bright future here.

Large companies start as side-projects. Older engineers may look at Firebase like a toy, but many a successful startup now runs on it. Instead of being a just a database, perhaps it’ll become a whole new platform — the successor to AWS.

The Market is very competitive

The market is competitive and the users are fickle. Slava’s Why RethinkDB Failed paints a picture for how hard it is to win in the developer tools market. I don’t think he is wrong. Doing this would require a compelling answer to how you’ll build a moat, and expand towards The Next AWS.

Fin

Well, we covered the pains, covered the competitors, covered an ideal solution, and went through the considerations. Thank you for walking with me on this journey!

Like-Minded Folks

These ideas are not new. My friends Sean Grove and Daniel Woelfel’s built Dato, a framework that integrated a bunch of these ideas. Nikita Tonsky wrote Web After Tomorrow an essay with a very similar spirit.

It may require some iteration to figure out the interface, but the there’s an interesting road ahead.

Next Up

I’m toying with some ideas in this direction. The big problem to solve here, is how important this is for people, and whether a good abstraction can work. To solve the first, I wrote this essay. Is this a hair-on-fire problem that you’re facing? If it is, to the point that you’re actively looking for solutions, please reach out to me on Twitter! I’d love to learn your use case 🙂. As I create applications, I’ll certainly keep this back of mind — who knows, maybe a good abstraction can be pulled out.

Thanks Joe Averbukh, Sean Grove, Ian Sinnott, Daniel Woelfel, Dennis Heihoff, Mark Shlick, Alex Reichert, Alex Kotliarskyi, Thomas Schranz, for reviewing drafts of this essay

[1] You may not notice this as Postgres gives a consistency guarantee. However, for them to support multiple concurrent transactions, they in effect need to be able to keep "temporary alterations"

^[2]: Figma mentions this problem in their multiplayer essay

[3] Plain SQL and boolean logic is hard to reuse, and can slow down the query planner. Many folks who have medium-sized apps experience this quickly.

^[4]: Take a look at Hasura’s notes