If you open Google Docs and start typing with a friend, you’ll see cursors moving in real time, text appearing instantly, and edits merging without conflicts. It feels simple on the surface, but behind it sits a carefully designed architecture built for concurrency, consistency, and low-latency updates.
This article is a continuation of the Rich Text Editor System Design.
Understanding that design is a prerequisite for this post, since we will extend the same single-user editor into a collaborative system.
R — Requirements
Scope
We are designing a browser-based collaborative rich text editor similar to Google Docs that supports multiple users editing the same document simultaneously.
Since the core editor architecture is already explained in previous post.
This article focuses mainly on:
- Real-time conflict resolution
- Real-time communication
- Version history
Functional Requirements
- Real-time multi-user editing
- Automatic conflict resolution
- Live cursor and selection visibility
- Real-time communication channel (client ↔ server)
- Persistent version history
- Autosave and cloud sync
- Presence awareness (join/leave, active users)
Non-Functional Requirements
- Low-latency local typing
- Eventual consistency across clients
- Offline editing with safe resynchronization
- Scalability for multiple concurrent users
- Reliability under network failures
A — Architecture
In the single-user editor, our update pipeline looked like:
Events → Commands → Editor State → Reconciler → DOM
This works because there is only one source of edits. Collaboration breaks this assumption.
Now we must synchronize multiple editor states across multiple clients in real time, while keeping all documents consistent.
To solve this, we introduce a new layer between the editor and the server:
👉 The Collaboration Engine
Why collaboration breaks normal editors
In a collaborative editor:
Multiple users can edit the same position at the same time.
Example:
Initial textHello world
User A deletes world
User B types beautiful
Without a merge strategy, clients may end up with different documents.
The system must guarantee convergence — all clients eventually see the same document.
This problem is called:
Distributed Real-Time Editing
There are four main approaches to resolve the issue:
- Last Write Wins (LWW)
- Locking systems
- Operational Transform (OT)
- CRDT
Only OT and CRDT truly support modern collaborative editing, but we’ll quickly understand all four.
1) Last Write Wins (LWW)
Idea:
The latest update overwrites previous changes.
How updates are sent
Clients typically send:
- Entire document, or
- Entire edited section/block
Server simply replaces the previous version.
Result:
Last update wins → earlier edits may be lost.
Verdict: Too destructive for collaboration.
2) Locking System (Single editor / Round robin)
Idea:
Only one user can edit a section at a time.
Other users must wait.
Example
- User A locks paragraph → edits
- User B must wait until lock released
Verdict: Safe but terrible UX. Not real-time.
3) Operational Transform (OT)
Used by: Google Docs, ShareDB
Think of OT as a central traffic controller for edits.
The intuition
Users do not send the whole document.
They send operations based on the version of the document they see.
Typical operations:
- Insert(text, position)
- Delete(range)
There is no “replace” operation.
Replace = Delete + Insert.
All operations go to a central server, which becomes the source of truth.
What problem does OT solve?
Multiple users can edit the same text at the same time.
Their operations are based on an older version of the document.
So when operations arrive at the server, they may no longer make sense.
The server must transform operations so they still work on the updated document.
This is the core idea of OT.
What does “transform” actually mean?
Transform =
Rewrite an operation so it still makes sense after earlier edits happened.
Let’s walk through a real example.
Example: Two users edit the same word
Initial text:
Hello world
Positions:
0 1 2 3 4 5 6 7 8 9 10
H e l l o _ w o r l d
Range 6–11 = "world"
Two users edit simultaneously
User A changes "world" → "helloo"
User B changes "world" → "averioo"
Both edits are based on the same old document.
Step 1 — Convert edits into operations
User A:
Delete(6–11)
Insert("helloo", 6)User B:
Delete(6–11)
Insert("averioo", 6)Both reach the server at the same time.
Step 2 — Server picks a deterministic order
Assume:
- Apply User A first
- Transform User B operations against A
Step 3 — Apply User A operations
Start:Hello world
After Delete(6–11):Hello
After Insert("helloo",6):Hello helloo
This becomes the official server document.
Step 4 — Transform User B operations
User B created operations assuming "Hello world" still exists.
But now the document is "Hello helloo".
So we must transform B’s operations.
Transform Delete(6–11)
User B wanted to delete "world".
But it was already deleted by User A.
So this operation becomes:
👉 No-op (do nothing)
Transform Insert("averioo",6)
Insert is still valid.
Apply to current document:
Hello helloo → insert at 6 →
Final:Hello averioohelloo
Final result (same for everyone)
Hello averioohelloo
All clients replay the same transformed operations in the same order, so everyone ends up with the same document.
Why OT works well
The server guarantees:
- Same operation order
- Same transformed operations
- Same final document
4) CRDT (Conflict-free Replicated Data Types)
Used by: Figma, Notion, Yjs, Automerge
CRDT solves collaboration by designing operations so they always merge safely, even if users edit simultaneously or while offline.
There is no central conflict resolver — every client can merge updates independently and still reach the same final document.
How CRDT resolves conflicts — Step-by-step example
Initial document:
Hello world
Two users go offline and edit at the same time.
User A replaces world → helloo
User B replaces world → averioo
Both edited the same word concurrently.
Step 1 — Convert edits into CRDT operations
CRDT does not send “replace text”.
Instead it sends character-level operations with unique IDs.
User A operations (conceptually)
User A deletes the characters that form "world" and inserts "helloo".
Insertion is recorded like this:
h(id101 after space)
e(id102 after id101)
l(id103 after id102)
l(id104 after id103)
o(id105 after id104)
o(id106 after id105)Key idea:
Each new character has:
- A unique ID
- A reference to the character it was inserted after
User B operations
User B deletes "world" and inserts "averioo":
a(id201 after space)
v(id202 after id201)
e(id203 after id202)
r(id204 after id203)
i(id205 after id204)
o(id206 after id205)
o(id207 after id206)Both users now have different local documents.
User A sees:Hello helloo
User B sees:Hello averioo
Step 2 — Users reconnect and sync
Clients exchange their operations.
Now each client receives:
- Insert operations from User A
- Insert operations from User B
- Delete markers for
"world"
Step 3 — Merge rules resolve the conflict
Delete conflict
Both users deleted "world" → no conflict.
Those characters remain deleted.
Insert conflict (same position)
Both users inserted text after the same character (space after “Hello”).
So CRDT now has two insert chains at the same location:
Chain A → "helloo"
Chain B → "averioo"
CRDT uses deterministic ordering to decide which chain comes first.
Typical rule:
- Compare timestamps
- If equal → compare client IDs
Assume User B’s operations come first.
Step 4 — Final merged document
Final merged text becomes:
Hello averioohelloo
All clients apply the same rules →
All clients end up with the same document.
No server decision.
No transforms.
No lost edits.
Why this works
CRDT guarantees:
- All edits are preserved
- Ordering is deterministic
- Message order doesn’t matter
- Offline edits merge safely
This property is called:
👉 Strong eventual consistency
Every replica eventually becomes identical.
Comparison of approches
Approach | Advantages | Disadvantages |
|---|---|---|
Last Write Wins (LWW) | Very simple to implementEasy storage modelLow computation cost | Users can lose workNo real mergingTerrible collaboration experienceNot suitable for editors |
Locking System (Single Editor / Round-Robin) | No merge conflictsSimple reasoningPredictable updates | Poor user experienceNo true real-time collaborationDoesn’t scale with many users |
Operational Transform (OT) | Mature and battle-tested, Smaller metadata size, Strong centralized controlWorks well for always-online apps | Complex transform logicRequires always-online central serverOffline editing is difficultServer becomes bottleneck |
CRDT (Conflict-free Replicated Data Types) | Offline-first friendlyAutomatic conflict resolutionNo single point of failureStrong eventual consistency | Larger metadata per characterBigger document sizeMore complex data structures |
Why we chose CRDT over OT
Both OT and CRDT can power collaborative editors, but based on our requirements, CRDT fits better. While Google Docs successfully uses OT with a strong centralized server to order and transform operations, our architecture prioritizes offline editing, easier horizontal scaling etc:
1) Offline editing requirement
Our requirements include offline tolerance.
- OT → needs a central server to transform operations
- CRDT → clients can edit fully offline and sync later
This is a major deciding factor.
2) No single point of conflict resolution
In OT:
- Server is responsible for resolving conflicts.
- If server is unavailable → collaboration breaks.
In CRDT:
- Every client can merge updates independently.
- Server acts mainly as relay + storage.
This improves reliability and scalability.
3) Simpler networking model
OT requires:
- Operation ordering
- Transformation pipelines
- Strict server coordination
CRDT requires:
- Exchange updates
- Apply deterministic merge rules
This makes the frontend-heavy architecture simpler.
4) Better support for modern apps
Modern collaborative tools (Figma, Notion, Excalidraw, Linear) prefer CRDT because they need:
- Offline-first capability
- Mobile / unstable network support
- Fast reconnection sync
CRDT aligns better with these needs.
5) Frontend-first architecture
Our system is frontend-centric.
CRDT lets the browser:
- Keep a full document replica
- Continue working without server
- Sync when connection returns
This matches modern web architecture trends.
Final Architecture Diagram
So in the interview you can explain the editor diagram first, similar to the previous post . Then you can talk about the conflict resolution.
Now lets focus on the other architecture aspects.
HTTP/1.1 vs HTTP/2 vs HTTP/3
Modern web apps can run on HTTP/1.1, HTTP/2, or HTTP/3.
- HTTP/1.1 → Limited concurrency and head-of-line blocking.
- HTTP/2 → Multiplexing and better performance for APIs and asset loading.
- HTTP/3 (QUIC) → Faster connection setup, no transport-level head-of-line blocking, better performance on unstable/mobile networks.
Decision:
We assume the app runs on HTTP/2 or HTTP/3, with HTTP/3 being ideal for real-time apps due to better latency and reconnection performance.
Communication Protocol Choice
For live updates, we have four main options:
- Short polling → frequent requests; high latency and wasteful.
- Long polling → better than polling but still request/response based.
- Server-Sent Events (SSE) → one-way streaming (server → client only).
- WebSockets → full duplex, persistent, low-latency communication.
Decision:
We choose WebSockets because collaborative editors require:
- Continuous two-way communication
- Instant delivery of updates
- Efficient streaming of small frequent messages
WebSockets are the best fit for CRDT updates, presence, and real-time sync.
M — Model (Collaborative Data Model)
In the single-user editor, the Editor State tree was the source of truth.
In a collaborative editor, this changes:
👉 The CRDT document becomes the source of truth
👉 The editor state becomes a derived view
New flow:
CRDT Document → Editor State Tree → DOMThe editor tree still exists, but it is now rebuilt from the CRDT document.
Two-Layer Document Model
We now maintain two representations of the document.
1) CRDT Document (persistent + collaborative)
This layer stores everything needed for collaboration:
- Characters with unique IDs
- Insert / delete operations
- Formatting metadata
- Author (clientId)
- Logical timestamps
Conceptual example of a CRDT character:
{
id: "c101",
char: "H",
after: "c100",
deleted: false,
clientId: "userA",
timestamp: 170001
}This structure allows:
- Concurrent edits
- Offline edits
- Deterministic merging
This is the true source of truth.
2) Editor State Tree (rendering view)
The editor still needs the familiar tree model for:
- Rendering blocks
- Commands & plugins
- Keyboard navigation
- Accessibility
So we derive the editor tree from the CRDT document.
CRDT → build tree → render DOM
This keeps the editor reusable and independent of collaboration.
How Rich Text is Represented in CRDT
Instead of storing full text nodes:
{ type: "text", value: "Hello", bold: true }CRDT stores characters + formatting metadata.
Conceptually:
[
{ id: "c1", char: "H" },
{ id: "c2", char: "e" },
{ id: "c3", char: "l", bold: true },
{ id: "c4", char: "l", bold: true },
{ id: "c5", char: "o" }
]This enables:
- Concurrent text editing
- Concurrent formatting
- Conflict-free style merging
Block Structure in CRDT
We also need to represent:
- Paragraphs
- Headings
- Lists
- Images
Each block becomes a CRDT node with an ID.
Conceptually:
{
id: "block1",
type: "paragraph",
children: ["c1","c2","c3"]
}Blocks can also be inserted, deleted, and reordered collaboratively.
Selection Model (Local vs Remote)
Single-user editor → one selection.
Collaborative editor → many selections.
Local selection
Stored in editor state (not in CRDT).
Remote selections (presence)
Stored separately:
type RemoteSelection = {
userId: string;
anchor: CRDTPosition;
focus: CRDTPosition;
color: string;
};Used for:
- Live cursors
- Text highlights
- Presence indicators
Presence is ephemeral (not persisted).
Undo / Redo Model Changes
Single-user:
Undo = revert previous editor state snapshot.
Collaborative editor:
Undo must be per-user.
Each client stores:
- Local operation history
- Local undo / redo stack
Undo only reverts your own operations, not other users’ edits.
This matches Google Docs behavior.
Version History Model
Beyond undo/redo, we now maintain a global change log.
Server stores:
- CRDT update log
- Periodic snapshots
This enables:
- Document timeline
- Restore previous versions
- Audit document evolution
I — Interface
In the single-user editor, the interface exposed APIs to update, read and extend the editor.
In the collaborative editor, we extend the same API to support:
- Collaboration lifecycle
- Sync status
- Presence (live users & cursors)
- Version history
We are adding APIs, not replacing the existing ones.
Editor Initialization (extended)
We extend the existing createEditor() config with collaboration options.
createEditor({
rootElement,
namespace: "doc-editor",
initialEditorState,
theme,
errorBoundary,
collaboration: {
documentId: "doc-123",
userId: "user-456",
userName: "Pranav",
userColor: "#7C3AED",
websocketUrl: "wss://collab.server.com"
}
});New collaboration config:
documentId→ shared document identifieruserId→ unique user identityuserName / userColor→ presence metadatawebsocketUrl→ collaboration server endpoint
You also need to explain other interface of editor similar described Rich Text Editor System Design.
O — Optimization (Collaborative Editor)
A collaborative editor is far heavier than a single-user editor.
Every keystroke can trigger:
- Local state update
- CRDT update
- Network broadcast
- Remote updates from other users
- Presence updates
- DOM reconciliation
Without careful optimisation, typing quickly becomes laggy.
1) Incremental CRDT Updates (Patch-based sync)
We never send the full document.
Instead we send small CRDT update patches:
- Insert character
- Delete character
- Apply formatting
- Insert block
Benefits:
- Very small network payloads
- Faster sync
- Scales to large documents
2) Local-first Updates (Optimistic UI)
Typing must feel instant.
Flow:
- Apply change locally immediately
- Send update to server asynchronously
- Merge remote updates later
Users never wait for the network before seeing their typing.
This is critical for perceived performance.
3) Update Batching (Network + Rendering)
Typing triggers many operations rapidly.
Instead of sending every keystroke immediately:
- Batch CRDT updates (e.g. every 50–100ms)
- Send updates in small bundles
Benefits:
- Fewer WebSocket messages
- Lower server load
- Better battery usage on mobile
4) Incremental Reconciliation for Remote Updates
Remote updates may arrive frequently.
We must avoid full re-renders.
Process:
- Apply CRDT update
- Rebuild only affected nodes
- Reconcile only changed DOM parts
Similar to React diffing.
5) Presence Throttling
Cursor movement can fire dozens of updates per second.
We throttle presence updates:
- Send cursor position every ~50–100ms
- Not on every pixel movement
This prevents network flooding.
Presence is high frequency but low importance.
6) Lazy Sync on Reconnect
When a user reconnects:
- Do NOT download full document immediately.
- Fetch missing CRDT updates since last version.
Benefits:
- Faster reconnect
- Lower bandwidth usage
7) Snapshot + Update Log Strategy
Server stores:
- Periodic snapshots
- Incremental CRDT updates
Why:
- Loading entire update history would be slow.
- Clients load latest snapshot + recent updates.
This keeps document load time fast.