If you open Google Docs and start typing with a friend, you’ll see cursors moving in real time, text appearing instantly, and edits merging without conflicts. It feels simple on the surface, but behind it sits a carefully designed architecture built for concurrency, consistency, and low-latency updates.

This article is a continuation of the Rich Text Editor System Design.
Understanding that design is a prerequisite for this post, since we will extend the same single-user editor into a collaborative system.

R — Requirements

Scope

We are designing a browser-based collaborative rich text editor similar to Google Docs that supports multiple users editing the same document simultaneously.

Since the core editor architecture is already explained in previous post.
This article focuses mainly on:

Real-time conflict resolution
Real-time communication
Version history

Functional Requirements

Real-time multi-user editing
Automatic conflict resolution
Live cursor and selection visibility
Real-time communication channel (client ↔ server)
Persistent version history
Autosave and cloud sync
Presence awareness (join/leave, active users)

Non-Functional Requirements

Low-latency local typing
Eventual consistency across clients
Offline editing with safe resynchronization
Scalability for multiple concurrent users
Reliability under network failures

A — Architecture

In the single-user editor, our update pipeline looked like:

Events → Commands → Editor State → Reconciler → DOM

This works because there is only one source of edits. Collaboration breaks this assumption.

Now we must synchronize multiple editor states across multiple clients in real time, while keeping all documents consistent.

To solve this, we introduce a new layer between the editor and the server:

👉 The Collaboration Engine

Why collaboration breaks normal editors

In a collaborative editor:

Multiple users can edit the same position at the same time.

Example:

Initial text
Hello world

User A deletes world
User B types beautiful

Without a merge strategy, clients may end up with different documents.

The system must guarantee convergence — all clients eventually see the same document.

This problem is called:

Distributed Real-Time Editing

There are four main approaches to resolve the issue:

Last Write Wins (LWW)
Locking systems
Operational Transform (OT)
CRDT

Only OT and CRDT truly support modern collaborative editing, but we’ll quickly understand all four.

1) Last Write Wins (LWW)

Idea:
The latest update overwrites previous changes.

How updates are sent
Clients typically send:

Entire document, or
Entire edited section/block

Server simply replaces the previous version.

Result:
Last update wins → earlier edits may be lost.

Verdict: Too destructive for collaboration.

2) Locking System (Single editor / Round robin)

Idea:
Only one user can edit a section at a time.

Other users must wait.

Example

User A locks paragraph → edits
User B must wait until lock released

Verdict: Safe but terrible UX. Not real-time.

3) Operational Transform (OT)

Used by: Google Docs, ShareDB

Think of OT as a central traffic controller for edits.

The intuition

Users do not send the whole document.
They send operations based on the version of the document they see.

Typical operations:

Insert(text, position)
Delete(range)

There is no “replace” operation.
Replace = Delete + Insert.

All operations go to a central server, which becomes the source of truth.

What problem does OT solve?

Multiple users can edit the same text at the same time.
Their operations are based on an older version of the document.

So when operations arrive at the server, they may no longer make sense.

The server must transform operations so they still work on the updated document.

This is the core idea of OT.

What does “transform” actually mean?

Transform =
Rewrite an operation so it still makes sense after earlier edits happened.

Let’s walk through a real example.

Example: Two users edit the same word

Initial text:

Hello world

Positions:

0 1 2 3 4 5 6 7 8 9 10
H e l l o _ w o r l d

Range 6–11 = "world"

Two users edit simultaneously

User A changes "world" → "helloo"
User B changes "world" → "averioo"

Both edits are based on the same old document.

Step 1 — Convert edits into operations

User A:

Delete(6–11)
Insert("helloo", 6)

User B:

Delete(6–11)
Insert("averioo", 6)

Both reach the server at the same time.

Step 2 — Server picks a deterministic order

Assume:

Apply User A first
Transform User B operations against A

Step 3 — Apply User A operations

Start:
Hello world

After Delete(6–11):
Hello

After Insert("helloo",6):
Hello helloo

This becomes the official server document.

Step 4 — Transform User B operations

User B created operations assuming "Hello world" still exists.
But now the document is "Hello helloo".

So we must transform B’s operations.

Transform Delete(6–11)

User B wanted to delete "world".
But it was already deleted by User A.

So this operation becomes:

👉 No-op (do nothing)

Transform Insert("averioo",6)

Insert is still valid.

Apply to current document:

Hello helloo → insert at 6 →

Final:
Hello averioohelloo

Final result (same for everyone)

Hello averioohelloo

All clients replay the same transformed operations in the same order, so everyone ends up with the same document.

Why OT works well

The server guarantees:

Same operation order
Same transformed operations
Same final document

4) CRDT (Conflict-free Replicated Data Types)

Used by: Figma, Notion, Yjs, Automerge

CRDT solves collaboration by designing operations so they always merge safely, even if users edit simultaneously or while offline.
There is no central conflict resolver — every client can merge updates independently and still reach the same final document.

How CRDT resolves conflicts — Step-by-step example

Initial document:

Hello world

Two users go offline and edit at the same time.

User A replaces world → helloo
User B replaces world → averioo

Both edited the same word concurrently.

Step 1 — Convert edits into CRDT operations

CRDT does not send “replace text”.

Instead it sends character-level operations with unique IDs.

User A operations (conceptually)

User A deletes the characters that form "world" and inserts "helloo".

Insertion is recorded like this:

h(id101 after space)
e(id102 after id101)
l(id103 after id102)
l(id104 after id103)
o(id105 after id104)
o(id106 after id105)

Key idea:
Each new character has:

A unique ID
A reference to the character it was inserted after

User B operations

User B deletes "world" and inserts "averioo":

a(id201 after space)
v(id202 after id201)
e(id203 after id202)
r(id204 after id203)
i(id205 after id204)
o(id206 after id205)
o(id207 after id206)

Both users now have different local documents.

User A sees:
Hello helloo

User B sees:
Hello averioo

Step 2 — Users reconnect and sync

Clients exchange their operations.

Now each client receives:

Insert operations from User A
Insert operations from User B
Delete markers for "world"

Step 3 — Merge rules resolve the conflict

Delete conflict

Both users deleted "world" → no conflict.
Those characters remain deleted.

Insert conflict (same position)

Both users inserted text after the same character (space after “Hello”).

So CRDT now has two insert chains at the same location:

Chain A → "helloo"
Chain B → "averioo"

CRDT uses deterministic ordering to decide which chain comes first.

Typical rule:

Compare timestamps
If equal → compare client IDs

Assume User B’s operations come first.

Step 4 — Final merged document

Final merged text becomes:

Hello averioohelloo

All clients apply the same rules →
All clients end up with the same document.

No server decision.
No transforms.
No lost edits.

Why this works

CRDT guarantees:

All edits are preserved
Ordering is deterministic
Message order doesn’t matter
Offline edits merge safely

This property is called:

👉 Strong eventual consistency

Every replica eventually becomes identical.

Comparison of approches

Approach	Advantages	Disadvantages
Last Write Wins (LWW)	Very simple to implementEasy storage modelLow computation cost	Users can lose workNo real mergingTerrible collaboration experienceNot suitable for editors
Locking System (Single Editor / Round-Robin)	No merge conflictsSimple reasoningPredictable updates	Poor user experienceNo true real-time collaborationDoesn’t scale with many users
Operational Transform (OT)	Mature and battle-tested, Smaller metadata size, Strong centralized controlWorks well for always-online apps	Complex transform logicRequires always-online central serverOffline editing is difficultServer becomes bottleneck
CRDT (Conflict-free Replicated Data Types)	Offline-first friendlyAutomatic conflict resolutionNo single point of failureStrong eventual consistency	Larger metadata per characterBigger document sizeMore complex data structures

Why we chose CRDT over OT

Both OT and CRDT can power collaborative editors, but based on our requirements, CRDT fits better. While Google Docs successfully uses OT with a strong centralized server to order and transform operations, our architecture prioritizes offline editing, easier horizontal scaling etc:

1) Offline editing requirement

Our requirements include offline tolerance.

OT → needs a central server to transform operations
CRDT → clients can edit fully offline and sync later

This is a major deciding factor.

2) No single point of conflict resolution

In OT:

Server is responsible for resolving conflicts.
If server is unavailable → collaboration breaks.

In CRDT:

Every client can merge updates independently.
Server acts mainly as relay + storage.

This improves reliability and scalability.

3) Simpler networking model

OT requires:

Operation ordering
Transformation pipelines
Strict server coordination

CRDT requires:

Exchange updates
Apply deterministic merge rules

This makes the frontend-heavy architecture simpler.

4) Better support for modern apps

Modern collaborative tools (Figma, Notion, Excalidraw, Linear) prefer CRDT because they need:

Offline-first capability
Mobile / unstable network support
Fast reconnection sync

CRDT aligns better with these needs.

5) Frontend-first architecture

Our system is frontend-centric.

CRDT lets the browser:

Keep a full document replica
Continue working without server
Sync when connection returns

This matches modern web architecture trends.

Final Architecture Diagram

So in the interview you can explain the editor diagram first, similar to the previous post . Then you can talk about the conflict resolution.

Now lets focus on the other architecture aspects.

HTTP/1.1 vs HTTP/2 vs HTTP/3

Modern web apps can run on HTTP/1.1, HTTP/2, or HTTP/3.

HTTP/1.1 → Limited concurrency and head-of-line blocking.
HTTP/2 → Multiplexing and better performance for APIs and asset loading.
HTTP/3 (QUIC) → Faster connection setup, no transport-level head-of-line blocking, better performance on unstable/mobile networks.

Decision:
We assume the app runs on HTTP/2 or HTTP/3, with HTTP/3 being ideal for real-time apps due to better latency and reconnection performance.

Communication Protocol Choice

For live updates, we have four main options:

Short polling → frequent requests; high latency and wasteful.
Long polling → better than polling but still request/response based.
Server-Sent Events (SSE) → one-way streaming (server → client only).
WebSockets → full duplex, persistent, low-latency communication.

Decision:
We choose WebSockets because collaborative editors require:

Continuous two-way communication
Instant delivery of updates
Efficient streaming of small frequent messages

WebSockets are the best fit for CRDT updates, presence, and real-time sync.

M — Model (Collaborative Data Model)

In the single-user editor, the Editor State tree was the source of truth.

In a collaborative editor, this changes:

👉 The CRDT document becomes the source of truth
👉 The editor state becomes a derived view

New flow:

CRDT Document → Editor State Tree → DOM

The editor tree still exists, but it is now rebuilt from the CRDT document.

Two-Layer Document Model

We now maintain two representations of the document.

1) CRDT Document (persistent + collaborative)

This layer stores everything needed for collaboration:

Characters with unique IDs
Insert / delete operations
Formatting metadata
Author (clientId)
Logical timestamps

Conceptual example of a CRDT character:

{
  id: "c101",
  char: "H",
  after: "c100",
  deleted: false,
  clientId: "userA",
  timestamp: 170001
}

This structure allows:

Concurrent edits
Offline edits
Deterministic merging

This is the true source of truth.

2) Editor State Tree (rendering view)

The editor still needs the familiar tree model for:

Rendering blocks
Commands & plugins
Keyboard navigation
Accessibility

So we derive the editor tree from the CRDT document.

CRDT → build tree → render DOM

This keeps the editor reusable and independent of collaboration.

How Rich Text is Represented in CRDT

Instead of storing full text nodes:

{ type: "text", value: "Hello", bold: true }

CRDT stores characters + formatting metadata.

Conceptually:

[
  { id: "c1", char: "H" },
  { id: "c2", char: "e" },
  { id: "c3", char: "l", bold: true },
  { id: "c4", char: "l", bold: true },
  { id: "c5", char: "o" }
]

This enables:

Concurrent text editing
Concurrent formatting
Conflict-free style merging

Block Structure in CRDT

We also need to represent:

Paragraphs
Headings
Lists
Images

Each block becomes a CRDT node with an ID.

Conceptually:

{
  id: "block1",
  type: "paragraph",
  children: ["c1","c2","c3"]
}

Blocks can also be inserted, deleted, and reordered collaboratively.

Selection Model (Local vs Remote)

Single-user editor → one selection.
Collaborative editor → many selections.

Local selection

Stored in editor state (not in CRDT).

Remote selections (presence)

Stored separately:

type RemoteSelection = {
  userId: string;
  anchor: CRDTPosition;
  focus: CRDTPosition;
  color: string;
};

Used for:

Live cursors
Text highlights
Presence indicators

Presence is ephemeral (not persisted).

Undo / Redo Model Changes

Single-user:
Undo = revert previous editor state snapshot.

Collaborative editor:
Undo must be per-user.

Each client stores:

Local operation history
Local undo / redo stack

Undo only reverts your own operations, not other users’ edits.

This matches Google Docs behavior.

Version History Model

Beyond undo/redo, we now maintain a global change log.

Server stores:

CRDT update log
Periodic snapshots

This enables:

Document timeline
Restore previous versions
Audit document evolution

I — Interface

In the single-user editor, the interface exposed APIs to update, read and extend the editor.
In the collaborative editor, we extend the same API to support:

Collaboration lifecycle
Sync status
Presence (live users & cursors)
Version history

We are adding APIs, not replacing the existing ones.

Editor Initialization (extended)

We extend the existing createEditor() config with collaboration options.

createEditor({
  rootElement,
  namespace: "doc-editor",
  initialEditorState,
  theme,
  errorBoundary,

  collaboration: {
    documentId: "doc-123",
    userId: "user-456",
    userName: "Pranav",
    userColor: "#7C3AED",
    websocketUrl: "wss://collab.server.com"
  }
});

New collaboration config:

documentId → shared document identifier
userId → unique user identity
userName / userColor → presence metadata
websocketUrl → collaboration server endpoint

You also need to explain other interface of editor similar described Rich Text Editor System Design.

O — Optimization (Collaborative Editor)

A collaborative editor is far heavier than a single-user editor.
Every keystroke can trigger:

Local state update
CRDT update
Network broadcast
Remote updates from other users
Presence updates
DOM reconciliation

Without careful optimisation, typing quickly becomes laggy.

1) Incremental CRDT Updates (Patch-based sync)

We never send the full document.

Instead we send small CRDT update patches:

Insert character
Delete character
Apply formatting
Insert block

Benefits:

Very small network payloads
Faster sync
Scales to large documents

2) Local-first Updates (Optimistic UI)

Typing must feel instant.

Flow:

Apply change locally immediately
Send update to server asynchronously
Merge remote updates later

Users never wait for the network before seeing their typing.

This is critical for perceived performance.

3) Update Batching (Network + Rendering)

Typing triggers many operations rapidly.

Instead of sending every keystroke immediately:

Batch CRDT updates (e.g. every 50–100ms)
Send updates in small bundles

Benefits:

Fewer WebSocket messages
Lower server load
Better battery usage on mobile

4) Incremental Reconciliation for Remote Updates

Remote updates may arrive frequently.

We must avoid full re-renders.

Process:

Apply CRDT update
Rebuild only affected nodes
Reconcile only changed DOM parts

Similar to React diffing.

5) Presence Throttling

Cursor movement can fire dozens of updates per second.

We throttle presence updates:

Send cursor position every ~50–100ms
Not on every pixel movement

This prevents network flooding.

Presence is high frequency but low importance.

6) Lazy Sync on Reconnect

When a user reconnects:

Do NOT download full document immediately.
Fetch missing CRDT updates since last version.

Benefits:

Faster reconnect
Lower bandwidth usage

7) Snapshot + Update Log Strategy

Server stores:

Periodic snapshots
Incremental CRDT updates

Why:

Loading entire update history would be slow.
Clients load latest snapshot + recent updates.

This keeps document load time fast.

Design Google Docs (Real-Time Collaborative Editor) - Frontend System Design

R — Requirements

Scope

Functional Requirements

Non-Functional Requirements

A — Architecture

Why collaboration breaks normal editors

Example:

Distributed Real-Time Editing

1) Last Write Wins (LWW)

2) Locking System (Single editor / Round robin)

3) Operational Transform (OT)

The intuition

What problem does OT solve?

What does “transform” actually mean?

Example: Two users edit the same word

Two users edit simultaneously

Step 1 — Convert edits into operations

Step 2 — Server picks a deterministic order

Step 3 — Apply User A operations

Step 4 — Transform User B operations

Transform Delete(6–11)

Transform Insert("averioo",6)

Final result (same for everyone)

Why OT works well

4) CRDT (Conflict-free Replicated Data Types)

Used by: Figma, Notion, Yjs, Automerge

How CRDT resolves conflicts — Step-by-step example

Step 1 — Convert edits into CRDT operations

User A operations (conceptually)

User B operations

Step 2 — Users reconnect and sync

Step 3 — Merge rules resolve the conflict

Delete conflict

Insert conflict (same position)

Step 4 — Final merged document

Why this works

Comparison of approches

Why we chose CRDT over OT

1) Offline editing requirement

2) No single point of conflict resolution

3) Simpler networking model

4) Better support for modern apps

5) Frontend-first architecture

Final Architecture Diagram

HTTP/1.1 vs HTTP/2 vs HTTP/3

Communication Protocol Choice

M — Model (Collaborative Data Model)

Two-Layer Document Model

1) CRDT Document (persistent + collaborative)

2) Editor State Tree (rendering view)

How Rich Text is Represented in CRDT

Block Structure in CRDT

Selection Model (Local vs Remote)

Local selection

Remote selections (presence)

Undo / Redo Model Changes

Version History Model

I — Interface

Editor Initialization (extended)

O — Optimization (Collaborative Editor)

1) Incremental CRDT Updates (Patch-based sync)

2) Local-first Updates (Optimistic UI)

3) Update Batching (Network + Rendering)

4) Incremental Reconciliation for Remote Updates

5) Presence Throttling

6) Lazy Sync on Reconnect

7) Snapshot + Update Log Strategy

Table of Contents

Companies

Topics

Table of Contents

Companies

Topics