If you open the Lexical playground, you’ll see a familiar writing experience — a toolbar, formatted text, headings, lists, links, code blocks, images, and smooth keyboard-driven editing. It feels simple on the surface, but behind it sits a carefully designed frontend architecture that keeps typing fast, state predictable, and features extensible.

In this article, we’ll design a rich text editor from a frontend system design perspective, starting with the requirements and gradually moving toward components, architecture, and data flow. We will use RADIO framework for the explanation.

R — Requirements

Scope

We’re designing a browser-based WYSIWYG editor for modern browsers, focused on single-user editing sessions; real-time collaboration, version history, and offline-first support are out of scope.

Functional Requirements

Text structure (blocks)
Support structured blocks including paragraphs, headings (H1–H3), blockquotes, bullet/numbered lists, and code blocks.
Inline formatting
Allow bold, italic, underline, strikethrough, inline code, and links via toolbar and keyboard shortcuts.
Media insertion
Allow inserting images, dividers, and external links, with room for future custom embeds.
Editing behavior
Provide undo/redo, copy/cut/paste with formatting, paste sanitization, selection-based formatting, and standard keyboard navigation.
Import / Export
Content should be storable as JSON and exportable as HTML, with Markdown support as a nice-to-have.
Extensibility
Support adding custom blocks, inline styles, input rules, and toolbar actions through a plugin-style system.

Non-Functional Requirements

Performance
Typing must feel instant and remain smooth even for large documents without full re-renders per keystroke.
Accessibility
Keyboard-only navigation, screen reader compatibility, proper focus management, and semantic structure are required.
Internationalization
Support RTL languages, IME input (CJK typing), and Unicode-safe editing.
Security
Sanitize pasted content, prevent XSS, and avoid rendering unsanitized HTML directly.

A — Architecture

Rendering Strategy

Before diving into architecture, we first need to decide how text will actually be rendered and edited in the browser.
This decision affects performance, accessibility, complexity, and extensibility.

There are four common approaches:

Textarea-based editing
DOM rendering with a custom cursor
ContentEditable-based editing
Canvas-based editors

1) Textarea-Based Editors

This is the simplest approach.

The editor is just a <textarea> and formatting is stored as plain text with markup (usually Markdown). The browser handles typing, cursor movement, and selection.

<textarea id="editor"></textarea>
<div id="preview"></div>

import { marked } from "marked";

const textarea = document.getElementById("editor");
const preview = document.getElementById("preview");

textarea.addEventListener("input", () => {
  preview.innerHTML = marked.parse(textarea.value);
});

User types → we store a string → convert string → render formatted preview.

Limitations

Not WYSIWYG
Cannot visually format selected text
No inline images or rich blocks

2) DOM Rendering with a Custom Cursor

In this approach, the browser is only used to display text, not edit it.

We render text manually into DOM nodes and implement the entire editing engine ourselves.

<div change={keyStrokeHandler} id="editor"></div>

Rendering text manually

const editor = document.getElementById("editor");
let text = "Hello world"; //assume this from keyStrokeHandler

function render() {
  editor.innerHTML = "";

  text.split("").forEach((char) => {
    const span = document.createElement("span");
    span.textContent = char;
    editor.appendChild(span);
  });
}

render();

At this point the browser only displays characters.
It does NOT provide editing behavior.

Implementing a custom cursor

const cursor = document.createElement("span");
cursor.className = "cursor";
cursor.textContent = "|";
editor.appendChild(cursor);

To allow typing, we must listen to keystrokes on the editor container and manually update the text state, then call render() again.

This means we must build:

Keyboard typing behavior
Cursor movement
Text selection
Copy / paste
Accessibility support
IME (international typing)

We are basically building a text engine from scratch.

Used by

Code editors (Monaco, CodeMirror)
Design tools (Figma)

Tradeoff
Maximum control, but extremely high complexity.

3) ContentEditable Editors

Browsers provide built-in editing using the contenteditable attribute.

<div contenteditable="true"></div>

The browser now provides for free:

Cursor rendering
Text selection
Keyboard navigation
Clipboard handling
IME support
Accessibility support

Example:

<div id="editor" contenteditable="true"></div>
<button id="bold">Bold</button>

document.getElementById("bold").onclick = () => {
  document.execCommand("bold");
};

The browser automatically updates the DOM as the user edits.

Modern editors do not treat the DOM as the source of truth.
Instead they maintain an internal document model and sync the DOM to it.

The DOM becomes a rendering layer, not the data layer.

4) Canvas-Based Editors

Everything is drawn manually on <canvas>.

<canvas id="editorCanvas"></canvas>

const ctx = canvas.getContext("2d");
ctx.fillText("Hello world", 10, 50);

We must implement:

Cursor rendering
Selection
Keyboard input
Clipboard support
Accessibility

Used by

Figma
Photoshop Web
google doc

Tradeoff
Extreme control but overkill for document editing.

Comparison

Strategy	Strengths	Weaknesses
Textarea	Simple	Not WYSIWYG
Custom DOM cursor	Full control	Very complex
ContentEditable	Native editing + rich features	Requires syncing DOM with state
Canvas	Maximum control	Overkill for text editing

Why We Choose ContentEditable

For a document editor we want:

Native typing and selection
Accessibility and IME support
Reasonable implementation complexity
Extensibility for plugins

ContentEditable gives the best balance between control and practicality, which is why most modern rich text editors use it.

Now we can move to the editor architecture design based on the contentEditable strategy

Diagram (Flux architecture)

State-Driven Editor Architecture

At the heart of the editor is a unidirectional update pipeline.
The DOM is only used to capture input and display content, while the editor state is the single source of truth.

The lifecycle of every change looks like this:

User input → Events → Commands → Editor State → Transformers → Reconciler → DOM update → Plugins

Let’s break down each piece.

1. Events — capturing user intent

Events are the raw signals coming from the browser.

Examples:

Keyboard input (typing, backspace, shortcuts)
Mouse selection changes
Paste / copy / cut
Toolbar clicks
Drag & drop

Events do not directly change the document.
They only describe what the user did and translate browser behavior into editor-friendly actions.

2. Commands — the editor’s action layer

Commands represent what should happen, not how it happens.

Examples:

InsertText
DeleteSelection
ToggleBold
ConvertToHeading
InsertImage

Multiple parts of the UI can trigger the same command:

Keyboard shortcut (Cmd+B)
Toolbar button
Plugin

This creates a single, consistent API for document updates.

3. Editor State — the source of truth

The editor state is a structured document model (usually a tree/JSON).
It stores:

Blocks (paragraphs, headings, lists)
Inline formatting (bold, italic, links)
Selection and cursor position

Every command produces a new editor state.
We never rely on the DOM as our data layer.

4. Transformers — converting content in and out

Transformers handle format conversion.

Responsibilities:

Import HTML / Markdown into editor state
Export editor state to HTML / Markdown
Clean and normalize pasted content

They make the editor portable and format-agnostic.

5. Reconciler — syncing state to the DOM

After state updates, the DOM must reflect the new document.

The reconciler compares:

Previous editor state
New editor state

Then updates only the DOM parts that changed.

This keeps typing fast and avoids full re-renders. This works similar to react.

6. Plugins — extending the editor safely

Plugins allow features to extend the editor without changing the core.

Examples:

Toolbar state updates
Word count
Auto-save
Mentions, hashtags
Custom blocks and embeds

Plugins subscribe to editor changes and dispatch commands when needed, making the editor extensible by design.

M — Model (Data Model)

The model is the foundation of the editor. Everything — rendering, commands, plugins, undo/redo — depends on how we represent the document in memory. Instead of storing HTML, we store a structured document model that describes content, formatting, and selection in a predictable way.

Why we don’t store HTML

Although the editor renders HTML in the browser, HTML is not a reliable format to use as the internal data model. Browser-generated HTML can be inconsistent, deeply nested, and difficult to normalize. It also makes transformations (like exporting to Markdown) and features such as undo/redo much harder to implement.

Instead, we maintain a structured JSON tree and treat HTML purely as a rendering format.

Document as a Tree

The document is represented as a hierarchical tree. At the top sits a root node, which contains a list of blocks, and each block contains inline content and text.

Example:

{
  type: "root",
  children: [
    {
      type: "paragraph",
      children: [
        { type: "text", value: "Hello " },
        { type: "text", value: "world", bold: true }
      ]
    }
  ]
}

This structure allows the editor to enforce consistent rules and makes transformations predictable.

Node Types

The model is built around three primary node categories.

Block nodes represent the structural layout of the document. Examples include paragraphs, headings, lists, quotes, and code blocks. These nodes define how content is grouped and arranged vertically.
Inline nodes exist inside block nodes and represent inline-level elements such as links, inline code, or mentions. They wrap around text and provide additional semantics.
Text nodes sit at the leaves of the tree and contain the actual characters. Formatting such as bold or italic is stored as properties on these nodes.

{
  type: "text",
  value: "Hello",
  bold: true,
  italic: false
}

This separation between block, inline, and text nodes keeps the document schema clean and extensible.

Selection Model

The editor also stores the user’s cursor and text selection inside the state instead of relying purely on the DOM.

We read the current selection from the browser using:

const selection = window.getSelection();

// Node where selection starts
const anchorNode = selection.anchorNode;  
 // Character position inside anchor node 
const anchorOffset = selection.anchorOffset;
// Node where selection ends
const focusNode = selection.focusNode;  
// Character position inside focus node   
const focusOffset = selection.focusOffset;

These values are converted into the editor’s selection state:

type EditorSelection = {
  anchorNode: EditorNode | null;
  anchorOffset: number;
  focusNode: EditorNode | null;
  focusOffset: number;
};

Storing selection in state allows formatting ranges, replacing text, and keeping the cursor stable during updates.

History Model (Undo / Redo)

Undo and redo become straightforward when the editor state is immutable. Each command produces a new version of the state, and we keep snapshots in a history stack.

{
  past: [state1, state2],
  present: currentState,
  future: [state4]
}

Undo moves backward through the stack, redo moves forward. There is no need to diff the DOM or manually reverse operations.

I — Interface (Editor APIs)

Now that we have the architecture and data model, we need to define how the rest of the application interacts with the editor.
This is the public interface exposed to developers using the editor.

Think of this as the SDK surface of the editor.

Editor Initialization API

This API is used when creating an editor instance and attaching it to the page.

The editor should accept a configuration object that defines how it should behave.

Typical configuration:

rootElement → DOM element where the editor will mount
namespace → unique identifier to avoid conflicts between multiple editors
initialEditorState → initial document content
theme → styling and class mappings
errorBoundary → error handling during rendering

Example:

createEditor({
  rootElement,
  namespace: "blog-editor",
  initialEditorState,
  theme,
  errorBoundary,
});

This keeps the editor configurable and reusable across different parts of an application.

Update API

All changes to the document must go through controlled update methods.

editor.update(() => {
  // mutate editor state safely
});

editor.setEditorState(newState);

editor.update(cb) allows safe state mutations inside the editor’s controlled environment.
setEditorState allows replacing the entire state when loading saved content.

Read API

Consumers of the editor often need to read the current content.

editor.getEditorState();
editor.getEditorCurrentVersion();

These methods allow features like autosave, preview rendering, and export.

Events / Commands API

The editor exposes a command system so external features and plugins can extend behavior without touching internal logic.

editor.registerCommand(command, handler);

Examples:

Keyboard shortcuts
Toolbar actions
Custom plugins

This keeps the editor extensible and modular.

O — Optimization

Rich text editors are deceptively heavy. Every keystroke can trigger state updates, DOM updates, selection updates, and plugin reactions. Without careful optimization, typing quickly becomes laggy. This section focuses on the key decisions that keep the editor fast and scalable.

1. Normalizing the document tree

Although the document is conceptually a tree, storing it as a deeply nested object is not ideal for performance. Modern editors like Lexical internally keep nodes in a normalized map structure.

Instead of repeatedly traversing a deep tree, nodes are stored by id:

{
  nodeMap: {
    n1: { type: "paragraph", children: ["n2", "n3"] },
    n2: { type: "text", value: "Hello " },
    n3: { type: "text", value: "world", bold: true }
  },
  root: "n1"
}

This gives several advantages:

Constant-time node lookup
Faster updates without deep cloning
Easier undo/redo snapshots
Efficient reconciliation

This hybrid approach (tree structure + normalized storage) is what most modern editors use.

2. Incremental DOM reconciliation

Updating the entire editor DOM on every keystroke would be extremely slow. Instead, the reconciler compares the previous and next editor state and updates only the nodes that changed.

This is similar to React’s diffing strategy and is critical for maintaining smooth typing in long documents.

3. Batched updates

Typing often triggers multiple operations:

Insert character
Update selection
Run plugins
Update toolbar state

These updates should be batched into a single render cycle to avoid unnecessary DOM work.

4. History optimization

Storing every keystroke as a separate undo step can quickly consume memory. Editors typically:

Merge rapid typing into a single history step
Limit history size
Store structural diffs instead of full copies (optional optimization)

This keeps undo/redo responsive and memory usage under control.

Internationalization

Rich text editing becomes complex when supporting global languages.

Key requirements:

RTL support
Languages like Arabic and Hebrew require right-to-left text rendering and cursor movement.

IME support
Chinese, Japanese, and Korean typing uses composition events. The editor must avoid interfering during text composition and update state only after composition ends.

This is a major reason we rely on ContentEditable.

Accessibility

Editors must be usable with assistive technologies.

Key areas:

Full keyboard navigation
Screen reader compatibility
Proper focus management
Speech-to-text integrations

Accessibility is not optional for production editors and must be considered part of the core design.

Clipboard and paste performance

Pasting large documents can introduce heavy HTML. Editors typically:

Sanitize pasted content
Convert HTML → editor state using transformers
Strip unsupported styles
Prevent layout thrashing during paste

Plugin isolation

Plugins can run code on every update. Without safeguards, they can slow down typing. Modern editors ensure:

Plugins run after state updates
Expensive plugins can be throttled or debounced
Plugins cannot mutate DOM directly

Designing a Modern Rich Text Editor (Frontend System Design)

R — Requirements

Scope

Functional Requirements

Non-Functional Requirements

A — Architecture

Rendering Strategy

1) Textarea-Based Editors

2) DOM Rendering with a Custom Cursor

Rendering text manually

Implementing a custom cursor

3) ContentEditable Editors

4) Canvas-Based Editors

Comparison

Why We Choose ContentEditable

Diagram (Flux architecture)

State-Driven Editor Architecture

1. Events — capturing user intent

2. Commands — the editor’s action layer

3. Editor State — the source of truth

4. Transformers — converting content in and out

5. Reconciler — syncing state to the DOM

6. Plugins — extending the editor safely

M — Model (Data Model)

Why we don’t store HTML

Document as a Tree

Node Types

Selection Model

History Model (Undo / Redo)

I — Interface (Editor APIs)

Editor Initialization API

Update API

Read API

Events / Commands API

O — Optimization

1. Normalizing the document tree

2. Incremental DOM reconciliation

3. Batched updates

4. History optimization

Internationalization

Accessibility

Clipboard and paste performance

Plugin isolation

Table of Contents

Companies

Topics

Table of Contents

Companies

Topics