If you open the Lexical playground, you’ll see a familiar writing experience — a toolbar, formatted text, headings, lists, links, code blocks, images, and smooth keyboard-driven editing. It feels simple on the surface, but behind it sits a carefully designed frontend architecture that keeps typing fast, state predictable, and features extensible.

In this article, we’ll design a rich text editor from a frontend system design perspective, starting with the requirements and gradually moving toward components, architecture, and data flow. We will use RADIO framework for the explanation.

R — Requirements

Scope

We’re designing a browser-based WYSIWYG editor for modern browsers, focused on single-user editing sessions; real-time collaboration, version history, and offline-first support are out of scope.

Functional Requirements

  • Text structure (blocks)
    Support structured blocks including paragraphs, headings (H1–H3), blockquotes, bullet/numbered lists, and code blocks.
  • Inline formatting
    Allow bold, italic, underline, strikethrough, inline code, and links via toolbar and keyboard shortcuts.
  • Media insertion
    Allow inserting images, dividers, and external links, with room for future custom embeds.
  • Editing behavior
    Provide undo/redo, copy/cut/paste with formatting, paste sanitization, selection-based formatting, and standard keyboard navigation.
  • Import / Export
    Content should be storable as JSON and exportable as HTML, with Markdown support as a nice-to-have.
  • Extensibility
    Support adding custom blocks, inline styles, input rules, and toolbar actions through a plugin-style system.

Non-Functional Requirements

  • Performance
    Typing must feel instant and remain smooth even for large documents without full re-renders per keystroke.
  • Accessibility
    Keyboard-only navigation, screen reader compatibility, proper focus management, and semantic structure are required.
  • Internationalization
    Support RTL languages, IME input (CJK typing), and Unicode-safe editing.
  • Security
    Sanitize pasted content, prevent XSS, and avoid rendering unsanitized HTML directly.

A — Architecture

Rendering Strategy

Before diving into architecture, we first need to decide how text will actually be rendered and edited in the browser.
This decision affects performance, accessibility, complexity, and extensibility.

There are four common approaches:

  1. Textarea-based editing
  2. DOM rendering with a custom cursor
  3. ContentEditable-based editing
  4. Canvas-based editors

1) Textarea-Based Editors

This is the simplest approach.

The editor is just a <textarea> and formatting is stored as plain text with markup (usually Markdown). The browser handles typing, cursor movement, and selection.

<textarea id="editor"></textarea>
<div id="preview"></div>
import { marked } from "marked";

const textarea = document.getElementById("editor");
const preview = document.getElementById("preview");

textarea.addEventListener("input", () => {
preview.innerHTML = marked.parse(textarea.value);
});

User types → we store a string → convert string → render formatted preview.

Limitations

  • Not WYSIWYG
  • Cannot visually format selected text
  • No inline images or rich blocks

2) DOM Rendering with a Custom Cursor

In this approach, the browser is only used to display text, not edit it.

We render text manually into DOM nodes and implement the entire editing engine ourselves.

<div change={keyStrokeHandler} id="editor"></div>

Rendering text manually

const editor = document.getElementById("editor");
let text = "Hello world"; //assume this from keyStrokeHandler

function render() {
editor.innerHTML = "";

text.split("").forEach((char) => {
const span = document.createElement("span");
span.textContent = char;
editor.appendChild(span);
});
}

render();

At this point the browser only displays characters.
It does NOT provide editing behavior.

Implementing a custom cursor

const cursor = document.createElement("span");
cursor.className = "cursor";
cursor.textContent = "|";
editor.appendChild(cursor);

To allow typing, we must listen to keystrokes on the editor container and manually update the text state, then call render() again.

This means we must build:

  • Keyboard typing behavior
  • Cursor movement
  • Text selection
  • Copy / paste
  • Accessibility support
  • IME (international typing)

We are basically building a text engine from scratch.

Used by

  • Code editors (Monaco, CodeMirror)
  • Design tools (Figma)

Tradeoff
Maximum control, but extremely high complexity.


3) ContentEditable Editors

Browsers provide built-in editing using the contenteditable attribute.

<div contenteditable="true"></div>

The browser now provides for free:

  • Cursor rendering
  • Text selection
  • Keyboard navigation
  • Clipboard handling
  • IME support
  • Accessibility support

Example:

<div id="editor" contenteditable="true"></div>
<button id="bold">Bold</button>
document.getElementById("bold").onclick = () => {
document.execCommand("bold");
};


The browser automatically updates the DOM as the user edits.

Modern editors do not treat the DOM as the source of truth.
Instead they maintain an internal document model and sync the DOM to it.

The DOM becomes a rendering layer, not the data layer.


4) Canvas-Based Editors

Everything is drawn manually on <canvas>.

<canvas id="editorCanvas"></canvas>const ctx = canvas.getContext("2d");
ctx.fillText("Hello world", 10, 50);


We must implement:

  • Cursor rendering
  • Selection
  • Keyboard input
  • Clipboard support
  • Accessibility

Used by

  • Figma
  • Photoshop Web
  • google doc

Tradeoff
Extreme control but overkill for document editing.


Comparison

Strategy

Strengths

Weaknesses

Textarea

Simple

Not WYSIWYG

Custom DOM cursor

Full control

Very complex

ContentEditable

Native editing + rich features

Requires syncing DOM with state

Canvas

Maximum control

Overkill for text editing


Why We Choose ContentEditable

For a document editor we want:

  • Native typing and selection
  • Accessibility and IME support
  • Reasonable implementation complexity
  • Extensibility for plugins

ContentEditable gives the best balance between control and practicality, which is why most modern rich text editors use it.

Now we can move to the editor architecture design based on the contentEditable strategy


Diagram (Flux architecture)

UIToolbarHTML ELementContentEditable EventsCommandsStoreCurrentUpdatedRecocilercmdDOM updatesTransformerPluginsReducerAction

State-Driven Editor Architecture

At the heart of the editor is a unidirectional update pipeline.
The DOM is only used to capture input and display content, while the editor state is the single source of truth.

The lifecycle of every change looks like this:

User input → Events → Commands → Editor State → Transformers → Reconciler → DOM update → Plugins

Let’s break down each piece.

1. Events — capturing user intent

Events are the raw signals coming from the browser.

Examples:

  • Keyboard input (typing, backspace, shortcuts)
  • Mouse selection changes
  • Paste / copy / cut
  • Toolbar clicks
  • Drag & drop

Events do not directly change the document.
They only describe what the user did and translate browser behavior into editor-friendly actions.

2. Commands — the editor’s action layer

Commands represent what should happen, not how it happens.

Examples:

  • InsertText
  • DeleteSelection
  • ToggleBold
  • ConvertToHeading
  • InsertImage

Multiple parts of the UI can trigger the same command:

  • Keyboard shortcut (Cmd+B)
  • Toolbar button
  • Plugin

This creates a single, consistent API for document updates.

3. Editor State — the source of truth

The editor state is a structured document model (usually a tree/JSON).
It stores:

  • Blocks (paragraphs, headings, lists)
  • Inline formatting (bold, italic, links)
  • Selection and cursor position

Every command produces a new editor state.
We never rely on the DOM as our data layer.


4. Transformers — converting content in and out

Transformers handle format conversion.

Responsibilities:

  • Import HTML / Markdown into editor state
  • Export editor state to HTML / Markdown
  • Clean and normalize pasted content

They make the editor portable and format-agnostic.


5. Reconciler — syncing state to the DOM

After state updates, the DOM must reflect the new document.

The reconciler compares:

  • Previous editor state
  • New editor state

Then updates only the DOM parts that changed.

This keeps typing fast and avoids full re-renders. This works similar to react.


6. Plugins — extending the editor safely

Plugins allow features to extend the editor without changing the core.

Examples:

  • Toolbar state updates
  • Word count
  • Auto-save
  • Mentions, hashtags
  • Custom blocks and embeds

Plugins subscribe to editor changes and dispatch commands when needed, making the editor extensible by design.

M — Model (Data Model)

The model is the foundation of the editor. Everything — rendering, commands, plugins, undo/redo — depends on how we represent the document in memory. Instead of storing HTML, we store a structured document model that describes content, formatting, and selection in a predictable way.

Why we don’t store HTML

Although the editor renders HTML in the browser, HTML is not a reliable format to use as the internal data model. Browser-generated HTML can be inconsistent, deeply nested, and difficult to normalize. It also makes transformations (like exporting to Markdown) and features such as undo/redo much harder to implement.

Instead, we maintain a structured JSON tree and treat HTML purely as a rendering format.


Document as a Tree

The document is represented as a hierarchical tree. At the top sits a root node, which contains a list of blocks, and each block contains inline content and text.

Example:

{
type: "root",
children: [
{
type: "paragraph",
children: [
{ type: "text", value: "Hello " },
{ type: "text", value: "world", bold: true }
]
}
]
}

This structure allows the editor to enforce consistent rules and makes transformations predictable.

Node Types

The model is built around three primary node categories.

  • Block nodes represent the structural layout of the document. Examples include paragraphs, headings, lists, quotes, and code blocks. These nodes define how content is grouped and arranged vertically.
  • Inline nodes exist inside block nodes and represent inline-level elements such as links, inline code, or mentions. They wrap around text and provide additional semantics.
  • Text nodes sit at the leaves of the tree and contain the actual characters. Formatting such as bold or italic is stored as properties on these nodes.
{
type: "text",
value: "Hello",
bold: true,
italic: false
}

This separation between block, inline, and text nodes keeps the document schema clean and extensible.

Selection Model

The editor also stores the user’s cursor and text selection inside the state instead of relying purely on the DOM.

We read the current selection from the browser using:

const selection = window.getSelection();

// Node where selection starts
const anchorNode = selection.anchorNode;
// Character position inside anchor node
const anchorOffset = selection.anchorOffset;
// Node where selection ends
const focusNode = selection.focusNode;
// Character position inside focus node
const focusOffset = selection.focusOffset;

These values are converted into the editor’s selection state:

type EditorSelection = {
anchorNode: EditorNode | null;
anchorOffset: number;
focusNode: EditorNode | null;
focusOffset: number;
};

Storing selection in state allows formatting ranges, replacing text, and keeping the cursor stable during updates.

History Model (Undo / Redo)

Undo and redo become straightforward when the editor state is immutable. Each command produces a new version of the state, and we keep snapshots in a history stack.

{
past: [state1, state2],
present: currentState,
future: [state4]
}

Undo moves backward through the stack, redo moves forward. There is no need to diff the DOM or manually reverse operations.

I — Interface (Editor APIs)

Now that we have the architecture and data model, we need to define how the rest of the application interacts with the editor.
This is the public interface exposed to developers using the editor.

Think of this as the SDK surface of the editor.

Editor Initialization API

This API is used when creating an editor instance and attaching it to the page.

The editor should accept a configuration object that defines how it should behave.

Typical configuration:

  • rootElement → DOM element where the editor will mount
  • namespace → unique identifier to avoid conflicts between multiple editors
  • initialEditorState → initial document content
  • theme → styling and class mappings
  • errorBoundary → error handling during rendering

Example:

createEditor({
rootElement,
namespace: "blog-editor",
initialEditorState,
theme,
errorBoundary,
});

This keeps the editor configurable and reusable across different parts of an application.

Update API

All changes to the document must go through controlled update methods.

editor.update(() => {
// mutate editor state safely
});

editor.setEditorState(newState);

editor.update(cb) allows safe state mutations inside the editor’s controlled environment.
setEditorState allows replacing the entire state when loading saved content.

Read API

Consumers of the editor often need to read the current content.

editor.getEditorState();
editor.getEditorCurrentVersion();

These methods allow features like autosave, preview rendering, and export.

Events / Commands API

The editor exposes a command system so external features and plugins can extend behavior without touching internal logic.

editor.registerCommand(command, handler);

Examples:

  • Keyboard shortcuts
  • Toolbar actions
  • Custom plugins

This keeps the editor extensible and modular.


O — Optimization

Rich text editors are deceptively heavy. Every keystroke can trigger state updates, DOM updates, selection updates, and plugin reactions. Without careful optimization, typing quickly becomes laggy. This section focuses on the key decisions that keep the editor fast and scalable.

1. Normalizing the document tree

Although the document is conceptually a tree, storing it as a deeply nested object is not ideal for performance. Modern editors like Lexical internally keep nodes in a normalized map structure.

Instead of repeatedly traversing a deep tree, nodes are stored by id:

{
nodeMap: {
n1: { type: "paragraph", children: ["n2", "n3"] },
n2: { type: "text", value: "Hello " },
n3: { type: "text", value: "world", bold: true }
},
root: "n1"
}

This gives several advantages:

  • Constant-time node lookup
  • Faster updates without deep cloning
  • Easier undo/redo snapshots
  • Efficient reconciliation

This hybrid approach (tree structure + normalized storage) is what most modern editors use.

2. Incremental DOM reconciliation

Updating the entire editor DOM on every keystroke would be extremely slow. Instead, the reconciler compares the previous and next editor state and updates only the nodes that changed.

This is similar to React’s diffing strategy and is critical for maintaining smooth typing in long documents.

3. Batched updates

Typing often triggers multiple operations:

  • Insert character
  • Update selection
  • Run plugins
  • Update toolbar state

These updates should be batched into a single render cycle to avoid unnecessary DOM work.

4. History optimization

Storing every keystroke as a separate undo step can quickly consume memory. Editors typically:

  • Merge rapid typing into a single history step
  • Limit history size
  • Store structural diffs instead of full copies (optional optimization)

This keeps undo/redo responsive and memory usage under control.

Internationalization

Rich text editing becomes complex when supporting global languages.

Key requirements:

RTL support
Languages like Arabic and Hebrew require right-to-left text rendering and cursor movement.

IME support
Chinese, Japanese, and Korean typing uses composition events. The editor must avoid interfering during text composition and update state only after composition ends.

This is a major reason we rely on ContentEditable.

Accessibility

Editors must be usable with assistive technologies.

Key areas:

  • Full keyboard navigation
  • Screen reader compatibility
  • Proper focus management
  • Speech-to-text integrations

Accessibility is not optional for production editors and must be considered part of the core design.

Clipboard and paste performance

Pasting large documents can introduce heavy HTML. Editors typically:

  • Sanitize pasted content
  • Convert HTML → editor state using transformers
  • Strip unsupported styles
  • Prevent layout thrashing during paste

Plugin isolation

Plugins can run code on every update. Without safeguards, they can slow down typing. Modern editors ensure:

  • Plugins run after state updates
  • Expensive plugins can be throttled or debounced
  • Plugins cannot mutate DOM directly