Development May 23, 2026

Writing Custom Tiptap Extensions: From Concept to Production

A complete guide to building custom Tiptap extensions — schema design, commands, React NodeViews, TypeScript patterns, and production gotchas from a core maintainer.

#tiptap #prosemirror #extensions #tutorial #typescript

I work with ProseMirror every day. As a core maintainer of Tiptap, I have spent years building on top of it, shipping extensions to production, and reviewing PRs from the community. And I have seen the same questions come up over and over.

This is not an API reference. The Tiptap docs already cover that. This is more about how I think through custom extensions when they need to survive real content, product requirements, and future changes.

Most tutorials explain the API surface. That is useful, but it often skips the harder part: deciding what kind of extension you should build in the first place. They show addAttributes(), but they rarely explain when something should be a Node instead of a Mark, or how to design a schema that does not need a migration six months later.

This post is about the parts that usually show up later, once the extension is used in a real editor with real content: schema decisions, command APIs, React NodeViews, TypeScript ergonomics, tests, and migrations.

If you haven’t read my ProseMirror for Beginners post, start there. This builds on that foundation.

The example throughout this post is a media node. Image resize, caption, alignment, the works. It is the number one community-requested extension type, and it happens to cover the most API surface. It is a compact example, but it touches schema design, commands, parsing, rendering, NodeViews, and production concerns.

Extension vs Node vs Mark — What’s the Difference?

Before you write any code, you need to pick the right type. Tiptap gives you three primitives, and I see people pick the wrong one all the time.

Extension is for behavior only. No content, no schema. Toolbars, keyboard shortcut registries, plugin containers. If it does not hold or decorate content, it is an Extension.

Node is for structural content. Paragraphs, headings, images, code blocks, blockquotes. If it holds content and appears as a block or inline element in the document tree, it is a Node.

Mark is for inline formatting. Bold, italic, links, highlights, subscript. If it wraps a piece of text without changing the document structure, it is a Mark.

My usual rule: if it becomes part of the document structure, it is probably a node. If it only adds meaning or formatting to text, it is probably a mark. If it is behavior around the editor, it is probably an extension.

The skeleton for each looks like this:

import { Extension } from '@tiptap/core'

const MyExtension = Extension.create({
  name: 'myExtension',
})

import { Node } from '@tiptap/core'

const MyNode = Node.create({
  name: 'myNode',
})

import { Mark } from '@tiptap/core'

const MyMark = Mark.create({
  name: 'myMark',
})

I see people define toolbars as Nodes, or try to make an Extension hold content. The API is flexible. That does not mean every approach is equally maintainable. The distinction exists for a reason, and respecting it makes your code easier to reason about and maintain.

Anatomy of an Extension

Every Tiptap extension starts with .create({}). It is a builder pattern. You pass in a configuration object, and Tiptap wires everything together.

The name field is the only required property. Tiptap uses it for JSON serialization, schema registration, and internal lookups. If you serialize your document to JSON, the type key in each node matches the extension name.

import { Extension } from '@tiptap/core'

const ChangeCounter = Extension.create({
  name: 'changeCounter',

  addOptions() {
    return { logToConsole: false }
  },

  addStorage() {
    return { changes: 0 }
  },

  onUpdate() {
    this.storage.changes += 1
    if (this.options.logToConsole) {
      console.log(`Changes: ${this.storage.changes}`)
    }
  },
})

There are a few key lifecycle hooks here. addOptions() returns default configuration. Users override these when they call .configure(). addStorage() returns mutable state that is namespaced to your extension. Unlike options, storage changes do not require reconfiguration. They are for runtime state.

The lifecycle hooks like onUpdate, onSelectionUpdate, onCreate, and onDestroy let you react to editor events. You get this context that includes this.editor, this.options, and this.storage without any extra wiring.

A quick note on priority. Every extension has a priority value that determines loading order. The default is 100. Higher numbers load earlier. This matters when two extensions register the same keybinding or the same ProseMirror plugin. If you need yours to run first, set a higher priority.

Designing Your Schema

The schema is the blueprint for your content. It defines what is valid, what attributes exist, and how everything serializes to HTML. Getting this right early saves you from painful migrations later.

Let me walk through the media node schema. This is a figure element wrapping an image and a caption.

import { Node, mergeAttributes } from '@tiptap/core'

export const Media = Node.create({
  name: 'media',

  group: 'block',
  content: 'inline*',
  draggable: true,

  addAttributes() {
    return {
      src: { default: null },
      alt: { default: '' },
      width: { default: '100%' },
      alignment: {
        default: 'center',
        parseHTML: (el) => el.getAttribute('data-alignment'),
        renderHTML: (attrs) => ({ 'data-alignment': attrs.alignment }),
      },
    }
  },

  parseHTML() {
    return [
      {
        tag: 'figure[data-media]',
        contentElement: 'figcaption',
        getAttrs: (element) => {
          if (!(element instanceof HTMLElement)) {
            return false
          }

          const img = element.querySelector('img')
          return {
            src: img?.getAttribute('src') ?? null,
            alt: img?.getAttribute('alt') ?? '',
            width: img?.style.width || element.getAttribute('data-width') || '100%',
            alignment: element.getAttribute('data-alignment') || 'center',
          }
        },
      },
    ]
  },

  renderHTML({ HTMLAttributes }) {
    const { src, alt, width, alignment, ...rest } = HTMLAttributes

    return [
      'figure',
      mergeAttributes(rest, {
        'data-media': '',
        'data-alignment': alignment,
        'data-width': width,
      }),
      ['img', { src, alt, style: `width: ${width}` }],
      ['figcaption', 0],
    ]
  },
})

Let me break this down. The group: 'block' tells ProseMirror this node belongs in block positions. The content: 'inline*' means it can hold zero or more inline children. That is the caption text. The 0 in the renderHTML array is the content hole. It tells ProseMirror where to render child content.

The parseHTML uses figure[data-media] as the tag selector so regular <figure> elements in pasted content are not accidentally captured. It also uses contentElement: 'figcaption' to tell ProseMirror which child element holds the inline content. The getAttrs callback reads src and alt from the inner <img> element instead of <figure>, which is where they actually live in the HTML. The width reads from the image’s style.width or falls back to a data-width attribute on the figure.

The renderHTML destructures src, alt, width, and alignment out of HTMLAttributes so they do not leak onto the <figure> element. The figure gets data-media, data-alignment, and data-width. The <img> gets src, alt, and the inline style for width. This produces clean, semantic HTML.

The addAttributes() block defines four attributes. Each one gets a default. The alignment attribute has custom parseHTML and renderHTML functions. I store it as a data-alignment attribute on the HTML element instead of something like a class name. This makes it explicit and easy to read back.

I learned this the hard way: schema design is the most important decision you make. Changing it later means writing content migrations. You cannot just rename a node type or change an attribute name without breaking existing documents. Think hard about what attributes you need, what defaults make sense, and how they serialize to HTML.

Adding Commands with addCommands

Commands are how users interact with your extension programmatically. They are the API surface your consumers call.

A command returns a function that receives command props like commands, editor, state, dispatch, chain, and can and returns a boolean. Return true if the command ran successfully, false if it could not run.

addCommands() {
  return {
    setMedia:
      (options) =>
      ({ commands }) => {
        return commands.insertContent({
          type: this.name,
          attrs: options,
        })
      },
    updateMedia:
      (attrs) =>
      ({ commands }) => {
        return commands.updateAttributes(this.name, attrs)
      },
  }
}

The setMedia command inserts a new media node at the cursor position. It uses commands.insertContent which handles all the transaction logic for you. Because the schema uses content: 'inline*', no child content is required. ProseMirror allows zero or more inline children.

The updateMedia command updates the attributes of the media node at the current position. This is what the alignment toolbar buttons and resize handlers call.

One thing worth adding early is TypeScript augmentation.

declare module '@tiptap/core' {
  interface Commands<ReturnType> {
    media: {
      setMedia: (options: { src: string; alt?: string; width?: string; alignment?: string }) => ReturnType
      updateMedia: (attrs: Partial<{ src: string; alt: string; width: string; alignment: string }>) => ReturnType
    }
  }
}

Without this augmentation, editor.chain().setMedia(...) works at runtime but has no type safety. Your users get any for the return type and no autocompletion on the options. In a production codebase, that is how bugs slip in.

Put this augmentation in a separate .d.ts file or at the top of your extension file. Either works.

Keyboard Shortcuts and Input Rules

Commands are great for programmatic access. But users also expect keyboard shortcuts and Markdown-style autoformatting.

addKeyboardShortcuts() {
  return {
    'Mod-Alt-m': () => this.editor.commands.setMedia({ src: '' }),
  }
}

Keyboard shortcuts return a boolean. Return true if the shortcut was handled, false if you want Tiptap to fall through to the next handler. This matters when multiple extensions register the same shortcut. The first one that returns true wins.

Input rules fire as the user types. They are how Markdown shortcuts work. Type ![]() and the text transforms into a media node.

addInputRules() {
  return [
    nodeInputRule({
      find: /!\[(.+?)\]\((.+?)\)$/,
      type: this.type,
      getAttributes: (match) => ({
        alt: match[1],
        src: match[2],
      }),
    }),
  ]
}

The regex matches ![alt](url) at the end of a line. The $ anchor is important. You do not want to trigger on ![]() in the middle of a paragraph. The getAttributes callback receives the regex match and returns the attribute object for the new node.

There is also addPasteRules for handling content on paste. They work the same way but use the g flag instead of $ anchors. I use paste rules for URL detection. If someone pastes an image URL, I want to wrap it in a media node automatically.

Building Interactive NodeViews in React

Default DOM rendering works for simple nodes. A media node with resize handles, alignment controls, and drag behavior needs more. That is where NodeViews come in.

import { NodeViewWrapper, NodeViewContent } from '@tiptap/react'

export function MediaNodeView({ node, updateAttributes, selected }) {
  return (
    <NodeViewWrapper
      className={`media-node ${selected ? 'media-node--selected' : ''}`}
      data-alignment={node.attrs.alignment}
    >
      <div className="media-node__toolbar">
        <button onClick={() => updateAttributes({ alignment: 'left' })}>
          Left
        </button>
        <button onClick={() => updateAttributes({ alignment: 'center' })}>
          Center
        </button>
        <button onClick={() => updateAttributes({ alignment: 'right' })}>
          Right
        </button>
      </div>
      <img
        data-drag-handle
        src={node.attrs.src}
        alt={node.attrs.alt}
        style={{ width: node.attrs.width }}
      />
      <NodeViewContent className="media-node__caption" />
    </NodeViewWrapper>
  )
}

The NodeViewWrapper is the root element. It replaces the default DOM rendering for this node. The NodeViewContent is where ProseMirror renders the child content. In this case, the caption text. Without NodeViewContent, the caption would not render.

The selected prop is a boolean that reflects whether the node is currently selected. I use it to show and hide the toolbar. That way the alignment buttons only appear when the user is interacting with the media node.

import { ReactNodeViewRenderer } from '@tiptap/react'

addNodeView() {
  return ReactNodeViewRenderer(MediaNodeView)
}

That hands rendering of the node to React. ProseMirror still owns the document model, selection, and transactions. The NodeView is just the UI layer.

A React NodeView does not re-render on every editor update. It updates when its rendered node changes, when relevant props like selected change, or when Tiptap decides the NodeView needs to update. By default, selected only tracks NodeSelection; if you want it to become true for a text selection inside the node, use selectedOnTextSelection.

addNodeView() {
  return ReactNodeViewRenderer(MediaNodeView, {
    selectedOnTextSelection: true,
  })
}

Tiptap also has a trackNodeViewPosition option. Only enable this if your NodeView needs to know its exact position in the document. It causes React NodeViews to re-render when their position in the document shifts. With many NodeViews or frequent transactions, this can become expensive.

If you need a simpler approach for inline formatting, check out ReactMarkViewRenderer. Same idea but for marks. It is useful for things like custom link previews or inline badges.

ProseMirror Plugins Inside Tiptap

The extension API covers most cases. But sometimes you need to drop down to raw ProseMirror. That is what addProseMirrorPlugins() is for.

Plugins give you access to decorations, event handling, and state that the extension API does not expose directly.

Here is a plugin that adds a CSS class to media nodes without a source set. This lets you style empty placeholders differently.

import { Plugin, PluginKey } from '@tiptap/pm/state'
import { Decoration, DecorationSet } from '@tiptap/pm/view'

addProseMirrorPlugins() {
  return [
    new Plugin({
      key: new PluginKey('mediaPlaceholder'),
      props: {
        decorations: (state) => {
          const decos: Decoration[] = []
          state.doc.descendants((node, pos) => {
            if (node.type.name === 'media' && !node.attrs.src) {
              decos.push(
                Decoration.node(pos, pos + node.nodeSize, {
                  class: 'media-node--empty',
                })
              )
            }
          })
          return DecorationSet.create(state.doc, decos)
        },
      },
    }),
  ]
}

Import from @tiptap/pm instead of the raw prosemirror-state package. Tiptap re-exports the ProseMirror packages under @tiptap/pm. Using these ensures version compatibility.

ProseMirror plugins are the escape hatch. If the extension API is a high-level framework, plugins are raw ProseMirror. Use them when you need fine-grained control over decorations, event handling, or custom state that spans multiple nodes.

TypeScript Patterns for Extensions

Most extension examples online skip types. In a production codebase, skipping types is how bugs slip in.

Here is the full generic pattern for a typed extension.

import { Node } from '@tiptap/core'

interface MediaOptions {
  HTMLAttributes: Record<string, any>
}

interface MediaStorage {
  lastRender: number
}

export const Media = Node.create<MediaOptions, MediaStorage>({
  name: 'media',

  addOptions() {
    return {
      HTMLAttributes: {},
    }
  },

  addStorage() {
    return {
      lastRender: 0,
    }
  },

  onUpdate() {
    this.storage.lastRender = Date.now()
  },
})

The Node.create<Options, Storage> generics give you full type safety on this.options and this.storage inside the extension. Without them, you get any and lose autocompletion.

The command augmentation I showed earlier also needs types. Define an interface for your attributes and reuse it across addAttributes, the command augmentation, and your NodeView props.

interface MediaAttributes {
  src: string
  alt: string
  width: string
  alignment: 'left' | 'center' | 'right'
}

One type, three uses. That is the pattern.

If you are extending an existing extension, this.parent() is available for calling the parent method. It is typed correctly as long as your generics match the parent’s. This is useful when you want to wrap an existing extension and add behavior on top of its existing hooks.

Testing Your Extensions

You can test parts of an extension without mounting a full editor. This is something most people don’t realize. Rendering can be tested with generateHTML(). Parsing can be tested with generateJSON(). Commands usually need a minimal editor instance, but that is still lightweight.

import { describe, it, expect } from 'vitest'
import { generateHTML } from '@tiptap/html'
import Document from '@tiptap/extension-document'
import Text from '@tiptap/extension-text'
import { Media } from './media'

describe('Media extension', () => {
  it('renders HTML correctly', () => {
    const html = generateHTML(
      {
        type: 'doc',
        content: [
          {
            type: 'media',
            attrs: {
              src: 'https://example.com/img.jpg',
              alt: 'test image',
              width: '50%',
              alignment: 'center',
            },
          },
        ],
      },
      [Document, Text, Media],
    )

    expect(html).toContain('src="https://example.com/img.jpg"')
    expect(html).toContain('data-alignment="center"')
  })
})

This test uses generateHTML() to produce HTML from a JSON document. It exercises the extension’s HTML rendering path, especially renderHTML, without requiring an editor instance or a browser-like DOM. The JSON structure passes media directly as a document child, so no Paragraph extension is needed here.

Testing commands requires a minimal editor instance.

import { Editor } from '@tiptap/core'
import StarterKit from '@tiptap/starter-kit'
import { Media } from './media'

describe('Media commands', () => {
  it('sets media attributes', () => {
    const editor = new Editor({
      extensions: [StarterKit, Media],
      content: '<p>hello</p>',
    })

    editor.chain().setMedia({ src: 'https://example.com/img.jpg' }).run()

    expect(editor.getHTML()).toContain('https://example.com/img.jpg')

    editor.destroy()
  })
})

Mounting a lightweight editor in tests is fast. The Editor constructor from @tiptap/core works in Node environments with a minimal DOM shim. You can test the full command pipeline, input rules, and paste handling without a browser.

Testing could be its own post, but even a small test setup catches a lot. The Tiptap core repo runs hundreds of these tests on every commit.

Production Checklist

The extension API is only half the work. The other half is keeping saved content stable over time.

ProseMirror versions. Import from @tiptap/pm instead of individual ProseMirror packages. The main reason is version compatibility: Tiptap re-exports specific ProseMirror versions that are tested to work together. This keeps your dependency tree consistent and avoids subtle bugs from version mismatches.

SSR safety. If you render your editor content on the server, use generateHTML() from @tiptap/html. It produces HTML from a JSON document without touching the DOM. Avoid window references in parseHTML rules. They will crash on the server.

import { generateHTML } from '@tiptap/html'
import { Media } from './extensions/media'

const html = generateHTML(jsonDoc, [Media, ...otherExtensions])

Security. The schema itself is a validation layer. ProseMirror will reject content that does not match your schema. But the schema validates document structure, not arbitrary HTML attributes. Make sure your renderHTML output does not include unsanitized user input in href, src, style, or event handler attributes. Always sanitize user-provided URLs and attribute values before they reach the DOM. In the media node example, width should be constrained to safe values (like a percentage or pixel range) before rendering it into a style attribute. This also applies to your own renderHTML implementation. The mergeAttributes helper is powerful, but feeding it unsanitized attributes from user data puts the rendered HTML at risk. Treat renderHTML as an output boundary. Pasted content can carry unexpected attributes or nesting. Use transformPastedHTML to sanitize pasted content before it enters the document. Never trust clipboard data.

Content migrations. Once documents are stored in production, your schema is part of your data model. Renaming a node from oldImage to media is not just a refactor anymore. It is a content migration. Walk the JSON document tree and replace old type names with new ones. Store a version number in your document metadata so you know which migration to run. I have seen production outages from people renaming nodes without migrations.

function migrateDocument(doc) {
  return {
    ...doc,
    content: doc.content.map((node) => {
      if (node.type === 'oldImage') {
        return { ...node, type: 'media' }
      }
      return node
    }),
  }
}

Monitoring. Wrap onTransaction with telemetry. If an extension throws during a transaction, catch it and log it. A crashing extension can make the entire editor unusable. Knowing about it before your users do makes all the difference.

What To Learn Next

If this post helped you, here is where to go next.

The Tiptap custom extensions docs cover the full API surface I did not touch here. The ProseMirror for Beginners post is the foundation this builds on.

I maintain Tiptap. If you found this useful, follow me on Bluesky or GitHub. I write about ProseMirror, Tiptap, and building tools for developers.

The best way to learn this stuff is to build something. Pick an extension you use in your editor, rebuild it from scratch, and ship it. You will run into every issue I covered here. That is the point.