JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 27
  • Score
    100M100P100Q65220F
  • License MIT

n8n custom node for converting DOCX files to Markdown using Pandoc

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (n8n-nodes-pandoc) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    n8n-nodes-pandoc

    This is an n8n community node that allows you to convert DOCX files to Markdown using Pandoc.

    n8n is a fair-code licensed workflow automation platform.

    Prerequisites

    Before using this node, you need to have Pandoc installed on your system:

    Windows

    # Using Chocolatey
    choco install pandoc
    
    # Using Scoop
    scoop install pandoc
    
    # Or download from: https://github.com/jgm/pandoc/releases

    macOS

    # Using Homebrew
    brew install pandoc
    
    # Using MacPorts
    sudo port install pandoc

    Linux

    # Ubuntu/Debian
    sudo apt-get install pandoc
    
    # CentOS/RHEL/Fedora
    sudo yum install pandoc
    # or
    sudo dnf install pandoc
    
    # Arch Linux
    sudo pacman -S pandoc

    Installation

    Follow the installation guide in the n8n community nodes documentation.

    1. Go to Settings > Community Nodes
    2. Select Install
    3. Enter n8n-nodes-pandoc as the package name
    4. Select Install

    After installation, the Pandoc Converter node will be available in your n8n instance.

    Operations

    Convert DOCX to Markdown

    Converts a DOCX file to Markdown format using Pandoc with full image support.

    Configuration

    • Input Data Field Name: The name of the binary property containing the DOCX file (default: "data")
    • Output Data Field Name: The name of the binary property where the converted Markdown will be stored (default: "data")
    • Extract Images: Whether to extract and process images from the DOCX file (default: true)
    • Image Output Format: How to handle extracted images:
      • Embed as Base64: Images are embedded directly in the Markdown as base64 data URLs (self-contained)
      • Save as Separate Files: Images are saved as separate binary files that can be accessed individually
    • Additional Pandoc Options: Optional command-line arguments to pass to Pandoc (e.g., --wrap=none --reference-links)

    Example Workflow

    1. HTTP Request node to download a DOCX file
    2. Pandoc Converter node to convert DOCX to Markdown with images
    3. Write Binary File node to save the Markdown file
    4. (Optional) Additional nodes to process extracted image files

    Image Handling

    Base64 Embedding (Default):

    • Images are embedded directly in the Markdown as ... URLs
    • Creates a single, self-contained Markdown file
    • Perfect for sharing or storing as a single document
    • Preserves original alt text from the document
    • Larger file size due to base64 encoding

    Base64 Embedding (Compact):

    • Images are embedded as base64 with simplified alt text (just filename)
    • Significantly smaller markdown file size compared to regular base64
    • Still self-contained but more readable
    • Better for documents with many images or very long alt text
    • Handles large images (>5MB) with size warnings

    Reference Links (Base64):

    • Uses markdown reference-style links with base64 data
    • Cleaner, more readable markdown content
    • Base64 data stored at the end of the document
    • Better compatibility with markdown viewers
    • Example: ![image1.png][img_1] with [img_1]: data:image/png;base64,... at bottom

    Separate Files:

    • Images are extracted as separate binary data properties
    • Markdown references images by filename
    • Allows individual processing of images
    • Smaller Markdown file size
    • Each image accessible as image_1_filename.png, image_2_filename.jpg, etc.

    Conversion Metadata

    The node provides detailed metadata about the conversion:

    {
      "conversion": {
        "originalFileName": "document.docx",
        "outputFileName": "document.md",
        "originalSize": 45234,
        "convertedSize": 12456,
        "timestamp": "2024-01-15T10:30:00.000Z",
        "extractedImages": true,
        "imageOutputFormat": "base64",
        "imageCount": 3
      }
    }

    Additional Pandoc Options Examples

    • --wrap=none - Disable text wrapping
    • --standalone - Produce a standalone document
    • --toc - Include table of contents
    • --reference-links - Use reference-style links
    • --preserve-tabs - Preserve tabs in code blocks

    Compatibility

    This node has been tested with:

    • n8n version 1.0.0+
    • Pandoc version 2.0+

    Resources

    License

    MIT