Package Exports

@matbee/libreoffice-converter
@matbee/libreoffice-converter/browser
@matbee/libreoffice-converter/package.json
@matbee/libreoffice-converter/server

Readme

LibreOffice WASM Document Converter

A headless document conversion toolkit that uses LibreOffice compiled to WebAssembly. Convert documents between various formats (DOCX, PDF, ODT, XLSX, etc.) directly in Node.js or the browser without any native dependencies.

Features

🚀 Pure WebAssembly - No native LibreOffice installation required
📄 Wide Format Support - Convert between 15+ document formats
🌐 Cross-Platform - Works in Node.js and browsers
📦 Zero Dependencies - Self-contained WASM module
🔒 Secure - Documents never leave your environment
⚡ Fast Conversions - 1-5 seconds per document after initialization

Note: First browser initialization includes downloading ~240MB of WASM files. After that, conversions are fast. Reuse the converter instance for best performance.

Quick Start

Installation

npm install @matbee/libreoffice-converter

Basic Usage (Node.js)

import { createConverter } from '@matbee/libreoffice-converter';
import fs from 'fs';

// Initialize the converter (blocks main thread)
const converter = await createConverter({
  wasmPath: './node_modules/@matbee/libreoffice-converter/wasm',
  verbose: true,
  onProgress: (info) => console.log(`[${info.phase}] ${info.percent}%`),
});

// Read a document
const docxBuffer = fs.readFileSync('document.docx');

// Convert to PDF
const result = await converter.convert(docxBuffer, {
  outputFormat: 'pdf',
}, 'document.docx');

// Save the result
fs.writeFileSync('document.pdf', result.data);
console.log(`Converted in ${result.duration}ms`);

// Clean up
await converter.destroy();

Non-Blocking Conversion (Recommended for Servers)

import { createWorkerConverter } from '@matbee/libreoffice-converter';

// Runs in a worker thread - doesn't block the main thread
const converter = await createWorkerConverter({
  wasmPath: './wasm',
});

const result = await converter.convert(docxBuffer, { outputFormat: 'pdf' });
await converter.destroy();

One-Shot Conversion

import { convertDocument } from '@matbee/libreoffice-converter';

// Creates converter, converts, then destroys - best for single conversions
const result = await convertDocument(
  docxBuffer,
  { outputFormat: 'pdf' },
  { wasmPath: './wasm' }
);

System Requirements
Building from Source
Project Setup
API Reference
- Node.js Converters
- Conversion Validation
Supported Formats
Browser Usage
WASM Loading Progress
Document Inspection & Rendering API
Document Editing API
Browser Document Preview API
Configuration
Examples
Troubleshooting
License

System Requirements

Using Pre-built WASM (Recommended)

Node.js 18.0.0 or later
~150MB disk space for WASM files

Building from Source

Ubuntu 22.04+ / Debian 12+ (or compatible)
16GB+ RAM (32GB recommended)
50GB+ disk space
8+ CPU cores (32 recommended)
Build time: 1-4 hours

Building from Source

If you need to build the LibreOffice WASM module yourself:

Prerequisites

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y \
    build-essential git cmake ninja-build \
    python3 python3-pip python3-dev \
    autoconf automake bison ccache flex gawk gettext \
    libarchive-dev libcups2-dev libcurl4-openssl-dev \
    libfontconfig1-dev libfreetype6-dev libglib2.0-dev \
    libharfbuzz-dev libicu-dev libjpeg-dev liblcms2-dev \
    libpng-dev libssl-dev libtool libxml2-dev libxslt1-dev \
    pkg-config uuid-dev xsltproc zip unzip wget curl \
    ca-certificates xz-utils gperf nasm

Build Steps

# Clone this repository
git clone https://github.com/matbeedotcom/libreoffice-document-converter.git
cd libreoffice-document-converter

# Run the build script (takes 1-4 hours)
BUILD_JOBS=32 ./build/build-wasm.sh

Build Options

Environment Variable	Default	Description
`BUILD_JOBS`	`$(nproc)`	Number of parallel compile jobs
`BUILD_DIR`	`~/libreoffice-wasm-build`	Build directory
`OUTPUT_DIR`	`./wasm`	Output directory for WASM files
`LIBREOFFICE_VERSION`	`libreoffice-24-8`	LibreOffice Git branch
`EMSDK_VERSION`	`3.1.51`	Emscripten SDK version
`SKIP_DEPS`	`0`	Skip dependency installation
`CLEAN_BUILD`	`0`	Clean before building


### Build Output

After building, the `wasm/` directory contains:

| File | Size (Raw) | Size (Brotli) | Description |
|------|------------|---------------|-------------|
| `soffice.wasm` | 112 MB | 24.8 MB | Main WebAssembly binary |
| `soffice.data` | 80 MB | 15.2 MB | Filesystem image (fonts, configs) |
| `soffice.cjs` | 230 KB | - | JavaScript loader |
| `soffice.worker.cjs` | 4 KB | - | Web Worker script |
| `loader.cjs` | 8 KB | - | Node.js module loader |
| **Total** | **192 MB** | **40 MB** | With Brotli compression |


## Project Setup

### Development Setup

```bash
# Install dependencies
npm install

# Build TypeScript
npm run build

# Run tests
npm test

# Type checking
npm run typecheck

# Development mode with auto-reload
npm run dev

NPM Scripts

Script	Description
`npm run build`	Compile TypeScript to `dist/`
`npm run build:wasm`	Build LibreOffice WASM
`npm test`	Run tests
`npm run dev`	Development server with watch
`npm run typecheck`	TypeScript type checking
`npm run lint`	ESLint code linting

API Reference

`createConverter(options?)`

Creates and initializes a converter instance.

import { createConverter } from '@matbee/libreoffice-converter';

const converter = await createConverter({
  wasmPath: './wasm',
  verbose: false,
  onProgress: (info) => console.log(info.message),
  onReady: () => console.log('Ready!'),
  onError: (err) => console.error(err),
});

Options:

Option	Type	Default	Description
`wasmPath`	`string`	`'./wasm'`	Path to WASM files directory
`verbose`	`boolean`	`false`	Enable debug logging
`onProgress`	`(info: ProgressInfo) => void`	-	Progress callback
`onReady`	`() => void`	-	Called when initialization completes
`onError`	`(error: Error) => void`	-	Error callback

`converter.convert(input, options, filename?)`

Convert a document to a different format.

const result = await converter.convert(
  inputBuffer,
  {
    outputFormat: 'pdf',
    inputFormat: 'docx',  // Optional, auto-detected from filename
    password: 'secret',   // For encrypted documents
    pdf: {
      pdfaLevel: 'PDF/A-2b',
      quality: 90,
    },
  },
  'document.docx'  // Optional filename for format detection
);

Parameters:

Parameter	Type	Description
`input`	`Uint8Array \| ArrayBuffer \| Buffer`	Input document data
`options`	`ConversionOptions`	Conversion options
`filename`	`string`	Optional filename for format detection

Returns: Promise<ConversionResult>

interface ConversionResult {
  data: Uint8Array;      // Converted document bytes
  mimeType: string;      // MIME type of output
  filename: string;      // Suggested output filename
  duration: number;      // Conversion time in ms
}

`converter.destroy()`

Clean up resources. Call when done converting.

await converter.destroy();

`convertDocument(input, options, converterOptions?)`

One-shot conversion utility. Creates converter, converts, then destroys.

import { convertDocument } from '@matbee/libreoffice-converter';

const result = await convertDocument(
  inputBuffer,
  { outputFormat: 'pdf' },
  { wasmPath: './wasm' }
);

`createWorkerConverter(options?)`

Creates a converter that runs in a worker thread. Recommended for servers as it doesn't block the main thread.

import { createWorkerConverter } from '@matbee/libreoffice-converter';

const converter = await createWorkerConverter({
  wasmPath: './wasm',
  verbose: false,
});

// Same API as LibreOfficeConverter
const result = await converter.convert(docxBuffer, { outputFormat: 'pdf' });
await converter.destroy();

`createSubprocessConverter(options?)`

Creates a converter that runs in a separate child process. Best for memory isolation and automatic recovery from crashes.

import { createSubprocessConverter } from '@matbee/libreoffice-converter';

const converter = await createSubprocessConverter({
  wasmPath: './wasm',
});

const result = await converter.convert(xlsxBuffer, { outputFormat: 'pdf' }, 'report.xlsx');
await converter.destroy();

Converter Comparison

Converter	Thread	Memory	Use Case
`createConverter()`	Main	Shared	Simple scripts
`createWorkerConverter()`	Worker	Shared	Servers (recommended)
`createSubprocessConverter()`	Process	Isolated	High reliability, memory-constrained

`LibreOfficeConverter.getSupportedInputFormats()`

Get list of supported input formats.

import { LibreOfficeConverter } from '@matbee/libreoffice-converter';

const formats = LibreOfficeConverter.getSupportedInputFormats();
// ['doc', 'docx', 'xls', 'xlsx', 'ppt', 'pptx', 'odt', 'ods', 'odp', ...]

`LibreOfficeConverter.getSupportedOutputFormats()`

Get list of supported output formats.

const formats = LibreOfficeConverter.getSupportedOutputFormats();
// ['pdf', 'docx', 'doc', 'odt', 'rtf', 'txt', 'html', 'xlsx', ...]

`isConversionSupported(inputFormat, outputFormat)`

Check if a specific conversion path is supported.

import { isConversionSupported } from '@matbee/libreoffice-converter';

isConversionSupported('docx', 'pdf');   // true
isConversionSupported('pdf', 'docx');   // false - PDFs can't be converted to DOCX
isConversionSupported('xlsx', 'csv');   // true
isConversionSupported('pptx', 'xlsx');  // false - can't convert presentations to spreadsheets

`getValidOutputFormatsFor(inputFormat)`

Get valid output formats for a given input format.

import { getValidOutputFormatsFor } from '@matbee/libreoffice-converter';

getValidOutputFormatsFor('docx');
// ['pdf', 'docx', 'doc', 'odt', 'rtf', 'txt', 'html', 'png']

getValidOutputFormatsFor('xlsx');
// ['pdf', 'xlsx', 'xls', 'ods', 'csv', 'html', 'png']

getValidOutputFormatsFor('pdf');
// ['pdf', 'png', 'svg', 'html'] (PDFs are imported as Draw documents)

Conversion Validation Example

import {
  isConversionSupported,
  getValidOutputFormatsFor,
  getConversionErrorMessage,
} from '@matbee/libreoffice-converter';

function validateConversion(inputFile: string, outputFormat: string) {
  const ext = inputFile.split('.').pop()?.toLowerCase();

  if (!isConversionSupported(ext, outputFormat)) {
    throw new Error(getConversionErrorMessage(ext, outputFormat));
    // "Cannot convert PDF to DOCX. PDF files are imported as Draw documents
    //  and cannot be exported to Office formats. Valid output formats for PDF:
    //  pdf, png, svg, html"
  }
}

Supported Formats

Input Formats

Format	Extension	Description
Microsoft Word	`.doc`, `.docx`	Word 97-2003 and modern
Microsoft Excel	`.xls`, `.xlsx`	Excel 97-2003 and modern
Microsoft PowerPoint	`.ppt`, `.pptx`	PowerPoint 97-2003 and modern
OpenDocument Text	`.odt`	LibreOffice Writer
OpenDocument Spreadsheet	`.ods`	LibreOffice Calc
OpenDocument Presentation	`.odp`	LibreOffice Impress
Rich Text Format	`.rtf`	Cross-platform text
Plain Text	`.txt`	UTF-8 text
HTML	`.html`, `.htm`	Web pages
CSV	`.csv`	Comma-separated values
PDF	`.pdf`	For editing (limited)
EPUB	`.epub`	E-books

Output Formats

Format	Extension	MIME Type
PDF	`.pdf`	`application/pdf`
DOCX	`.docx`	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`
DOC	`.doc`	`application/msword`
ODT	`.odt`	`application/vnd.oasis.opendocument.text`
RTF	`.rtf`	`application/rtf`
TXT	`.txt`	`text/plain`
HTML	`.html`	`text/html`
XLSX	`.xlsx`	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
XLS	`.xls`	`application/vnd.ms-excel`
ODS	`.ods`	`application/vnd.oasis.opendocument.spreadsheet`
CSV	`.csv`	`text/csv`
PPTX	`.pptx`	`application/vnd.openxmlformats-officedocument.presentationml.presentation`
PPT	`.ppt`	`application/vnd.ms-powerpoint`
ODP	`.odp`	`application/vnd.oasis.opendocument.presentation`
PNG	`.png`	`image/png`
JPG	`.jpg`	`image/jpeg`
SVG	`.svg`	`image/svg+xml`

Browser Usage

Import

<script type="module">
import {
  WorkerBrowserConverter,
  BrowserConverter,
  createWasmPaths
} from '@matbee/libreoffice-converter/browser';
</script>

Basic Browser Usage (Web Worker - Recommended)

The WorkerBrowserConverter runs LibreOffice in a Web Worker, keeping the main thread responsive:

import { WorkerBrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

// Create converter - serves WASM from /wasm/ by default
const converter = new WorkerBrowserConverter({
  ...createWasmPaths(), // Defaults to /wasm/
  browserWorkerJs: '/dist/browser-worker.js',
  onProgress: (info) => {
    progressBar.style.width = `${info.percent}%`;
    statusText.textContent = info.message;
  },
});

await converter.initialize();

// Convert a File object
const file = document.querySelector('input[type="file"]').files[0];
const arrayBuffer = await file.arrayBuffer();
const result = await converter.convert(new Uint8Array(arrayBuffer), {
  outputFormat: 'pdf',
}, file.name);

// Download the result
const blob = new Blob([result.data], { type: result.mimeType });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = result.filename;
a.click();

Main Thread Converter (Alternative)

For simpler setups without a worker (blocks UI during conversion):

import { BrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

const converter = new BrowserConverter({
  ...createWasmPaths(), // Defaults to /wasm/
  onProgress: (info) => console.log(`${info.percent}%: ${info.message}`),
});

await converter.initialize();
const result = await converter.convert(fileData, { outputFormat: 'pdf' }, 'doc.docx');

Required WASM Paths

The browser converter requires paths to WASM files. Use createWasmPaths() which defaults to /wasm/:

import { createWasmPaths, DEFAULT_WASM_BASE_URL } from '@matbee/libreoffice-converter/browser';

// Use default /wasm/ path (same-origin)
const paths = createWasmPaths();
// Returns:
// {
//   sofficeJs: '/wasm/soffice.js',
//   sofficeWasm: '/wasm/soffice.wasm',
//   sofficeData: '/wasm/soffice.data',
//   sofficeWorkerJs: '/wasm/soffice.worker.js',
// }

// Or use your own CDN
const paths = createWasmPaths('https://cdn.example.com/wasm/');

// Or specify each path manually
const converter = new WorkerBrowserConverter({
  sofficeJs: 'https://cdn.example.com/wasm/soffice.js',
  sofficeWasm: 'https://cdn.example.com/wasm/soffice.wasm',
  sofficeData: 'https://cdn.example.com/wasm/soffice.data',
  sofficeWorkerJs: 'https://cdn.example.com/wasm/soffice.worker.js',
  browserWorkerJs: '/workers/browser-worker.js',
});

Note: For production, consider hosting WASM files on your own CDN for better reliability and caching.

Required HTTP Headers

SharedArrayBuffer requires specific CORS headers on your server:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

WASM Loading Progress

The browser converter provides detailed progress tracking during WASM initialization. The progress system tracks download bytes, compilation phases, and LibreOffice initialization.

Progress Callback

import { WorkerBrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

const converter = new WorkerBrowserConverter({
  ...createWasmPaths('/wasm/'),
  browserWorkerJs: '/dist/browser-worker.js',
  onProgress: (progress) => {
    // progress is a WasmLoadProgress object
    console.log(`Phase: ${progress.phase}`);
    console.log(`Progress: ${progress.percent}%`);
    console.log(`Message: ${progress.message}`);

    // During download phases, bytes info is available
    if (progress.bytesLoaded !== undefined) {
      const mb = (progress.bytesLoaded / 1024 / 1024).toFixed(1);
      const totalMb = (progress.bytesTotal! / 1024 / 1024).toFixed(1);
      console.log(`Downloaded: ${mb} MB / ${totalMb} MB`);
    }
  },
});

WasmLoadProgress Interface

interface WasmLoadProgress {
  /** Overall progress 0-100 */
  percent: number;
  /** Human-readable status message */
  message: string;
  /** Current loading phase */
  phase: WasmLoadPhase;
  /** Bytes downloaded (present during download phases) */
  bytesLoaded?: number;
  /** Total bytes to download (present during download phases) */
  bytesTotal?: number;
}

type WasmLoadPhase =
  | 'download-wasm'    // Downloading soffice.wasm (~142MB)
  | 'download-data'    // Downloading soffice.data (~96MB)
  | 'compile'          // WebAssembly compilation
  | 'filesystem'       // Emscripten filesystem setup
  | 'lok-init'         // LibreOfficeKit initialization
  | 'ready';           // Complete

Progress Phases

Phase	Weight	Description
`download-wasm`	30%	Downloading soffice.wasm (~142MB)
`download-data`	20%	Downloading soffice.data (~96MB)
`compile`	10%	WebAssembly compilation
`filesystem`	5%	Virtual filesystem setup
`lok-init`	35%	LibreOffice initialization

Example: Progress Bar UI

<div id="progress-container">
  <div id="progress-bar" style="width: 0%"></div>
</div>
<div id="progress-text">Initializing...</div>
<div id="progress-bytes"></div>

<script type="module">
import { WorkerBrowserConverter, createWasmPaths } from '/dist/browser.js';

const progressBar = document.getElementById('progress-bar');
const progressText = document.getElementById('progress-text');
const progressBytes = document.getElementById('progress-bytes');

const converter = new WorkerBrowserConverter({
  ...createWasmPaths('/wasm/'),
  browserWorkerJs: '/dist/browser-worker.js',
  onProgress: (progress) => {
    progressBar.style.width = `${progress.percent}%`;
    progressText.textContent = progress.message;

    if (progress.bytesLoaded !== undefined && progress.bytesTotal) {
      const loaded = (progress.bytesLoaded / 1024 / 1024).toFixed(1);
      const total = (progress.bytesTotal / 1024 / 1024).toFixed(1);
      progressBytes.textContent = `${loaded} MB / ${total} MB`;
    } else {
      progressBytes.textContent = '';
    }
  },
});

await converter.initialize();
progressText.textContent = 'Ready!';
</script>

Document Inspection & Rendering API

All converters (Node.js and Browser) provide APIs for inspecting documents and rendering page previews without full conversion. This is useful for building document viewers, thumbnail galleries, and editors.

`converter.getDocumentInfo(input, inputFormat)`

Get document metadata including type, page count, and valid output formats.

// Node.js
import { createWorkerConverter } from '@matbee/libreoffice-converter';

const converter = await createWorkerConverter({ wasmPath: './wasm' });

const docInfo = await converter.getDocumentInfo(fileBuffer, 'docx');
console.log(docInfo);
// {
//   documentType: 0,           // 0=TEXT, 1=SPREADSHEET, 2=PRESENTATION, 3=DRAWING
//   documentTypeName: 'Text Document',
//   validOutputFormats: ['pdf', 'docx', 'odt', 'html', 'txt', 'png'],
//   pageCount: 5
// }

Returns: Promise<DocumentInfo>

interface DocumentInfo {
  documentType: number;           // LOK document type enum
  documentTypeName: string;       // Human-readable type name
  validOutputFormats: string[];   // Formats this document can be converted to
  pageCount: number;              // Number of pages/slides/sheets
}

`converter.getPageCount(input, inputFormat)`

Get just the page count for a document.

const pageCount = await converter.getPageCount(docxBuffer, 'docx');
console.log(`Document has ${pageCount} pages`);

Returns: Promise<number>

`converter.renderPage(input, inputFormat, pageIndex, width, height?)`

Render a single page as a PNG image.

// Render first page at 800px width (height auto-calculated to maintain aspect ratio)
const preview = await converter.renderPage(pptxBuffer, 'pptx', 0, 800);

// preview.data is a Uint8Array containing raw RGBA pixel data
console.log(`Rendered: ${preview.width}x${preview.height} pixels`);

// Save as PNG (Node.js)
import { createCanvas } from 'canvas';
const canvas = createCanvas(preview.width, preview.height);
const ctx = canvas.getContext('2d');
const imageData = ctx.createImageData(preview.width, preview.height);
imageData.data.set(preview.data);
ctx.putImageData(imageData, 0, 0);
fs.writeFileSync('page-0.png', canvas.toBuffer('image/png'));

Parameters:

Parameter	Type	Description
`input`	`Uint8Array \| Buffer`	Document data
`inputFormat`	`string`	Input format (e.g., 'docx', 'pptx')
`pageIndex`	`number`	0-based page index
`width`	`number`	Target width in pixels
`height`	`number`	Optional target height (0 = auto based on aspect ratio)

Returns: Promise<PagePreview>

interface PagePreview {
  page: number;      // Page index
  data: Uint8Array;  // Raw RGBA pixel data
  width: number;     // Actual rendered width
  height: number;    // Actual rendered height
}

`converter.renderPagePreviews(input, inputFormat, options?)`

Render multiple pages as thumbnails.

// Render all pages at 400px width
const previews = await converter.renderPagePreviews(pptxBuffer, 'pptx', {
  width: 400,
});

console.log(`Rendered ${previews.length} pages`);
previews.forEach(p => console.log(`Page ${p.page}: ${p.width}x${p.height}`));

// Render only specific pages
const selectedPreviews = await converter.renderPagePreviews(pptxBuffer, 'pptx', {
  width: 800,
  pageIndices: [0, 2, 4],  // Only pages 1, 3, and 5
});

Options:

interface RenderOptions {
  /** Width of rendered image in pixels (default: 800) */
  width?: number;
  /** Height of rendered image in pixels (0 = auto based on aspect ratio) */
  height?: number;
  /** Specific page indices to render (0-based). If empty, renders all pages */
  pageIndices?: number[];
}

Returns: Promise<PagePreview[]>

`converter.getDocumentText(input, inputFormat)`

Extract all text content from a document.

const text = await converter.getDocumentText(docxBuffer, 'docx');
if (text) {
  console.log('Document text:', text);
} else {
  console.log('No text content found');
}

Returns: Promise<string | null>

`converter.getPageNames(input, inputFormat)`

Get slide names (for presentations) or sheet names (for spreadsheets).

// For presentations - get slide names
const slideNames = await converter.getPageNames(pptxBuffer, 'pptx');
console.log('Slides:', slideNames);
// ['Introduction', 'Overview', 'Conclusion']

// For spreadsheets - get sheet names
const sheetNames = await converter.getPageNames(xlsxBuffer, 'xlsx');
console.log('Sheets:', sheetNames);
// ['Sheet1', 'Data', 'Summary']

Returns: Promise<string[]>

Document Types

Type	Value	Description
TEXT	0	Writer documents (doc, docx, odt, rtf, txt)
SPREADSHEET	1	Calc documents (xls, xlsx, ods, csv)
PRESENTATION	2	Impress documents (ppt, pptx, odp)
DRAWING	3	Draw documents (odg, pdf)

Document Editing API

The converters support opening documents for editing, making modifications, and saving the results.

`converter.openDocument(input, inputFormat)`

Open a document for editing. Returns a session that can be used for subsequent operations.

const session = await converter.openDocument(docxBuffer, 'docx');
console.log(session);
// {
//   sessionId: 'edit_session_0_1234567890',
//   documentType: 'writer',  // 'writer', 'calc', or 'impress'
//   pageCount: 5
// }

Returns: Promise<EditorSession>

interface EditorSession {
  sessionId: string;      // Unique session ID for this document
  documentType: string;   // 'writer', 'calc', or 'impress'
  pageCount: number;      // Number of pages/slides/sheets
}

`converter.editorOperation(sessionId, method, ...args)`

Execute an editing operation on an open document.

// Get document structure
const structure = await converter.editorOperation(session.sessionId, 'getStructure');
console.log(structure.data);

// Get document type
const docType = await converter.editorOperation(session.sessionId, 'getDocumentType');
console.log(docType.data);  // 'writer', 'calc', or 'impress'

// Insert text (Writer documents)
const result = await converter.editorOperation(
  session.sessionId,
  'insertText',
  'Hello, World!'
);

// Set cell value (Calc documents)
const cellResult = await converter.editorOperation(
  session.sessionId,
  'setCellValue',
  'A1',
  42
);

Returns: Promise<EditorOperationResult<T>>

interface EditorOperationResult<T = unknown> {
  success: boolean;      // Whether the operation succeeded
  verified?: boolean;    // Whether the result was verified
  data?: T;              // Operation result data
  error?: string;        // Error message if failed
  suggestion?: string;   // Suggested fix if failed
}

`converter.closeDocument(sessionId)`

Close an editing session and get the modified document.

// Close and get modified document
const modifiedData = await converter.closeDocument(session.sessionId);

if (modifiedData) {
  fs.writeFileSync('modified.docx', modifiedData);
  console.log('Document saved!');
} else {
  console.log('No changes or save failed');
}

Returns: Promise<Uint8Array | undefined>

Complete Editing Example

import { createWorkerConverter } from '@matbee/libreoffice-converter';
import fs from 'fs';

const converter = await createWorkerConverter({ wasmPath: './wasm' });

// Read document
const docx = fs.readFileSync('template.docx');

// Open for editing
const session = await converter.openDocument(docx, 'docx');
console.log(`Opened ${session.documentType} document with ${session.pageCount} pages`);

// Get current structure
const structure = await converter.editorOperation(session.sessionId, 'getStructure');
console.log('Structure:', structure.data);

// Make modifications...
// (specific operations depend on document type)

// Close and save
const modified = await converter.closeDocument(session.sessionId);
if (modified) {
  fs.writeFileSync('output.docx', modified);
}

await converter.destroy();

Browser Document Preview API

The browser converter provides additional convenience methods for rendering.

Get Document Info (Browser)

import { WorkerBrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

const converter = new WorkerBrowserConverter({
  ...createWasmPaths('/wasm/'),
  browserWorkerJs: '/dist/browser-worker.js',
});
await converter.initialize();

const docInfo = await converter.getDocumentInfo(fileBuffer, 'document.docx');

Get LibreOffice Info

const lokInfo = await converter.getLokInfo();
console.log(lokInfo);
// {
//   version: "24.8.0.0.alpha0...",
//   buildInfo: "..."
// }

Example: Document Thumbnail Gallery

import { WorkerBrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

const converter = new WorkerBrowserConverter({
  ...createWasmPaths('/wasm/'),
  browserWorkerJs: '/dist/browser-worker.js',
});
await converter.initialize();

async function renderThumbnails(fileBuffer: Uint8Array, filename: string) {
  const docInfo = await converter.getDocumentInfo(fileBuffer, filename);
  const thumbnails: string[] = [];

  for (let i = 0; i < docInfo.pageCount; i++) {
    const pageData = await converter.renderSinglePage(fileBuffer, filename, {
      pageIndex: i,
      dpi: 72,  // Low DPI for thumbnails
    });

    const blob = new Blob([pageData], { type: 'image/png' });
    thumbnails.push(URL.createObjectURL(blob));
  }

  return thumbnails;
}

Configuration

PDF Options

const result = await converter.convert(input, {
  outputFormat: 'pdf',
  pdf: {
    // PDF/A compliance level
    pdfaLevel: 'PDF/A-2b',  // 'PDF/A-1b', 'PDF/A-2b', 'PDF/A-3b'
    
    // Image quality (0-100)
    quality: 90,
  },
});

Image Options

const result = await converter.convert(input, {
  outputFormat: 'png',
  image: {
    width: 1920,
    height: 1080,
    dpi: 150,
  },
});

Password-Protected Documents

const result = await converter.convert(encryptedDoc, {
  outputFormat: 'pdf',
  password: 'document-password',
});

Examples

Convert DOCX to PDF

import { createConverter } from '@matbee/libreoffice-converter';
import fs from 'fs';

const converter = await createConverter({ wasmPath: './wasm' });

const docx = fs.readFileSync('report.docx');
const pdf = await converter.convert(docx, { outputFormat: 'pdf' });

fs.writeFileSync('report.pdf', pdf.data);
await converter.destroy();

Batch Conversion

import { createConverter } from '@matbee/libreoffice-converter';
import fs from 'fs';
import path from 'path';

const converter = await createConverter({ wasmPath: './wasm' });

const files = fs.readdirSync('./documents')
  .filter(f => f.endsWith('.docx'));

for (const file of files) {
  const input = fs.readFileSync(path.join('./documents', file));
  const result = await converter.convert(input, { outputFormat: 'pdf' }, file);
  fs.writeFileSync(
    path.join('./output', result.filename),
    result.data
  );
  console.log(`Converted: ${file} -> ${result.filename}`);
}

await converter.destroy();

Express.js Server

import express from 'express';
import multer from 'multer';
import { createWorkerConverter, isConversionSupported } from '@matbee/libreoffice-converter';

const app = express();
const upload = multer();
let converter;

// Initialize on startup (use worker converter for non-blocking)
(async () => {
  converter = await createWorkerConverter({ wasmPath: './wasm' });
  console.log('Converter ready');
})();

app.post('/convert', upload.single('file'), async (req, res) => {
  try {
    const inputFormat = req.file.originalname.split('.').pop()?.toLowerCase();
    const outputFormat = req.body.format || 'pdf';

    // Validate conversion before attempting
    if (!isConversionSupported(inputFormat, outputFormat)) {
      return res.status(400).json({
        error: `Cannot convert ${inputFormat} to ${outputFormat}`,
      });
    }

    const result = await converter.convert(
      req.file.buffer,
      { outputFormat },
      req.file.originalname
    );

    res.set('Content-Type', result.mimeType);
    res.set('Content-Disposition', `attachment; filename="${result.filename}"`);
    res.send(Buffer.from(result.data));
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

React Component

import { useState, useEffect, useRef } from 'react';
import { WorkerBrowserConverter, createWasmPaths } from '@matbee/libreoffice-converter/browser';

function DocumentConverter() {
  const converterRef = useRef<WorkerBrowserConverter | null>(null);
  const [status, setStatus] = useState('Loading...');
  const [progress, setProgress] = useState(0);
  const [ready, setReady] = useState(false);

  useEffect(() => {
    const init = async () => {
      const converter = new WorkerBrowserConverter({
        ...createWasmPaths('/wasm/'),
        browserWorkerJs: '/dist/browser-worker.js',
        onProgress: (info) => {
          setProgress(info.percent);
          setStatus(info.message);
        },
      });
      await converter.initialize();
      converterRef.current = converter;
      setReady(true);
      setStatus('Ready');
    };
    init();

    return () => {
      converterRef.current?.destroy();
    };
  }, []);

  const handleFile = async (e: React.ChangeEvent<HTMLInputElement>) => {
    const file = e.target.files?.[0];
    if (!file || !converterRef.current) return;

    setStatus('Converting...');
    try {
      const arrayBuffer = await file.arrayBuffer();
      const result = await converterRef.current.convert(
        new Uint8Array(arrayBuffer),
        { outputFormat: 'pdf' },
        file.name
      );

      // Download the result
      const blob = new Blob([result.data], { type: result.mimeType });
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
      a.download = result.filename;
      a.click();
      URL.revokeObjectURL(url);

      setStatus('Done!');
    } catch (err) {
      setStatus(`Error: ${err instanceof Error ? err.message : 'Unknown error'}`);
    }
  };

  return (
    <div>
      <h2>Document Converter</h2>
      <p>Status: {status}</p>
      <progress value={progress} max={100} />
      <input
        type="file"
        onChange={handleFile}
        accept=".doc,.docx,.odt,.rtf,.xls,.xlsx,.ppt,.pptx"
        disabled={!ready}
      />
    </div>
  );
}

Troubleshooting

Common Issues

"WASM module not found"

Ensure the wasm/ directory contains all required files:

soffice.wasm
soffice.cjs
soffice.data
soffice.worker.cjs
loader.cjs

"SharedArrayBuffer is not defined" (Browser)

SharedArrayBuffer requires specific headers. Add to your server:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Browser initialization seems slow

Browser initialization includes downloading ~240MB of WASM files. This is network-dependent. The cost is paid only once per converter instance. After initialization:

WASM files are cached by the browser
Conversions take 1-5 seconds depending on document size
Reuse the converter instance for multiple conversions
For servers (Node.js), initialization is much faster (~1-2s) since files load from disk

Memory issues

The WASM module uses ~1GB RAM (set by TOTAL_MEMORY). For memory-constrained environments:

Use converter.destroy() after batch conversions
Avoid parallel conversions
Consider running conversions in a subprocess

Reduce transfer size

Compress WASM files for 79% smaller downloads:

# Brotli (best - 40MB total)
brotli -9 wasm/soffice.wasm -o wasm/soffice.wasm.br
brotli -9 wasm/soffice.data -o wasm/soffice.data.br

# Gzip (63MB total)
gzip -9 -k wasm/soffice.wasm
gzip -9 -k wasm/soffice.data

Configure your server to serve pre-compressed files with correct headers.

Build fails with "out of memory"

Reduce parallel jobs:

BUILD_JOBS=4 ./build/build-wasm.sh

Process Doesn't Exit (Node.js)

The WASM module uses pthread workers that keep the Node.js process alive. Solutions:

// Option 1: Explicitly exit when done
await converter.destroy();
process.exit(0);

// Option 2: For servers, the process stays alive anyway (this is fine)
// The workers will be reused for subsequent conversions

// Option 3: Use setTimeout with unref() for scripts
const timer = setTimeout(() => {}, 0);
timer.unref();

Debug Mode

Enable verbose logging:

const converter = await createConverter({
  wasmPath: './wasm',
  verbose: true,  // Shows LibreOffice internal logs
});

Error Codes

Code	Description
`WASM_NOT_INITIALIZED`	Module not loaded or initialized
`INVALID_INPUT`	Empty or invalid input document
`UNSUPPORTED_FORMAT`	Format not supported
`CORRUPTED_DOCUMENT`	Cannot parse input document
`PASSWORD_REQUIRED`	Document is encrypted
`CONVERSION_FAILED`	Generic conversion error
`LOAD_FAILED`	Could not load document

Performance

Benchmarks

Node.js (filesystem-based):

Operation	Time
First initialization	~1s
DOCX → PDF	~100ms (first), ~35ms (subsequent)
XLSX → PDF	~65ms (first), ~35ms (subsequent)
PPTX → PDF	~290ms (first), ~250ms (subsequent)

Browser (Chromium, local server):

Operation	Time
WASM download (~240MB)	5-30s (depends on network)
LibreOfficeKit initialization	~2.5s
DOCX → PDF	~95ms
XLSX → PDF	~85ms
PPTX → PDF	~305ms

Note: Browser initialization time depends heavily on network speed for the initial WASM download. The ~240MB of WASM files are cached after first load. Node.js loads from filesystem so initialization is much faster.

Benchmarks measured on Node.js v22 / Chromium with 20KB DOCX, 5KB XLSX, and 937KB PPTX test files.

Optimization Tips

Reuse converter instances - Initialization cost is paid only once
Pre-initialize - Start loading during idle time or page load
Server keep-warm - In production, keep converter processes alive
Use Web Workers - Keep UI responsive (browser)
Enable Brotli compression - Reduces transfer size by 79% (192MB → 40MB)
Cache WASM files - Browser caches files after first load

License

This project is licensed under the Mozilla Public License 2.0 (MPL-2.0), the same license as LibreOffice.

Dependencies

LibreOffice - MPL-2.0
Emscripten - MIT

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

Development

# Clone and setup
git clone https://github.com/matbeedotcom/libreoffice-document-converter.git
cd libreoffice-document-converter
npm install

# Build
npm run build

# Test
npm test

# Lint
npm run lint:fix

@matbee/libreoffice-converter

Package Exports

Readme

LibreOffice WASM Document Converter

Features

Quick Start

Installation

Basic Usage (Node.js)

Non-Blocking Conversion (Recommended for Servers)

One-Shot Conversion

Table of Contents

System Requirements

Using Pre-built WASM (Recommended)

Building from Source

Building from Source

Prerequisites

Build Steps

Build Options

NPM Scripts

API Reference

createConverter(options?)

converter.convert(input, options, filename?)

converter.destroy()

convertDocument(input, options, converterOptions?)

createWorkerConverter(options?)

createSubprocessConverter(options?)

Converter Comparison

LibreOfficeConverter.getSupportedInputFormats()

LibreOfficeConverter.getSupportedOutputFormats()

isConversionSupported(inputFormat, outputFormat)

getValidOutputFormatsFor(inputFormat)

Conversion Validation Example

Supported Formats

Input Formats

Output Formats

Browser Usage

Import

Basic Browser Usage (Web Worker - Recommended)

Main Thread Converter (Alternative)

Required WASM Paths

Required HTTP Headers

WASM Loading Progress

Progress Callback

WasmLoadProgress Interface

Progress Phases

Example: Progress Bar UI

Document Inspection & Rendering API

converter.getDocumentInfo(input, inputFormat)

converter.getPageCount(input, inputFormat)

converter.renderPage(input, inputFormat, pageIndex, width, height?)

converter.renderPagePreviews(input, inputFormat, options?)

converter.getDocumentText(input, inputFormat)

converter.getPageNames(input, inputFormat)

Document Types

Document Editing API

converter.openDocument(input, inputFormat)

converter.editorOperation(sessionId, method, ...args)

converter.closeDocument(sessionId)

Complete Editing Example

Browser Document Preview API

Get Document Info (Browser)

Get LibreOffice Info

Example: Document Thumbnail Gallery

Configuration

PDF Options

Image Options

Password-Protected Documents

Examples

Convert DOCX to PDF

Batch Conversion

Express.js Server

React Component

Troubleshooting

Common Issues

"WASM module not found"

"SharedArrayBuffer is not defined" (Browser)

Browser initialization seems slow

Memory issues

Reduce transfer size

Build fails with "out of memory"

`createConverter(options?)`

`converter.convert(input, options, filename?)`

`converter.destroy()`

`convertDocument(input, options, converterOptions?)`

`createWorkerConverter(options?)`

`createSubprocessConverter(options?)`

`LibreOfficeConverter.getSupportedInputFormats()`

`LibreOfficeConverter.getSupportedOutputFormats()`

`isConversionSupported(inputFormat, outputFormat)`

`getValidOutputFormatsFor(inputFormat)`

`converter.getDocumentInfo(input, inputFormat)`

`converter.getPageCount(input, inputFormat)`

`converter.renderPage(input, inputFormat, pageIndex, width, height?)`

`converter.renderPagePreviews(input, inputFormat, options?)`

`converter.getDocumentText(input, inputFormat)`

`converter.getPageNames(input, inputFormat)`

`converter.openDocument(input, inputFormat)`

`converter.editorOperation(sessionId, method, ...args)`

`converter.closeDocument(sessionId)`