unpdf

A collection of utilities to work with PDFs. Uses Mozilla's PDF.js under the hood.

unpdf takes advantage of export conditions to circumvent build issues in serverless environments. PDF.js depends on the optional canvas module, which doesn't work inside worker threads.

This library is also intended as a modern alternative to the unmaintained pdf-parse.

Features

🏗️ Conditional exports for Browser, Node and worker environments
💬 Extract text from PDFs
🧱 Use custom PDF.js build

Installation

Run the following command to add unpdf to your project.

# pnpm
pnpm add -D unpdf

# npm
npm install -D unpdf

# yarn
yarn add -D unpdf

Usage

import { decodePDFText } from 'unpdf'

const pdfBuffer = await fetch('https://example.com/file.pdf').then(res => res.arrayBuffer())

const { totalPages, info, metadata, text } = await decodePDFText(
  new Uint8Array(pdfBuffer), { mergePages: true }
)

Config

interface UnPDFConfiguration {
  /**
   * By default, UnPDF will use the latest version of PDF.js. If you want to
   * use an older version or the legacy build, set a promise that resolves to
   * the PDF.js module.
   *
   * @example
   * () => import('pdfjs-dist/legacy/build/pdf.js')
   */
  pdfjs?: () => typeof PDFJS
}

Methods

`defineUnPDFConfig`

function defineUnPDFConfig({ pdfjs }: UnPDFConfiguration): Promise<void>

`decodePDFText`

interface PDFContent {
  totalPages: number
  info?: Record<string, any>
  metadata?: any
  text: string | string[]
}

function decodePDFText(
  data: ArrayBuffer,
  { mergePages }?: { mergePages?: boolean }
): Promise<PDFContent>

`getImagesFromPage`

function getImagesFromPage(
  data: ArrayBuffer,
  pageNumber: number
): Promise<ArrayBuffer[]>

JSPM

unpdf

Package Exports

Readme

unpdf

Features

Installation

Usage

Config

Methods

`defineUnPDFConfig`

`decodePDFText`

`getImagesFromPage`

License