Package Exports
- pdfjs-serverless
Readme
pdfjs-serverless
A redistribution of Mozilla's PDF.js for serverless environments, like Deno Deploy and Cloudflare Workers with zero dependencies. All named exports of the PDF.js library are available at roughly 1.4 MB (minified).
Installation
Run the following command to add pdfjs-serverless to your project.
# pnpm
pnpm add pdfjs-serverless
# npm
npm install pdfjs-serverless
# yarn
yarn add pdfjs-serverlessHow It Works
First, some string replacements of the PDF.js library is necessary, i.e. removing browser context references and checks like typeof window. Additionally, we enforce Node.js compatibility (might sound paradox at first, bear with me), i.e. mocking the canvas module and setting the isNodeJS flag to true.
PDF.js uses a worker to parse and work with PDF documents. This worker is a separate file that is loaded by the main library. For the serverless build, we need to inline the worker code into the main library.
To achieve the final nodeless build, unenv does the heavy lifting by converting Node.js specific code to be platform-agnostic. This ensures that Node.js built-in modules like fs are mocked.
See the rollup.config.ts file for more information.
Example Usage
🦕 Deno
import { getDocument } from 'https://esm.sh/pdfjs-serverless'
const data = Deno.readFileSync('dummy.pdf')
const doc = await getDocument(data).promise
console.log(await doc.getMetadata())
for (let i = 1; i <= doc.numPages; i++) {
const page = await doc.getPage(i)
const textContent = await page.getTextContent()
const contents = textContent.items.map(item => item.str).join(' ')
console.log(contents)
}Inspiration
pdf.mjs, a nodeless build of PDF.js v2.
License
MIT License © 2023-PRESENT Johann Schopplich