Package Exports

webpage2pdf
webpage2pdf/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (webpage2pdf) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

webpage2pdf

A powerful tool to convert web pages, HTML strings, Buffers, or Streams to PDF files. Supports both command-line and function call usage, with advanced features like stream output and multi-page merging.

Installation

Option 1: Global Installation (Recommended)

Using npm:

npm install -g webpage2pdf

Or using pnpm:

pnpm add -g webpage2pdf

Then you can use it directly:

webpage2pdf https://www.example.com

Option 2: Local Installation

Using npm:

npm install webpage2pdf

# Run with npx
npx webpage2pdf https://www.example.com

Or using pnpm:

pnpm add webpage2pdf

# Run with pnpm exec
pnpm exec webpage2pdf https://www.example.com

Option 3: Install as Dependency

Using npm:

npm install webpage2pdf --save

Or using pnpm:

pnpm add webpage2pdf

Usage

Method 1: Command Line

Basic Usage

# Convert a webpage to PDF (default: uses page title as filename)
webpage2pdf https://www.example.com
# Output: ./Example_202512251539.pdf

# Specify output path
webpage2pdf https://www.example.com -o ./my-pdf.pdf

# Specify page size
webpage2pdf https://www.example.com -s A4_PRINT

# Wait longer (ensure page fully loads)
webpage2pdf https://www.example.com -w 5000

# Wait for specific element
webpage2pdf https://www.example.com --selector "button"

Command Line Options

Option	Short	Description	Default
`--output`	`-o`	Output file path	Uses page title (`document.title`)
`--size`	`-s`	Page size (A4, A4_PRINT, A3, LETTER)	`A4`
`--wait`	`-w`	Wait time (milliseconds)	`3000`
`--selector`		Selector to wait for (e.g., `button`, `#content`)	None
`--header`		Custom headers (format: `key:value`, can be used multiple times)	None

Method 2: Function Call

Basic Usage

const { generatePdf } = require('webpage2pdf');

// Basic usage
async function example() {
  const result = await generatePdf('https://www.example.com', './output.pdf');
  
  if (result.success) {
    console.log('PDF generated successfully:', result.path);
    console.log('File size:', result.size, 'bytes');
    console.log('Page title:', result.title);
  } else {
    console.error('Generation failed:', result.error);
  }
}

example();

Advanced Usage

const { generatePdf, PAGE_SIZE_CONFIG, setVerbose } = require('webpage2pdf');

// Disable verbose logging (recommended for function calls)
setVerbose(false);

async function advancedExample() {
  const result = await generatePdf('https://example.com', './output.pdf', {
    pageSize: 'A4_PRINT',        // Page size
    waitTime: 5000,              // Wait time (milliseconds)
    selector: '#content',         // Wait for specific element
    headers: {                   // Custom headers
      'Authorization': 'Bearer token',
      'X-Custom-Header': 'value'
    }
  });
  
  if (result.success) {
    console.log('Success:', result.path);
  }
}

advancedExample();

API Documentation

`generatePdf(input, outputPath, options)`

Convert a webpage or HTML to PDF.

Parameters:

input (string|string[]|Buffer|Readable, required) - Input:
- string - URL or HTML string
- string[] - URL array (multi-page merge)
- Buffer - HTML Buffer
- Readable - HTML Stream
outputPath (string|null, optional) - Output file path
- string - Save to file
- null - Return Stream
options (object, optional) - Configuration options
- pageSize (string) - Page size, options: A4, A4_PRINT, A3, LETTER, default: A4
- waitTime (number) - Wait time (milliseconds), default: 3000
- selector (string) - Selector to wait for (optional), default: null
- headers (object) - Custom headers (optional), default: {}
- margin (object) - Page margins, format: {top, right, bottom, left}, unit: mm, default: {top: '0mm', right: '0mm', bottom: '0mm', left: '0mm'}
- scale (number) - Scale factor (0.1-2), default: 1
- printBackground (boolean) - Print background, default: true
- ignore (string|RegExp|Array) - Errors to ignore (string, regex, or array), default: []
- debug (boolean) - Output debug information, default: false

Return Value:

{
  success: boolean,    // Whether successful
  path?: string,       // Output file path (when file output)
  stream?: Readable,   // PDF Stream (when stream output)
  size: number,       // File size (bytes)
  title?: string,      // Page title (when URL input)
  error?: string,      // Error message (when failed)
  ignored?: boolean   // Whether ignored (when error ignored)
}

`setVerbose(verbose)`

Set whether to output verbose logs.

Parameters:

verbose (boolean) - Whether to output logs, default: true

`PAGE_SIZE_CONFIG`

Page size configuration object containing all available page sizes.

Examples

Example 1: Convert Public Webpage

# Command line
webpage2pdf https://www.example.com -o example.pdf

// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://www.example.com', './example.pdf');

Example 2: Convert Authenticated Page

# Command line
webpage2pdf https://api.example.com/page \
  --header "Authorization:Bearer token" \
  -o authenticated.pdf

// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://api.example.com/page', './authenticated.pdf', {
  headers: {
    'Authorization': 'Bearer token'
  }
});

Example 3: Wait for Dynamic Content

# Command line
webpage2pdf https://example.com/dynamic-page \
  --selector "#content" \
  -w 10000 \
  -o dynamic-page.pdf

// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://example.com/dynamic-page', './dynamic-page.pdf', {
  selector: '#content',
  waitTime: 10000
});

Example 4: Stream Output

// Function call - Return Stream
const { generatePdf } = require('webpage2pdf');
const fs = require('fs');

const result = await generatePdf('https://example.com', null);
if (result.success) {
  result.stream.pipe(fs.createWriteStream('output.pdf'));
}

Example 5: HTML String Input

const { generatePdf } = require('webpage2pdf');

const html = `
<html>
  <head><title>Test</title></head>
  <body><h1>Hello World</h1></body>
</html>
`;

const result = await generatePdf(html, './output.pdf');

Example 6: Multi-page Merge

const { generatePdf } = require('webpage2pdf');

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
];

// Merge multiple pages into one PDF
const result = await generatePdf(urls, './combined.pdf', {
  pageSize: 'A4',
  waitTime: 5000
});

Note: The current multi-page merge uses simple Buffer concatenation, which may not properly handle complex PDF structures. For professional PDF merging (preserving bookmarks, table of contents, etc.), it's recommended to:

Use pdf-lib or similar libraries to implement merge logic
Generate PDFs separately first, then merge using professional tools

Example 7: Error Ignoring

const { generatePdf } = require('webpage2pdf');

const result = await generatePdf('https://example.com', './output.pdf', {
  ignore: ['timeout', /network error/i],  // Ignore specific errors
  debug: true  // Enable debug mode
});

Example 8: Custom Margins and Scaling

const { generatePdf } = require('webpage2pdf');

const result = await generatePdf('https://example.com', './output.pdf', {
  margin: { top: '20mm', right: '15mm', bottom: '20mm', left: '15mm' },
  scale: 0.9,              // Scale to 90%
  printBackground: true   // Print background
});

Example 9: Batch Conversion

const { generatePdf, setVerbose } = require('webpage2pdf');

// Disable verbose logging
setVerbose(false);

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
];

async function batchConvert() {
  for (const url of urls) {
    const result = await generatePdf(url, `./${Date.now()}.pdf`);
    console.log(result.success ? '✓' : '✗', url);
  }
}

batchConvert();

Supported Page Sizes

A4: 210mm × 297mm (Standard A4)
A4_PRINT: 216mm × 291mm (A4 Print Size)
A3: 297mm × 420mm (A3)
LETTER: 8.5in × 11in (US Letter)

Technical Details

Uses Puppeteer for webpage rendering and PDF generation
Prefers system Chrome (if available), otherwise uses Puppeteer's bundled Chromium
Supports multiple input types: URL, HTML string, Buffer, Stream, URL array
Supports stream output: can return Stream instead of just files
Supports multi-page merging: can merge multiple webpages into one PDF (currently uses simple concatenation, suitable for simple scenarios)
Supports custom headers (for authenticated pages)
Supports waiting for specific elements (for dynamic content)
Supports error ignoring mechanism (can ignore specific errors)
Default filename: If -o parameter is not specified, automatically uses page's document.title as filename
Timestamp format: Timestamp in filename format is YYYYMMDDHHmm (e.g., 202512251539)

New Features

✨ Stream Output

Supports returning Stream, suitable for pipe operations and stream processing:

const result = await generatePdf('https://example.com', null);
result.stream.pipe(fs.createWriteStream('output.pdf'));

✨ Multiple Input Types

URL: 'https://example.com'
HTML String: '<html>...</html>'
HTML Buffer: Buffer.from('<html>...</html>')
HTML Stream: fs.createReadStream('input.html')
URL Array: ['url1', 'url2'] (multi-page merge, currently uses simple concatenation)

✨ Error Ignoring

Can ignore specific errors to avoid interrupting the flow due to non-fatal errors:

await generatePdf(url, './output.pdf', {
  ignore: ['timeout', /network error/i]
});

✨ Enhanced Configuration

Custom margins: margin: { top: '20mm', right: '15mm', ... }
Scale factor: scale: 0.9
Background print control: printBackground: true/false
Debug mode: debug: true

Notes

First Run: If using Puppeteer's bundled Chromium, it will be automatically downloaded on first run (~200MB)
Network Connection: Ensure you can access the target webpage
Page Loading: For pages with lots of dynamic content, consider increasing wait time or using --selector option
Authenticated Pages: If accessing authenticated pages, use --header option or headers parameter to add authentication information
Multi-page Merge: Current implementation uses simple Buffer concatenation, suitable for simple scenarios. For professional PDF merging (preserving bookmarks, table of contents, metadata, etc.), it's recommended to use pdf-lib or similar libraries to implement merge logic
Stream Output: When using stream output, ensure timely processing of Stream to avoid excessive memory usage
HTML Input: When using HTML string input, ensure HTML format is correct, otherwise rendering may fail

Troubleshooting

Issue: Chrome Not Found

Solution:

macOS: Ensure Google Chrome is installed
Linux: Install Chromium or use puppeteer bundled version
Windows: Ensure Chrome is installed

Issue: Page Load Timeout

Solution:

Increase wait time: -w 10000 or waitTime: 10000
Use selector wait: --selector "#content" or selector: '#content'

Issue: Incomplete PDF Content

Solution:

Increase wait time
Use --selector or selector to wait for key elements to load
Check if the webpage has dynamically loaded content

Issue: Multi-page Merge Fails or Merged PDF Cannot Open Properly

Solution:

Ensure all URLs are accessible
Check network connection
Use ignore option to ignore non-fatal errors
Important: Current implementation uses simple Buffer concatenation, which may not properly handle complex PDF structures
For professional PDF merging, it's recommended to:
1. Generate PDFs separately first
2. Use pdf-lib, pdf-merger-js or similar libraries for merging
3. Or use command-line tools like pdftk, ghostscript for merging

Professional Merge Example:

const { PDFDocument } = require('pdf-lib');
const { generatePdf } = require('webpage2pdf');
const fs = require('fs');

// 1. Generate PDFs separately
const urls = ['url1', 'url2', 'url3'];
const pdfFiles = [];
for (const url of urls) {
  const result = await generatePdf(url, `./temp-${Date.now()}.pdf`);
  if (result.success) {
    pdfFiles.push(result.path);
  }
}

// 2. Merge properly using pdf-lib
const mergedPdf = await PDFDocument.create();
for (const pdfPath of pdfFiles) {
  const pdfBytes = fs.readFileSync(pdfPath);
  const pdf = await PDFDocument.load(pdfBytes);
  const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
  pages.forEach((page) => mergedPdf.addPage(page));
}
const mergedPdfBytes = await mergedPdf.save();
fs.writeFileSync('./merged.pdf', mergedPdfBytes);

// 3. Clean up temporary files
pdfFiles.forEach(fs.unlinkSync);

Issue: Stream Output Not Working

Solution:

Ensure outputPath is set to null
Check the returned stream object
Ensure timely processing of Stream to avoid memory leaks

If you need to understand other webpage-to-PDF solutions, refer to the following comparison:

Mainstream Solutions Comparison

Solution	Rendering Quality	JS Support	Resource Usage	Speed	Maintenance	Cost
Puppeteer (This Project)	⭐⭐⭐⭐⭐	✅ Full	High (~200MB)	Medium	✅ Active	Free
Playwright	⭐⭐⭐⭐⭐	✅ Full	High (~300MB)	Medium	✅ Active	Free
wkhtmltopdf	⭐⭐⭐	⚠️ Limited	Low (~50MB)	Fast	❌ Stopped	Free
html2pdf.js	⭐⭐⭐	⚠️ Limited	Low	Fast	✅ Active	Free
Gotenberg	⭐⭐⭐⭐⭐	✅ Full	High	Medium	✅ Active	Free
Prince XML	⭐⭐⭐⭐⭐	⚠️ Limited	Medium	Fast	✅ Active	💰 Commercial

Solution Details

1. Puppeteer (Used by This Project)

Tech Stack: Node.js + Chrome DevTools Protocol
Pros: Full modern web support, strong dynamic content handling, high rendering quality, feature-rich
Cons: High resource usage (~200MB), slower startup
Use Cases: Modern web applications, need to wait for dynamic content, need high-quality PDF

2. Playwright

Tech Stack: Node.js + Multi-browser engines
Pros: Supports multiple browser engines, smarter auto-wait mechanism, more modern API design
Cons: Higher resource usage, relatively new
Use Cases: Need cross-browser compatibility, automation testing + PDF generation

3. wkhtmltopdf

Tech Stack: C++ + Qt WebKit
Pros: Lightweight (~50MB), fast startup, low resource usage
Cons: Based on old WebKit, doesn't support modern JavaScript, limited CSS3 support, maintenance stopped
Use Cases: Simple static pages, batch processing, resource-constrained environments

4. html2pdf.js / jsPDF

Tech Stack: Pure frontend JavaScript
Pros: No backend support needed, client-side generation, lightweight
Cons: Average rendering quality (Canvas-based), doesn't support complex CSS, limited page control
Use Cases: Simple page conversion, no backend support needed, client-side generation

5. Gotenberg

Tech Stack: Docker + Chromium
Pros: Containerized deployment, Chromium-based high rendering quality, RESTful API
Cons: Requires Docker environment, requires server resources
Use Cases: Microservice architecture, containerized deployment, need API interface

Selection Advice

Modern Web Applications (React/Vue/Angular): Recommend Puppeteer or Playwright
Simple Static Pages: Recommend wkhtmltopdf or html2pdf.js
Batch Processing: Choose wkhtmltopdf (simple) or Puppeteer (complex) based on complexity
Microservice Architecture: Recommend Gotenberg
Frontend Direct Generation: Recommend html2pdf.js or jsPDF
Professional Typesetting Needs: Recommend Prince XML (commercial)

Advantages of This Project (webpage2pdf)

Based on Puppeteer, especially suitable for:

✅ Modern web applications (React/Vue/Angular)
✅ Need to wait for dynamic content loading
✅ Need high-quality PDF output
✅ Node.js environment

Differentiating Features:

Supports both command-line and function calls
Supports stream output
Supports multiple input types (URL, HTML, Buffer, Stream)
Supports multi-page merging
User-friendly API design

For more detailed comparison, refer to the "Related Solutions Comparison" section in the README.

Language

English (Current)
简体中文

License

MIT

Contributing

Issues and Pull Requests are welcome!