JSPM

webpage2pdf

1.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • 0
  • Score
    100M100P100Q12446F
  • License MIT

Convert web pages to PDF - Command line tool and Node.js module

Package Exports

  • webpage2pdf
  • webpage2pdf/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (webpage2pdf) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

webpage2pdf

A powerful tool to convert web pages, HTML strings, Buffers, or Streams to PDF files. Supports both command-line and function call usage, with advanced features like stream output and multi-page merging.

Installation

Using npm:

npm install -g webpage2pdf

Or using pnpm:

pnpm add -g webpage2pdf

Then you can use it directly:

webpage2pdf https://www.example.com

Option 2: Local Installation

Using npm:

npm install webpage2pdf

# Run with npx
npx webpage2pdf https://www.example.com

Or using pnpm:

pnpm add webpage2pdf

# Run with pnpm exec
pnpm exec webpage2pdf https://www.example.com

Option 3: Install as Dependency

Using npm:

npm install webpage2pdf --save

Or using pnpm:

pnpm add webpage2pdf

Usage

Method 1: Command Line

Basic Usage

# Convert a webpage to PDF (default: uses page title as filename)
webpage2pdf https://www.example.com
# Output: ./Example_202512251539.pdf

# Specify output path
webpage2pdf https://www.example.com -o ./my-pdf.pdf

# Specify page size
webpage2pdf https://www.example.com -s A4_PRINT

# Wait longer (ensure page fully loads)
webpage2pdf https://www.example.com -w 5000

# Wait for specific element
webpage2pdf https://www.example.com --selector "button"

Command Line Options

Option Short Description Default
--output -o Output file path Uses page title (document.title)
--size -s Page size (A4, A4_PRINT, A3, LETTER) A4
--wait -w Wait time (milliseconds) 3000
--selector Selector to wait for (e.g., button, #content) None
--header Custom headers (format: key:value, can be used multiple times) None

Method 2: Function Call

Basic Usage

const { generatePdf } = require('webpage2pdf');

// Basic usage
async function example() {
  const result = await generatePdf('https://www.example.com', './output.pdf');
  
  if (result.success) {
    console.log('PDF generated successfully:', result.path);
    console.log('File size:', result.size, 'bytes');
    console.log('Page title:', result.title);
  } else {
    console.error('Generation failed:', result.error);
  }
}

example();

Advanced Usage

const { generatePdf, PAGE_SIZE_CONFIG, setVerbose } = require('webpage2pdf');

// Disable verbose logging (recommended for function calls)
setVerbose(false);

async function advancedExample() {
  const result = await generatePdf('https://example.com', './output.pdf', {
    pageSize: 'A4_PRINT',        // Page size
    waitTime: 5000,              // Wait time (milliseconds)
    selector: '#content',         // Wait for specific element
    headers: {                   // Custom headers
      'Authorization': 'Bearer token',
      'X-Custom-Header': 'value'
    }
  });
  
  if (result.success) {
    console.log('Success:', result.path);
  }
}

advancedExample();

API Documentation

generatePdf(input, outputPath, options)

Convert a webpage or HTML to PDF.

Parameters:

  • input (string|string[]|Buffer|Readable, required) - Input:
    • string - URL or HTML string
    • string[] - URL array (multi-page merge)
    • Buffer - HTML Buffer
    • Readable - HTML Stream
  • outputPath (string|null, optional) - Output file path
    • string - Save to file
    • null - Return Stream
  • options (object, optional) - Configuration options
    • pageSize (string) - Page size, options: A4, A4_PRINT, A3, LETTER, default: A4
    • waitTime (number) - Wait time (milliseconds), default: 3000
    • selector (string) - Selector to wait for (optional), default: null
    • headers (object) - Custom headers (optional), default: {}
    • margin (object) - Page margins, format: {top, right, bottom, left}, unit: mm, default: {top: '0mm', right: '0mm', bottom: '0mm', left: '0mm'}
    • scale (number) - Scale factor (0.1-2), default: 1
    • printBackground (boolean) - Print background, default: true
    • ignore (string|RegExp|Array) - Errors to ignore (string, regex, or array), default: []
    • debug (boolean) - Output debug information, default: false

Return Value:

{
  success: boolean,    // Whether successful
  path?: string,       // Output file path (when file output)
  stream?: Readable,   // PDF Stream (when stream output)
  size: number,       // File size (bytes)
  title?: string,      // Page title (when URL input)
  error?: string,      // Error message (when failed)
  ignored?: boolean   // Whether ignored (when error ignored)
}
setVerbose(verbose)

Set whether to output verbose logs.

Parameters:

  • verbose (boolean) - Whether to output logs, default: true
PAGE_SIZE_CONFIG

Page size configuration object containing all available page sizes.

Examples

Example 1: Convert Public Webpage

# Command line
webpage2pdf https://www.example.com -o example.pdf
// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://www.example.com', './example.pdf');

Example 2: Convert Authenticated Page

# Command line
webpage2pdf https://api.example.com/page \
  --header "Authorization:Bearer token" \
  -o authenticated.pdf
// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://api.example.com/page', './authenticated.pdf', {
  headers: {
    'Authorization': 'Bearer token'
  }
});

Example 3: Wait for Dynamic Content

# Command line
webpage2pdf https://example.com/dynamic-page \
  --selector "#content" \
  -w 10000 \
  -o dynamic-page.pdf
// Function call
const { generatePdf } = require('webpage2pdf');
await generatePdf('https://example.com/dynamic-page', './dynamic-page.pdf', {
  selector: '#content',
  waitTime: 10000
});

Example 4: Stream Output

// Function call - Return Stream
const { generatePdf } = require('webpage2pdf');
const fs = require('fs');

const result = await generatePdf('https://example.com', null);
if (result.success) {
  result.stream.pipe(fs.createWriteStream('output.pdf'));
}

Example 5: HTML String Input

const { generatePdf } = require('webpage2pdf');

const html = `
<html>
  <head><title>Test</title></head>
  <body><h1>Hello World</h1></body>
</html>
`;

const result = await generatePdf(html, './output.pdf');

Example 6: Multi-page Merge

const { generatePdf } = require('webpage2pdf');

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
];

// Merge multiple pages into one PDF
const result = await generatePdf(urls, './combined.pdf', {
  pageSize: 'A4',
  waitTime: 5000
});

Note: The current multi-page merge uses simple Buffer concatenation, which may not properly handle complex PDF structures. For professional PDF merging (preserving bookmarks, table of contents, etc.), it's recommended to:

  1. Use pdf-lib or similar libraries to implement merge logic
  2. Generate PDFs separately first, then merge using professional tools

Example 7: Error Ignoring

const { generatePdf } = require('webpage2pdf');

const result = await generatePdf('https://example.com', './output.pdf', {
  ignore: ['timeout', /network error/i],  // Ignore specific errors
  debug: true  // Enable debug mode
});

Example 8: Custom Margins and Scaling

const { generatePdf } = require('webpage2pdf');

const result = await generatePdf('https://example.com', './output.pdf', {
  margin: { top: '20mm', right: '15mm', bottom: '20mm', left: '15mm' },
  scale: 0.9,              // Scale to 90%
  printBackground: true   // Print background
});

Example 9: Batch Conversion

const { generatePdf, setVerbose } = require('webpage2pdf');

// Disable verbose logging
setVerbose(false);

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
];

async function batchConvert() {
  for (const url of urls) {
    const result = await generatePdf(url, `./${Date.now()}.pdf`);
    console.log(result.success ? '✓' : '✗', url);
  }
}

batchConvert();

Supported Page Sizes

  • A4: 210mm × 297mm (Standard A4)
  • A4_PRINT: 216mm × 291mm (A4 Print Size)
  • A3: 297mm × 420mm (A3)
  • LETTER: 8.5in × 11in (US Letter)

Technical Details

  • Uses Puppeteer for webpage rendering and PDF generation
  • Prefers system Chrome (if available), otherwise uses Puppeteer's bundled Chromium
  • Supports multiple input types: URL, HTML string, Buffer, Stream, URL array
  • Supports stream output: can return Stream instead of just files
  • Supports multi-page merging: can merge multiple webpages into one PDF (currently uses simple concatenation, suitable for simple scenarios)
  • Supports custom headers (for authenticated pages)
  • Supports waiting for specific elements (for dynamic content)
  • Supports error ignoring mechanism (can ignore specific errors)
  • Default filename: If -o parameter is not specified, automatically uses page's document.title as filename
  • Timestamp format: Timestamp in filename format is YYYYMMDDHHmm (e.g., 202512251539)

New Features

✨ Stream Output

Supports returning Stream, suitable for pipe operations and stream processing:

const result = await generatePdf('https://example.com', null);
result.stream.pipe(fs.createWriteStream('output.pdf'));

✨ Multiple Input Types

  • URL: 'https://example.com'
  • HTML String: '<html>...</html>'
  • HTML Buffer: Buffer.from('<html>...</html>')
  • HTML Stream: fs.createReadStream('input.html')
  • URL Array: ['url1', 'url2'] (multi-page merge, currently uses simple concatenation)

✨ Error Ignoring

Can ignore specific errors to avoid interrupting the flow due to non-fatal errors:

await generatePdf(url, './output.pdf', {
  ignore: ['timeout', /network error/i]
});

✨ Enhanced Configuration

  • Custom margins: margin: { top: '20mm', right: '15mm', ... }
  • Scale factor: scale: 0.9
  • Background print control: printBackground: true/false
  • Debug mode: debug: true

Notes

  1. First Run: If using Puppeteer's bundled Chromium, it will be automatically downloaded on first run (~200MB)
  2. Network Connection: Ensure you can access the target webpage
  3. Page Loading: For pages with lots of dynamic content, consider increasing wait time or using --selector option
  4. Authenticated Pages: If accessing authenticated pages, use --header option or headers parameter to add authentication information
  5. Multi-page Merge: Current implementation uses simple Buffer concatenation, suitable for simple scenarios. For professional PDF merging (preserving bookmarks, table of contents, metadata, etc.), it's recommended to use pdf-lib or similar libraries to implement merge logic
  6. Stream Output: When using stream output, ensure timely processing of Stream to avoid excessive memory usage
  7. HTML Input: When using HTML string input, ensure HTML format is correct, otherwise rendering may fail

Troubleshooting

Issue: Chrome Not Found

Solution:

  • macOS: Ensure Google Chrome is installed
  • Linux: Install Chromium or use puppeteer bundled version
  • Windows: Ensure Chrome is installed

Issue: Page Load Timeout

Solution:

  • Increase wait time: -w 10000 or waitTime: 10000
  • Use selector wait: --selector "#content" or selector: '#content'

Issue: Incomplete PDF Content

Solution:

  • Increase wait time
  • Use --selector or selector to wait for key elements to load
  • Check if the webpage has dynamically loaded content

Issue: Multi-page Merge Fails or Merged PDF Cannot Open Properly

Solution:

  • Ensure all URLs are accessible
  • Check network connection
  • Use ignore option to ignore non-fatal errors
  • Important: Current implementation uses simple Buffer concatenation, which may not properly handle complex PDF structures
  • For professional PDF merging, it's recommended to:
    1. Generate PDFs separately first
    2. Use pdf-lib, pdf-merger-js or similar libraries for merging
    3. Or use command-line tools like pdftk, ghostscript for merging

Professional Merge Example:

const { PDFDocument } = require('pdf-lib');
const { generatePdf } = require('webpage2pdf');
const fs = require('fs');

// 1. Generate PDFs separately
const urls = ['url1', 'url2', 'url3'];
const pdfFiles = [];
for (const url of urls) {
  const result = await generatePdf(url, `./temp-${Date.now()}.pdf`);
  if (result.success) {
    pdfFiles.push(result.path);
  }
}

// 2. Merge properly using pdf-lib
const mergedPdf = await PDFDocument.create();
for (const pdfPath of pdfFiles) {
  const pdfBytes = fs.readFileSync(pdfPath);
  const pdf = await PDFDocument.load(pdfBytes);
  const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
  pages.forEach((page) => mergedPdf.addPage(page));
}
const mergedPdfBytes = await mergedPdf.save();
fs.writeFileSync('./merged.pdf', mergedPdfBytes);

// 3. Clean up temporary files
pdfFiles.forEach(fs.unlinkSync);

Issue: Stream Output Not Working

Solution:

  • Ensure outputPath is set to null
  • Check the returned stream object
  • Ensure timely processing of Stream to avoid memory leaks

If you need to understand other webpage-to-PDF solutions, refer to the following comparison:

Mainstream Solutions Comparison

Solution Rendering Quality JS Support Resource Usage Speed Maintenance Cost
Puppeteer (This Project) ⭐⭐⭐⭐⭐ ✅ Full High (~200MB) Medium ✅ Active Free
Playwright ⭐⭐⭐⭐⭐ ✅ Full High (~300MB) Medium ✅ Active Free
wkhtmltopdf ⭐⭐⭐ ⚠️ Limited Low (~50MB) Fast ❌ Stopped Free
html2pdf.js ⭐⭐⭐ ⚠️ Limited Low Fast ✅ Active Free
Gotenberg ⭐⭐⭐⭐⭐ ✅ Full High Medium ✅ Active Free
Prince XML ⭐⭐⭐⭐⭐ ⚠️ Limited Medium Fast ✅ Active 💰 Commercial

Solution Details

1. Puppeteer (Used by This Project)

  • Tech Stack: Node.js + Chrome DevTools Protocol
  • Pros: Full modern web support, strong dynamic content handling, high rendering quality, feature-rich
  • Cons: High resource usage (~200MB), slower startup
  • Use Cases: Modern web applications, need to wait for dynamic content, need high-quality PDF

2. Playwright

  • Tech Stack: Node.js + Multi-browser engines
  • Pros: Supports multiple browser engines, smarter auto-wait mechanism, more modern API design
  • Cons: Higher resource usage, relatively new
  • Use Cases: Need cross-browser compatibility, automation testing + PDF generation

3. wkhtmltopdf

  • Tech Stack: C++ + Qt WebKit
  • Pros: Lightweight (~50MB), fast startup, low resource usage
  • Cons: Based on old WebKit, doesn't support modern JavaScript, limited CSS3 support, maintenance stopped
  • Use Cases: Simple static pages, batch processing, resource-constrained environments

4. html2pdf.js / jsPDF

  • Tech Stack: Pure frontend JavaScript
  • Pros: No backend support needed, client-side generation, lightweight
  • Cons: Average rendering quality (Canvas-based), doesn't support complex CSS, limited page control
  • Use Cases: Simple page conversion, no backend support needed, client-side generation

5. Gotenberg

  • Tech Stack: Docker + Chromium
  • Pros: Containerized deployment, Chromium-based high rendering quality, RESTful API
  • Cons: Requires Docker environment, requires server resources
  • Use Cases: Microservice architecture, containerized deployment, need API interface

Selection Advice

  • Modern Web Applications (React/Vue/Angular): Recommend Puppeteer or Playwright
  • Simple Static Pages: Recommend wkhtmltopdf or html2pdf.js
  • Batch Processing: Choose wkhtmltopdf (simple) or Puppeteer (complex) based on complexity
  • Microservice Architecture: Recommend Gotenberg
  • Frontend Direct Generation: Recommend html2pdf.js or jsPDF
  • Professional Typesetting Needs: Recommend Prince XML (commercial)

Advantages of This Project (webpage2pdf)

Based on Puppeteer, especially suitable for:

  • ✅ Modern web applications (React/Vue/Angular)
  • ✅ Need to wait for dynamic content loading
  • ✅ Need high-quality PDF output
  • ✅ Node.js environment

Differentiating Features:

  • Supports both command-line and function calls
  • Supports stream output
  • Supports multiple input types (URL, HTML, Buffer, Stream)
  • Supports multi-page merging
  • User-friendly API design

For more detailed comparison, refer to the "Related Solutions Comparison" section in the README.

Language

License

MIT

Contributing

Issues and Pull Requests are welcome!