Package Exports
- @shilendra-dev/pdf-to-json
- @shilendra-dev/pdf-to-json/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@shilendra-dev/pdf-to-json) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
PDF to JSON Converter
A TypeScript utility that converts PDF documents into structured JSON data while preserving text content, formatting, and hyperlinks. Perfect for resume parsing, document analysis, and content extraction workflows.
✨ Features
- Text Extraction: Extract text content with precise positioning and styling
- Hyperlink Detection: Capture clickable links with their coordinates and target URLs
- Font Preservation: Maintains font information for each text element
- Multi-page Support: Processes documents of any length
- Type Safety: Built with TypeScript for better development experience
- Lightweight: Minimal dependencies
📦 Installation
Prerequisites
Make sure you have the following installed on your system:
- Node.js (v16 or higher)
- npm (v7 or higher) or yarn
Install the package
Using npm:
npm install @shilendra-dev/pdf-to-jsonOr using yarn:
yarn add @shilendra-dev/pdf-to-jsonPeer Dependencies
This package requires the following peer dependencies which will be installed automatically:
pdfjs-dist: ^3.4.120 (PDF.js library for PDF parsing)@types/node: ^18.0.0 (TypeScript types for Node.js)
🚀 Usage
import { pdfToJson } from '@shilendra-dev/pdf-to-json';
import fs from 'fs/promises';
async function convertPdfToJson() {
try {
// Read PDF file
const pdfBuffer = await fs.readFile('path/to/your/document.pdf');
// Convert to JSON
const result = await pdfToJson(pdfBuffer, {
outputPath: 'output.json' // Optional: Path to save the JSON output
});
console.log('Conversion complete!');
console.log(`Processed ${result.numPages} pages`);
} catch (error) {
console.error('Error converting PDF:', error);
}
}
convertPdfToJson();📝 API
pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>
Converts a PDF document to JSON.
Parameters:
pdfSource: PDF file as Buffer or file pathoptions: (Optional) Configuration optionsoutputPath: (string) Path to save the JSON output fileincludeTextContent: (boolean) Whether to include raw text content (default: true)includeStyles: (boolean) Whether to include font and style information (default: true)includeLinks: (boolean) Whether to include hyperlinks (default: true)
Returns: Promise that resolves to the parsed PDF data
📂 Output Format
The converter generates a JSON object with the following structure:
{
numPages: number;
pages: Array<{
pageNumber: number;
width: number;
height: number;
items: Array<{
type: 'text' | 'link';
content: string;
x: number;
y: number;
width: number;
height: number;
fontFamily?: string;
fontSize?: number;
color?: string;
url?: string; // For links
}>;
}>;
}🔍 Example
import { pdfToJson } from '@shilendra-dev/pdf-to-json';
// Convert PDF from URL
const response = await fetch('https://example.com/document.pdf');
const pdfBuffer = await response.arrayBuffer();
const result = await pdfToJson(Buffer.from(pdfBuffer));
// Process the extracted data
result.pages.forEach(page => {
console.log(`Page ${page.pageNumber} (${page.width}x${page.height}):`);
console.log(`- Contains ${page.items.length} text items`);
});🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by Shilendra Singh