JSPM

node-pptx-parser

1.0.01
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 10789
  • Score
    100M100P100Q131241F
  • License MIT

A PowerPoint (PPTX) parser that extracts text content with preserved formatting

Package Exports

  • node-pptx-parser

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (node-pptx-parser) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

node-pptx-parser

A Node.js library for parsing PowerPoint (PPTX) files and extracting text content. This library maintains text formatting, line breaks, and paragraph structures from the original presentation.

Features

  • Extract text content from PPTX files with preserved formatting

  • Parse PPTX structure into manageable JavaScript objects

  • Access raw XML content of presentation components

  • Written in TypeScript for type safety

  • Promise-based API

  • Preserves line breaks and paragraph formatting

  • Minimal dependencies

Installation

npm  install  node-pptx-parser

Usage

Once the package is installed you can you it with import or require statements like this:

// ESM import:
import PptxParser from "node-pptx-parser";

// CommonJs require:
const PptxParser = require("node-pptx-parser").default;

Basic Text Extraction

import PptxParser from "node-pptx-parser";

async function main() {
  const parser = new PptxParser("presentation.pptx");

  try {
    // Extract text from all slides
    const textContent = await parser.extractText();

    // Print text from each slide
    textContent.forEach((slide) => {
      console.log(`\nSlide ${slide.id}:`);

      console.log(slide.text.join("\n"));
    });
  } catch (error) {
    console.error("Error:", error.message);
  }
}

main();

Advanced Usage - Full Presentation Parsing

import PptxParser from "node-pptx-parser";

async function main() {
  const parser = new PptxParser("presentation.pptx");

  try {
    // Get complete parsed presentation content
    const parsedContent = await parser.parse();

    // Access presentation structure
    console.log(parsedContent.presentation.parsed);

    // Access individual slides
    parsedContent.slides.forEach((slide) => {
      console.log(`Slide ${slide.id}:`, slide.parsed);
    });

    // Access raw XML if needed
    console.log(parsedContent.presentation.xml);
  } catch (error) {
    console.error("Error:", error.message);
  }
}

main();

API Reference

PptxParser

The main class for parsing PPTX files.

Constructor

constructor(filePath: string)

Creates a new instance of PptxParser.

  • filePath: Path to the PPTX file to be parsed

Methods

parse()
async parse(): Promise<ParsedPresentation>

Parses the entire PPTX file and returns its content.

  • Returns: Promise resolving to a ParsedPresentation object containing the complete presentation structure
extractText()
async extractText(): Promise<SlideTextContent[]>

Extracts formatted text content from all slides.

  • Returns: Promise resolving to an array of SlideTextContent objects

Types

ParsedPresentation

interface ParsedPresentation {
  presentation: {
    path: string;
    xml: string;
    parsed: any;
  };
  relationships: {
    path: string;
    xml: string;
    parsed: any;
  };
  slides: ParsedSlide[];
}

ParsedSlide

interface ParsedSlide {
  id: string;
  path: string;
  xml: string;
  parsed: any;
}

SlideTextContent

interface SlideTextContent extends ParsedSlide {
  text: string[];
}

Error Handling

The library throws errors in the following cases:

  • Invalid PPTX file structure

  • File reading errors

  • XML parsing errors

Example error handling:

try {
  const parser = new PptxParser("presentation.ppt");
  const content = await parser.extractText();
} catch (error) {
  if (error.message.includes("Invalid PPTX file structure")) {
    console.error("The PPTX file is corrupted or invalid");
  } else {
    console.error("An error occurred:", error.message);
  }
}

Dependencies

  • unzipper: For extracting PPTX files
  • xml2js: For parsing XML content

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.