JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 17
  • Score
    100M100P100Q44326F
  • License ISC

Node-RED node to extract pdf data using pdfjs with legacy support

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (node-red-contrib-pdfjs-mac) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    A node to help extract text contents from a pdf. Uses the Mozilla library found at https://github.com/mozilla/pdfjs-dist to extract text data.

    Inputs

    payload

    Either a buffer object that corresponds to a pdf file or a filepath leading to a pdf file to be decoded.

    Config

    filename

    If a file path/object is not provided in the payload, A file path to a pdf file should be provided here to be decoded.

    Order text

    Check this option to force the text to be ordered top down using the y value if 'from top to bottom' is selected, or ordered left to right by it's x value if 'from left to right' is selected. If both options are selected, it will order from top to bottom, then left to right.

    Merge text with next text

    When inserting text into output payload array, if the previous text inserted has the same x value (are in the same column), or same y value (are in the same row), it will concatenate the string to be inserted with the previous string with a space instead.

    Outputs

    payload

    Results of the parsing will be returned as an array with each element in the array corresponding to a page in the pdf. Each page in the array is stored as an array of objects which can be seen below.

    [
        {
            "p": 1, // order on the page
            "x": 328.78, // distance away from the right edge
            "y": 1175.676, // distance away from the bottom of the page
            "t": "Survey Responses 1/02/21 - 31/04/21" // text content
        },
        {
            "p": 2, 
            "x": 428.78, 
            "y": 1175.676, 
            "t": "Survey Responses 1/05/21 - 31/07/21"
        }
    ]