JSPM

  • Created
  • Published
  • Downloads 555
  • Score
    100M100P100Q105870F
  • License MIT

Official TypeScript/JavaScript SDK for PDF Vector API - Parse PDF/Word/Image/Excel documents to clean, structured markdown format and search academic publications across multiple databases

Package Exports

  • @pdfvector/instance-client
  • @pdfvector/instance-client/.tsc/lib/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@pdfvector/instance-client) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

PDF Vector TypeScript/JavaScript SDK

The official TypeScript/JavaScript SDK for the PDF Vector API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.

Installation

npm install @pdfvector/instance-client
# or
yarn add @pdfvector/instance-client
# or
pnpm add @pdfvector/instance-client
# or
bun add @pdfvector/instance-client

Quick Start

import { createClient } from "@pdfvector/instance-client";

const client = createClient({
  apiKey: "your-api-key",
});

// Parse a document
const parseResult = await client.document.parse({
  url: "https://example.com/document.pdf",
});

console.log(parseResult.markdown);
console.log(`Pages: ${parseResult.pageCount}, Model: ${parseResult.model}`);

// Ask questions about documents
const askResult = await client.document.ask({
  url: "https://example.com/research-paper.pdf",
  question: "What are the key findings and conclusions?",
});

console.log(askResult.markdown);

// Extract structured data using JSON Schema
const extractResult = await client.document.extract({
  url: "https://example.com/research-paper.pdf",
  prompt: "Extract the research information",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      authors: { type: "array", items: { type: "string" } },
      abstract: { type: "string" },
      findings: { type: "array", items: { type: "string" } },
    },
    required: ["title", "abstract"],
  },
});

console.log(extractResult.data);

Authentication

Get your API key from the PDF Vector dashboard.

const client = createClient({
  apiKey: "your-api-key",
});

Verify your credentials:

const status = await client.authenticate.validateCredential();
console.log(status.version); // Server version

Custom Domain

By default, the SDK connects to global.pdfvector.com. For custom or self-hosted instances:

const client = createClient({
  domain: "your-instance.pdfvector.com",
  apiKey: "your-api-key",
});

// For local development
const localClient = createClient({
  domain: "localhost:34000",
  apiKey: "your-api-key",
});

Document Processing

All document endpoints accept three input methods: url, file (File/Blob), or base64.

Supported file types: PDF, Word (.docx), Excel (.xlsx), CSV, and Image (.png, .jpg).

Parse

Extract text content from documents:

const result = await client.document.parse(
  {
    url: "https://example.com/document.pdf",
    model: "auto", // "auto" | "nano" | "mini" | "pro" | "max"
  },
  { context: { documentId: "my-doc-123" } }, // optional, for usage tracking
);

console.log(result.markdown);  // Extracted text
console.log(result.pageCount); // Number of pages
console.log(result.model);     // Model tier used
console.log(result.html);      // Full HTML (only with 'max' model)
console.log(result.documentId); // "my-doc-123"

Parse from file data

import { readFile } from "fs/promises";

const result = await client.document.parse(
  {
    file: new File([await readFile("document.pdf")], "document.pdf", {
      type: "application/pdf",
    }),
    model: "auto",
  },
  { context: { documentId: "uploaded-doc" } },
);

console.log(result.markdown);

Ask

Answer questions about a document:

const result = await client.document.ask(
  {
    url: "https://example.com/research-paper.pdf",
    question: "What are the main findings of this study?",
    model: "auto",
  },
  { context: { documentId: "research-paper-1" } },
);

console.log(result.markdown);

Extract

Extract structured data using a JSON Schema:

const result = await client.document.extract(
  {
    url: "https://example.com/research-paper.pdf",
    prompt: "Extract the title, authors, and publication year",
    schema: {
      type: "object",
      properties: {
        title: { type: "string" },
        authors: { type: "array", items: { type: "string" } },
        year: { type: "number" },
      },
      required: ["title", "authors", "year"],
    },
  },
  { context: { documentId: "research-paper-1" } },
);

console.log(result.data); // { title: "...", authors: [...], year: 2024 }

Invoice Processing

Specialized methods for processing invoices. Parse supports pro and max models only. Ask and extract support all model tiers.

Parse Invoice

const result = await client.invoice.parse(
  { url: "https://example.com/invoice.pdf" },
  { context: { documentId: "invoice-001" } },
);

console.log(result.markdown);

Ask Questions About Invoices

const result = await client.invoice.ask(
  {
    url: "https://example.com/invoice.pdf",
    question: "What is the total amount and due date for this invoice?",
  },
  { context: { documentId: "invoice-001" } },
);

console.log(result.markdown);

Extract Structured Invoice Data

const result = await client.invoice.extract(
  {
    url: "https://example.com/invoice.pdf",
    prompt: "Extract all invoice details including vendor, items, and totals",
    schema: {
      type: "object",
      properties: {
        invoiceNumber: { type: "string" },
        date: { type: "string" },
        totalAmount: { type: "number" },
        items: {
          type: "array",
          items: {
            type: "object",
            properties: {
              description: { type: "string" },
              quantity: { type: "number" },
              price: { type: "number" },
            },
          },
        },
      },
      required: ["invoiceNumber", "date", "totalAmount", "items"],
    },
  },
  { context: { documentId: "invoice-001" } },
);

console.log(result.data);

Identity Document Processing

Specialized methods for processing ID documents (passports, driver's licenses, ID cards). Parse supports pro and max models only. Ask and extract support all model tiers.

Parse ID Document

const result = await client.identity.parse(
  { url: "https://example.com/passport.pdf" },
  { context: { documentId: "passport-jane" } },
);

console.log(result.markdown);
console.log(result.documentType); // e.g., "passport"

Ask Questions About ID Documents

const result = await client.identity.ask(
  {
    url: "https://example.com/passport.pdf",
    question: "What is the full name and date of birth on this document?",
  },
  { context: { documentId: "passport-jane" } },
);

console.log(result.markdown);

Extract Structured ID Document Data

const result = await client.identity.extract(
  {
    url: "https://example.com/passport.pdf",
    prompt: "Extract passport details from this document",
    schema: {
      type: "object",
      properties: {
        fullName: { type: "string" },
        dateOfBirth: { type: "string" },
        documentNumber: { type: "string" },
        nationality: { type: "string" },
        expirationDate: { type: "string" },
      },
      required: ["fullName", "documentNumber"],
    },
  },
  { context: { documentId: "passport-jane" } },
);

console.log(result.data);

Bank Statement Processing

Specialized methods for processing bank statements. Parse supports pro and max models only. Ask and extract support all model tiers.

const result = await client.bankStatement.parse(
  { url: "https://example.com/statement.pdf" },
  { context: { documentId: "statement-2024-03" } },
);

console.log(result.markdown);

Also supports bankStatement.ask() and bankStatement.extract() with the same patterns as above.

Academic Research

Search Academic Publications

Search across multiple academic databases with a unified API. Costs 2 credits per request.

const result = await client.academic.search({
  query: "quantum computing",
  providers: ["semantic-scholar", "arxiv", "pubmed"],
  limit: 20,
  yearFrom: 2021,
  yearTo: 2024,
});

result.results.forEach((publication) => {
  console.log(`Title: ${publication.title}`);
  console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
  console.log(`Year: ${publication.year}`);
  console.log("---");
});

Supported Providers:

Search Parameters:

  • query (required): 1-400 characters
  • providers: Array of provider names (default: ["semantic-scholar"])
  • offset: Pagination offset (default: 0)
  • limit: Results per provider, 1-100 (default: 20)
  • yearFrom / yearTo: Filter by publication year (1900-2100)
  • fields: Specific fields to return ("doi", "title", "url", "providerURL", "authors", "date", "year", "totalCitations", "totalReferences", "abstract", "pdfURL", "provider", "providerData")

Fetch Academic Publications by ID

Fetch specific papers by their identifiers with automatic provider detection. Costs 2 credits per request.

const result = await client.academic.fetch({
  ids: [
    "10.1038/nature12373", // DOI
    "12345678",            // PubMed ID
    "2301.00001",          // ArXiv ID
  ],
  fields: ["title", "authors", "year", "abstract", "doi"],
});

result.results.forEach((pub) => {
  console.log(`Title: ${pub.title}`);
  console.log(`Provider: ${pub.detectedProvider}`);
});

result.errors?.forEach((error) => {
  console.log(`Failed to fetch ${error.id}: ${error.error}`);
});

Supported ID types: DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, ERIC ID, Europe PMC ID, OpenAlex ID.

Find Citations for a Paragraph

Find relevant academic citations for each sentence in a paragraph using semantic similarity. Costs 2 credits per sentence analyzed.

const result = await client.academic.findCitations({
  paragraph:
    "Transformers have revolutionized natural language processing. Attention mechanisms allow models to focus on relevant parts of the input.",
  providers: ["semantic-scholar", "arxiv", "pubmed"],
});

console.log(
  `Found ${result.totalCitations} citations across ${result.sentenceCount} sentences`,
);

for (const item of result.results) {
  console.log(`\nSentence: ${item.sentence}`);
  for (const citation of item.citations) {
    console.log(`  [Score: ${citation.score}/10] ${citation.title}`);
  }
}

Paper Citation Graph

Retrieve a paper's citing papers and referenced papers with pagination support. Costs 2+ credits per request (scales with result count).

const result = await client.academic.paperGraph({
  id: "10.1038/nature12373", // DOI, ArXiv ID, Semantic Scholar ID, OpenAlex ID, or URL
  citationsLimit: 20,
  referencesLimit: 20,
  citationsOffset: 0,
  referencesOffset: 0,
  fields: ["title", "authors", "year", "doi"],
});

console.log(`Paper: ${result.paper.title}`);
console.log(`Total Citations: ${result.totalCitations}`);
console.log(`Total References: ${result.totalReferences}`);

for (const citation of result.citations) {
  console.log(`  Cited by: ${citation.title} (${citation.year})`);
}

Parameters:

  • id (required): Paper identifier (DOI, ArXiv ID, Semantic Scholar ID, OpenAlex ID, or URL)
  • citationsLimit: Max citing papers to return, 0-1000 (default: 100)
  • referencesLimit: Max referenced papers to return, 0-1000 (default: 100)
  • citationsOffset / referencesOffset: Pagination offsets (default: 0)
  • fields: Specific fields to return

Find Similar Papers

Find papers similar to a seed paper using citation network analysis. Costs 3 credits per request.

const result = await client.academic.similarPapers({
  id: "10.1038/nature12373",
  limit: 10,
  includeEdges: true, // include citation graph edges for visualization
  fields: ["title", "year", "doi"],
});

console.log(`Seed: ${result.seed.title}`);

for (const item of result.results) {
  console.log(`  [Similarity: ${item.similarity.toFixed(2)}] ${item.publication.title}`);
  if (item.citingIds) {
    console.log(`    Citing: ${item.citingIds.length} papers in result set`);
  }
}

Parameters:

  • id (required): Seed paper identifier
  • limit: Max similar papers, 1-100 (default: 30)
  • includeEdges: Include citingIds/citedByIds for graph construction (default: false)
  • fields: Specific fields to return

Search Grants

Search for research grants and funding opportunities across multiple databases with a unified API. Costs 2 credits per request.

const result = await client.academic.searchGrants({
  query: "machine learning healthcare",
  providers: ["grants-gov", "nih-reporter", "cordis", "ukri"],
  limit: 10,
  fundingMin: 100000,
  fundingMax: 1000000,
  deadlineFrom: "2026-01-01",
  deadlineTo: "2026-12-31",
});

console.log(`Found ${result.estimatedTotalResults} grants`);

for (const grant of result.results) {
  console.log(`Title: ${grant.title}`);
  console.log(`Agency: ${grant.agency} (${grant.region})`);
  console.log(`Funding: ${grant.currency} ${grant.fundingAmountMin}-${grant.fundingAmountMax}`);
  console.log(`Deadline: ${grant.deadlineDate ?? grant.closeDate ?? "N/A"}`);
  console.log(`URL: ${grant.url}`);
  console.log("---");
}

// Partial provider failures are reported in errors array
result.errors?.forEach((error) => {
  console.log(`Provider ${error.provider} failed: ${error.message}`);
});

Supported Grant Providers:

  • "grants-gov" - Grants.gov (US federal grants — NIH, NSF, DOE, DOD, etc.)
  • "nih-reporter" - NIH RePORTER (NIH-funded research projects)
  • "cordis" - CORDIS (EU-funded projects — Horizon Europe, ERC, etc.)
  • "ukri" - UKRI (UK-funded projects — EPSRC, BBSRC, MRC, etc.)

Search Parameters:

  • query (required): 1-400 characters
  • providers: Array of grant provider names (default: all 4 providers)
  • offset: Pagination offset (default: 0)
  • limit: Results per provider, 1-50 (default: 10)
  • fundingMin / fundingMax: Filter by funding amount
  • deadlineFrom / deadlineTo: Filter by deadline date (ISO format, e.g. "2026-01-01")
  • fields: Specific fields to return ("sourceId", "title", "url", "agency", "program", "description", "eligibility", "fundingAmountMin", "fundingAmountMax", "currency", "deadlineDate", "openDate", "closeDate", "grantType", "region", "keywords", "piName", "organizationName", "provider", "providerData")

Document ID Tracking

Pass a documentId per request to track API usage. The ID is sent as a header and returned in responses for document, identity, invoice, and bank statement endpoints. Academic endpoints do not use documentId.

const result = await client.document.parse(
  { url: "https://example.com/document.pdf" },
  { context: { documentId: "invoice-456" } },
);

console.log(result.documentId); // "invoice-456"

Each request can have its own documentId:

const [resultA, resultB] = await Promise.all([
  client.document.parse(
    { url: "https://example.com/doc-a.pdf" },
    { context: { documentId: "doc-a" } },
  ),
  client.document.parse(
    { url: "https://example.com/doc-b.pdf" },
    { context: { documentId: "doc-b" } },
  ),
]);

console.log(resultA.documentId); // "doc-a"
console.log(resultB.documentId); // "doc-b"

Model Tiers

Tier Best for Max pages Max size Supported formats
nano Simple text documents 30 10MB PDF, Word, Excel, CSV
mini Tables, structured content 30 10MB PDF, Word, Excel, CSV
pro Complex docs, handwriting, scans 30 40MB PDF, Word, Excel, CSV
max Large docs, images, full capabilities, HTML output 1000 500MB PDF, Word, Excel, CSV, Image
auto Automatic selection with fallback (default) 1000 500MB PDF, Word, Excel, CSV, Image

Note: Identity, invoice, and bank statement parse only support pro, max, and auto models. Their ask and extract support all model tiers.

Credit Costs

API nano mini pro max Unit
Document Parse 1 2 4 8 /page
Document Ask 2 4 8 16 /page
Document Extract 2 4 8 16 /page
Identity Parse 6 10 /page
Identity Ask 6 10 14 18 /page
Identity Extract 6 10 14 18 /page
Invoice Parse 6 10 /page
Invoice Ask 6 10 14 18 /page
Invoice Extract 6 10 14 18 /page
Bank Statement Parse 6 10 /page
Bank Statement Ask 6 10 14 18 /page
Bank Statement Extract 6 10 14 18 /page
Academic Search 2 2 2 2 /request
Academic Fetch 2 2 2 2 /request
Academic Find Citations 2 2 2 2 /sentence
Academic Paper Graph 2+ 2+ 2+ 2+ /request
Academic Similar Papers 3 3 3 3 /request
Grant Search 2 2 2 2 /request

Error Handling

All API errors are thrown as PDFVectorError instances. The SDK transparently maps every server error into the most specific subclass it can, so you can branch on the type using instanceof and read typed metadata fields directly.

import { createClient, PDFVectorError } from "@pdfvector/instance-client";

const client = createClient({ apiKey: "your-api-key" });

try {
  const result = await client.document.parse({
    url: "https://example.com/document.pdf",
  });
  console.log(result.markdown);
} catch (error) {
  if (error instanceof PDFVectorError) {
    console.error(`API Error [${error.code}]: ${error.message}`);
    console.error(`HTTP Status: ${error.status}`);
    console.error(`Request ID: ${error.requestId}`);   // server-assigned, useful for support
    console.error(`Document ID: ${error.documentId}`); // echoed back if you set one
    console.error(`User error: ${error.userError}`);   // true if caused by your input
  } else {
    // Network errors (DNS, connection refused, timeout) bubble up as TypeError.
    console.error("Unexpected Error:", error);
  }
}

Branching on specific error types

Every error class extends PDFVectorError, so you can use instanceof to handle specific cases. Specialized subclasses expose typed fields pulled from the error's data payload:

import {
  createClient,
  FileTooLargeError,
  PageLimitExceededError,
  PasswordProtectedError,
  URLFetchError,
  UnauthorizedError,
  TooManyRequestsError,
  EmptyDocumentError,
  ExtractionFailedError,
  PDFVectorError,
} from "@pdfvector/instance-client";

try {
  await client.document.parse({ url: "...", model: "nano" });
} catch (error) {
  if (error instanceof FileTooLargeError) {
    console.error(
      `File ${error.fileSizeMB}MB exceeds ${error.limitMB}MB limit for the '${error.model}' model`,
    );
  } else if (error instanceof PageLimitExceededError) {
    console.error(
      `Document has ${error.pageCount} pages — ${error.model} only supports up to ${error.pageLimit}`,
    );
  } else if (error instanceof PasswordProtectedError) {
    console.error("Remove the password from the file and try again");
  } else if (error instanceof URLFetchError) {
    console.error(`Could not fetch ${error.url}: ${error.statusCode} ${error.statusText}`);
  } else if (error instanceof UnauthorizedError) {
    console.error("Invalid API key — check your dashboard");
  } else if (error instanceof TooManyRequestsError) {
    console.error(`Rate limit ${error.limit} exceeded; resets at ${error.resetAt}`);
  } else if (error instanceof EmptyDocumentError) {
    console.error("The document has no readable content");
  } else if (error instanceof ExtractionFailedError) {
    console.error(`Extraction failed. Hint: ${error.hint}`);
    if (error.rawText) console.error(`Model output sample: ${error.rawText}`);
  } else if (error instanceof PDFVectorError) {
    // Catch-all for any error code not specifically handled
    console.error(`API Error [${error.code}]: ${error.message}`);
  }
}

You can also branch on the error code if you prefer:

try {
  await client.document.parse({ url: "..." });
} catch (error) {
  if (error instanceof PDFVectorError) {
    switch (error.code) {
      case "UNAUTHORIZED":
        console.error("Invalid API key");
        break;
      case "BAD_REQUEST":
        console.error("Validation error:", error.message);
        break;
      case "UNPROCESSABLE_CONTENT":
        console.error("Could not process document:", error.message);
        break;
      case "INTERNAL_SERVER_ERROR":
        console.error(`Server error (requestId: ${error.requestId}):`, error.message);
        break;
    }
  }
}

Error Class Hierarchy

PDFVectorError
├── BadRequestError                 (400)
│   ├── FileTooLargeError                 — fileSizeMB, limitMB, model
│   ├── PageLimitExceededError            — pageCount, pageLimit, model
│   ├── PasswordProtectedError
│   ├── UnsupportedFormatError            — format, supportedFormats
│   ├── URLFetchError                     — url, statusCode, statusText
│   ├── TierNotSupportedError             — documentType, model, allowedTypes
│   ├── InvalidSchemaError                — reason
│   └── NoInputProvidedError
├── UnauthorizedError               (401)
├── NotFoundError                   (404)
├── ConflictError                   (409)
├── TooManyRequestsError            (429) — limit, resetAt
├── UnprocessableContentError       (422)
│   ├── EmptyDocumentError
│   ├── NoTextDetectedError
│   └── ExtractionFailedError             — hint, rawText
├── InternalServerError             (500)
└── NotImplementedError             (501)

Common fields on every PDFVectorError

Field Type Description
code string The ORPC error code (BAD_REQUEST, UNAUTHORIZED, etc.)
status number HTTP status code (400, 401, 404, 409, 422, 429, 500, 501)
message string Human-readable error message
data Record<string, unknown> Raw error payload from the server
requestId number | undefined Server-assigned request ID — include in support tickets
documentId string | undefined Echoed back if you passed context.documentId
userError boolean true if the failure was caused by your input (vs. a server-side issue)
cause unknown Original error (the underlying ORPCError from the wire)

Type guard

If you'd rather not import PDFVectorError just to do an instanceof check, use the isPDFVectorError guard:

import { isPDFVectorError } from "@pdfvector/instance-client";

try {
  await client.document.parse({ url: "..." });
} catch (error) {
  if (isPDFVectorError(error)) {
    console.error(error.code, error.message, error.requestId);
  }
}

Error Codes

Code Status Description
BAD_REQUEST 400 Input validation failed (e.g., missing fields, invalid URL, file too large, page limit exceeded, invalid JSON Schema)
UNAUTHORIZED 401 Missing or invalid API key
NOT_FOUND 404 Resource not found (e.g., academic paper ID, version)
CONFLICT 409 Operation conflicts with the current state
UNPROCESSABLE_CONTENT 422 Document could not be processed (empty, no readable text, extraction failed)
TOO_MANY_REQUESTS 429 Rate limit exceeded
INTERNAL_SERVER_ERROR 500 Server-side failure — capture the requestId for support
NOT_IMPLEMENTED 501 Endpoint not available on this instance

TypeScript Support

The SDK is written in TypeScript and includes full type definitions:

import {
  createClient,
  isPDFVectorError,
  // Base error class — all errors inherit from this
  PDFVectorError,
  // HTTP-aligned error categories
  BadRequestError,
  UnauthorizedError,
  NotFoundError,
  ConflictError,
  TooManyRequestsError,
  UnprocessableContentError,
  InternalServerError,
  NotImplementedError,
  // Specialized error subclasses with typed metadata
  FileTooLargeError,
  PageLimitExceededError,
  PasswordProtectedError,
  UnsupportedFormatError,
  URLFetchError,
  TierNotSupportedError,
  InvalidSchemaError,
  NoInputProvidedError,
  EmptyDocumentError,
  NoTextDetectedError,
  ExtractionFailedError,
  // Underlying ORPC error — re-exported for advanced use cases
  ORPCError,
} from "@pdfvector/instance-client";

import type {
  Client,
  ClientContext,
  CreateClientOptions,
  ContractInputs,
  ContractOutputs,
  PDFVectorModel,
  PDFVectorErrorCode,
} from "@pdfvector/instance-client";

Runtime Support

  • Node.js: 20+
  • Bun: 1.0+
  • ESM only (CommonJS is not supported)
  • Uses standard fetch API

Support

License

MIT