Package Exports
- @pdfvector/instance-client
- @pdfvector/instance-client/.tsc/lib/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@pdfvector/instance-client) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
PDFVector TypeScript/JavaScript SDK
The official TypeScript/JavaScript SDK for the PDFVector API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.
Installation
npm install @pdfvector/instance-client
# or
yarn add @pdfvector/instance-client
# or
pnpm add @pdfvector/instance-client
# or
bun add @pdfvector/instance-clientQuick Start
import { createClient } from "@pdfvector/instance-client";
const client = createClient({
apiKey: "your-api-key",
});
// Parse a document
const parseResult = await client.document.parse({
url: "https://example.com/document.pdf",
});
console.log(parseResult.markdown);
console.log(`Pages: ${parseResult.pageCount}, Model: ${parseResult.model}`);
// Ask questions about documents
const askResult = await client.document.ask({
url: "https://example.com/research-paper.pdf",
question: "What are the key findings and conclusions?",
});
console.log(askResult.markdown);
// Extract structured data using JSON Schema
const extractResult = await client.document.extract({
url: "https://example.com/research-paper.pdf",
prompt: "Extract the research information",
schema: {
type: "object",
properties: {
title: { type: "string" },
authors: { type: "array", items: { type: "string" } },
abstract: { type: "string" },
findings: { type: "array", items: { type: "string" } },
},
required: ["title", "abstract"],
},
});
console.log(extractResult.data);Authentication
Get your API key from the PDFVector dashboard.
const client = createClient({
apiKey: "your-api-key",
});Verify your credentials:
const status = await client.authenticate.validateCredential();
console.log(status.version); // Server versionCustom Domain
By default, the SDK connects to global.pdfvector.com. For custom or self-hosted instances:
const client = createClient({
domain: "your-instance.pdfvector.com",
apiKey: "your-api-key",
});
// For local development
const localClient = createClient({
domain: "localhost:34000",
apiKey: "your-api-key",
});Document Processing
All document endpoints accept three input methods: url, file (File/Blob), or base64.
Supported file types: PDF, Word (.docx), Excel (.xlsx), CSV, and Image (.png, .jpg).
Parse
Extract text content from documents:
const result = await client.document.parse(
{
url: "https://example.com/document.pdf",
model: "auto", // "auto" | "nano" | "mini" | "pro" | "max"
},
{ context: { documentId: "my-doc-123" } }, // optional, for usage tracking
);
console.log(result.markdown); // Extracted text
console.log(result.pageCount); // Number of pages
console.log(result.model); // Model tier used
console.log(result.html); // Full HTML (only with 'max' model)
console.log(result.documentId); // "my-doc-123"Parse from file data
import { readFile } from "fs/promises";
const result = await client.document.parse(
{
file: new File([await readFile("document.pdf")], "document.pdf", {
type: "application/pdf",
}),
model: "auto",
},
{ context: { documentId: "uploaded-doc" } },
);
console.log(result.markdown);Ask
Answer questions about a document:
const result = await client.document.ask(
{
url: "https://example.com/research-paper.pdf",
question: "What are the main findings of this study?",
model: "auto",
},
{ context: { documentId: "research-paper-1" } },
);
console.log(result.markdown);Extract
Extract structured data using a JSON Schema:
const result = await client.document.extract(
{
url: "https://example.com/research-paper.pdf",
prompt: "Extract the title, authors, and publication year",
schema: {
type: "object",
properties: {
title: { type: "string" },
authors: { type: "array", items: { type: "string" } },
year: { type: "number" },
},
required: ["title", "authors", "year"],
},
},
{ context: { documentId: "research-paper-1" } },
);
console.log(result.data); // { title: "...", authors: [...], year: 2024 }Invoice Processing
Specialized methods for processing invoices. Parse supports pro and max models only. Ask and extract support all model tiers.
Parse Invoice
const result = await client.invoice.parse(
{ url: "https://example.com/invoice.pdf" },
{ context: { documentId: "invoice-001" } },
);
console.log(result.markdown);Ask Questions About Invoices
const result = await client.invoice.ask(
{
url: "https://example.com/invoice.pdf",
question: "What is the total amount and due date for this invoice?",
},
{ context: { documentId: "invoice-001" } },
);
console.log(result.markdown);Extract Structured Invoice Data
const result = await client.invoice.extract(
{
url: "https://example.com/invoice.pdf",
prompt: "Extract all invoice details including vendor, items, and totals",
schema: {
type: "object",
properties: {
invoiceNumber: { type: "string" },
date: { type: "string" },
totalAmount: { type: "number" },
items: {
type: "array",
items: {
type: "object",
properties: {
description: { type: "string" },
quantity: { type: "number" },
price: { type: "number" },
},
},
},
},
required: ["invoiceNumber", "date", "totalAmount", "items"],
},
},
{ context: { documentId: "invoice-001" } },
);
console.log(result.data);Identity Document Processing
Specialized methods for processing ID documents (passports, driver's licenses, ID cards). Parse supports pro and max models only. Ask and extract support all model tiers.
Parse ID Document
const result = await client.identity.parse(
{ url: "https://example.com/passport.pdf" },
{ context: { documentId: "passport-jane" } },
);
console.log(result.markdown);
console.log(result.documentType); // e.g., "passport"Ask Questions About ID Documents
const result = await client.identity.ask(
{
url: "https://example.com/passport.pdf",
question: "What is the full name and date of birth on this document?",
},
{ context: { documentId: "passport-jane" } },
);
console.log(result.markdown);Extract Structured ID Document Data
const result = await client.identity.extract(
{
url: "https://example.com/passport.pdf",
prompt: "Extract passport details from this document",
schema: {
type: "object",
properties: {
fullName: { type: "string" },
dateOfBirth: { type: "string" },
documentNumber: { type: "string" },
nationality: { type: "string" },
expirationDate: { type: "string" },
},
required: ["fullName", "documentNumber"],
},
},
{ context: { documentId: "passport-jane" } },
);
console.log(result.data);Bank Statement Processing
Specialized methods for processing bank statements. Parse supports pro and max models only. Ask and extract support all model tiers.
const result = await client.bankStatement.parse(
{ url: "https://example.com/statement.pdf" },
{ context: { documentId: "statement-2024-03" } },
);
console.log(result.markdown);Also supports bankStatement.ask() and bankStatement.extract() with the same patterns as above.
Academic Research
Search Academic Publications
Search across multiple academic databases with a unified API. Costs 2 credits per request.
const result = await client.academic.search({
query: "quantum computing",
providers: ["semantic-scholar", "arxiv", "pubmed"],
limit: 20,
yearFrom: 2021,
yearTo: 2024,
});
result.results.forEach((publication) => {
console.log(`Title: ${publication.title}`);
console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
console.log(`Year: ${publication.year}`);
console.log("---");
});Supported Providers:
"semantic-scholar"(default) - Semantic Scholar"pubmed"- PubMed"arxiv"- ArXiv"google-scholar"- Google Scholar"eric"- ERIC"europe-pmc"- Europe PMC"openalex"- OpenAlex"crossref"- Crossref
Search Parameters:
query(required): 1-400 charactersproviders: Array of provider names (default:["semantic-scholar"])offset: Pagination offset (default: 0)limit: Results per provider, 1-100 (default: 20)yearFrom/yearTo: Filter by publication year (1900-2100)fields: Specific fields to return ("doi","title","url","providerURL","authors","date","year","totalCitations","totalReferences","abstract","pdfURL","provider","providerData")
Fetch Academic Publications by ID
Fetch specific papers by their identifiers with automatic provider detection. Costs 2 credits per request.
const result = await client.academic.fetch({
ids: [
"10.1038/nature12373", // DOI
"12345678", // PubMed ID
"2301.00001", // ArXiv ID
],
fields: ["title", "authors", "year", "abstract", "doi"],
});
result.results.forEach((pub) => {
console.log(`Title: ${pub.title}`);
console.log(`Provider: ${pub.detectedProvider}`);
});
result.errors?.forEach((error) => {
console.log(`Failed to fetch ${error.id}: ${error.error}`);
});Supported ID types: DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, ERIC ID, Europe PMC ID, OpenAlex ID.
Find Citations for a Paragraph
Find relevant academic citations for each sentence in a paragraph using semantic similarity. Costs 2 credits per sentence analyzed.
const result = await client.academic.findCitations({
paragraph:
"Transformers have revolutionized natural language processing. Attention mechanisms allow models to focus on relevant parts of the input.",
providers: ["semantic-scholar", "arxiv", "pubmed"],
});
console.log(
`Found ${result.totalCitations} citations across ${result.sentenceCount} sentences`,
);
for (const item of result.results) {
console.log(`\nSentence: ${item.sentence}`);
for (const citation of item.citations) {
console.log(` [Score: ${citation.score}/10] ${citation.title}`);
}
}Paper Citation Graph
Retrieve a paper's citing papers and referenced papers with pagination support. Costs 2+ credits per request (scales with result count).
const result = await client.academic.paperGraph({
id: "10.1038/nature12373", // DOI, ArXiv ID, Semantic Scholar ID, OpenAlex ID, or URL
citationsLimit: 20,
referencesLimit: 20,
citationsOffset: 0,
referencesOffset: 0,
fields: ["title", "authors", "year", "doi"],
});
console.log(`Paper: ${result.paper.title}`);
console.log(`Total Citations: ${result.totalCitations}`);
console.log(`Total References: ${result.totalReferences}`);
for (const citation of result.citations) {
console.log(` Cited by: ${citation.title} (${citation.year})`);
}Parameters:
id(required): Paper identifier (DOI, ArXiv ID, Semantic Scholar ID, OpenAlex ID, or URL)citationsLimit: Max citing papers to return, 0-1000 (default: 100)referencesLimit: Max referenced papers to return, 0-1000 (default: 100)citationsOffset/referencesOffset: Pagination offsets (default: 0)fields: Specific fields to return
Find Similar Papers
Find papers similar to a seed paper using citation network analysis. Costs 3 credits per request.
const result = await client.academic.similarPapers({
id: "10.1038/nature12373",
limit: 10,
includeEdges: true, // include citation graph edges for visualization
fields: ["title", "year", "doi"],
});
console.log(`Seed: ${result.seed.title}`);
for (const item of result.results) {
console.log(` [Similarity: ${item.similarity.toFixed(2)}] ${item.publication.title}`);
if (item.citingIds) {
console.log(` Citing: ${item.citingIds.length} papers in result set`);
}
}Parameters:
id(required): Seed paper identifierlimit: Max similar papers, 1-100 (default: 30)includeEdges: IncludecitingIds/citedByIdsfor graph construction (default: false)fields: Specific fields to return
Search Grants
Search for research grants and funding opportunities across multiple databases with a unified API. Costs 2 credits per request.
const result = await client.academic.searchGrants({
query: "machine learning healthcare",
providers: ["grants-gov", "nih-reporter", "cordis", "ukri"],
limit: 10,
fundingMin: 100000,
fundingMax: 1000000,
deadlineFrom: "2026-01-01",
deadlineTo: "2026-12-31",
});
console.log(`Found ${result.estimatedTotalResults} grants`);
for (const grant of result.results) {
console.log(`Title: ${grant.title}`);
console.log(`Agency: ${grant.agency} (${grant.region})`);
console.log(`Funding: ${grant.currency} ${grant.fundingAmountMin}-${grant.fundingAmountMax}`);
console.log(`Deadline: ${grant.deadlineDate ?? grant.closeDate ?? "N/A"}`);
console.log(`URL: ${grant.url}`);
console.log("---");
}
// Partial provider failures are reported in errors array
result.errors?.forEach((error) => {
console.log(`Provider ${error.provider} failed: ${error.message}`);
});Supported Grant Providers:
"grants-gov"- Grants.gov (US federal grants — NIH, NSF, DOE, DOD, etc.)"nih-reporter"- NIH RePORTER (NIH-funded research projects)"cordis"- CORDIS (EU-funded projects — Horizon Europe, ERC, etc.)"ukri"- UKRI (UK-funded projects — EPSRC, BBSRC, MRC, etc.)
Search Parameters:
query(required): 1-400 charactersproviders: Array of grant provider names (default: all 4 providers)offset: Pagination offset (default: 0)limit: Results per provider, 1-50 (default: 10)fundingMin/fundingMax: Filter by funding amountdeadlineFrom/deadlineTo: Filter by deadline date (ISO format, e.g."2026-01-01")fields: Specific fields to return ("sourceId","title","url","agency","program","description","eligibility","fundingAmountMin","fundingAmountMax","currency","deadlineDate","openDate","closeDate","grantType","region","keywords","piName","organizationName","provider","providerData")
Document ID Tracking
Pass a documentId per request to track API usage. The ID is sent as a header and returned in responses for document, identity, invoice, and bank statement endpoints. Academic endpoints do not use documentId.
const result = await client.document.parse(
{ url: "https://example.com/document.pdf" },
{ context: { documentId: "invoice-456" } },
);
console.log(result.documentId); // "invoice-456"Each request can have its own documentId:
const [resultA, resultB] = await Promise.all([
client.document.parse(
{ url: "https://example.com/doc-a.pdf" },
{ context: { documentId: "doc-a" } },
),
client.document.parse(
{ url: "https://example.com/doc-b.pdf" },
{ context: { documentId: "doc-b" } },
),
]);
console.log(resultA.documentId); // "doc-a"
console.log(resultB.documentId); // "doc-b"Model Tiers
| Tier | Best for | Max pages | Max size | Supported formats |
|---|---|---|---|---|
nano |
Simple text documents | 30 | 10MB | PDF, Word, Excel, CSV |
mini |
Tables, structured content | 30 | 10MB | PDF, Word, Excel, CSV |
pro |
Complex docs, images, handwriting | 30 | 40MB | PDF, Word, Excel, CSV, Image |
max |
Large docs, full capabilities, HTML output | 1000 | 500MB | PDF, Word, Excel, CSV, Image |
auto |
Automatic selection with fallback (default) | 1000 | 500MB | PDF, Word, Excel, CSV, Image |
Note: Identity, invoice, and bank statement parse only support pro, max, and auto models. Their ask and extract support all model tiers.
Credit Costs
| API | nano | mini | pro | max | Unit |
|---|---|---|---|---|---|
| Document Parse | 1 | 2 | 4 | 8 | /page |
| Document Ask | 2 | 4 | 8 | 16 | /page |
| Document Extract | 2 | 4 | 8 | 16 | /page |
| Identity Parse | — | — | 6 | 10 | /page |
| Identity Ask | 6 | 10 | 14 | 18 | /page |
| Identity Extract | 6 | 10 | 14 | 18 | /page |
| Invoice Parse | — | — | 6 | 10 | /page |
| Invoice Ask | 6 | 10 | 14 | 18 | /page |
| Invoice Extract | 6 | 10 | 14 | 18 | /page |
| Bank Statement Parse | — | — | 6 | 10 | /page |
| Bank Statement Ask | 6 | 10 | 14 | 18 | /page |
| Bank Statement Extract | 6 | 10 | 14 | 18 | /page |
| Academic Search | 2 | 2 | 2 | 2 | /request |
| Academic Fetch | 2 | 2 | 2 | 2 | /request |
| Academic Find Citations | 2 | 2 | 2 | 2 | /sentence |
| Academic Paper Graph | 2+ | 2+ | 2+ | 2+ | /request |
| Academic Similar Papers | 3 | 3 | 3 | 3 | /request |
| Grant Search | 2 | 2 | 2 | 2 | /request |
Error Handling
All API errors are thrown as PDFVectorError instances with structured error information:
import { createClient, PDFVectorError } from "@pdfvector/instance-client";
const client = createClient({
apiKey: "your-api-key",
});
try {
const result = await client.document.parse({
url: "https://example.com/document.pdf",
});
console.log(result.markdown);
} catch (error) {
if (error instanceof PDFVectorError) {
console.error(`API Error [${error.code}]: ${error.message}`);
console.error(`HTTP Status: ${error.status}`);
console.error(`Error Data:`, error.data);
} else {
console.error("Unexpected Error:", error);
}
}You can also use the static type guard:
try {
await client.document.parse({ url: "..." });
} catch (error) {
if (PDFVectorError.is(error)) {
switch (error.code) {
case "UNAUTHORIZED":
console.error("Invalid API key");
break;
case "BAD_REQUEST":
console.error("Validation error:", error.message);
break;
case "INTERNAL_SERVER_ERROR":
console.error("Server error:", error.message);
break;
}
}
}Error Codes
| Code | Status | Description |
|---|---|---|
BAD_REQUEST |
400 | Input validation failed (e.g., missing fields, invalid URL, question too short) |
UNAUTHORIZED |
401 | Missing or invalid API key |
UNPROCESSABLE_CONTENT |
422 | Document could not be processed by the requested model tier |
TOO_MANY_REQUESTS |
429 | Rate limit exceeded |
INTERNAL_SERVER_ERROR |
500 | Server-side failure |
TypeScript Support
The SDK is written in TypeScript and includes full type definitions:
import {
createClient,
PDFVectorError,
} from "@pdfvector/instance-client";
import type {
Client,
ClientContext,
CreateClientOptions,
ContractInputs,
ContractOutputs,
PDFVectorModel,
} from "@pdfvector/instance-client";Runtime Support
- Node.js: 20+
- Bun: 1.0+
- ESM only (CommonJS is not supported)
- Uses standard
fetchAPI
Support
- API Reference: global.pdfvector.com/api/reference
- Dashboard: app.pdfvector.com
- Billing: app.pdfvector.com/workspace/billing
License
MIT