Package Exports
- metanova
Readme
MetaNova
Live Explorer
Try MetaNova Explorer:
https://meta-nova-explorer.vercel.app/
MetaNova is a modular JavaScript and TypeScript library for extracting, analyzing, scoring, and normalizing metadata from web pages and public URLs.
It is designed for link previews, bots, bookmark managers, search systems, AI agents, browser extensions, dashboards, CMS integrations, and content aggregation platforms.
Features
- Open Graph, Twitter Cards, JSON-LD, Schema.org, oEmbed discovery, standard HTML metadata, canonical URLs, favicons, images, videos, and audio.
- Unified typed JSON output with diagnostics.
- Embedded data extraction from
__NEXT_DATA__, Nuxt payloads, window state objects, Apollo-like caches, and JSON script blobs. - MediaDiscoveryEngine for deterministic image/video/audio candidate unification.
- Smart image scoring for deterministic
bestImageselection with human-readable diagnostics. - ConfidenceEngine and completeness scoring from 0 to 100.
- Redirect-aware fetch pipeline with timeouts, retries, byte limits, and SSRF protections.
- Browser-like request headers by default, with user override options.
- Site adapter layer for Reddit, Pinterest, Behance, YouTube, TikTok, X/Twitter, Facebook, and Instagram.
- Plugin API for custom extractors, adapters, and scorers.
- ESM and CommonJS builds with TypeScript declarations.
Install
npm install metanovaUsage
import { createPreviewCard, fetchMetadata } from "metanova";
const metadata = await fetchMetadata("https://example.com/article");
const card = createPreviewCard(metadata);
console.log(metadata.bestImage);
console.log(card);For already-downloaded HTML:
import { parseMetadata } from "metanova";
const metadata = parseMetadata(html, "https://example.com/article");Unified Output
{
ok: true,
url: "https://example.com/article",
finalUrl: "https://example.com/article",
type: "article",
title: "...",
description: "...",
siteName: "...",
canonicalUrl: "...",
confidence: 94,
completeness: 88,
bestImage: "...",
images: [],
videos: [],
audio: [],
favicons: [],
article: {},
product: {},
diagnostics: {
statusCode: 200,
contentType: "text/html; charset=utf-8",
redirects: [],
sourcesUsed: ["openGraph", "jsonLd", "html"],
warnings: [],
trace: ["downloaded page", "parsed Open Graph", "selected image from openGraph"],
selectedImageReason: "Selected openGraph image with 1200x630, score 100.",
extractedAt: "..."
}
}Public API
fetchMetadata(url, options)
parseMetadata(html, url, options)
parseMetadataAsync(html, url, options)
normalizeMetadata(rawSources)
extractOpenGraph(html)
extractTwitterCards(html)
extractJsonLd(html)
extractOEmbed(html, url)
extractEmbeddedData(html)
extractImages(html, baseUrl)
extractVideos(html, baseUrl)
extractAudio(html, baseUrl)
resolveUrl(url, baseUrl)
scoreImages(images)
discoverMedia(rawSources, finalUrl)
calculateConfidence(input)
calculateCompleteness(input)
createPreviewCard(metadata)
MetaNova.use(plugin)Architecture
MetaNova is a layered extraction pipeline:
- URL validation, short-link detection, redirect resolution, and secure fetch.
- Browser-like download with realistic
User-Agent,Accept,Accept-Language, andAccept-Encodingheaders. - Source extractors for Open Graph, Twitter Cards, JSON-LD, embedded application data, oEmbed, HTML metadata, and media tags.
- Site adapters for social and content platforms.
MediaDiscoveryEngineunifies media candidates from every source.- Image scoring ranks candidates and explains
bestImage. ConfidenceEngineand completeness scoring measure result quality from 0 to 100.- Normalization returns a stable JSON shape plus diagnostics and extraction trace.
Adapters
Built-in adapters recover title, description, images, videos, author, publish date, and identifiers for Reddit, Pinterest, Behance, YouTube, TikTok, X/Twitter, Facebook, and Instagram. They use embedded app data and discovered media as fallbacks when Open Graph is weak.
Confidence Engine
confidence is an integer from 0 to 100. It considers title quality, description quality, image quality, canonical URL, structured data, adapter success, embedded data, and warnings.
completeness is also 0 to 100. It measures how many useful preview fields are present.
reliability is 0 to 100 and combines confidence, completeness, adapter success, media quality, and warnings.
Media Discovery Engine
The engine searches Open Graph, Twitter Cards, JSON-LD, embedded app data, oEmbed, HTML images, srcset, picture, lazy-loaded attributes, video posters, social platform media, and fallback images. It resolves relative URLs, filters weak candidates, and deduplicates near-identical media.
Diagnostics And Extraction Trace
Diagnostics include sourcesUsed, warnings, redirects, selectedImageReason, and trace.
Important fields also include source attribution:
{
"sources": {
"title": "jsonLd",
"description": "openGraph",
"author": "youtubeAdapter",
"image": "twitter"
},
"diagnostics": {
"adapter": {
"matched": true,
"name": "youtubeAdapter",
"confidence": 95
}
}
}[
"downloaded page",
"parsed Open Graph",
"parsed JSON-LD",
"adapter matched: redditAdapter",
"selected image from redditAdapter (openGraph)"
]Performance
MetaNova stays lightweight by default: no browser automation in core, bounded response size, request timeouts, retry controls, cache hooks, and mostly synchronous parsing for already-downloaded HTML.
Security Defaults
fetchMetadata blocks risky targets by default:
localhost- loopback IPs
- private network IPs
- link-local and reserved networks
- unsupported protocols
- oversized responses
- malicious redirect targets
You can opt into trusted internal targets:
await fetchMetadata("http://localhost:3000", {
allowLocalhost: true,
allowPrivateNetwork: true
});Plugins
import { MetaNova, type MetaNovaPlugin } from "metanova";
const plugin: MetaNovaPlugin = {
name: "custom-docs",
setup(api) {
api.addExtractor("docs", ({ $ }) => ({
source: "docs",
title: $("meta[name='doc:title']").attr("content"),
siteName: "Docs"
}));
api.addImageScorer((image) => (image.url.includes("/hero/") ? 10 : 0));
}
};
MetaNova.use(plugin);Development
npm install
npm run typecheck
npm test
npm run lint
npm run buildMock examples use local HTML fixtures and do not require network:
node examples/reddit.mjs
node examples/pinterest.mjs
node examples/behance.mjs
node examples/youtube.mjs
node examples/diagnostics.mjsLive network examples take URLs from arguments and intentionally do not embed validation URLs:
node examples/live-fetch.mjs https://example.com
node examples/youtube-video.mjs https://example.com
node examples/youtube-playlist.mjs https://example.com
node examples/social-preview.mjs https://example.comFor a full from-zero walkthrough, see USAGE_GUIDE.md.
Project Layout
src/
adapters/
diagnostics/
extractors/
fetcher/
normalizers/
plugins/
scorers/
types/
utils/
index.tsPublishing
The package is configured with dual ESM/CommonJS exports, generated type declarations, source maps, an npm files allowlist, and Node engine constraints. Run npm run build before publishing.