JSPM

Found 98 results for content-extraction

firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.

  • v3.5.2
  • 66.74
  • Published

defuddle

Extract article content and metadata from web pages.

  • v0.6.6
  • 61.41
  • Published

fetcher-mcp

MCP server for fetching web content using Playwright browser

    • v0.3.5
    • 54.04
    • Published

    hyperbrowser-mcp

    Hyperbrowser Model Context Protocol Server

    • v1.0.25
    • 49.94
    • Published

    valyu-js

    Search for AIs - DeepSearch and Content API.

    • v2.1.2
    • 45.79
    • Published

    @bonginkan/maria

    🚀 MARIA v4.4.8 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes nat

    • v4.4.8
    • 43.48
    • Published

    playread

    Web content extraction and automation via Playwright MCP

    • v1.2.8
    • 38.46
    • Published

    docusaurus-plugin-copy-page-button

    Docusaurus plugin that adds a copy page button to extract documentation content as markdown for AI tools like ChatGPT and Claude

    • v0.3.5
    • 36.04
    • Published

    @just-every/crawl

    Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown

    • v1.0.8
    • 35.79
    • Published

    crawl-mcp-server

    微信公众号文章抓取 MCP 服务器 - 支持自动图片下载、内容清理、智能抓取,可生成完整的本地化Markdown文档

    • v1.1.8
    • 34.24
    • Published

    agentql-mcp

    Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

    • v1.0.0
    • 32.01
    • Published

    n8n-nodes-olyptik

    n8n community node for Olyptik web crawling and content extraction API

    • v0.1.4
    • 31.54
    • Published

    h2m-parser

    LLM-ready HTML to Markdown pipeline with Readability, htmlparser2, and post-processing utilities.

    • v1.0.0
    • 31.18
    • Published

    reelflow

    Elegant and powerful Instagram video downloader for seamless content extraction

    • v5.6.5
    • 30.13
    • Published

    js-harvester

    Harvester is a lightweight and highly optimized javascript library for extracting data from the DOM tree. It supports extraction of tag texts with specified types and attributes. it's tiny and has no dependencies and also works with Puppeteer

    • v0.3.14
    • 29.74
    • Published

    mcp-web-content-pick

    A tool for extracting structured content from web pages with customizable selectors and crawling options

    • v0.0.25
    • 29.32
    • Published

    @pinkpixel/web-scout-mcp

    MCP server for web search and content extraction with multiple URL support and memory optimizations

    • v1.5.5
    • 27.45
    • Published

    @langgraph-js/crawler

    A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

    • v1.7.0
    • 25.76
    • Published

    linkd-mcp

    Linkd Model Context Protocol Server

    • v1.0.25
    • 25.58
    • Published

    doc-to-readable

    Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

    • v1.5.3
    • 25.54
    • Published

    fieldcraft-document-reader

    An efficient React Native file reader library designed for comprehensive document handling with support for multiple file types and advanced content extraction capabilities

    • v1.0.0
    • 25.23
    • Published

    webscraping-ai-mcp

    Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

    • v1.0.2
    • 24.44
    • Published

    @jtsang/fetcher-mcp

    MCP server for fetching web content using Playwright browser

      • v0.3.3
      • 24.33
      • Published

      @tyronerossjr/blog-scraper

      Powerful web scraping SDK for extracting blog articles and content. No LLM required.

      • v0.1.0
      • 23.09
      • Published

      @febbyrg/pdf-decomposer

      A powerful PDF text and image extraction library with universal browser and Node.js support (Dual Licensed: Free for non-commercial, Paid for commercial use)

      • v1.0.5
      • 22.43
      • Published

      @tyroneross/blog-scraper

      Powerful web scraping SDK for extracting blog articles and content. No LLM required.

      • v0.1.0
      • 22.39
      • Published

      tyroneross-blog-scraper

      Powerful web scraping SDK for extracting blog articles and content. No LLM required.

      • v0.1.0
      • 22.36
      • Published

      ethos-crawler

      Web crawler and API for aggregating and serving digital rights organizations' publications.

      • v1.3.0
      • 22.10
      • Published

      @supadata/mcp

      MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

      • v1.0.1
      • 21.12
      • Published

      @iflow-mcp/firecrawl-mcp

      MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

      • v1.12.0
      • 20.80
      • Published

      askbudi-context

      Provide up-to-date context about any library, built by askbudi.ai

      • v1.2.3
      • 20.31
      • Published

      @clado-ai/mcp

      Clado Model Context Protocol Server

      • v1.0.34
      • 20.08
      • Published

      url-to-json-markdown

      A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.

      • v1.0.7
      • 19.85
      • Published

      web-fetch-mcp

      MCP server for web content fetching, summarizing, comparing, and extracting information

        • v0.1.1
        • 19.85
        • Published

        markdown-crawler

        A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown

        • v1.0.18
        • 19.40
        • Published

        @cyanheads/jinaai-mcp-server

        A Model Context Protocol (MCP) server that provides intelligent web reading capabilities using the Jina AI Reader API. It extracts clean, LLM-ready content from any URL.

        • v1.0.4
        • 19.16
        • Published

        threads-harvester

        A TypeScript library for extracting threaded content from discussion platforms like Reddit, Twitter, and Hacker News

        • v1.1.2
        • 18.17
        • Published

        @stellarwp/archivist

        A Bun-based tool for archiving web content as LLM context using Pure.md API

        • v0.1.0-beta.8
        • 17.75
        • Published

        url-to-markdown-cli-tool

        CLI tool for converting web pages to clean, LLM-friendly markdown. Fetches content from URLs and converts HTML to optimized markdown format perfect for LLM training, RAG systems, and AI applications.

        • v1.1.0
        • 17.29
        • Published

        @cladoai/mcp

        Clado Model Context Protocol Server

        • v1.0.27
        • 16.78
        • Published

        crawl-to-markdown

        Crawl-to-markdown is a powerful TypeScript package designed to search search engines for a given keyword, crawl the resulting websites, and deliver the content in clean, readable Markdown format. Additionally, it can directly crawl specified websites for

        • v1.0.1
        • 16.59
        • Published

        llm-gen

        A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

        • v1.0.3
        • 16.10
        • Published

        graby-ts

        TypeScript version of Graby content extraction library

          • v1.1.0
          • 15.91
          • Published

          scoopit

          A tool that generates content files from website routes in multiple formats (text, JSON, markdown)

          • v2.1.2
          • 15.34
          • Published

          @mseep/firecrawl-mcp

          MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

          • v1.9.0
          • 15.16
          • Published

          a1hul-mcp

          MCP server for extracting content from web pages

          • v0.1.6
          • 14.86
          • Published

          html-content-processor

          A professional library for processing, cleaning, filtering, and converting HTML content to Markdown. Features advanced customization options, presets, plugin support, fluent API, and TypeScript integration for reliable content extraction.

          • v1.0.5
          • 14.64
          • Published

          web-scraper-pro

          Professional web scraper with Puppeteer & Mozilla Readability. Extract clean content from any website with full TypeScript support.

          • v1.1.1
          • 14.56
          • Published

          @langgraph-js/crawler-mcp

          A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

          • v1.5.3
          • 14.19
          • Published

          @iflow-mcp/agentql-mcp

          Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

          • v1.0.0
          • 13.81
          • Published

          ohmyreader

          A powerful web content extractor that converts articles to clean markdown

          • v0.1.1
          • 13.62
          • Published

          cleanweb-mcp

          A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion

            • v1.0.1
            • 13.47
            • Published

            alou-fetch-mcp

            MCP服务器用于抓取网页内容,支持HTML、Markdown、纯文本和JSON格式,特别优化了微信公众号文章和学术论文的抓取

            • v1.0.0
            • 13.47
            • Published

            next-llms-generator

            Generate LLM-friendly text files from Next.js applications by crawling sitemaps and extracting content

            • v0.1.0
            • 12.91
            • Published

            graby-ts-site-config

            Site configuration loader for Graby-TS with dynamic imports

              • v1.1.1
              • 12.91
              • Published

              youtube-scrap-mcp

              MCP server for extracting YouTube video content with transcript processing.

              • v0.1.1
              • 12.90
              • Published

              web-scraper-mcp

              MCP server for scraping images and text from websites with comprehensive web content extraction capabilities

              • v1.1.3
              • 12.30
              • Published

              medium-mcp-server

              LLM-optimized MCP server for fetching and processing Medium articles

              • v1.0.0
              • 11.64
              • Published

              @udx/mcurl

              curl but in markdown - fetches content from URLs and converts to markdown

              • v0.3.1
              • 11.51
              • Published

              simple-reader-mode

              A lightweight alternative to Mozilla's Readability library for extracting readable content from web pages

              • v0.1.1
              • 11.20
              • Published

              web-search-extract-mcp

              A Model Context Protocol server for web search with content extraction

                • v1.0.1
                • 9.99
                • Published

                @mseep/fetcher-mcp

                MCP server for fetching web content using Playwright browser

                  • v0.2.6
                  • 9.98
                  • Published

                  @mseep/agentql-mcp

                  Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

                  • v1.0.0
                  • 9.97
                  • Published

                  iftp-client

                  IFTP Service JavaScript client library for browser integration

                  • v1.0.0
                  • 8.90
                  • Published

                  defuddler

                  A command-line interface for extracting main content from web pages and articles

                  • v1.0.1
                  • 8.60
                  • Published

                  node-merle

                  A utility for cataloguing the metadata for a URL

                  • v0.0.1
                  • 8.59
                  • Published

                  vacuumjs

                  A low-level node.js web page content extractor based on `parse5`.

                    • v1.0.1
                    • 8.59
                    • Published

                    web-content-extract-mcp

                    MCP server for extracting web content using web-content-extract library

                      • v1.0.0
                      • 8.14
                      • Published

                      @philwrenn/fetcher-mcp

                      MCP server for fetching web content using Playwright browser

                      • v0.3.1
                      • 7.49
                      • Published

                      @iflow-mcp/webscraping-ai-mcp-server

                      Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

                      • v1.0.2
                      • 0.00
                      • Published

                      @iflow-mcp/fetcher-mcp

                      MCP server for fetching web content using Playwright browser

                        • v0.3.5
                        • 0.00
                        • Published

                        search-agent

                        Oblien Search SDK - AI-powered web search, content extraction, and website crawling. Full documentation at https://oblien.com/docs/search-api

                        • v1.0.2
                        • 0.00
                        • Published

                        @seokaka/mcp-search

                        Enhanced MCP Server for intelligent search with real-time data extraction, AI integration, and Vietnamese financial content support

                        • v1.0.5
                        • 0.00
                        • Published