JSPM

cleanweb-mcp

1.0.1
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • 0
    • Score
      100M100P100Q32366F
    • License MIT

    A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion

    Package Exports

      This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (cleanweb-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

      Readme

      ๐ŸŒ CleanWeb MCP

      npm version GitHub stars License: MIT

      A lightweight Model Context Protocol (MCP) server

      Specialized in intelligently extracting core web content, automatically filtering ads and irrelevant elements, and converting to clean Markdown format

      ๐Ÿš€ Quick Start โ€ข ๐Ÿ“– Documentation โ€ข ๐Ÿ”ง Configuration โ€ข ๐Ÿค Contributing

      โœจ Features

      ๐ŸŒ Smart Extraction ๐Ÿงน Content Cleaning ๐Ÿ“ Format Conversion โšก Lightweight Deploy
      Axios + Cheerio + Readability Auto-filter ads & distractions HTML โ†’ Markdown Zero browser dependency

      ๐ŸŽฏ Core Advantages

      • ๐ŸŒ Smart Content Extraction: Uses Axios + Cheerio + Readability algorithm to extract main web content
      • ๐Ÿงน Intelligent Content Cleaning: Automatically removes ads, navigation, sidebars and other distracting elements
      • ๐Ÿ“ Markdown Conversion: Converts HTML content to clean Markdown format
      • ๐Ÿ–ผ๏ธ Image Link Optimization: Automatically handles overly long image links for better readability
      • โšก Lightweight Deployment: No browser dependencies, simple and fast deployment
      • ๐Ÿ”ง Multiple Output Formats: Supports pure Markdown or JSON format with metadata
      • ๐Ÿš€ MCP Protocol: Fully compatible with Model Context Protocol standard

      ๐Ÿ› ๏ธ Tech Stack

      TypeScript Node.js Axios Cheerio

      ๐Ÿš€ Quick Start

      ๐Ÿ“ฆ Installation

      # Install from npm
      npm install cleanweb-mcp
      
      # Or clone the repository
      git clone https://github.com/guangxiangdebizi/cleanweb-mcp.git
      cd cleanweb-mcp
      npm install

      ๐Ÿ’ก Advantage: Uses lightweight HTTP client, no browser download required, simpler deployment! Focused on content cleaning and optimization.

      ๐Ÿ”ง Build Project

      npm run build

      ๐ŸŽฏ Usage

      1. Stdio Mode (Local Development)

      npm run mcp:stdio

      2. SSE Mode (via Supergateway)

      npm run mcp:sse

      Server will start at http://localhost:3100/sse

      3. WebSocket Mode

      npm run mcp:ws

      4. Development Mode (Watch file changes)

      npm run mcp:dev

      ๐Ÿ› ๏ธ Claude Configuration

      Stdio Mode Configuration

      Add to Claude's configuration file:

      {
        "mcpServers": {
          "cleanweb-mcp": {
            "command": "node",
            "args": ["path/to/your/project/build/index.js"]
          }
        }
      }

      SSE Mode Configuration

      {
        "mcpServers": {
          "cleanweb-mcp-sse": {
            "type": "sse",
            "url": "http://localhost:3100/sse",
            "timeout": 600
          }
        }
      }

      ๐Ÿ”จ API Reference

      extract_web_content

      Intelligently extract web content and convert to Markdown format.

      Parameters

      Parameter Type Required Default Description
      url string โœ… - The web URL to extract content from
      format string โŒ markdown Return format: markdown or json
      timeout number โŒ 30000 Page loading timeout (milliseconds)

      Usage Examples

      // Basic usage
      extract_web_content({
        url: "https://example.com/article"
      })
      
      // Advanced usage
      extract_web_content({
        url: "https://example.com/article",
        format: "json",
        timeout: 60000
      })

      ๐Ÿ“ Project Structure

      cleanweb-mcp/
      โ”œโ”€โ”€ ๐Ÿ“„ README.md                 # Project documentation
      โ”œโ”€โ”€ ๐Ÿ“ฆ package.json              # Project configuration
      โ”œโ”€โ”€ โš™๏ธ tsconfig.json             # TypeScript configuration
      โ”œโ”€โ”€ ๐Ÿ”ง claude-config-example.json # Claude configuration example
      โ”œโ”€โ”€ ๐Ÿ“– example-usage.md          # Usage examples
      โ”œโ”€โ”€ ๐Ÿ—๏ธ build/                    # Compiled output
      โ”‚   โ”œโ”€โ”€ index.js
      โ”‚   โ””โ”€โ”€ tools/
      โ”‚       โ””โ”€โ”€ web-content-extractor.js
      โ””โ”€โ”€ ๐Ÿ“ src/                      # Source code
          โ”œโ”€โ”€ index.ts                 # MCP server main entry
          โ””โ”€โ”€ tools/
              โ””โ”€โ”€ web-content-extractor.ts # Web content extraction tool

      ๐Ÿ”„ Migration from Express Server

      The original Express server (server.js) can still run independently:

      npm start

      The MCP version provides the same core functionality but integrates with AI assistants through the MCP protocol.

      ๐Ÿšจ Important Notes

      1. Lightweight Implementation: Uses HTTP client to fetch static content, no browser dependencies required
      2. Network Access: Requires access to target websites
      3. Static Content: Primarily suitable for static HTML content, dynamically rendered content may not be accessible
      4. Timeout Settings: For slow-loading websites, you can appropriately increase the timeout parameter
      5. Content Optimization: Automatically optimizes image link display for better readability

      ๐Ÿค Contributing

      Welcome to submit Issues and Pull Requests! If you have any questions or suggestions, feel free to contact me.

      ๐Ÿ“ž Contact

      ๐Ÿ“„ License

      MIT License - See LICENSE file for details