Package Exports

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (gpt-research) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

GPT Research

🔍 GPT Research is an autonomous AI research agent that conducts comprehensive research on any topic, searches the web for real-time information, and generates detailed reports with proper citations.

Built with TypeScript and optimized for both local development and serverless deployment (Vercel, AWS Lambda, etc.).

✨ Features

🔍 Multi-source Research: Integrates multiple search providers:
- Tavily - AI-optimized search engine
- Serper - Google Search API (2,500 free searches/month)
- Google Custom Search - Direct Google integration
- DuckDuckGo - Privacy-focused search
🌐 Smart Web Scraping: Cheerio and Puppeteer for content extraction
🤖 Multiple LLM Support: OpenAI, Anthropic, Google AI, Groq, and more
🔌 MCP Integration: Model Context Protocol for external tool connections
📊 Various Report Types: Research, Detailed, Summary, Resource, Outline
🔄 Streaming Support: Real-time updates via Server-Sent Events
⚡ Vercel Optimized: Built for serverless deployment
💾 Memory Management: Tracks research context and history
💰 Cost Tracking: Monitor LLM usage and costs

🚀 Quick Start

Installation

npm install gpt-research
# or
yarn add gpt-research
# or
pnpm add gpt-research

Configuration

Create a .env file in the root directory:

# Required
OPENAI_API_KEY=your-openai-api-key

# Optional Search Providers (at least one recommended)
TAVILY_API_KEY=your-tavily-api-key        # https://tavily.com (best for AI research)
SERPER_API_KEY=your-serper-api-key        # https://serper.dev (Google search, 2,500 free/month)
GOOGLE_API_KEY=your-google-api-key        # Google Custom Search
GOOGLE_CX=your-google-custom-search-engine-id

# Optional LLM Providers
ANTHROPIC_API_KEY=your-anthropic-api-key
GOOGLE_AI_API_KEY=your-google-ai-api-key
GROQ_API_KEY=your-groq-api-key

Basic Usage

const { GPTResearch } = require('gpt-research');
// or for TypeScript/ES modules:
// import { GPTResearch } from 'gpt-research';

async function main() {
  const researcher = new GPTResearch({
    query: 'What are the latest developments in quantum computing?',
    reportType: 'research_report',
    llmProvider: 'openai',
    apiKeys: {
      openai: process.env.OPENAI_API_KEY,
      tavily: process.env.TAVILY_API_KEY
    }
  });

  // Conduct research
  const result = await researcher.conductResearch();
  
  console.log(result.report);
  console.log(`Sources used: ${result.sources.length}`);
  console.log(`Cost: $${result.costs.total.toFixed(4)}`);
}

main().catch(console.error);

Streaming Research

const researcher = new GPTResearch(config);

// Stream research updates in real-time
for await (const update of researcher.streamResearch()) {
  switch (update.type) {
    case 'progress':
      console.log(`[${update.progress}%] ${update.message}`);
      break;
    case 'data':
      if (update.data?.reportChunk) {
        process.stdout.write(update.data.reportChunk);
      }
      break;
    case 'complete':
      console.log('\nResearch complete!');
      break;
  }
}

🔧 Configuration Options

interface ResearchConfig {
  // Required
  query: string;                    // Research query
  
  // Report Configuration
  reportType?: ReportType;          // Type of report to generate
  reportFormat?: ReportFormat;      // Output format (markdown, pdf, docx)
  tone?: Tone;                      // Writing tone
  
  // LLM Configuration
  llmProvider?: string;             // LLM provider (openai, anthropic, etc.)
  smartLLMModel?: string;           // Model for complex tasks
  fastLLMModel?: string;            // Model for simple tasks
  temperature?: number;             // Generation temperature
  maxTokens?: number;               // Max tokens per generation
  
  // Search Configuration
  defaultRetriever?: string;        // Default search provider
  maxSearchResults?: number;        // Max results per search
  
  // Scraping Configuration
  defaultScraper?: string;          // Default scraper (cheerio, puppeteer)
  scrapingConcurrency?: number;     // Concurrent scraping operations
  
  // API Keys
  apiKeys?: {
    openai?: string;
    tavily?: string;
    serper?: string;
    google?: string;
    anthropic?: string;
    groq?: string;
  };
}

📋 Report Types

ResearchReport: Comprehensive research with citations
DetailedReport: In-depth analysis with extensive coverage
QuickSummary: Concise overview of key points
ResourceReport: Curated list of resources and references
OutlineReport: Structured outline for further research

🔍 Search Providers

Available Providers

Provider	Best For	Free Tier	API Key Required
Tavily	AI-optimized research	1,000/month	Yes - Get Key
Serper	Google search results	2,500/month	Yes - Get Key
Google	Custom search	100/day	Yes - Setup
DuckDuckGo	Privacy-focused	Unlimited	No

Choosing the Right Provider

Tavily: Best for AI research, academic papers, technical topics
Serper: Best for current events, general web search, Google quality
Google Custom Search: Best for specific domains, controlled results
DuckDuckGo: Best for privacy-sensitive research, no API needed

Using Multiple Providers

// Configure multiple providers for redundancy
const researcher = new GPTResearch({
  query: 'Your research topic',
  retrievers: ['tavily', 'serper'], // Falls back if one fails
  apiKeys: {
    tavily: process.env.TAVILY_API_KEY,
    serper: process.env.SERPER_API_KEY
  }
});

🔌 MCP (Model Context Protocol) Support

GPT Research now supports MCP for connecting to external tools and services!

What is MCP?

MCP (Model Context Protocol) is a standardized protocol for connecting AI systems to external tools and data sources. It enables seamless integration with various services through a unified interface.

MCP Features

Stdio MCP Servers - Local process spawning for NPX/binary tools (Node.js/Docker/VPS)
HTTP MCP Servers - RESTful API connections (works everywhere including Vercel)
WebSocket MCP - Real-time bidirectional communication (works everywhere)
Tool Discovery - Automatic discovery of available tools from all server types
Smart Selection - AI-powered tool selection based on research query
Streaming Updates - Real-time progress tracking via SSE
Mixed Mode - Combine stdio, HTTP, and WebSocket in the same application

MCP Usage Examples

HTTP/WebSocket MCP (Works everywhere including Vercel)

const researcher = new GPTResearch({
  query: "Latest AI developments",
  mcpConfigs: [
    {
      name: "research-tools",
      connectionType: "http",
      connectionUrl: "https://mcp.example.com",
      connectionToken: process.env.MCP_TOKEN
    }
  ],
  useMCP: true
});

Stdio MCP (Local tools - Node.js environments)

const researcher = new GPTResearch({
  query: "Analyze this codebase",
  mcpConfigs: [
    {
      name: "filesystem",
      connectionType: "stdio",
      command: "npx",
      args: ["@modelcontextprotocol/filesystem-server"],
      env: { READ_ONLY: "false" }
    },
    {
      name: "git",
      connectionType: "stdio",
      command: "git-mcp",
      args: ["--repo", "."]
    }
  ]
});

Mixed Mode (Combine all connection types)

const researcher = new GPTResearch({
  query: "Research topic",
  mcpConfigs: [
    // Local tools via stdio
    { name: "local-fs", connectionType: "stdio", command: "npx", args: ["fs-mcp"] },
    // Remote API via HTTP
    { name: "api", connectionType: "http", connectionUrl: "https://api.example.com/mcp" },
    // Real-time via WebSocket
    { name: "stream", connectionType: "websocket", connectionUrl: "wss://realtime.example.com" }
  ]
});

MCP Deployment Compatibility

MCP Type	Local/Node.js	Vercel	Docker	VPS/Cloud
HTTP Servers	✅ Full	✅ Full	✅ Full	✅ Full
WebSocket	✅ Full	✅ Full	✅ Full	✅ Full
Stdio	✅ Full	❌ Not Supported	✅ Full	✅ Full

Stdio MCP Notes:

Works perfectly in Node.js, Docker, VPS, and self-hosted environments
Not supported on Vercel, AWS Lambda, or other serverless platforms
For serverless deployments, use HTTP/WebSocket MCP or deploy a proxy server

Popular Stdio MCP Servers

These MCP servers can be run locally via stdio:

# File System Access
npx @modelcontextprotocol/filesystem-server

# Git Repository Tools  
npx @modelcontextprotocol/git-server

# Database Query Execution
npm install -g mcp-database
mcp-database

# Custom Python MCP Server
python -m mcp.server

# Shell Command Execution
cargo install mcp-shell
mcp-shell

Learn More

See examples/demo-mcp.js for HTTP/WebSocket demo
See examples/demo-mcp-stdio.js for stdio demo
Read MCP.md for implementation details
Check MCP Specification for protocol docs

🌐 Vercel Deployment

API Routes

Create API routes in your Next.js/Vercel project:

// api/research/route.js
import { GPTResearch } from 'gpt-research';

export async function POST(request) {
  const { query, reportType } = await request.json();
  
  const researcher = new GPTResearch({
    query,
    reportType,
    apiKeys: {
      openai: process.env.OPENAI_API_KEY,
      tavily: process.env.TAVILY_API_KEY
    }
  });
  
  const result = await researcher.conductResearch();
  
  return Response.json(result);
}

Streaming API

// api/research/stream/route.js
export async function POST(request) {
  const { query } = await request.json();
  
  const stream = new ReadableStream({
    async start(controller) {
      const researcher = new GPTResearch({ query });
      
      for await (const update of researcher.streamResearch()) {
        controller.enqueue(
          `data: ${JSON.stringify(update)}\n\n`
        );
      }
      
      controller.close();
    }
  });
  
  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive'
    }
  });
}

Environment Variables

Add to your Vercel project settings:

OPENAI_API_KEY=your-key
TAVILY_API_KEY=your-key
SERPER_API_KEY=your-key

🧪 Examples

# Basic example
npm run example

# OpenAI-only example (no web search)
npm run example:simple

# Full research with Tavily web search
npm run example:tavily

# Research using Serper (Google Search API)
npm run example:serper

Check the examples/ directory for more detailed usage examples.

📚 Documentation

🎯 Use Cases

Market Research: Analyze competitors, trends, and market opportunities
Academic Research: Gather and synthesize information for papers and studies
Content Creation: Research topics thoroughly for articles and blog posts
Technical Documentation: Research technical topics and generate comprehensive guides
Due Diligence: Conduct thorough research on companies, people, or topics
News Aggregation: Gather and summarize news from multiple sources

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

MIT License - see LICENSE file for details.

📊 Performance Considerations

Token Limits: Automatically manages context within token limits
Concurrent Operations: Configurable concurrency for searches and scraping
Cost Optimization: Uses appropriate models for different tasks
Caching: Caches scraped content to avoid redundant operations
Memory Management: Efficient in-memory storage with export/import capabilities

🔐 Security

API Key Management: Never commit API keys to version control
Input Validation: All URLs and inputs are validated
Rate Limiting: Built-in rate limiting for API calls
Error Handling: Comprehensive error handling and recovery

🎯 Roadmap

Add multi-language support
Add more LLM providers (Cohere, Together AI)
Implement research templates
Add PDF and DOCX report export

💡 Tips

Use Tavily for best results - It's specifically designed for AI research
Configure multiple search providers - Automatic fallback ensures reliability
Adjust concurrency based on your limits - Prevent rate limiting
Use streaming for long research - Better user experience
Monitor costs - Track LLM usage to manage expenses

🆘 Troubleshooting

Common Issues

Build Errors: Make sure you have Node.js 18+ and run npm install

API Key Errors: Verify your API keys are correct in .env

Rate Limiting: Reduce scrapingConcurrency and maxSearchResults

Memory Issues: For large research, increase Node.js memory:

node --max-old-space-size=4096 your-script.js

📧 Support

Issues: GitHub Issues

⭐ Show Your Support

If you find GPT Research helpful, please consider:

Giving us a star on GitHub
Sharing with your network
Contributing to the project

Built with ❤️ by Pablo Schaffner
Autonomous research for everyone