JSPM

@podx/scraper

2.0.2
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 10
  • Score
    100M100P100Q39309F
  • License ISC

๐Ÿ” Advanced Twitter/X scraping, bot detection, and crypto analysis toolkit for PODx

Package Exports

  • @podx/scraper
  • @podx/scraper/analyzers
  • @podx/scraper/auth
  • @podx/scraper/crypto
  • @podx/scraper/scrapers
  • @podx/scraper/services
  • @podx/scraper/signals

Readme

@podx/scraper

Version License TypeScript Bun

The scraper package provides comprehensive Twitter/X data collection, analysis, and signal generation capabilities for the PODx ecosystem. It includes advanced scraping algorithms, bot detection, sentiment analysis, cryptocurrency token extraction, and trading signal generation.

๐Ÿ“ฆ Installation

# Install from workspace
bun add @podx/scraper@workspace:*

# Or install from npm (when published)
bun add @podx/scraper

๐Ÿ—๏ธ Architecture

The scraper package is organized into several specialized modules:

packages/scraper/src/
โ”œโ”€โ”€ scrapers/         # Core scraping functionality
โ”‚   โ”œโ”€โ”€ baseScraper.ts    # Base scraper with authentication
โ”‚   โ”œโ”€โ”€ searchScraper.ts  # Search-based tweet scraping
โ”‚   โ”œโ”€โ”€ commentScraper.ts # Comment/reply scraping
โ”‚   โ””โ”€โ”€ index.ts          # Scraper exports
โ”œโ”€โ”€ services/         # Service layer
โ”‚   โ”œโ”€โ”€ index.ts          # Main ScraperService
โ”œโ”€โ”€ auth/             # Authentication handling
โ”œโ”€โ”€ analyzers/        # Data analysis modules
โ”‚   โ”œโ”€โ”€ BotDetector.ts    # Bot detection algorithms
โ”‚   โ”œโ”€โ”€ SentimentAnalyzer.ts # Sentiment analysis
โ”‚   โ”œโ”€โ”€ SignalGenerator.ts # Trading signal generation
โ”‚   โ”œโ”€โ”€ TokenExtractor.ts # Cryptocurrency token extraction
โ”‚   โ””โ”€โ”€ index.ts          # Analyzer exports
โ”œโ”€โ”€ crypto/           # Cryptocurrency analysis
โ”œโ”€โ”€ signals/          # Signal processing and generation
โ”œโ”€โ”€ types/            # TypeScript type definitions
โ””โ”€โ”€ index.ts          # Main exports

๐Ÿš€ Quick Start

import { ScraperService, SentimentAnalyzer, TokenExtractor } from '@podx/scraper';

// Initialize scraper service
const scraper = new ScraperService();

// Scrape tweets from a user
const tweets = await scraper.scrapeAccount({
  targetUsername: 'cryptowhale',
  maxTweets: 100,
  progressCallback: (progress) => {
    console.log(`Scraped ${progress.count}/${progress.max} tweets`);
  }
});

// Analyze sentiment
const analyzer = new SentimentAnalyzer();
const sentiment = await analyzer.analyze(tweets);

// Extract cryptocurrency tokens
const extractor = new TokenExtractor();
const tokens = await extractor.extract(tweets);

// Save results
const result = await scraper.saveTweetsToFile(tweets, 'cryptowhale');
console.log(`Saved ${tweets.length} tweets to ${result.filename}`);

๐Ÿ” Authentication

The scraper supports multiple authentication methods for Twitter/X API access:

Environment Variables Setup

# Required credentials
export XSERVE_USERNAME="your_twitter_username"
export XSERVE_PASSWORD="your_twitter_password"
export XSERVE_EMAIL="your_email@example.com"  # Optional, for account recovery

Authentication Flow

import { ScraperService } from '@podx/scraper';

const scraper = new ScraperService();

// Authentication happens automatically on first API call
try {
  const tweets = await scraper.scrapeAccount({
    targetUsername: 'example',
    maxTweets: 10
  });

  console.log('Authentication successful!');
} catch (error) {
  if (error.code === 'AUTHENTICATION_FAILED') {
    console.error('Please check your Twitter credentials');
  }
}

๐Ÿ“Š Core Scraping Features

Account Scraping

import { ScraperService } from '@podx/scraper';

const scraper = new ScraperService();

// Scrape tweets from a specific account
const tweets = await scraper.scrapeAccount({
  targetUsername: 'VitalikButerin',
  maxTweets: 500,
  progressCallback: (progress) => {
    const percent = Math.round((progress.count / progress.max) * 100);
    console.log(`Progress: ${percent}% (${progress.count}/${progress.max})`);
  }
});

// Process scraped tweets
tweets.forEach(tweet => {
  console.log(`@${tweet.username}: ${tweet.text}`);
  console.log(`Likes: ${tweet.likes}, Retweets: ${tweet.retweets}`);
});

Search-Based Scraping

import { SearchScraper } from '@podx/scraper/scrapers';

// Search for tweets with specific criteria
const searchScraper = new SearchScraper();

const tweets = await searchScraper.search({
  query: 'bitcoin OR ethereum',
  maxTweets: 1000,
  filters: {
    language: 'en',
    dateRange: {
      from: new Date('2024-01-01'),
      to: new Date('2024-01-31')
    },
    minLikes: 10,
    minRetweets: 5
  }
});

Comment/Reply Scraping

import { CommentScraper } from '@podx/scraper/scrapers';

const commentScraper = new CommentScraper();

// Scrape replies to a specific tweet
const replies = await commentScraper.scrapeComments({
  tweetId: '1234567890123456789',
  maxReplies: 200,
  includeNested: true  // Include replies to replies
});

// Analyze conversation threads
const threads = commentScraper.buildConversationThreads(replies);

๐Ÿง  Advanced Analysis

Bot Detection

import { BotDetector } from '@podx/scraper/analyzers';

const detector = new BotDetector();

// Analyze tweets for bot-like behavior
const analysis = await detector.analyze(tweets);

analysis.results.forEach(result => {
  console.log(`@${result.username}: ${result.botProbability}% bot probability`);
  console.log(`Reasons: ${result.reasons.join(', ')}`);
});

// Filter out likely bots
const humanTweets = analysis.results
  .filter(result => result.botProbability < 30)
  .map(result => result.tweet);

Sentiment Analysis

import { SentimentAnalyzer } from '@podx/scraper/analyzers';

const sentimentAnalyzer = new SentimentAnalyzer();

// Analyze sentiment of tweets
const sentimentResults = await sentimentAnalyzer.analyze(tweets);

sentimentResults.forEach(result => {
  console.log(`Tweet: ${result.text}`);
  console.log(`Sentiment: ${result.sentiment} (${result.confidence}%)`);
  console.log(`Emotions: ${result.emotions.join(', ')}`);
});

// Aggregate sentiment over time
const timeSeries = sentimentAnalyzer.createSentimentTimeSeries(sentimentResults);

Cryptocurrency Token Extraction

import { TokenExtractor } from '@podx/scraper/analyzers';

const tokenExtractor = new TokenExtractor();

// Extract cryptocurrency mentions and addresses
const tokenResults = await tokenExtractor.extract(tweets);

tokenResults.forEach(result => {
  console.log(`Found ${result.tokens.length} tokens in tweet`);
  result.tokens.forEach(token => {
    console.log(`- ${token.symbol}: ${token.address} (${token.blockchain})`);
    console.log(`  Context: ${token.context}`);
    console.log(`  Confidence: ${token.confidence}%`);
  });
});

// Get trending tokens
const trending = tokenExtractor.getTrendingTokens(tokenResults, {
  timeframe: '24h',
  minMentions: 5
});

๐Ÿ“ˆ Signal Generation

Trading Signals

import { SignalGenerator } from '@podx/scraper/analyzers';

const signalGenerator = new SignalGenerator();

// Generate trading signals from tweet analysis
const signals = await signalGenerator.generateSignals({
  tweets,
  sentimentResults,
  tokenResults,
  marketData: {
    btcPrice: 45000,
    ethPrice: 2500
  }
});

signals.forEach(signal => {
  console.log(`Signal: ${signal.type} for ${signal.token}`);
  console.log(`Strength: ${signal.strength}/10`);
  console.log(`Reason: ${signal.reason}`);
  console.log(`Confidence: ${signal.confidence}%`);
  console.log(`Timeframe: ${signal.timeframe}`);
});

// Filter high-confidence signals
const strongSignals = signals.filter(s => s.confidence > 80 && s.strength >= 7);

Market Sentiment Signals

// Generate market sentiment signals
const marketSignals = await signalGenerator.generateMarketSignals({
  tweets,
  sentimentData: sentimentResults,
  tokenData: tokenResults,
  marketContext: {
    overallSentiment: 'bullish',
    fearGreedIndex: 75,
    volume24h: 1250000000
  }
});

marketSignals.forEach(signal => {
  console.log(`Market Signal: ${signal.type}`);
  console.log(`Direction: ${signal.direction}`);
  console.log(`Strength: ${signal.strength}`);
  console.log(`Timeframe: ${signal.timeframe}`);
  console.log(`Rationale: ${signal.rationale}`);
});

๐Ÿ”ง Advanced Configuration

Custom Scraping Options

import { ScraperService } from '@podx/scraper';

// Advanced scraping with custom options
const scraper = new ScraperService();

const tweets = await scraper.scrapeAccount({
  targetUsername: 'crypto_influencer',
  maxTweets: 1000,
  filters: {
    minLikes: 10,
    minRetweets: 5,
    dateRange: {
      from: new Date('2024-01-01'),
      to: new Date('2024-01-31')
    },
    language: 'en',
    excludeReplies: false,
    excludeRetweets: true
  },
  rateLimit: {
    requestsPerMinute: 30,
    delayBetweenRequests: 2000
  },
  retryPolicy: {
    maxRetries: 3,
    backoffMultiplier: 2,
    initialDelay: 1000
  }
});

Custom Analysis Pipeline

import { 
  SentimentAnalyzer, 
  TokenExtractor, 
  BotDetector,
  SignalGenerator 
} from '@podx/scraper/analyzers';

// Create custom analysis pipeline
class CryptoAnalysisPipeline {
  constructor(
    private sentimentAnalyzer = new SentimentAnalyzer(),
    private tokenExtractor = new TokenExtractor(),
    private botDetector = new BotDetector(),
    private signalGenerator = new SignalGenerator()
  ) {}

  async analyze(tweets: Tweet[]): Promise<AnalysisResult> {
    // Step 1: Filter out bots
    const botAnalysis = await this.botDetector.analyze(tweets);
    const humanTweets = botAnalysis.results
      .filter(r => r.botProbability < 50)
      .map(r => r.tweet);

    // Step 2: Analyze sentiment
    const sentiment = await this.sentimentAnalyzer.analyze(humanTweets);

    // Step 3: Extract tokens
    const tokens = await this.tokenExtractor.extract(humanTweets);

    // Step 4: Generate signals
    const signals = await this.signalGenerator.generateSignals({
      tweets: humanTweets,
      sentimentResults: sentiment,
      tokenResults: tokens
    });

    return {
      originalTweetCount: tweets.length,
      humanTweets: humanTweets.length,
      sentiment,
      tokens,
      signals,
      analysisTimestamp: new Date()
    };
  }
}

// Use the pipeline
const pipeline = new CryptoAnalysisPipeline();
const result = await pipeline.analyze(tweets);

๐Ÿ’พ Data Storage and Export

File Storage

import { ScraperService } from '@podx/scraper';

const scraper = new ScraperService();

// Scrape and save to file automatically
const result = await scraper.scrapeAndSave({
  targetUsername: 'cryptopunk',
  maxTweets: 200
});

console.log(`Saved ${result.tweets.length} tweets to ${result.filename}`);

// Custom file naming and organization
const customResult = await scraper.saveTweetsToFile(
  tweets, 
  'custom_username',
  {
    format: 'json',
    compress: true,
    includeMetadata: true,
    splitByDate: true
  }
);

Database Integration

import { DatabaseService } from '@podx/core';
import { ScraperService } from '@podx/scraper';

const db = new DatabaseService(config.database);
const scraper = new ScraperService();

// Scrape and store in database
const tweets = await scraper.scrapeAccount({
  targetUsername: 'defi_pulse',
  maxTweets: 100
});

// Store with analysis
for (const tweet of tweets) {
  const analysis = await analyzer.analyze([tweet]);
  const tokens = await tokenExtractor.extract([tweet]);

  await db.save('analyzed_tweets', {
    ...tweet,
    sentiment: analysis[0]?.sentiment,
    tokens: tokens[0]?.tokens || [],
    analyzedAt: new Date()
  });
}

๐Ÿ“Š Analytics and Reporting

Generate Reports

import { AnalyticsEngine } from '@podx/scraper/analytics';

const analytics = new AnalyticsEngine();

// Generate comprehensive report
const report = await analytics.generateReport({
  tweets,
  sentimentResults,
  tokenResults,
  signals,
  timeframe: {
    from: new Date('2024-01-01'),
    to: new Date('2024-01-31')
  }
});

// Export report
await analytics.exportReport(report, {
  format: 'pdf',
  includeCharts: true,
  includeRawData: false
});

// Get insights
const insights = analytics.extractInsights(report);
console.log('Key Insights:');
insights.forEach(insight => {
  console.log(`- ${insight.category}: ${insight.description}`);
});

Real-time Monitoring

import { RealtimeMonitor } from '@podx/scraper/monitoring';

const monitor = new RealtimeMonitor({
  targetUsernames: ['cryptowhale', 'defi_pulse'],
  keywords: ['bitcoin', 'ethereum', 'defi'],
  updateInterval: 30000  // 30 seconds
});

// Monitor with callbacks
monitor.onTweet((tweet) => {
  console.log(`New tweet from @${tweet.username}: ${tweet.text}`);
});

monitor.onSignal((signal) => {
  console.log(`New signal: ${signal.type} for ${signal.token}`);
  // Send notification, update dashboard, etc.
});

// Start monitoring
await monitor.start();

๐Ÿ”ง API Reference

ScraperService

scrapeAccount(options: ScrapingOptions): Promise<Tweet[]>

Scrapes tweets from a specific Twitter account.

Parameters:

  • targetUsername: string - Twitter username to scrape
  • maxTweets: number - Maximum number of tweets to scrape
  • progressCallback?: (progress: ScrapingProgress) => void - Progress callback

scrapeAndSave(options: ScrapingOptions): Promise<{ tweets: Tweet[]; filename: string }>

Scrapes tweets and saves them to file.

saveTweetsToFile(tweets: Tweet[], username: string): Promise<string>

Saves tweets to a JSON file.

Analyzers

SentimentAnalyzer.analyze(tweets: Tweet[]): Promise<SentimentResult[]>

Analyzes sentiment of tweets.

TokenExtractor.extract(tweets: Tweet[]): Promise<TokenResult[]>

Extracts cryptocurrency tokens from tweets.

BotDetector.analyze(tweets: Tweet[]): Promise<BotAnalysis>

Detects bot-like behavior in tweets.

SignalGenerator.generateSignals(params: SignalParams): Promise<Signal[]>

Generates trading signals from tweet analysis.

๐Ÿ“‹ Data Types

Tweet

interface Tweet {
  id: string;
  username: string;
  text: string;
  createdAt: Date;
  likes: number;
  retweets: number;
  replies: number;
  isReply: boolean;
  isRetweet: boolean;
  media?: MediaData[];
  hashtags: string[];
  mentions: string[];
  urls: string[];
}

SentimentResult

interface SentimentResult {
  tweet: Tweet;
  sentiment: 'positive' | 'negative' | 'neutral';
  confidence: number;
  emotions: string[];
  score: number;
}

TokenResult

interface TokenResult {
  tweet: Tweet;
  tokens: TokenMention[];
}

interface TokenMention {
  symbol: string;
  address?: string;
  blockchain: string;
  context: string;
  confidence: number;
}

Signal

interface Signal {
  id: string;
  type: 'buy' | 'sell' | 'hold' | 'alert';
  token: string;
  strength: number;  // 1-10
  confidence: number; // 0-100
  reason: string;
  timeframe: string;
  timestamp: Date;
  supportingTweets: Tweet[];
}

๐Ÿงช Testing

import { describe, test, expect, mock } from 'bun:test';
import { ScraperService } from '@podx/scraper';

describe('ScraperService', () => {
  test('should scrape tweets from account', async () => {
    const scraper = new ScraperService();

    // Mock the scraper
    mock.module('agent-twitter-client', () => ({
      Scraper: class {
        async login() {}
        async getTweets() {
          return [
            {
              id: '1',
              username: 'testuser',
              text: 'Hello world!',
              createdAt: new Date(),
              likes: 10,
              retweets: 5,
              replies: 2
            }
          ];
        }
      }
    }));

    const tweets = await scraper.scrapeAccount({
      targetUsername: 'testuser',
      maxTweets: 1
    });

    expect(tweets).toHaveLength(1);
    expect(tweets[0].username).toBe('testuser');
  });

  test('should handle authentication errors', async () => {
    const scraper = new ScraperService();

    // Mock authentication failure
    mock.module('agent-twitter-client', () => ({
      Scraper: class {
        async login() {
          throw new Error('Invalid credentials');
        }
      }
    }));

    expect(async () => {
      await scraper.scrapeAccount({
        targetUsername: 'testuser',
        maxTweets: 1
      });
    }).toThrow('Invalid credentials');
  });
});

โšก Performance Optimization

Rate Limiting

// Configure rate limiting to avoid Twitter API limits
const scraper = new ScraperService({
  rateLimit: {
    requestsPerMinute: 30,
    delayBetweenRequests: 2000
  }
});

Caching

// Cache analysis results to improve performance
const cache = new AnalysisCache();

const analyzer = new SentimentAnalyzer({
  cache: cache
});

// Results are cached automatically
const result1 = await analyzer.analyze(tweets);
const result2 = await analyzer.analyze(tweets); // Uses cache

Parallel Processing

// Process multiple accounts in parallel
const usernames = ['user1', 'user2', 'user3'];
const results = await Promise.allSettled(
  usernames.map(username =>
    scraper.scrapeAccount({ targetUsername: username, maxTweets: 100 })
  )
);

๐Ÿ”’ Security Considerations

  • Credential Protection: Never store credentials in code
  • Rate Limiting: Respect Twitter's API limits
  • Data Privacy: Handle user data responsibly
  • Error Handling: Don't expose sensitive information in errors
  • Logging: Be careful with sensitive data in logs

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“ License

This package is licensed under the ISC License. See the LICENSE file for details.

๐Ÿ“ž Support

For support and questions: