JSPM

@hotglue/gluestick-ts

0.1.7
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 112547
    • Score
      100M100P100Q183993F
    • License MIT

    TypeScript version of the gluestick ETL library for hotglue IPaaS platform

    Package Exports

    • @hotglue/gluestick-ts

    Readme

    Gluestick TypeScript

    A powerful TypeScript library for data processing and ETL operations on the hotglue IPaaS platform, built with Polars for high-performance data manipulation. Supports multiple export formats including CSV, JSON, Parquet, and Singer specification.

    Installation

    npm install @hotglue/gluestick-ts

    npm version

    Quick Start

    import * as gs from '@hotglue/gluestick-ts';
    
    // Create a Reader to access your data
    const reader = new gs.Reader();
    
    // Get available data streams
    const streams = reader.keys();
    console.log('Available streams:', streams);
    
    // Read and process a specific stream
    const dataFrame = reader.get('your_stream_name', { catalogTypes: true });
    
    // Export processed data (defaults to singer)
    gs.toExport(dataFrame, 'output_name', './etl-output');
    
    // Export as CSV
    gs.toExport(dataFrame, 'output_name', './etl-output', { exportFormat: 'csv' });

    Core Components

    Reader Class

    The Reader class is your main interface for accessing data streams:

    const reader = new gs.Reader(inputDir?, rootDir?);

    Methods:

    • get(stream, options) - Read a specific stream as a Polars DataFrame
    • keys() - Get all available stream names
    • getPk(stream) - Get primary keys for a stream from catalog

    Options:

    • catalogTypes: boolean - Use catalog for automatic type inference
    • Other options will be passed through to Polars when reading. See ReadCSV and ReadParquet options for more information

    Export Functions

    Export your processed data in multiple formats:

    gs.toExport(dataFrame, outputName, outputDir, options?);

    Supported formats:

    • Singer (default) - Singer specification format for data integration
    • CSV - Comma-separated values
    • JSON - Single JSON array
    • JSONL - Newline-delimited JSON
    • Parquet - Columnar storage format

    Development

    Build the project:

    npm run build

    Run examples:

    # Run CSV processing example
    npm run run:example:csv
    
    # Run Parquet processing example  
    npm run run:example:parquet

    API Reference

    Reader Constructor

    new Reader(inputDir?: string, rootDir?: string)
    • inputDir - Custom input directory (default: ${rootDir}/sync-output)
    • rootDir - Root directory (default: process.env.ROOT_DIR || '.')

    Reader Methods

    get(stream: string, options?: ReadOptions): DataFrame | null

    Read a data stream as a Polars DataFrame.

    const df = reader.get('users', { catalogTypes: true });

    Options:

    • catalogTypes: boolean - Use catalog for automatic type inference

    keys(): string[]

    Get all available stream names.

    const streams = reader.keys();
    // Returns: ['users', 'orders', 'products']

    getPk(stream: string): string[] | null

    Get primary keys for a stream from the catalog.

    const primaryKeys = reader.getPk('users');
    // Returns: ['id']

    Export Function

    toExport(
      dataFrame: DataFrame,
      outputName: string,
      outputDir: string,
      options?: ExportOptions
    ): void

    Parameters:

    • dataFrame - Polars DataFrame to export
    • outputName - Name for the output file (without extension)
    • outputDir - Directory to write the output file
    • options - Export configuration options

    Export Options:

    interface ExportOptions {
      exportFormat?: 'csv' | 'json' | 'jsonl' | 'parquet' | 'singer';
      outputFilePrefix?: string;
      keys?: string[];  // Primary keys for the data
      stringifyObjects?: boolean;
      reservedVariables?: Record<string, string>;
      allowObjects?: boolean;  // For Singer format
      schema?: SingerHeaderMap;  // For Singer format
    }

    Examples:

    // Export as CSV with prefix
    gs.toExport(dataFrame, 'processed_users', './output', {
      exportFormat: 'csv',
      outputFilePrefix: 'tenant_123_',
      keys: ['user_id']
    });
    
    // Export as Singer format
    gs.toExport(dataFrame, 'processed_users', './output', {
      exportFormat: 'singer',
      allowObjects: true,
      keys: ['user_id']
    });

    Singer Format Support

    Export data in Singer specification format for data integration pipelines:

    // Basic Singer export
    gs.toExport(dataFrame, 'users', './output', {
      exportFormat: 'singer',
      keys: ['id']
    });
    
    // Singer export with object support
    gs.toExport(dataFrame, 'users', './output', {
      exportFormat: 'singer',
      allowObjects: true,
      keys: ['id']
    });

    The Singer export automatically generates SCHEMA, RECORD, and STATE messages according to the Singer specification.