Gluestick TypeScript

A powerful TypeScript library for data processing and ETL operations on the hotglue IPaaS platform, built with Polars for high-performance data manipulation. Supports multiple export formats including CSV, JSON, Parquet, and Singer specification.

Installation

npm install @hotglue/gluestick-ts

Quick Start

import * as gs from '@hotglue/gluestick-ts';

// Create a Reader to access your data
const reader = new gs.Reader();

// Get available data streams
const streams = reader.keys();
console.log('Available streams:', streams);

// Read and process a specific stream
const dataFrame = reader.get('your_stream_name', { catalogTypes: true });

// Export processed data (defaults to singer)
gs.toExport(dataFrame, 'output_name', './etl-output');

// Export as CSV
gs.toExport(dataFrame, 'output_name', './etl-output', { exportFormat: 'csv' });

Core Components

Reader Class

The Reader class is your main interface for accessing data streams:

const reader = new gs.Reader(inputDir?, rootDir?);

Methods:

get(stream, options) - Read a specific stream as a Polars DataFrame
keys() - Get all available stream names
getPk(stream) - Get primary keys for a stream from catalog

Options:

catalogTypes: boolean - Use catalog for automatic type inference
Other options will be passed through to Polars when reading. See ReadCSV and ReadParquet options for more information

Export Functions

Export your processed data in multiple formats:

gs.toExport(dataFrame, outputName, outputDir, options?);

Supported formats:

Singer (default) - Singer specification format for data integration
CSV - Comma-separated values
JSON - Single JSON array
JSONL - Newline-delimited JSON
Parquet - Columnar storage format

Development

Build the project:

npm run build

Run examples:

# Run CSV processing example
npm run run:example:csv

# Run Parquet processing example  
npm run run:example:parquet

API Reference

Reader Constructor

new Reader(inputDir?: string, rootDir?: string)

inputDir - Custom input directory (default: ${rootDir}/sync-output)
rootDir - Root directory (default: process.env.ROOT_DIR || '.')

Reader Methods

`get(stream: string, options?: ReadOptions): DataFrame | null`

Read a data stream as a Polars DataFrame.

const df = reader.get('users', { catalogTypes: true });

Options:

catalogTypes: boolean - Use catalog for automatic type inference

`keys(): string[]`

Get all available stream names.

const streams = reader.keys();
// Returns: ['users', 'orders', 'products']

`getPk(stream: string): string[] | null`

Get primary keys for a stream from the catalog.

const primaryKeys = reader.getPk('users');
// Returns: ['id']

Export Function

toExport(
  dataFrame: DataFrame,
  outputName: string,
  outputDir: string,
  options?: ExportOptions
): void

Parameters:

dataFrame - Polars DataFrame to export
outputName - Name for the output file (without extension)
outputDir - Directory to write the output file
options - Export configuration options

Export Options:

interface ExportOptions {
  exportFormat?: 'csv' | 'json' | 'jsonl' | 'parquet' | 'singer';
  outputFilePrefix?: string;
  keys?: string[];  // Primary keys for the data
  stringifyObjects?: boolean;
  reservedVariables?: Record<string, string>;
  allowObjects?: boolean;  // For Singer format
  schema?: SingerHeaderMap;  // For Singer format
}

Examples:

// Export as CSV with prefix
gs.toExport(dataFrame, 'processed_users', './output', {
  exportFormat: 'csv',
  outputFilePrefix: 'tenant_123_',
  keys: ['user_id']
});

// Export as Singer format
gs.toExport(dataFrame, 'processed_users', './output', {
  exportFormat: 'singer',
  allowObjects: true,
  keys: ['user_id']
});

Singer Format Support

Export data in Singer specification format for data integration pipelines:

// Basic Singer export
gs.toExport(dataFrame, 'users', './output', {
  exportFormat: 'singer',
  keys: ['id']
});

// Singer export with object support
gs.toExport(dataFrame, 'users', './output', {
  exportFormat: 'singer',
  allowObjects: true,
  keys: ['id']
});

The Singer export automatically generates SCHEMA, RECORD, and STATE messages according to the Singer specification.