JSPM

@db2lake/driver-bigquery

0.1.3
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 8
  • Score
    100M100P100Q51179F
  • License MIT

BigQuery destination driver for db2lake

Package Exports

  • @db2lake/driver-bigquery
  • @db2lake/driver-bigquery/lib/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@db2lake/driver-bigquery) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

db2lake logo

@db2lake BigQuery Destination Driver

This package provides a high-performance BigQuery destination driver for the @db2lake data pipeline framework. It enables efficient data loading into BigQuery tables with support for both batch and streaming writes. The driver utilizes the @google-cloud/bigquery SDK for optimal performance and implements intelligent resource management.

Features

  • Lazy connection initialization for optimal resource usage
  • Automatic dataset and table creation with schema management
  • High-performance batch processing with configurable sizes
  • Support for streaming writes via BigQuery write streams
  • Intelligent batch buffering and automatic flushing
  • Robust error handling and resource cleanup
  • Full TypeScript support with generic types
  • Configurable write modes (append/truncate)

Installation & Setup

Install the package:

npm install @db2lake/driver-bigquery

Set up Google Cloud credentials:

  • Create a service account and download the JSON key file
  • Provide the key via bigQueryOptions.keyFilename or set GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
  • Never commit credentials to source control

Project Structure

├── src/
│   └── index.ts  # BigQueryDestinationDriver implementation
│   └── type.ts   # Configuration type definitions
└── package.json  # Package metadata and dependencies

Usage Examples

Basic Table Insert

import { BigQueryDestinationDriver, BigQueryConfig } from '@db2lake/driver-bigquery';

const config: BigQueryConfig = {
  bigQueryOptions: {
    keyFilename: './service-account.json',
    projectId: 'my-project-id'
  },
  dataset: 'my_dataset',
  table: 'users',
  batchSize: 1000,
  // Optional: use streaming for real-time inserts
  writeOptions: {
    sourceFormat: 'NEWLINE_DELIMITED_JSON'
  }
};

const driver = new BigQueryDestinationDriver<{name: string; age: number}>(config);
try {
  // Connection is optional - will be established on first insert
  await driver.connect();
  
  const users = [
    { name: 'John', age: 30 },
    { name: 'Jane', age: 25 }
  ];
  
  await driver.insert(users);
  // Batches are automatically flushed when reaching batchSize
} finally {
  await driver.close(); // Ensures all pending data is written
}

Advanced Usage with Schema Creation and Streaming

import { BigQueryDestinationDriver, BigQueryConfig } from '@db2lake/driver-bigquery';

interface OrderRecord {
  id: number;
  customer: string;
  amount: number;
  created_at: Date;
}

const config: BigQueryConfig = {
  bigQueryOptions: {
    keyFilename: './service-account.json',
    projectId: 'my-project-id'
  },
  dataset: 'my_dataset',
  table: 'orders',
  createTableOptions: {
    schema: [
      { name: 'id', type: 'INTEGER' },
      { name: 'customer', type: 'STRING' },
      { name: 'amount', type: 'NUMERIC' },
      { name: 'created_at', type: 'TIMESTAMP' }
    ],
    description: 'Order transactions table with automatic timestamp'
  },
  writeDisposition: 'WRITE_APPEND',
  batchSize: 500,
  writeOptions: {
    sourceFormat: 'NEWLINE_DELIMITED_JSON',
    createDisposition: 'CREATE_IF_NEEDED',
    writeDisposition: 'WRITE_APPEND',
    schema: {
      fields: [
        { name: 'id', type: 'INTEGER' },
        { name: 'customer', type: 'STRING' },
        { name: 'amount', type: 'NUMERIC' },
        { name: 'created_at', type: 'TIMESTAMP' }
      ]
    }
  }
};

const driver = new BigQueryDestinationDriver<OrderRecord>(config);
try {
  const orders: OrderRecord[] = [
    { 
      id: 1, 
      customer: 'John Doe', 
      amount: 150.75,
      created_at: new Date()
    }
  ];
  
  // Table will be created automatically if needed
  await driver.insert(orders);
} finally {
  await driver.close();
}

Configuration Options

Connection Options

  • bigQueryOptions: BigQuery client configuration (required)
    {
      keyFilename?: string;      // Path to service account key file
      projectId: string;         // Google Cloud project ID
      credentials?: Credentials; // Or direct credentials object
      // ... other BigQuery options
    }

Dataset and Table Options

  • dataset: BigQuery dataset ID (required)
  • table: BigQuery table ID (required)
  • batchSize: Maximum rows per batch (default: 1000)
  • writeDisposition: 'WRITE_APPEND' or 'WRITE_TRUNCATE'

Table Creation Configuration

  • createTableOptions: Settings for automatic table creation
    {
      schema: string | TableSchema; // "name:STRING,age:INTEGER" or schema object
      expirationTime?: number;      // Table expiration in ms from epoch
      description?: string;         // Table description
    }

Write Stream Options

  • writeOptions: Configuration for streaming writes
    {
      sourceFormat?: string;        // e.g., 'NEWLINE_DELIMITED_JSON'
      createDisposition?: string;   // e.g., 'CREATE_IF_NEEDED'
      writeDisposition?: string;    // e.g., 'WRITE_APPEND'
      schema?: TableSchema;         // Schema for streaming writes
      // ... other load job options
    }

Best Practices

Resource Management

const driver = new BigQueryDestinationDriver(config);
try {
  // Connection is established automatically on first insert
  await driver.insert(batch1);
  await driver.insert(batch2);
} finally {
  // ALWAYS close to ensure pending data is written
  await driver.close();
}

Batch Size Optimization

  • For standard inserts: 500-1000 rows per batch
  • For streaming: 100-500 rows for lower latency
  • Monitor memory usage and adjust accordingly

Error Handling

const driver = new BigQueryDestinationDriver(config);
try {
  await driver.insert(rows);
} catch (error) {
  if (error.code === 404) {
    // Handle table/dataset not found
  } else if (error.code === 400) {
    // Handle invalid data format
  } else {
    // Handle other errors
  }
  throw error;
} finally {
  await driver.close();
}

TypeScript Integration

Type-Safe Row Definitions

interface UserRecord {
  name: string;
  age: number;
  active: boolean;
  lastLogin?: Date;
}

const driver = new BigQueryDestinationDriver<UserRecord>(config);
// TypeScript will ensure all inserted rows match UserRecord
await driver.insert([
  { name: 'John', age: 30, active: true },
  { name: 'Jane', age: 25, active: true, lastLogin: new Date() }
]);

License

MIT