Package Exports

@db2lake/driver-bigquery
@db2lake/driver-bigquery/lib/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@db2lake/driver-bigquery) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

db2lake logo

@db2lake BigQuery Destination Driver

This package provides a high-performance BigQuery destination driver for the @db2lake data pipeline framework. It enables efficient data loading into BigQuery tables with support for both batch and streaming writes. The driver utilizes the @google-cloud/bigquery SDK for optimal performance and implements intelligent resource management.

Features

Lazy connection initialization for optimal resource usage
Automatic dataset and table creation with schema management
High-performance batch processing with configurable sizes
Support for streaming writes via BigQuery write streams
Intelligent batch buffering and automatic flushing
Robust error handling and resource cleanup
Full TypeScript support with generic types
Configurable write modes (append/truncate)

Installation & Setup

Install the package:

npm install @db2lake/driver-bigquery

Set up Google Cloud credentials:

Create a service account and download the JSON key file
Provide the key via bigQueryOptions.keyFilename or set GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
Never commit credentials to source control

Project Structure

├── src/
│   └── index.ts  # BigQueryDestinationDriver implementation
│   └── type.ts   # Configuration type definitions
└── package.json  # Package metadata and dependencies

Usage Examples

Basic Table Insert

import { BigQueryDestinationDriver, BigQueryConfig } from '@db2lake/driver-bigquery';

const config: BigQueryConfig = {
  bigQueryOptions: {
    keyFilename: './service-account.json',
    projectId: 'my-project-id'
  },
  dataset: 'my_dataset',
  table: 'users',
  batchSize: 1000,
  // Optional: use streaming for real-time inserts
  writeOptions: {
    sourceFormat: 'NEWLINE_DELIMITED_JSON'
  }
};

const driver = new BigQueryDestinationDriver<{name: string; age: number}>(config);
try {
  // Connection is optional - will be established on first insert
  await driver.connect();
  
  const users = [
    { name: 'John', age: 30 },
    { name: 'Jane', age: 25 }
  ];
  
  await driver.insert(users);
  // Batches are automatically flushed when reaching batchSize
} finally {
  await driver.close(); // Ensures all pending data is written
}

Advanced Usage with Schema Creation and Streaming

import { BigQueryDestinationDriver, BigQueryConfig } from '@db2lake/driver-bigquery';

interface OrderRecord {
  id: number;
  customer: string;
  amount: number;
  created_at: Date;
}

const config: BigQueryConfig = {
  bigQueryOptions: {
    keyFilename: './service-account.json',
    projectId: 'my-project-id'
  },
  dataset: 'my_dataset',
  table: 'orders',
  createTableOptions: {
    schema: [
      { name: 'id', type: 'INTEGER' },
      { name: 'customer', type: 'STRING' },
      { name: 'amount', type: 'NUMERIC' },
      { name: 'created_at', type: 'TIMESTAMP' }
    ],
    description: 'Order transactions table with automatic timestamp'
  },
  writeDisposition: 'WRITE_APPEND',
  batchSize: 500,
  writeOptions: {
    sourceFormat: 'NEWLINE_DELIMITED_JSON',
    createDisposition: 'CREATE_IF_NEEDED',
    writeDisposition: 'WRITE_APPEND',
    schema: {
      fields: [
        { name: 'id', type: 'INTEGER' },
        { name: 'customer', type: 'STRING' },
        { name: 'amount', type: 'NUMERIC' },
        { name: 'created_at', type: 'TIMESTAMP' }
      ]
    }
  }
};

const driver = new BigQueryDestinationDriver<OrderRecord>(config);
try {
  const orders: OrderRecord[] = [
    { 
      id: 1, 
      customer: 'John Doe', 
      amount: 150.75,
      created_at: new Date()
    }
  ];
  
  // Table will be created automatically if needed
  await driver.insert(orders);
} finally {
  await driver.close();
}

Configuration Options

Connection Options

bigQueryOptions: BigQuery client configuration (required)

{
  keyFilename?: string;      // Path to service account key file
  projectId: string;         // Google Cloud project ID
  credentials?: Credentials; // Or direct credentials object
  // ... other BigQuery options
}

Dataset and Table Options

dataset: BigQuery dataset ID (required)
table: BigQuery table ID (required)
batchSize: Maximum rows per batch (default: 1000)
writeDisposition: 'WRITE_APPEND' or 'WRITE_TRUNCATE'

Table Creation Configuration

createTableOptions: Settings for automatic table creation

{
  schema: string | TableSchema; // "name:STRING,age:INTEGER" or schema object
  expirationTime?: number;      // Table expiration in ms from epoch
  description?: string;         // Table description
}

Write Stream Options

writeOptions: Configuration for streaming writes

{
  sourceFormat?: string;        // e.g., 'NEWLINE_DELIMITED_JSON'
  createDisposition?: string;   // e.g., 'CREATE_IF_NEEDED'
  writeDisposition?: string;    // e.g., 'WRITE_APPEND'
  schema?: TableSchema;         // Schema for streaming writes
  // ... other load job options
}

Best Practices

Resource Management

const driver = new BigQueryDestinationDriver(config);
try {
  // Connection is established automatically on first insert
  await driver.insert(batch1);
  await driver.insert(batch2);
} finally {
  // ALWAYS close to ensure pending data is written
  await driver.close();
}

Batch Size Optimization

For standard inserts: 500-1000 rows per batch
For streaming: 100-500 rows for lower latency
Monitor memory usage and adjust accordingly

Error Handling

const driver = new BigQueryDestinationDriver(config);
try {
  await driver.insert(rows);
} catch (error) {
  if (error.code === 404) {
    // Handle table/dataset not found
  } else if (error.code === 400) {
    // Handle invalid data format
  } else {
    // Handle other errors
  }
  throw error;
} finally {
  await driver.close();
}

TypeScript Integration

Type-Safe Row Definitions

interface UserRecord {
  name: string;
  age: number;
  active: boolean;
  lastLogin?: Date;
}

const driver = new BigQueryDestinationDriver<UserRecord>(config);
// TypeScript will ensure all inserted rows match UserRecord
await driver.insert([
  { name: 'John', age: 30, active: true },
  { name: 'Jane', age: 25, active: true, lastLogin: new Date() }
]);

License

MIT