Package Exports
- @wholebuzz/fs/lib/csv
- @wholebuzz/fs/lib/csv.js
- @wholebuzz/fs/lib/fs
- @wholebuzz/fs/lib/fs.js
- @wholebuzz/fs/lib/gcp
- @wholebuzz/fs/lib/gcp.js
- @wholebuzz/fs/lib/json
- @wholebuzz/fs/lib/json.js
- @wholebuzz/fs/lib/local
- @wholebuzz/fs/lib/local.js
- @wholebuzz/fs/lib/merge
- @wholebuzz/fs/lib/merge.js
- @wholebuzz/fs/lib/parquet
- @wholebuzz/fs/lib/parquet.js
- @wholebuzz/fs/lib/s3
- @wholebuzz/fs/lib/s3.js
- @wholebuzz/fs/lib/stream
- @wholebuzz/fs/lib/stream.js
- @wholebuzz/fs/lib/tfrecord
- @wholebuzz/fs/lib/tfrecord.js
- @wholebuzz/fs/lib/util
- @wholebuzz/fs/lib/util.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@wholebuzz/fs) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
@wholebuzz/fs

File system abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, HTTP, and Local file systems. Provides atomic primitives enabling multiple readers and writers.
- LocalFileSystem employs content hashing to approximate GCS Object Versioning.
- GoogleCloudFileSystem provides consistent parallel access paterns.
- S3FileSystem provides basic file system primitives.
- SMBFileSystem provides basic file system primitives.
- HTTPFileSystem provides a basic HTTP file system.
Provides file format implementations for:
- Lines
- CSV (via csv)
- JSON, ND-JSON / JSONL (via JSONStream and ndjson)
- Parquet including
streamingParquetcodec and parquetjs. - TFRecord including tfrecord-stream.
Additionally provides sharding & merging utilities.
Dependencies
The FileSystem implementations require peer dependencies:
- AnyFileSystem: None. URL resolution as a
FileSystem. Files have URLs and HTTP is a file system. - AzureBlobStorageFileSystem:
@azure/storage-bloband@azure/identity - AzureFileShareFileSystem:
@azure/storage-file-share - GoogleCloudFileSystem:
@google-cloud/storage - HTTPFileSystem:
axios - LocalFileSystem:
fs-ext,glob, andglob-stream - S3FileSystem:
aws-sdk,s3-stream-upload, andathena-express - SMBFileSystem:
@marsaud/smb2
Credits
Built with the tree-stream primitives ReadableStreamTree and WritableStreamTree.
Project history
The project started to support @wholebuzz/archive, a terabyte-scale archive for GCS. The focus has since expanded to include powering dbcp and @wholebuzz/mapreduce with a collection of file system implementations under a common interface. The atomic primitives are only available for Google Cloud Storage and local.
Example
import { AnyFileSystem } from '@wholebuzz/fs/lib/fs'
import { GoogleCloudFileSystem } from '@wholebuzz/fs/lib/gcp'
import { HTTPFileSystem } from '@wholebuzz/fs/lib/http'
import { LocalFileSystem } from '@wholebuzz/fs/lib/local'
import { S3FileSystem } from '@wholebuzz/fs/lib/s3'
import { readJSON, writeJSON } from '@wholebuzz/fs/lib/json'
const httpFileSystem = new HTTPFileSystem()
const fs = new AnyFileSystem([
{ urlPrefix: 'gs://', fs: new GoogleCloudFileSystem() },
{ urlPrefix: 's3://', fs: new S3FileSystem() },
{ urlPrefix: 'http://', fs: httpFileSystem },
{ urlPrefix: 'https://', fs: httpFileSystem },
{ urlPrefix: '', fs: new LocalFileSystem() },
])
await writeJSON(fs, 's3://bucket/file', { foo: 'bar' })
const foobar = await readJSON(fs, 's3://bucket/file')CLI
node lib/cli.js ls .
node lib/cli.js --helpAPI Reference
Modules
Methods
- appendToFile
- copyFile
- createFile
- ensureDirectory
- fileExists
- getFileStatus
- moveFile
- openReadableFile
- openWritableFile
- queueRemoveFile
- readDirectory
- readDirectoryStream
- removeDirectory
- removeFile
- replaceFile
Constructors
constructor
+ new FileSystem(): FileSystem
Returns: FileSystem
Methods
appendToFile
▸ Abstract appendToFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, createOptions?: CreateOptions, appendOptions?: AppendOptions): Promise<null | FileStatus>
Appends to the file, safely. Either writeCallback or createCallback is called.
For simple appends, the same paramter can be supplied for both writeCallback and
createCallback.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to append to. |
writeCallback |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for appending to the file. |
createCallback? |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file, if necessary. |
createOptions? |
CreateOptions | Initial metadata for initializing the file, if necessary. |
appendOptions? |
AppendOptions | - |
Returns: Promise<null | FileStatus>
Defined in: src/fs.ts:209
copyFile
▸ Abstract copyFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Copies the file.
Parameters
| Name | Type | Description |
|---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:178
createFile
▸ Abstract createFile(urlText: string, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, options?: CreateOptions): Promise<boolean>
Creates file, failing if the file already exists.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to create. |
createCallback? |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file. |
options? |
CreateOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:155
ensureDirectory
▸ Abstract ensureDirectory(urlText: string, options?: EnsureDirectoryOptions): Promise<boolean>
Ensures the directory exists
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory. |
options? |
EnsureDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:109
fileExists
▸ Abstract fileExists(urlText: string): Promise<boolean>
Returns true if the file exists.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to check whether exists. |
Returns: Promise<boolean>
Defined in: src/fs.ts:121
getFileStatus
▸ Abstract getFileStatus(urlText: string, options?: GetFileStatusOptions): Promise<FileStatus>
Determines the file status. The file version is used to implement atomic mutations.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to retrieve the status for. |
options? |
GetFileStatusOptions | - |
Returns: Promise<FileStatus>
Defined in: src/fs.ts:127
moveFile
▸ Abstract moveFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Moves the file.
Parameters
| Name | Type | Description |
|---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:185
openReadableFile
▸ Abstract openReadableFile(url: string, options?: OpenReadableFileOptions): Promise<ReadableStreamTree>
Opens a file for reading.
optional version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
|---|---|---|
url |
string | The URL of the file to read from. |
options? |
OpenReadableFileOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:134
openWritableFile
▸ Abstract openWritableFile(url: string, options?: OpenWritableFileOptions): Promise<WritableStreamTree>
Opens a file for writing.
optional version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
|---|---|---|
url |
string | The URL of the file to write to. |
options? |
OpenWritableFileOptions | - |
Returns: Promise<WritableStreamTree>
Defined in: src/fs.ts:144
queueRemoveFile
▸ Abstract queueRemoveFile(urlText: string): Promise<boolean>
Queues deletion, e.g. after DaysSinceCustomTime.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:171
readDirectory
▸ Abstract readDirectory(urlText: string, options?: ReadDirectoryOptions): Promise<DirectoryEntry[]>
Returns the URLs of the files in a directory.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<DirectoryEntry[]>
Defined in: src/fs.ts:94
readDirectoryStream
▸ Abstract readDirectoryStream(urlText: string, options?: ReadDirectoryOptions): Promise<ReadableStreamTree>
Returns a stream of the URLs of the files in a directory.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:100
removeDirectory
▸ Abstract removeDirectory(urlText: string, options?: RemoveDirectoryOptions): Promise<boolean>
Removes the directory
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory. |
options? |
RemoveDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:115
removeFile
▸ Abstract removeFile(urlText: string): Promise<boolean>
Deletes the file.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:165
replaceFile
▸ Abstract replaceFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, options?: ReplaceFileOptions): Promise<boolean>
Replaces the file, failing if the file version doesn't match.
Parameters
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to replace. |
writeCallback |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for replacing the file. |
options? |
ReplaceFileOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:194 @wholebuzz/fs / Exports / json
Module: json
Table of contents
Variables
Functions
- newJSONLinesFormatter
- newJSONLinesParser
- parseJSON
- parseJSONLines
- pipeJSONFormatter
- pipeJSONLinesFormatter
- pipeJSONLinesParser
- pipeJSONParser
- readJSON
- readJSONHashed
- readJSONLines
- serializeJSON
- serializeJSONLines
- writeJSON
- writeJSONLines
- writeShardedJSONLines
Variables
JSONStream
• Const JSONStream: any
Defined in: src/json.ts:11
Functions
newJSONLinesFormatter
▸ Const newJSONLinesFormatter(): Transform
Returns: Transform
Defined in: src/json.ts:146
newJSONLinesParser
▸ Const newJSONLinesParser(): ThroughStream
Returns: ThroughStream
Defined in: src/json.ts:147
parseJSON
▸ parseJSON(stream: ReadableStreamTree): Promise<unknown>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
|---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown>
Defined in: src/json.ts:72
parseJSONLines
▸ parseJSONLines(stream: ReadableStreamTree): Promise<unknown[]>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
|---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:80
pipeJSONFormatter
▸ pipeJSONFormatter(stream: WritableStreamTree, isArray: boolean): WritableStreamTree
Create JSON formatter stream.
Parameters
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | - |
isArray |
boolean | Accept array objects or property tuples. |
Returns: WritableStreamTree
Defined in: src/json.ts:127
pipeJSONLinesFormatter
▸ pipeJSONLinesFormatter(stream: WritableStreamTree): WritableStreamTree
Create JSON-lines formatter stream.
Parameters
| Name | Type |
|---|---|
stream |
WritableStreamTree |
Returns: WritableStreamTree
Defined in: src/json.ts:142
pipeJSONLinesParser
▸ pipeJSONLinesParser(stream: ReadableStreamTree): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
|---|---|
stream |
ReadableStreamTree |
Returns: ReadableStreamTree
Defined in: src/json.ts:119
pipeJSONParser
▸ pipeJSONParser(stream: ReadableStreamTree, isArray: boolean): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
|---|---|
stream |
ReadableStreamTree |
isArray |
boolean |
Returns: ReadableStreamTree
Defined in: src/json.ts:110
readJSON
▸ readJSON(fileSystem: FileSystem, url: string): Promise<unknown>
Reads a serialized JSON object or array from a file.
Parameters
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown>
Defined in: src/json.ts:17
readJSONHashed
▸ readJSONHashed(fileSystem: FileSystem, url: string): Promise<[*unknown*, null | *string*]>
Reads a serialized JSON object from a file, and also hashes the file.
Parameters
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object from. |
Returns: Promise<[*unknown*, null | *string*]>
Defined in: src/json.ts:25
readJSONLines
▸ readJSONLines(fileSystem: FileSystem, url: string): Promise<unknown[]>
Reads a serialized JSON-lines array from a file.
Parameters
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:35
serializeJSON
▸ serializeJSON(stream: WritableStreamTree, obj: object | any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSON.
Parameters
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
object | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:88
serializeJSONLines
▸ serializeJSONLines(stream: WritableStreamTree, obj: any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSONLines.
Parameters
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:103
writeJSON
▸ writeJSON(fileSystem: FileSystem, url: string, value: object | any[]): Promise<boolean>
Serializes object or array to a JSON file.
Parameters
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON object or array to. |
value |
object | any[] | The object or array to serialize. |
Returns: Promise<boolean>
Defined in: src/json.ts:44
writeJSONLines
▸ writeJSONLines(fileSystem: FileSystem, url: string, obj: object[]): Promise<boolean>
Serializes array to a JSON Lines file.
Parameters
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON array to. |
obj |
object[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:53
writeShardedJSONLines
▸ writeShardedJSONLines(fileSystem: FileSystem, url: string, obj: object[], shards: number, shardFunction?: (x: object, modulus: number) => number): Promise<boolean>
Parameters
| Name | Type |
|---|---|
fileSystem |
FileSystem |
url |
string |
obj |
object[] |
shards |
number |
shardFunction |
(x: object, modulus: number) => number |
Returns: Promise<boolean>
Defined in: src/json.ts:57