JSPM

de-identify-sql

1.0.4
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • 0
  • Score
    100M100P100Q8101F
  • License MIT

De-identify data in SQL statements

Package Exports

  • de-identify-sql
  • de-identify-sql/app.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (de-identify-sql) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

De-identify SQL

De-identify data in SQL statements.

De-identify data in SQL example

About

De-identify SQL removes protected data from a .sql file or piped input (e.g. mysqldump) and sends results to stdout or a new .sql file. De-identify SQL replaces protected data / PII / PHI using the following techniques:

  • Replacing data with a fabricated value.
  • Redacting data with a constant value.
  • Generating new data (customizable JavaScript).

De-identify SQL supports:

  • Reading from piped input or a passed-in file.
  • Writing to piped output or creating a file.

Installation

Install globally using npm i -g de-identify-sql

Usage

> de-identify-sql -h
Options:
  -v, --version            Show version number                         [boolean]
  -h, --help               Show help                                   [boolean]
  -o, --outputFile         Output to file instead of stdout             [string]
  -i, --inputFile          Read from file instead of stdin              [string]
  -s, --strategyDirectory  Location of SQL strategy files
                                   [string] [default: {installed-dir}/strategy}]
  -f, --functionsFile      Location of file that contains all custom functions
                           (optional, overwrites .js per table)         [string]

Examples

# Read from piped input, write to piped output ...
> mysqldump my-database | de-identify-sql | gzip > de-id-mysqldump.sql.gz
# Read from piped input, write to .sql ...
> mysqldump my-database | de-identify-sql -o de-id-mysqldump.sql
# Read from .sql, write to piped output ...
> de-identify-sql -i mysqldump.sql | gzip > de-id-mysqldump.sql.gz
# Read from .sql, write to .sql ...
> de-identify-sql -i mysqldump.sql -o de-id-mysqldump.sql
# Read from .sql, write to screen (test de-identification) ...
> de-identify-sql -i mysqldump.sql
# or
> cat mysqldump.sql | de-identify-sql

De-identifying data within SQL

De-identify SQL uses strategy files which are mapped to tables to modify SQL statements. Strategy files should be placed in the /strategy folder (configurable) and named after the table they act on. If the incoming SQL contains USING the format is DATABASE_NAME.TABLE_NAME.json otherwise the format is TABLE_NAME.json. A CREATE TABLE statement must be part of the input file.

Example: /strategy/user.json

{
  "columns": [
    {
      "redactWith": "internet.email",
      "columnKey": "email",
      "tracked": false
    },
    {
      "redactWith": "NAME REMOVED",
      "columnKey": "name",
      "tracked": false
    },
    {
      "redactWith": "generateDatetime",
      "columnKey": "last_visit",
      "tracked": false
    },
    {
      "redactWith": "{{datatype.number({\"min\":18,\"max\":90})}}",
      "columnKey": "age",
      "tracked": false
    }
  ]   
}

The columns array describes how each SQL column should be modified. Omitted columns pass-through without modification.

  • columnKey - (string) The column to be modified by de-identify-sql.

  • redactWith - (string) There are four ways to de-identify data:

    • A faker function - Possible functions : e.g. name.lastName or phone.phoneNumber
    • A faker template - A mustache template of faker methods: e.g. {{name.firstName}} {{name.lastName}}, {{name.jobTitle}} or {{address.streetAddress}} {{address.city}}
    • Custom JavaScript - You can call a function to create a value, these are defined in a .js file that matches the name of the .json file
    • A constant value - Replace with a constant, e.g. VALUE REMOVED
  • tracked - (boolean) Tracking preserves data relationships while de-identifying SQL.

Tracking: Data Relationships

De-identify SQL can preserve relationships within data. If original data repeats, de-identify SQL recognizes and replaces it with the same value it previously used.

This allows the generated SQL to retain its structure while removing protected data. For example, if there are multiple INSERTs which contain the same email address these would become different email addresses. However, if the tracked parameter is true, then every instance will be replaced with the same made-up value.