JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 131
  • Score
    100M100P100Q92962F
  • License Apache-2.0

Spam Assassin public mail corpus.

Package Exports

  • @stdlib/datasets-spam-assassin

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@stdlib/datasets-spam-assassin) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Spam Assassin

NPM version Build Status Coverage Status dependencies

The Spam Assassin public mail corpus.

Installation

npm install @stdlib/datasets-spam-assassin

Usage

var corpus = require( '@stdlib/datasets-spam-assassin' );

corpus()

Returns the Spam Assassin public mail corpus.

var data = corpus();
// returns [{...},{...},...]

Each array element has the following fields:

  • id: message id (relative to message group)
  • group: message group
  • checksum: object containing checksum info
  • text: message text (including headers)

The message group may be one of the following:

  • easy-ham-1: easier to detect non-spam e-mails (2500 messages)
  • easy-ham-2: easier to detect non-spam e-mails collected at a later date (1400 messages)
  • hard-ham-1: harder to detect non-spam e-mails (250 messages)
  • spam-1: spam e-mails (500 messages)
  • spam-2: spam e-mails collected at a later date (1396 messages)

The checksum object contains the following fields:

  • type: checksum type (e.g., MD5)
  • value: checksum value

Examples

var corpus = require( '@stdlib/datasets-spam-assassin' );

var data;
var i;

data = corpus();
for ( i = 0; i < data.length; i++ ) {
    console.log( 'Character Count: %d', data[ i ].text.length );
}

CLI

Installation

To use the module as a general utility, install the module globally

npm install -g @stdlib/datasets-spam-assassin

Usage

Usage: spam-assassin [options]

Options:

  -h,    --help                Print this message.
  -V,    --version             Print the package version.
         --format fmt          Output format: 'txt' or 'ndjson'.

Notes

  • The CLI supports two output formats: plain text (txt) and newline-delimited JSON (NDJSON). The default output format is txt.

Examples

$ spam-assassin

License

The data files (databases) are licensed under an Open Data Commons Public Domain Dedication & License 1.0 and their contents are licensed under Creative Commons Zero v1.0 Universal. The software is licensed under Apache License, Version 2.0.


Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

Chat


Copyright © 2016-2021. The Stdlib Authors.