Package Exports
- parse-domain
Readme
parse-domain
Splits a hostname into subdomains, domain and (effective) top-level domains. Written in TypeScript.
Since domain name registrars are organizing their namespaces in different ways, it's not straight-forward to recognize subdomains, domain and top-level domains in a hostname. parse-domain validates the given hostname against RFC 1034 and uses a large list of known top-level domains from publicsuffix.org to split a hostname into different parts:
import {parseDomain} from "parse-domain";
const {subDomains, domain, topLevelDomains} = parseDomain(
// This should be a string with basic latin characters only.
// More information below.
"www.some.example.co.uk",
);
console.log(subDomains); // ["www", "some"]
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
This module uses a trie data structure under the hood to ensure the smallest possible library size and the fastest lookup. The library is roughly 30KB minified and gzipped.
Installation
npm install parse-domain
Updates
💡 Please note: publicsuffix.org is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of npm publish
. In order to get an up-to-date list, you should run npx parse-domain-update
everytime you start or build your application. This will download the latest list from https://publicsuffix.org/list/public_suffix_list.dat
.
Expected input
⚠️ parseDomain
does not parse whole URLs. You should only pass the puny-encoded hostname section of the URL:
❌ Wrong | ✅ Correct |
---|---|
https://user@www.example.com:8080/path?query |
www.example.com |
münchen.de |
xn--mnchen-3ya.de |
食狮.com.cn?query |
xn--85x722f.com.cn |
There is the utility function fromUrl
which tries to extract the hostname from a (partial) URL and puny-encodes it:
import {parseDomain, fromUrl} from "parse-domain";
const {subDomains, domain, topLevelDomains} = parseDomain(
fromUrl("https://www.münchen.de?query"),
);
console.log(subDomains); // ["www"]
console.log(domain); // "xn--mnchen-3ya"
console.log(topLevelDomains); // ["de"]
// You can use the 'punycode' NPM package to decode the domain again
import {toUnicode} from "punycode";
console.log(toUnicode(domain)); // "münchen"
fromUrl
uses the URL
constructor under the hood. Depending on your target environments you need to make sure that there is a polyfill for it. It's globally available in all modern browsers (no IE) and in Node v10.
Expected output
When parsing a domain there are 4 possible results:
- invalid
- formally correct and the domain name
- is reserved
- is not listed in the public suffix list
- is listed
parseDomain
returns a ParseResult
with a type
property that allows to distinguish these cases.
👉 Invalid domains
The given input is first validated against RFC 1034. If the validation fails, parseResult.type
will be ParseResultType.Invalid
:
import {parseDomain, ParseResultType} from "parse-domain";
const parseResult = parseDomain("münchen.de");
console.log(parseResult.type === ParseResultType.Invalid); // true
Check out the API if you need more information about the validation error.
👉 Reserved domains
There are 5 top-level domains that are not listed in the public suffix list but reserved according to RFC 6761 and RFC 6762:
localhost
local
example
invalid
test
In these cases, parseResult.type
will be ParseResultType.Reserved
:
import {parseDomain, ParseResultType} from "parse-domain";
const parseResult = parseDomain("pecorino.local");
console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(parseResult.domains); // ["pecorino", "local"]
👉 Domains that are not listed
If the given hostname is valid, but not listed in the downloaded public suffix list, parseResult.type
will be ParseResultType.NotListed
:
import {parseDomain, ParseResultType} from "parse-domain";
const parseResult = parseDomain("this.is.not-listed");
console.log(parseResult.type === ParseResultType.NotListed); // true
console.log(parseResult.domains); // ["this", "is", "not-listed"]
If a domain is not listed, it can be caused by an outdated list. Make sure to update the list once in a while.
⚠️ Do not treat parseDomain as authoritative answer. It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.
👉 Effective top-level domains
Technically, the term top-level domain describes the very last domain in a hostname (uk
in example.co.uk
). Most people, however, use the term top-level domain for the public suffix which is a namespace "under which Internet users can directly register names".
Some examples for public suffixes:
com
inexample.com
co.uk
inexample.co.uk
co
inexample.co
com.co
inexample.com.co
If the hostname contains domains from the public suffix list, the parseResult.type
will be ParseResultType.Listed
:
import {parseDomain, ParseResultType} from "parse-domain";
const parseResult = parseDomain("example.co.uk");
console.log(parseResult.type === ParseResultType.Listed); // true
console.log(parseResult.domains); // ["example", "co", "uk"]
Now parseResult
will also provide a subDomains
, domain
and topLevelDomains
property:
const {subDomains, domain, topLevelDomains} = parseResult;
console.log(subDomains); // []
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
👉 Switch over parseResult.type
to distinguish between different parse results
We recommend switching over the parseResult.type
:
switch (parseResult.type) {
case ParseResultType.Listed: {
const {hostname, topLevelDomains} = parseResult;
console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
break;
}
case ParseResultType.Reserved:
case ParseResultType.NotListed: {
const {hostname} = parseResult;
console.log(`${hostname} is a reserved or unknown domain`);
break;
}
default:
throw new Error(`${hostname} is an invalid domain`);
}
⚠️ Effective top-level domains vs. ICANN
What's surprising to a lot of people is that the definition of public suffix means that regular business domains can become effective top-level domains:
const {subDomains, domain, topLevelDomains} = parseDomain(
"parse-domain.github.io",
);
console.log(subDomains); // []
console.log(domain); // "parse-domain"
console.log(topLevelDomains); // ["github", "io"] 🤯
In this case github.io
is nothing else than a private domain name registrar. github.io
is the effective top-level domain and browsers are treating it like that.
If you're only interested in top-level domains listed in the ICANN section of the public suffix list, there's an icann
property:
const parseResult = parseDomain("parse-domain.github.io");
const {subDomains, domain, topLevelDomains} = parseResult.icann;
console.log(subDomains); // ["parse-domain"]
console.log(domain); // "github"
console.log(topLevelDomains); // ["io"]
⚠️ domain
can also be undefined
const {subDomains, domain, topLevelDomains} = parseDomain("co.uk");
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // ["co", "uk"]
⚠️ ""
is a valid (but reserved) domain
The empty string ""
represents the DNS root and is considered to be valid. parseResult.type
will be ParseResultType.Reserved
in that case:
const parseResult = parseDomain("");
console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // []
API
🧩 = JavaScript export
🧬 = TypeScript export
🧩 export parseDomain(hostname: string | typeof NO_HOSTNAME): ParseResult
Takes a hostname (e.g. "www.example.com"
) and returns a ParseResult
. The hostname must only contain letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
import {parseDomain} from "parse-domain";
const parseResult = parseDomain("www.example.com");
🧩 export fromUrl(input: string): string | typeof NO_HOSTNAME
Takes a URL-like string and tries to extract the hostname. Requires the global URL
constructor to be available on the platform. Returns the NO_HOSTNAME
symbol if the input was not a string or the hostname could not be extracted. Take a look at the test suite for some examples. Does not throw an error, even with invalid input.
🧩 export NO_HOSTNAME: unique symbol
NO_HOSTNAME
is a symbol that is returned by fromUrl
when it was not able to extract a hostname from the given string. When passed to parseDomain
, it will always yield a ParseResultInvalid
.
🧬 export ParseResult
A ParseResult
is either a ParseResultInvalid
, ParseResultReserved
, ParseResultNotListed
or ParseResultListed
.
All parse results have a type
property that is either "INVALID"
, "RESERVED"
, "NOT_LISTED"
or "LISTED"
. Use the exported ParseResultType to check for the type instead of checking against string literals.
All parse results also have a hostname
property that stores the original hostname that was passed to parseDomain
.
🧩 export ParseResultType
An object that holds all possible ParseResult type
values:
const ParseResultType = {
Invalid: "INVALID",
Reserved: "RESERVED",
NotListed: "NOT_LISTED",
Listed: "LISTED",
};
🧬 export ParseResultType
This type represents all possible ParseResult type
values.
🧬 export ParseResultInvalid
Describes the shape of the parse result that is returned when the given hostname does not adhere to RFC 1034:
- The hostname is not a string
- The hostname is longer than 253 characters
- A domain label is shorter than 1 character
- A domain label is longer than 63 characters
- A domain label contains a character that is not a letter, digit or hyphen
type ParseResultInvalid = {
type: ParseResultType.INVALID;
hostname: string | typeof NO_HOSTNAME;
errors: Array<ValidationError>;
};
// Example
{
type: "INVALID",
hostname: ".com",
errors: [...]
}
🧬 export ValidationError
Describes the shape of a validation error as returned by parseDomain
type ValidationError = {
type: ValidationErrorType;
message: string;
column: number;
};
// Example
{
type: "LABEL_MIN_LENGTH",
message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
column: 1,
}
🧩 export ValidationErrorType
An object that holds all possible ValidationError type
values:
const ValidationErrorType = {
NoHostname: "NO_HOSTNAME",
DomainMaxLength: "DOMAIN_MAX_LENGTH",
LabelMinLength: "LABEL_MIN_LENGTH",
LabelMaxLength: "LABEL_MAX_LENGTH",
LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
};
🧬 export ValidationErrorType
This type represents all possible type
values of a ValidationError.
🧬 export ParseResultReserved
This type describes the shape of the parse result that is returned when the given hostname
- is the root domain (the empty string
""
) - belongs to the top-level domain
localhost
,local
,example
,invalid
ortest
type ParseResultReserved = {
type: ParseResultType.Reserved;
hostname: string;
domains: Array<string>;
};
// Example
{
type: "RESERVED",
hostname: "pecorino.local",
domains: ["pecorino", "local"]
}
🧬 export ParseResultNotListed
Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the public suffix list.
type ParseResultNotListed = {
type: ParseResultType.NotListed;
hostname: string;
domains: Array<string>;
};
// Example
{
type: "NOT_LISTED",
hostname: "this.is.not-listed",
domains: ["this", "is", "not-listed"]
}
🧬 export ParseResultListed
Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list:
type ParseResultListed = {
type: ParseResultType.Listed;
hostname: string;
domains: Array<string>;
subDomains: Array<string>;
domain: string | undefined;
topLevelDomains: Array<string>;
icann: {
subDomains: Array<string>;
domain: string | undefined;
topLevelDomains: Array<string>;
};
};
// Example
{
type: "LISTED",
hostname: "parse-domain.github.io",
domains: ["parse-domain", "github", "io"]
subDomains: [],
domain: "parse-domain",
topLevelDomains: ["github", "io"],
icann: {
subDomains: ["parse-domain"],
domain: "github",
topLevelDomains: ["io"]
}
}
License
Unlicense