JSPM

table-scraper

1.0.3
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 153
  • Score
    100M100P100Q79104F
  • License MIT

Easily scrape any website's html table data into an array of JavaScript objects.

Package Exports

  • table-scraper

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (table-scraper) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

build status

table-scraper

Simple utility for scraping data from html tables on a given website into a list of javascript objects.

installation

npm install --save table-scraper

methods

get(url)

Returns a promise that resolves to a list of tables found on the input website. HTML table rows are converted to javascript objects

For example: suppose the website at http://www.some-fake-url.com consisted of the following:

<html>
<head>
</head>
<body>
  <table>
    <thead>
    <tr><th>State</th><th>Capital City</th><th>Pop.<th></tr>
    </thead>
    <tbody>
    <tr><td>Minnesota</td><td>Saint Paul</td><td>3</td></tr>
    <tr><td>New York</td><td>Albany</td><td>Eight Million</td></tr>
    </tbody>
  </table>
</body>
</html>

The following code would result in the array displayed below:

var scraper = require('table-scraper');
scraper
  .get('http://www.some-fake-url.com')
  .then(function(tableData) {
    /*
       tableData === 
        [ 
          [ 
            { State: 'Minnesota', 'Capital City': 'Saint Paul', 'Pop.': '3' },
            { State: 'New York', 'Capital City': 'Albany', 'Pop.': 'Eight Million' } 
          ] 
        ]
    */
  });

Important to note: the tableData returned is a list of lists. So, if some-fake-url.com contained three tables, the structure of the response would look like

[
  [ /* list of data from the first table */ ],
  [ /* list of data from the second table */ ],
  [ /* list of data from the third table */ ]
]

If a table has NO headings (no <th> elements), the object keys are simply the column index:

[
  {'0': <first column data of first row>, '1': <second column data of first row>, .... }
]
Contributing

Feedback/PRs welcome! Please include tests around any new functionality, and make sure existing tests pass:

npm test
Credits

The following node libraries make this utility super easy: