Package Exports
- table-scraper
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (table-scraper) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
table-scraper
Simple utility for scraping data from html tables on a given website into a list of javascript objects.
installation
npm install --save table-scrapermethods
get(url)
Returns a promise that resolves to a list of tables found on the input website. HTML table rows are converted to javascript objects
For example: suppose the website at http://www.some-fake-url.com consisted of the following:
<html>
<head>
</head>
<body>
<table>
<thead>
<tr><th>State</th><th>Capital City</th><th>Pop.<th></tr>
</thead>
<tbody>
<tr><td>Minnesota</td><td>Saint Paul</td><td>3</td></tr>
<tr><td>New York</td><td>Albany</td><td>Eight Million</td></tr>
</tbody>
</table>
</body>
</html>The following code would result in the array displayed below:
var scraper = require('table-scraper');
scraper
.get('http://www.some-fake-url.com')
.then(function(tableData) {
/*
tableData ===
[
[
{ State: 'Minnesota', 'Capital City': 'Saint Paul', 'Pop.': '3' },
{ State: 'New York', 'Capital City': 'Albany', 'Pop.': 'Eight Million' }
]
]
*/
});Important to note: the tableData returned is a list of lists. So, if some-fake-url.com
contained three tables, the structure of the response would look like
[
[ /* list of data from the first table */ ],
[ /* list of data from the second table */ ],
[ /* list of data from the third table */ ]
]If a table has NO headings (no <th> elements), the object keys are simply the column index:
[
{'0': <first column data of first row>, '1': <second column data of first row>, .... }
]Contributing
Feedback/PRs welcome! Please include tests around any new functionality, and make sure existing tests pass:
npm testCredits
The following node libraries make this utility super easy: