The Seven Truths About Scraping

These are the Key Truths About Scraping the Interwebs for Data

Scraping is legal in most cases.

But not all cases. If scrape data that is secured in any way, you may be on shaky ground.

No one wants you to scrape their data.

They worked really hard to create their data; they generally don’t want it taken and repurposed for any reason and even if that reason is in the spirit they intended.

Scraping is a brittle science.

Let’s face it; you’re building a data interchange model on sand because no one is obligated or wed to an HTML structure.

Scraping recipes are almost always one-offs.

You might be able to compartmentalize portions of a scraping recipe, but for the most part, you start from scratch for every automation process.

Scraping is not sustainable.

Many variables conspire to ensure what works today may not work tomorrow.

Nothing about scraping is strategic; it is purely a tactical measure to gather data.

You generally won’t know when a scrape fails.

There are generally no monitoring or reporting features built into scraping. As such, you will not know when it fails.

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.