Web scraping is an activity when unstructured web pages become structured data.
We’re specialising in two types of crawling: — Targeted crawl — Broad crawl
Crawls are targeted when we know precisely: — The aimed site — What information we would extract from there — How we would find that information
For instance, targeted extraction of a wikipedia article:
Usual application for targeted crawls are: — E-commerce extraction and monitoring — Booking and reservations sites monitoring — Social networks and media So many others.
On the other hand, the broad crawl is how search engines index the internet. When we do broad crawl, we rip an entire website, store it in our database and then process it with our machine learning algorithms.
We created excellent tools for scraping and put them into our internal platform.
Our platform helps us:
— Completely hide bots. Make all crawling activity looks like human behaviour. — Emulate popular web browsers — Do crawling gracefully for target sites — Easily test results — Save all page sources in the database for further investigation, fixes and data re-extraction.