
Portrait-Robot de Zyte.com
Tous les jours, des centaines de crawlers collectent des données sur les sites des éditeurs radio, TV, presse online & print. Ils opèrent pour les activités de veille, des analyses, des résumés, fournissent le big data de qualité indispensable aux IA …
Portrait-Robot de ZYTE.com, “the best place to host Scrapy spiders”
Sur son site, Zyte propose un service « News & Article data : Accurate articles and news data from global publishers and the largest news websites in the world”.

Zyte démontre sa bonne connaissance des sites de news : « Articles and news data comes in all shapes and sizes. We get it all”.
1-“Mainstream broadcast : These are large organizations that have dominated the news world for many years. They include TV networks, newspapers, press releases, and radio stations that are widely recognized and trusted by the public”.
2-“Industry and vertical : These websites focus on specific industries or niches, providing news and information that is relevant to professionals in those fields”.
3-“Alternative media and independents : These websites operate outside of the traditional, corporate-owned media landscape. They may provide alternative perspectives on news and events”.
4-“Groups, individuals, and influencer : These web pages are created and run by individuals or groups, such as bloggers, vloggers, or podcasters”.
5-“Online aggregators : These websites collect and curate crucial news data from various sources and present them to users in a single location”.
6-“News blogs : These websites are dedicated to latest news articles and opinion, often with a specific focus or niche”.
7-“Video news : Video news websites provide news coverage through video content, which can be more difficult to collect and parse data from than text-based news”.
8-“Social media : Social media platforms where journalists and publications source stories and where many brands self-publish and promote their content”.

À quoi servira ce scraping ?
“Brand monitoring & reputation management , Market research , Content optimization (SEO) , News aggregation , Tackling misinformation , Building AI models and algorithms , Creating dashboards , Ad and affiliate tracking”.
La factorisation du scrap donne un prix de service assez intéressant, que ce soit pour gérer son scraping ou pour l’acquisition de données déjà extraites et mises en forme.
On notera que IPXO, une place de marché pour louer des adresses IPs, se présente comme « un partenaire de confiance en matière de location d’IPs pour les entreprises dans plus de 75 secteurs d’activité », et avance quelques services partenaires dont « Zyte ».

D’ailleurs, parmi les différents usages possibles des ips de location, on peut trouver le « data crawling and data extraction » :
“With the support of professional IP leasing services provided by IPXO, a leading web exfiltration company in Europe can continue introducing innovations and improving the quality of services to guarantee quick and efficient data crawling and data extraction”.
2. Estimated Revenue, Valuation, employee data
- IPXO.com estimated annual revenue is currently 8M$ per year.
- Employee : 62
- Zyte (ScrapingHub) estimated annual revenue is currently 27M$ per year.
- Employee : 171
3. stats on Botscorner

