Webscrapper for Octopus
Developing and supporting a customizable engine to collect data from multiple unstructured sources for Octopus Ventures, the UK’s largest venture capital firm.
• Back-End Development
• Data Aggregation Engine Development
• Python & Tornado for high-throughput parsing
• RESTful API design
• MySQL for structured storage
• HTTP & Google App Engine deployment
• DeepCrawl integration for site discovery
- Java Tech/Team Lead
- 2 Java Developers
- NodeJS Developer
- DevOps
Have
a similar project?
Get an Estimate
About the Client
Octopus Ventures is one of Europe’s largest venture capital teams, headquartered in London and New York with partners in San Francisco, Singapore, and China, investing from £350k seed rounds up to £25m in Series A.
Challenge
• Automatically track product listings, removals, and price changes across dozens of financial services sites, ensuring real-time data collection and analysis.
• Extract key fields such as bank name, product name, interest rate, min/max investment, notice period, and account type from unstructured web pages.
• Provide a reliable data feed for Octopus’s analytics teams to spot market trends in real time, enabling faster decision-making and data-driven insights.
Solution
Agile Project Management
Led requirements workshops and sprint planning to define the scraper’s scope, cadence, and error-handling policies, ensuring on-time delivery and adherence to professional standards.
Interactive Prototyping
Built UI mockups and data-flow diagrams on Google App Engine, demonstrating how parsed data would be collected, indexed, and exposed via API endpoints, and providing clear insight into system integration and seamless user experience.
Robust Parser Implementation
Developed a Tornado-based parser with comprehensive exception handling, logging failures, retrying transient errors, and alerting on source-structure changes, while ensuring performance optimization and smooth data processing.
Scalable Data Aggregation
Created a custom aggregation engine that normalizes and indexes scraped data providing a simple REST API for nontechnical teams to query and filter market information.
Business
Value
The Webscrapper engine accumulates pricing data on over 2,000 products from 20+ websites, enabling Octopus to visualize cost trends and inform investment decisions.
Delivered a generic, modular solution that can onboard new data sources and parsing algorithms, future-proofing Octopus’s competitive analytics capabilities and ensuring scalable data integration for long-term growth.
Have
a similar project?
Get an Estimate
Our Portfolio
Looking
for a Custom Solution?
Contact us