Overview
Scrapers for JustLease.nl, DirectLease.nl, and 123Lease.nl that retrieve structured car leasing data by navigating paginated listings and extracting detail-level attributes from each offer.
Hybrid Scraping Architecture
Given the real-world diversity of the Dutch car leasing websites—ranging from fully server-rendered pages (e.g., JustLease and DirectLease) to JavaScript-dependent platforms like 123Lease—AutoCompare requires a hybrid scraping architecture to operate effectively at scale.
This approach strategically combines:
Scrapy + BeautifulSoup: To efficiently scrape structured data from HTML-based sites using fast, lightweight HTTP parsing. Selenium (Headless Chrome): As a fallback for JS-heavy or dynamic content sites where standard scrapers cannot reach critical data. This architecture ensures:
Scalability across thousands of listings per site. Resilience against site structure differences and partial page loads. Maintainability, where 90% of use cases remain efficient and fast via Scrapy, while Selenium is only invoked when necessary—minimizing overhead and complexity.
Technical Component
Input Configuration (per Website)
Data Fields Extracted
Scraping Logic and Behavior
System iterates through all paginated listing pages. Stops when "next" page is no longer found. Each listing’s summary is processed for available inline data. If needed, follows link to listing detail page. Extracts additional data (e.g., full specs, lease terms, variant). Applies fallback selectors if primary ones fail. Delay: 1.5–3s randomized delay between requests. Retries: Up to 3x retry per failed request. Headers: Dynamic User-Agent strings rotated from a pool. robots.txt: Checked before scraping any site. JavaScript Rendering (Fallback Mode) If listings or content fail to load with standard Scrapy/requests: Trigger Selenium for that domain only. Wait for DOM-ready state or specific element load. Extract page content and return to main pipeline. Validation & Logging
Missing fields (e.g., no mileage or price): marked null and flagged in logs. Each scrape run generates: Total listings with errors or missing fields Logs stored to cloud (e.g., S3, CloudWatch, Azure Log) Site-Specific Behavior Examples