PRD Web Scraping

icon picker
Monitoring Dashboard & Logs

Technical Specification specifically for the Scraper Modules

Overview

This module provides real-time visibility into the health, stability, and performance of the AutoCompare scraping and data processing pipeline. It enables both developers and stakeholders to track the status of scraping jobs, understand trends over time, and identify anomalies that may require investigation. By offering visual and downloadable insights, it supports proactive maintenance and fast incident response.

Technical Component

Component
Technology / Tool
Purpose
Logging System
Python logging module
Records job metadata, errors, field anomalies
Log Aggregator
AWS CloudWatch / Azure Monitor / Logstash
Centralizes logs across scraping modules
Metrics Collector
Prometheus (optional)
Tracks scrape counts, duration, and error rates
Dashboard
Grafana / Kibana / Streamlit
Visual UI to monitor pipeline behavior and trends
Alert Engine (optional)
CloudWatch Alarms / Grafana Alerts
Notifies team of anomalies or failures
There are no rows in this table

Input/Output Specification

Type
Format
Description
Input
Logs, JSON metadata
Job summaries, scraper results, anomaly events
Output
Visual charts, alerts, exported logs
Displayed via dashboard or downloadable CSVs
There are no rows in this table

Metrics Tracked

Job Execution Status: Success / Failure per site and run.
Scraped Records Count: Number of valid listings per site.
Error Rate: Percentage of listings with missing critical fields.
Scrape Duration: Time taken for each scraping run.
Retry Count: How often retry logic was triggered per domain.

Time-Series Insights

Daily/Weekly Scrape Volume Trends
Error Rate Trends per Site
Duration Performance over Time
Anomaly Markers (e.g., drop in listings from 1000 → 50 overnight)

Capabilty

Feature
Description
Job Status View
Tabular display of past scraping runs, status, and data volume
Anomaly Alerts
Visual or email/Slack alerts for unusual patterns
Export Logs
CSV or JSON export of logs and job data
UI Panel (Optional)
Web dashboard to browse scraping sessions and anomalies
Filterable Logs
Searchable logs by site, field name, timestamp
Graphical Analytics
Time-series charts of scraping activity, record count, error frequency
There are no rows in this table

Validation & Alerting Rules

Condition
Action
0 listings scraped from any site
Raise alert: "Zero Listings"
Error rate > 30%
Raise alert: "Field Extraction Error Surge"
Drop in listing count > 70% vs previous day
Flag as anomaly, display in dashboard
Run time > expected threshold (e.g., 15 min)
Log as performance warning
There are no rows in this table

Access & Availability

Admin-only access to dashboard or logs UI (if exposed via browser)
Role-based access for log exports
Logs retained for at least 30 days for debugging

Extensibility Strategy

Can integrate with Sentry / Datadog for exception monitoring
Easily connected to notification systems (Slack, PagerDuty)
Dashboard can support additional modules in future (e.g., compare site performance)

Example Alerts

🚨 JustLease.nl failed to return results - status code 500
⚠️ 123Lease.nl error rate exceeds 40% for three consecutive days
📉 DirectLease record count dropped from 852 → 27
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.