Open Questions
During the preparation of these requirements, several points arose that require clarification with the client or decisions to be made with the development team. Addressing these open questions will ensure the project meets expectations and runs smoothly:
Update Frequency and Schedule: How often does AutoCompare expect the data to be refreshed? Daily (which is assumed), or even multiple times per day? This affects how we configure the scheduler and also impacts load on source websites. Scope of Websites: Are JustLease, DirectLease, and 123Lease the only sites to scrape for the initial phase? Does the client anticipate adding more Dutch lease websites (or perhaps dealerships) soon? Knowing this helps ensure the system is ready for additional sources. Data Change Notifications: Does AutoCompare need to highlight when a price changes or a new car appears/disappears, or is it sufficient to just always show the current data? In other words, do we need to track historical changes in the data, or only maintain the latest snapshot? (This could be a future feature but not in current scope unless needed.) Use of Existing APIs: Do any of these target websites offer official APIs or data feeds for their listings? If so, using those might be more reliable than scraping (subject to licensing). If not or if not free, we proceed with scraping as planned. Matching Granularity: Clarification on the matching logic – should we match offers only by model, or also consider lease terms? For example, if two sites offer Audi A3 but one is a 36-month lease and the other 48-month, do we consider them “the same car” for comparison? Likely yes (the user can then see the different terms), but we should confirm the comparison logic desired on the platform side. Image Handling: Should images be stored by AutoCompare or hot-linked from the source? We plan to use URLs to the source images. We should confirm this is acceptable (most likely yes, the front-end can just use the URL). If the client prefers to have images cached (to avoid broken links or for faster load), we might need to download and host them – which is a different approach with more storage needed. Output Format Details: In what format exactly does the downstream system expect the data? We assume a JSON with a certain structure. If the AutoCompare development team has a preferred schema or wants the data in CSV or a database, that’s important to know. (We’ve provided a sample schema in the Appendix to facilitate this discussion.) Integration Method: How will the AutoCompare platform ingest the data? Will it pull data from an API we provide, or should the scraping platform push data to their database? If an API is needed on our side, we might need to implement a simple service. If they prefer to handle integration, maybe just dropping a JSON file in a cloud storage that they have access to is sufficient. Technology Preferences: Does the client have any preference between AWS or Azure (or another cloud) for deployment? We assume AWS/Azure are both options. This might depend on the client’s existing infrastructure. Also, if the client’s team will maintain the system, their familiarity matters (e.g., if they are more comfortable with Azure, we’d lean that way). Licensing and Terms Compliance: Has the client secured any permission or have any concern regarding scraping these sites? While these are public, we want to ensure AutoCompare is comfortable with the legal standpoint. If needed, we might incorporate a polite note or throttle to ensure we don’t get blocked by the sites. Open question if the client wants to possibly approach the data sources for partnerships (not in our scope, but affects how stealthy we need to be with scraping). Post-launch Maintenance: Who will be responsible for maintaining the scrapers once live? Will it be the same development team or handed off to the client’s in-house team? This will affect how much we focus on ease-of-use (for example, if handing to non-developers, maybe providing a simple admin interface or detailed guide becomes important). It would be good to know the plan so we can prepare training or documentation accordingly (in Phase 7). Error Escalation: In case of repeated scraping failures (say a site redesigns and our scrapers break until fixed), is there an agreed SLA for data freshness? For instance, is it acceptable that a site’s data is missing for 1-2 days until fixed? Setting this expectation helps prioritize how quickly the team needs to react to alerts. Future Features: Are there any anticipated features beyond what’s described? (For example, maybe AutoCompare might later want analytics on the pricing data, or to integrate user reviews, etc.) While not directly related to the scraping, knowing the longer-term vision can ensure our design is not short-sighted (e.g., if they want to track price trends, we might opt to store historical data from the start). Testing Environment: Will the client provide any test environment or staging for the platform to integrate our data before going live? We assume yes – we’d like to test the end-to-end flow (scrape -> feed to platform) safely before real users see it. Notification Preferences: If we implement alerting, how does the client want to receive them? Email to the devops team? Perhaps integrating with a Slack channel or other tool? We can set up multiple channels if needed.