The final phase of the workflow focuses on refining your design and ML models to deliver long-term value. Iteration and monitoring help you:
Catch issues early before they impact users. Adapt to changes in data, behavior, or market needs. Build a product that balances technical performance with user satisfaction. 🎨Design Iteration
Early GPS systems often misled drivers, and similarly, bad AI interfaces can confuse users. Iterative design helps fix these problems before reaching users in the wild. Here are two methods to refine interface design: Heuristic Evaluation and Usability Testing.
Heuristic Evaluation
Assesses the interface against usability guidelines.
Visibility of System Status (#1): Designs should provide timely, appropriate feedback to keep users informed.
Ex: "You Are Here" indicators on mall maps help people navigate their current location and next steps. Usability Testing
While heuristic evaluation gives us great guidelines on what to lookout for but, nothing beats watching real users interact with your product. Usability testing helps us measure both by having end-users complete specific tasks while we observe their behavior and gather feedback.
Usefulness = Usability + Utility - Jakob Nielsen Key Dimensions of Usability Testing
🎯 Performance Metrics
Effectiveness: Can users accomplish their goals? Efficiency: How quickly can users complete tasks? Error tolerance: How does the system handle user mistakes? Satisfaction: How do users feel about their interaction? 🧠 Learning Curve
Learnability: How easily can users learn the system initially? Memorability: How easily can users return after time away? The beauty of usability testing lies in its directness – we learn about users from users through structured observation of real interactions. The key is ensuring your test scenarios are relevant to your actual users and align with their real-world goals.
Remember: It's far better to test, fail, learn, and test again than to face the cost of a bad design in production.
⚙️ ML Monitoring
Think of AI systems like athletes - they need constant health monitoring to stay in peak condition. Whether you're building a recommendation system or quality inspection AI, you need to watch three key areas:
Data Drift: Has input data changed from what you trained on? Concept Drift: Are user needs and goals shifting? System Health: Is your AI system performing efficiently? Example: Phone Screen Quality
Let's explore each through a Phone Screen Quality Check system, where AI helps maintain product quality on a manufacturing line by inspects screens for defects on manufacturing line.
UX promises to Users:
Fast inspection times (<100ms) Accurate defect detection Clear feedback to operators Easy override options when needed 1. Data Drift: Domain is Changing
When input data patterns differ from training data.
Example: New lighting in the factory is more powerful, the images taken are now different from the training data, so the model gets confused.
UX Impact: False alerts and missed defects reduce operator trust.
Key Metrics to Watch:
Input distributions: Changes in image properties Error rates: Sudden changes in detection accuracy Data quality: Image clarity and consistency 2. Concept Drift: Changes in "Correct”
This occurs when the link between the model's input and output changes, often along with shifts in what is seen as the correct output.
Example: New quality standards changes what a “high quality phone looks like”
UX Impact: Outdated standards lead to incorrect classifications and cause production delays
Key Metrics to Watch:
Override rates: Operators correcting AI decisions Customer returns: New types of quality issues Standard updates: Changes in quality requirements 3. System Health : Your AI's Vital Signs
The basic vital signs of your AI system's performance.
Example: Must process hundreds to thousands of screens per hour
Key Metrics to Watch:
Response time: ≤100ms per screen inspection Resource usage: CPU/GPU utilization during peak production UX Impact: Operators need real-time feedback to maintain production flow
Action Triggers: When to Take Action
In our phone screen quality inspection system, we set clear thresholds to maintain the promised user experience: fast, accurate quality checks that operators can trust.mises & Monitoring Thresholds
UX Promises & Monitoring Thresholds
Pro Tip: These thresholds help maintain both technical performance and user trust. When accuracy drops below threshold, it's a signal to investigate and potentially retrain - either manually or through automation.
Model Accuracy and Retraining after Data Change
Pro Tip: Whether manual or automated, always validate changes with operators. The goal is maintaining their trust and workflow efficiency, not just model metrics!
Looking Forward 🚀
The best AI features feel invisible. Users shouldn’t think about the complexity behind the scenes; they should just think, “Wow, that was easy!”
Coming soon:
Advanced Monitoring Techniques Stay tuned for more!
🚀 Let's Connect
I'm always excited to discuss the intersection of AI and user experience, or explore potential collaborations.
✉️
🌎 Citizenship: US, Canada
📍 Location: Toronto, ON, Canada
💼