Skip to content

Feedback

The final phase of the workflow focuses on refining your design and ML models to deliver long-term value. Iteration and monitoring help you:
Catch issues early before they impact users.
Adapt to changes in data, behavior, or market needs.
Build a product that balances technical performance with user satisfaction.
mlops-vs-design-cycle-iteration-monitoring.jpg

🎨Design Iteration

Early GPS systems often misled drivers, and similarly, bad AI interfaces can confuse users. Iterative design helps fix these problems before reaching users in the wild. Here are two methods to refine interface design: Heuristic Evaluation and Usability Testing.

Heuristic Evaluation

Assesses the interface against usability guidelines.
Visibility of System Status (#1): Designs should provide timely, appropriate feedback to keep users informed.
Ex: "You Are Here" indicators on mall maps help people navigate their current location and next steps.
heuristic-example-1-rbg-73.png

Usability Testing

While heuristic evaluation gives us great guidelines on what to lookout for but, nothing beats watching real users interact with your product. Usability testing helps us measure both by having end-users complete specific tasks while we observe their behavior and gather feedback.
Usefulness = Usability + Utility - Jakob Nielsen

Key Dimensions of Usability Testing

🎯 Performance Metrics

Effectiveness: Can users accomplish their goals?
Efficiency: How quickly can users complete tasks?
Error tolerance: How does the system handle user mistakes?
Satisfaction: How do users feel about their interaction?

🧠 Learning Curve

Learnability: How easily can users learn the system initially?
Memorability: How easily can users return after time away?

image.png

The beauty of usability testing lies in its directness – we learn about users from users through structured observation of real interactions. The key is ensuring your test scenarios are relevant to your actual users and align with their real-world goals.
Remember: It's far better to test, fail, learn, and test again than to face the cost of a bad design in production.

⚙️ ML Monitoring

Think of AI systems like athletes - they need constant health monitoring to stay in peak condition. Whether you're building a recommendation system or quality inspection AI, you need to watch three key areas:
Data Drift: Has input data changed from what you trained on?
Concept Drift: Are user needs and goals shifting?
System Health: Is your AI system performing efficiently?
image.png

Example: Phone Screen Quality

Let's explore each through a Phone Screen Quality Check system, where AI helps maintain product quality on a manufacturing line by inspects screens for defects on manufacturing line.
UX promises to Users:
Fast inspection times (<100ms)
Accurate defect detection
Clear feedback to operators
Easy override options when needed

1. Data Drift: Domain is Changing

When input data patterns differ from training data.
Example: New lighting in the factory is more powerful, the images taken are now different from the training data, so the model gets confused.
UX Impact: False alerts and missed defects reduce operator trust.
Key Metrics to Watch:
Input distributions: Changes in image properties
Error rates: Sudden changes in detection accuracy
Data quality: Image clarity and consistency

image.png


image.png

2. Concept Drift: Changes in "Correct”

This occurs when the link between the model's input and output changes, often along with shifts in what is seen as the correct output.
Example: New quality standards changes what a “high quality phone looks like”
UX Impact: Outdated standards lead to incorrect classifications and cause production delays
Key Metrics to Watch:
Override rates: Operators correcting AI decisions
Customer returns: New types of quality issues
Standard updates: Changes in quality requirements

3. System Health : Your AI's Vital Signs

The basic vital signs of your AI system's performance.
Example: Must process hundreds to thousands of screens per hour
Key Metrics to Watch:
Response time: ≤100ms per screen inspection
Resource usage: CPU/GPU utilization during peak production
UX Impact: Operators need real-time feedback to maintain production flow


image.png

Action Triggers: When to Take Action

In our phone screen quality inspection system, we set clear thresholds to maintain the promised user experience: fast, accurate quality checks that operators can trust.mises & Monitoring Thresholds
UX Promises & Monitoring Thresholds
Metric
Promise
Warning Level
Required Action
Response Time
Fast inspections
Above 100ms
Scale infrastructure
Model Accuracy
Reliable detection
Drops by 5%
Investigate and retrain
Override Rate
Easy control
Exceeds 30%
Review recommendation logic
Feature Adoption
Clear feedback
Below 20%
Conduct user research
There are no rows in this table
Pro Tip: These thresholds help maintain both technical performance and user trust. When accuracy drops below threshold, it's a signal to investigate and potentially retrain - either manually or through automation.
Model Accuracy and Retraining after Data Change
image.png
Pro Tip: Whether manual or automated, always validate changes with operators. The goal is maintaining their trust and workflow efficiency, not just model metrics!

Looking Forward 🚀

The best AI features feel invisible. Users shouldn’t think about the complexity behind the scenes; they should just think, “Wow, that was easy!”
Coming soon:
Scaling AI Systems
Advanced Monitoring Techniques
Stay tuned for more!
← Previous

Thanks for reading!

🚀 Let's Connect

I'm always excited to discuss the intersection of AI and user experience, or explore potential collaborations.
✉️ 🌎 Citizenship: US, Canada 📍 Location: Toronto, ON, Canada 💼
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.