Configuring experiments:
Ability to support n number of buckets:
For eg: if we choose 2 buckets A/B
A will be the control. Control is generally the current version of your user experience.
B would be the variant. The sum of the variants would always be 100% and the default distribution would be even i.e. a 50-50 bucket.
However there should exist an ability to change the distribution percentage. Depending upon the percentage of each bucket, traffic would get distributed in accordance.
Creating and managing variants:
Ability to manage variants allowing for variant name, and an optional payload. Each variant may / may not carry a payload. The payload could be used to change the form or path on the implementing experiment. For eg, my changing the payload I might change the colour of a button from red to green.
Dialling traffic distribution:
Ability to dial traffic up. For eg one can start with a 80/20 with 80 on control and over time there should be an ability to dial traffic to variant by reducing the control. Dialling down is not in scope of v1.
Segment selection :
Ability to choose the segment on which the experiment would be executed. A segment essentially is a cohort of users. Eg: All users, New users, Repeat users, custom defined users.
Segment size exposure :
Ability to select the percentage of users to which the experiment will be exposed. Meaning open up the experiment to 30% of new users or 100% of repeat users. Where new users and repeat users are segments.
Sticky buckets:
Users once allocated to a particular bucket would continue to stay inside that bucket until the experiment has been concluded.
Configuring and measuring success metrics:
Ability to configure success metrics for a given experiment. Ability to configure watch metrics (l1)
Ability to measure the movement on those metrics through simple user interfaces.
Exclusivity
Ability to reserve users to one experiment only and not allow them to be part of any other on going experiments.
Statistical significance:
Until such time there is enough evidence collected an experiment will remain inconclusive. Ability to determine whether there is enough evidence collected such that an experiment can be derived off.