Product / Engineering Doc

Engineering wiki

Alex's Happy Doc

Files

Experiments

Tim Product Roadmap

Explore

Alex's Happy Doc

Product Recap (Shahil) - March 2022

SaaS Overview

Aibo, in it’s current iteration, is effectively code-less and instead utilizes various SaaS technologies and platforms to achieve the resultant product. Although this architecture has the benefits of simplicity and ease, scalability becomes the main challenge which will see Aibo evolve to overcome in time. This section provides an overview of the tools used in the process flow of Aibo:

Coda is used by clients as an order sheet service.

Landbot is where whatsapp bots are created and used to deliver these surveys to participants and collect their answers.

The demographic data of participants is stored in a Google Sheet, as well as the answers to each survey (in separate sheets).

This data is then cleaned in another sheet and visualized using Data Studio, which is delivered to the client as a report alongside an exported CSV of the survey results.

Finally, Simcloud is used to pay participants in airtime.

Coda

Used effectively as the front-end webpage, coda documents are used to communicate the details of the service between the client and Aibo. A client is directed to a coda document where they may fill out their details, agree on a pricing model, enter the questions they want to be included in a survey as well as the type of answers that should be presented, and indicate their target audience demographic.

Any data entered in the coda document is manually re-entered elsewhere in the process, and these documents (as well as the relevant client data) persist until manually deleted [Double check].

Landbot

Landbot is used to manage the interface with participants through whatsapp chatbots. The landbot account is where various bots are created and managed. Linked to a channel, when a participant sends a message to the Ai-Bo channel on whatsapp, a signup flow is initiated which walks them through the signup and stores their demographic details (upon given permission) both within the landbot account’s internal database, and later in a google spreadsheet. Upon request for a new survey from a client, after going through the Coda ordering process the demographic details of participants are queried in landbot to find which match the specified audience by the client. These participants are then sent a message template inviting them to participate in the new survey, which is facilitated by the creation of a new bot flow. The bot flow prompts participants who accept the survey invitation with questions hard-coded into the flow by inspection of the coda, and stores their answers in variables contained within landbot’s internal database. These variables are then automatically written to the google spreadsheet for that survey, matched to columns with the same name as the variable to be inserted.

Google Sheets

Acting as the database for the retrieval and manipulation of data in Ai-bo’s process flow, Google Sheets contains two main tables used in each survey:

The Users’ table which contains the demographic information of participants as they sign up to participate in surveys. This table persists throughout operations and builds up information with each new participant and survey. More verbose than the Users’ information contained in Landbot’s internal database, this table also includes columns for Users’ survey count and other meta-information. Note that this naturally leads to the possibility of inconsistencies between the two tables used for separate information. Ideally one source of truth should exist and be used, with the other assisting the operation of the main table.

The Survey sheet, one for each survey. This contains the information of respondents for a particular survey and their answers to the questions. Each question is it’s own field with the respondents’ answers being inserted cell-by-cell as they work through the survey. Once the desired amount of responses have been reached, the table is then used to clean the data and identify sources of fraud or improper answers. Automatic validation checks exist within table formulas to determine if a participant passes or fails and is based on a bad actor score assigned to each participant.

Once the data from the survey sheet has been cleaned, it is copied to another table to be used for visualisation, removing all personally identifiable information from participants, and this sheet is exported to a csv file for the client.

Data Studio

Visualisation is done within Data Studio, a program that has builtin integration with Google Sheets. A generic template is used to fill in relevant demographic information from the cleaned survey sheet. Apart from a few generic questions that can be visualised in a context-agnostic way, most of the visualisation must be manually created for each survey. To assist in gathering insight from open-text questions, a formula is used to extract the most used word(s) from all respondents’ answers. The report is then exported and sent to the client alongside the CSV.

Simcloud

Simcloud is used to pay respondents in airtime for their participation in bulk. All the users that passed the validation checks in the survey form have their number, cell provider, and recharge amount exported in to a CSV. This CSV is uploaded to the simcloud website with payment details in order to complete this bulk transaction. An API might exist to make this process far easier.

Potential Hazards

Database inconsistencies: Manual updating and copying of multiple databases could (and eventually will) lead to inconsistent databases which serve similar purposes. The landbot internal database contains a db of users and their details. This database is copied to a google sheet which is used for other information (bad actor scores). In a new survey, this information is re-entered by participants, creating data redundancy and inconsistency, which could have erroneous implications.

Database structure: A flat database is used for operations with no normalization applied. Insertion, deletion, and update anomalies limit the flexibility of the database and will lead to errors at a larger scale.

Participant compensation: Some automation is used to pass/fail participants for a survey but ultimately discretion is used to validate the decision. Fraudulent participants can still game this system, while honest participants might accidentally be failed. Withholding compensation in this case might have adverse implications, especially at scale.

Referrer compensation: Arising from database structure and consistency issues, payment to a referrer is met with many issues, but if this is promised then it’s important that it is seen to.

Lack of answer parser: Without a program and API managed by the team, participant survey answers are parsed in Landbot with minimal functionality to ensure correctness or edit the data. This is instead done in the google sheet, which should not be responsible for data correctness. An example of this is when a participant’s telephone number is provided, oftentimes the country code (+27 for SA) is left out and this leads to errors, which is patched within the google sheet. Furthermore, validation checks and editing for correctness within a carries a high risk of inconsistencies arising throughout the database.

Missing systems/protocol for when the number of responses requested by a client is not met: While not strictly an engineering hazard as opposed to a business problem currently, systems should eventually be translated into code in this will be a problem when that happens. In general, more systems need a defined structure that can be translated in to code for an automated protocol.

API Endpoints

Coda: ✅

Coda has an API that is suitable for all purposes needed to scale and automate systems within Ai-bo. Most necessary is the ability to make an API call to a table containing a list of survey questions or other details entered by the client, which the API supports. All client details and survey questions can be inserted into a database for further processing using the API.

This API also supports the possibility of extending functionality to send information back to the client via the Coda document, if the client wants to be able to get partial survey information before it is delivered to them by the Ai-bo team.

For authentication, a user token is required to make API calls (this means a service requires a Coda account with relevant permissions).

⁠

https://coda.io/developers/apis/v1#section/Introduction⁠

⁠

Landbot: ✅

The landbot API is able to retrieve data from a landbot account as well as manage channels and agents on that account. However, it is limited in terms of its ability to integrate with and configure the bots themselves, which will need to still be created within landbot.

The API can create and send message templates, as well as react to incoming messages through MessageHooks (similar to Webhooks).

Although there is no direct database integration capabilities other than google sheets, stored customer data for a bot is retrievable via the API.

⁠

https://dev.landbot.io/api/index.html⁠

⁠

Google Sheets: ✅

The Google Sheets API has comprehensive documentation and can be used via API to the same extent that a user can manipulate the interface.

⁠

https://developers.google.com/sheets/api/quickstart/quickstarts-overview⁠

⁠

Simcloud: 🟧

Simcloud currently does not have an API to use for automated payments.

UPDATE: Although simcloud itself does not have an API, its parent company does, which is what will be used in the future.

Pain points and short term wins

What the Aibo team finds super ouch at the moment:

Manually creating landbot flows for each survey.

No generic flow blueprint for handling different types of questions/responses makes this take a significant amount of time.

This is also limited in its testing scope. Errors or unintended behavior are not automatically caught, and mistakes in linking questions to the wrong tables or other subtle database interfacing errors are possible and likely to occur. By abstracting the manual database interaction, these errors could go undetected for quite some time and result in undesirable outcomes.

Data cleaning in Google Sheets.

Although much of this is automated and the team is learning from past surveys, this method is not scalable as the team and the customer base grows. This task should not take long at all and can be entirely automated by having an intermediate service between the landbot and the database. From an architecture point of view, cleaning the data should not be the role of the database, nor should it be conducted within the same table (risk of inconsistent/incorrect data), and should instead be handed off to another service. A possible cause of issues is that sheets are designed to be human-interpretable, which often results in data being more difficult to manipulate.

Data visualisation in Data Studio.

Another area that has a limited generic template but ultimately requires far too much manual work for each survey. This will prove to be a significant bottleneck as the company grows, and an automated solution should be found with haste.

An automated solution would also improve the consistency of the quality of results produced by Aibo.

This area has significant potential for meaningful results by leveraging machine learning.

Addressing these, the short term priorities of Aibo are, accordingly:

Create an automated or reusable bot flow.

With a proof of concept already created, the possibility of utilizing an API server with a webhook to integrate with a cyclical bot flow means that this bot can be reused and not manually created for each new survey. The PoC uses hardcoded survey questions asked in a random order, that is not dependent on the Landbot.

RESTful APIs, by definition, are stateless, meaning this ordering can’t be specifically requested by Landbot, and a workaround will have to be developed (I am not experienced with APIs or web dev, so someone else with more context can weigh in).

A far more short-term alternative to a fully automated bot flow, if that option is deemed to take an extensive amount of time and effort, is to work on a generic bot flow in landbot that can be copied and tweaked slightly between surveys.

Restructure database design.

Although not directly addressing the issue of data cleaning, restructuring the database will allow for the cleaning process to be conducted by an automated system, and with far less errors and inconsistency. Database restructuring and moving away from Google Sheets is necessary in order to further implement systems which rely on database access.

Implement intermediary services between SaaS API endpoints.

With a coherent database, services can then be used to manipulate the data within them across the various SaaS technologies using their API endpoints. As the company grow, the services used will grow too, becoming unsuitable for manual processing of data. This should be cloud-based. This can be used to insert data from the Coda form into the database, or retrieve payment info for participants from the database and pay them automatically, as examples.

These services can also be used to process the data without the use of certain SaaS, but that turns it into a more autonomous application that may be a more long-term goal.

Speed up data visualisation.

Lastly, addressing the problem of data visualisation. This task might not have any short term solutions other than to move to a software or service that is faster and easier to use than Data Studio, like Tableau.

A very long term goal would be to automate this step entirely, but that would require an extensive machine learning implementation (which I am most excited for out of everything so I hope we do it).

Not directly related to a particular problem the team is facing, but could definitely improve the product:

Implementing (3) gives way to allowing more client interaction with the process, being able to gain insight before the visualisation is complete, or displaying information on how many respondents have answered so far

Enriching respondents’ answers with a record of their answers to past questions (as the product grows, so too does the historic data for each participant, which could possibly be leveraged). With a robust database of past questions, machine learning could detect if a current client’s question is similar enough to pull data from that record.

Web UI for respondents’ to interact with their profile and business incentives - leaderboards, earnings to date, question history, etc.