Refine Call Topics/Categories

L1-L2 Topic Prediction Tables

L1 Call Topic Prediction Table

Where to find predictions

Predictions for the L1 call topic model can be found in the following Big Query table inside the pg-zinnia-data-production-v1 project:
pg-zinnia-data-production-v1.zowie_ml.topic_model_predictions

Technical Context

This table has predictions for all Call ID’s found in pg-zinnia-data-production-v1.call_center_v1.call_transcriptions_ner table.
There is also an that is pulling new transcriptions daily and updates the topic_model_predictions table with new L1 call topic predictions. Production Airflow DAG can be found
.
The model used inside the DAG was deployed to .

Table Context

Call topic model started out as outputting the top call topic prediction for a given transcription. On October 25th, 2024, we updated the model to now output the top 3 call topic predictions (link to
). This means that:
For backward compatibility reasons we kept the original topic_pred_id field which corresponds to the top call topic prediction.
With the model update, we added 6 new fields: topic_pred_id_n, and topic_pred_id_n_name. Note that topic_pred_id and topic_pred_id_1 are essentially predicting the same thing.
Another full backfill on all historical transcription was also done after the model update. This means that all call IDs have duplicate entries in the table: 1 row for predictions of the top call topic, 1 row for predictions with top 3 call topics.
In order to get predictions for the most recent model, filter the topic_model_predictions table to fetch data created past 2024-10-26. Ex:
SELECT * FROM pg-zinnia-data-production-v1.zowie_ml.topic_model_predictions
WHERE created_at >= “2024-10-26”
L1 Topic Model Prediction Table
Field name
Type
Description
topic_pred_id
Top predicted topic id
topic_pred_id_1
Top predicted topic id
topic_pred_id_2
2nd top predicted topic id
topic_pred_id_3
3rd top predicted topic id
topic_pred_id_1_name
Name of top predicted topic id
topic_pred_id_2_name
Name of 2nd top predicted topic id
topic_pred_id_3_name
Name of 3rd top predicted topic id
deployed_model_id
ID of the VertexAI model in production that made the predictions
model
Full URI of the model’s endpoint
model_display_name
Name of the VertexAI model in production that made the predictions
model_version_id
Version of the VertexAI model in production that made the predictions
created_at
Timestamp at which the prediction was made (UTC)
There are no rows in this table

L2 Call Topic Prediction Table

Where to find predictions

Predictions for the L1 call topic model can be found in the following Big Query table inside the pg-zinnia-data-production-v1 project:
pg-zinnia-data-production-v1.zowie_ml.topic_model_l2_predictions

Technical Context

This table has predictions for the more specific L2 topics generated by clustering algorithms. All the L2 topics can be found in the following .
This model runs as a daily batch prediction job and updates the topic_model_l2_predictions table. Implementation details can be found in this
.
Contrary to the L1 topic model, this model was not deployed to Vertex AI. Instead, the job loads the model from GCS and makes offline predictions.

Table Context

With the way L2 model is coded, there is dependency between L1 and L2 topics.
Predictions for this L2 model directly depend on the L1 topic_model_predictions table. This is why we also see L1 predictions inside the L2 table.
This batch prediction job was created on October 23rd 2024 and no backfill job has currently been run on historical transcripts. This means we only have L2 predictions for Call ids on October 23rd 2024 forward.
After realizing that the L2 topic names were not very clean and meaningful, an update was made to have refined L2 call topic names which can all be found in this .
The update was made on 2024-10-31 so the newer L2 topic names are available from that date forward.
L2 Topic Model Prediction Table
Field name
Type
Description
L1_raw_topic_id
Top prediction for L1 model taken from L1 prediction table
L1_raw_topic_name
Raw name of top topic prediction for L1 model
L1_updated_topic_name
Updated name of top topic prediction for L1 model
L2_topic_id
Id for the top L2 predicted topic
L2_topic_name
Name of the top L2 predicted topic
L2_preds_topic_seq
Prediction id for each text chunk within the transcript
chunks_pred_probs
Probability of prediction id for each text chunk within the transcript
model_used
Name of model used for L2 predictions
created_at
Timestamp at which the prediction was made (UTC)
There are no rows in this table

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.