Glean Collaboration Exercise

icon picker
Test the Data ingestion flow

Diagram:

Screen Shot 2023-03-21 at 3.41.59 PM.png
Screen Shot 2023-03-21 at 2.32.23 PM.png

Steps:

Overview
We can verify different parts of the flow by going through the logs and seeing if the data is flowing correctly.
I would use the , , and the
See if when we created the new doc in confluence if that triggered a web hook event (crawl hints and trigger the system to make the API call to fetch the modified content)
keep track of the docId , you’ll need it later
Was a scheduled Content cron job then executed
check the configuration for the frequency of the crawls and if the enterprise app has a push notification API. This could also lead to a delay for data sync.
Then was that data sent to the Task Queue where the rest of the crawl was scheduled
Was the secret then successfully verified using the secret store so that the content connector handler was called
Check that the credentials aren’t tied to IT admins who have already left the company or to a person who has left the company
Did the content connector handler make an API call to grab the rest of the data from Atlassian
Did that make it into the Document Store + Cache
GCP Audit Logs: resource.type = "cloudsql_database" AND resource.labels.database_id="DATABASE_ID" AND log_id("cloudaudit.googleapis.com/activity"
or ask if IT admin can can do a sql query search for the doc ID in these stores
Did that in turn trigger the Pub-Sub system & the ranking quality pipeline
After the Pub-Sub system was triggered did the Doc go through the DocBuilder pipeline, to the Indexer, and into the Kubernetes Search Index
GCP Audit Logs: resource.type = "gke_cluster" AND log_id("cloudaudit.googleapis.com/activity"
or ask if IT admin can can do a sql query search for the doc ID in these stores
Did the crawler kick off a crawl so that the doc moved into the ACLed Cloud Storage so that the machine learning models could be populated
GCP Audit Logs: resource.type = "gcs_bucket" AND resource.labels.bucket_name="BUCKET_NAME"
or ask if IT admin can can do a sql query search for the doc ID in these stores

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.