The GCP architecture diagram is separated into three parts. There is the Public App Engine Service, Internal App Engine Services, and the GCP private IPs.
Public App Engine Service
User Machines can login and search through the Query endpoint that also does SSO authentication and Query Processing. After which it calls the Query processing app engine services that processes/search API calls and returns the search results. Query Endpoint → Query processing App engine Services, SQL, and Kubernetes Search Index Datasource events app engine service is is called from the API from the Glean client’s user machine. This returns query and visits activity events to provide relevant search results to the user. No information is returned to the user. It also needs to be authenticated. It also accepts web hooks from the cloud application (slack, salesforce, etc). Must also be authenticated using an app secret. Not SSO because it’s not authenticated as a user. Internal App Engine Services
Query processing App Engine services: processes /search API calls and returns the search results Crawler App Engine Services: only called from Google Cloud Scheduler and Google Cloud Tasks GCP Private IPs
Identity information (enforce permissions in search result): users in each application roles/groups those users are members of document content from SQL → parse and process it using NLP and ML pipelines → entity concepts, synonyms, antonyms, important phrases, and other artifacts inferred form the parsed content processes the content from: Document store + Cache/ SQL : Content Connector Handlers/Crawler App Engine Services: Teams/365/Salesforce/other app servers populated search index hosted in the Kubernetes cluster inverted index that maps a word to the id of the documents the word is a part of metadata used to populate search result text content needed to generate the snippets in the search results stored in the index So the User needs to login. If it’s their first time it goes through the identity Connector Handlers and they’re validated against the Identity and Permissions store using the Scio App. Otherwise if it’s not their first time they are just signed in through SSO authentication and sent to the Query Endpoint. The global-web-app is no longer involved. It’s now the user’s specific company Cloud project and Query endpoint. So their query is then parsed and sent to the Query Processing App Engine Services. After which it goes to the search index and returns the result. The search Index is created from Ranking quality pipeline that uses data from Cloud Storage (ML and NLP) and the Document store+Cache to create its search Index. The search index is also created from the Indexer which is built from the Doc builder pipeline from the Pub-sub system that pulls data from the third party cloud services like Teams/365/salesforce.