For every enterprise data source that is connected to Glean, we run content and identity connectors in the cloud project that fetch data and permissions map from that source. These connectors run periodically and also in response to webhook events. The connectors store the fetched information into Glean’s document and identity store. There’s a dataflow pipeline which reads the newly fetched content and stores them in our secure search index.
In terms of data flow, the code running inside the GCP project fetches the content from the enterprise applications over HTTPS over the public web (if the application is hosted on the Internet e.g. Google Drive) or over the private internal connection (if the application is hosted inside the customer’s network e.g. on-premises Jira).
Glean Build Process
A central locked down build service (implemented using Google Cloud Build) periodically reads code from trusted branches on Github. Builds the relevant docker containers and signs using binary authorization. The service is locked down in that release engineers can only trigger a build and not modify the pipeline. The service also has authorization policies to only allow engineers to trigger release builds and deploys.
The central deployment workflow only has the capability to invoke a specific Cloud Function in the customer’s GCP project. The Cloud Function that can be invoked takes the name of the release to upgrade to.
Software upgrade trust model
The customer’s Glean GCP project has a “deployer” service account (typically called glean-deployer), whose key is shared with the Glean on-call team. The deployer service account has minimal IAM privileges - it has the IAM role to invoke Cloud functions in the GCP project and ability to view the contents of the config Cloud Storage bucket but nothing else.
In order to deploy a software upgrade in the customer’s Glean instance, Glean’s central build server uses the deployer service account’s key to invoke the deploy_build Cloud Function exposed in the GCP project. The deploy_build Cloud Function just accepts as input the release version to upgrade to. Once the Cloud Function is invoked, the Cloud Build component in the customer’s GCP project then downloads the specified version’s release artifacts from a locked-down trusted central GCP docker container registry that customer GCP projects have read access to. The Cloud Build component then verifies, using binary authorization, that the downloaded release artifacts were signed correctly and if so proceeds to upgrade the Glean system in-place.
Security
inherits all security settings of the customer’s cloud environment
only system exposed as a web service is the Query Endpoint service. And every time it’s run either the user has to authenticate with Enterprise SSO or it has an authenticated cookie issued to the user as part of the SSO login
For datasources that support web hooks, those endpoints listen for notifications about modified content from the datasources but they do not return data. They are mostly crawl hints that trigger the system to make the API call to fetch the modified content. Also. the web hook notifications payload is signed by a secret established out of band with the datasource to prevent spoofing
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (