Infrascructure Components

Infrastructure Diagram:

flask-backend cloud run (repo:flask-backend)
Face-detection cloud run (repo:pre-processor)
cloud function attached to buckets events of envison-us-central1-prod bucket
envision_endpoint_a1111_pubsub (id: w7gv1tqn0cgi4h) Runpod Endpoint for inference jobs
cloud function (not in use)
cloud function for billing account alerts
CDN for jetrr buckets(not in use)
load balancers for cdn
custom kubernetes job logger deployment (repo:k8s-event-streamer-service)(not in use)
gke-job-status-update-gpu-cluster-auto log sink for logging (in cloud console for log routers)(not in use)
daemonset for caching training image in cpu nodes (repo: kubernetes-client)(not in use)
kubernetes cluster gpu_cluster_auto:(not in use)
image caching is enabled
basic node pool with 1 cpu node for image caching and log streaming service
default node pool for gpu nodes for running stable diffusion jobs
temporary: NO_SCHEDULE
cloud schedulers
pub-sub-topics and subscriptions (description in terraform and kubernetes-client repo)
#This topic is not used by envision backend, it’s used for jetrr-cloud project billing alerts
dead-letter-topic (to be created)
dead-letter-sub (to be created)
Slack channels
api timeout notification channel dev
failure revovery job summary
admin panel
Deployment strategy for Production
update configs-prod according to configs dev i.e. update configs collection in firestore
update pre-processor
update backend
check fcm, slack and other webhooks
update runpod prod endpoint
update and test cloud scheduler for Failure Recovery Cron Job
update and Test cloud function for replication of data (especially when updating infra with terraform because terraform resets some permissions which causes the function to stop working and you have to manually give the cloud function that permission back)

