Explore

GPU Infrastructure & Unit Economics Model

The system is modeled around dedicated GPU allocation and time-based usage. Each active session consumes a full GPU, and infrastructure cost scales directly with how long users play rather than how many users are registered. For this reason, all planning is based on GPU hours, concurrent sessions, and realistic utilization instead of theoretical capacity.

The platform operates under a fixed subscription model where each user receives 100 hours of gameplay per month. This defines the maximum infrastructure exposure per user. In practice, user behavior is expected to fall below this limit, with average usage between 40 and 70 hours per month. This gap between allocated and actual usage is critical, as it allows unused capacity to be redistributed and improves overall efficiency.

⁠

System Requirements per Game Session

Each session is provisioned with a consistent compute environment to ensure stable performance across a wide range of games. The system is built around consumer-grade NVIDIA RTX GPUs (30/40/50 series), which provide strong performance and hardware encoding support at a significantly lower cost than datacenter GPUs.

Each session is assigned a full GPU. This avoids performance contention, simplifies scheduling, and ensures predictable latency and frame consistency regardless of the game being played.

The baseline requirements per session are:

Resource

Requirement

CPU

6 vCPU

Memory

12 GB RAM

Local Storage

~200 GB (ephemeral)

Shared Storage

~2 TB (preinstalled games and runtime assets)

GPU

1× NVIDIA RTX 30/40/50 series

Network

1 Gbps low-latency connection

There are no rows in this table

⁠

This configuration is designed to support 1080p60 gameplay with consistent encoding quality and low input latency.

⁠

GPU Cost Assumptions (Corrected Model)

The economic model relies on using low-cost infrastructure rather than cloud GPUs. Consumer GPUs deployed on bare metal or equivalent environments provide the necessary cost efficiency.

Scenario

Cost per GPU hour

Optimized (target)

$0.06 – $0.10

Mid-range

$0.10 – $0.15

Upper bound

$0.15 – $0.20

There are no rows in this table

⁠

This table defines the acceptable operating range. The system is designed to operate near the lower end of this range, with higher values representing temporary or less efficient conditions.

⁠

GPU Capacity and Real Utilization

Each GPU has a theoretical maximum of approximately 720 hours per month. However, this capacity cannot be fully utilized due to how users behave.

Gaming activity is concentrated in peak hours, typically evenings and weekends. During off-peak periods, particularly late night and early morning, GPUs remain idle. This creates a gap between theoretical and usable capacity.

Metric

Value

Theoretical capacity

~720 hours/month

Practical utilization range

50% – 75%

Effective usable capacity

~360 – 540 hours/month

There are no rows in this table

⁠

This table reflects the actual usable capacity of a GPU in a real system. Even with perfect infrastructure, time-of-day demand limits utilization.

⁠

Subscription Allocation vs GPU Coverage

The subscription model defines how GPU capacity is distributed across users.

Metric

Value

Allocation per user

100 hours/month

Target users per GPU

~12 users

There are no rows in this table

⁠

This allocation intentionally exceeds strict capacity limits to account for real usage patterns. Not all users consume their full allocation, and usage is spread unevenly over time.

⁠

Real Usage Model

Actual efficiency depends on how users consume their allocated hours.

Metric

Value

Avg usage per user

~40 – 70 hours

Effective GPU capacity

~360 – 540 hours

Effective users per GPU

~8 – 12 users

There are no rows in this table

⁠

This table represents the realistic operating range. The system is designed to stay within this window to balance utilization and service quality.

⁠

Implications for System Design

Assigning one GPU per session simplifies the system and guarantees performance, but it increases the importance of utilization. Unlike shared GPU systems, unused capacity cannot be dynamically reassigned during idle periods.

This makes time-based utilization the central factor in cost efficiency. Infrastructure must be provisioned for peak demand, even if it results in idle capacity during off-peak hours.

The subscription model partially offsets this limitation by allowing unused user allocation to be absorbed by other active users. However, this effect is constrained by demand patterns, which remain the dominant factor.

⁠

Core Constraint

The entire system is built around a single requirement:

Cost per GPU hour must remain close to $0.06 – $0.10

This constraint determines infrastructure selection, pricing strategy, and scaling approach. If costs rise significantly above this range, the model becomes difficult to sustain under the current pricing structure.

⁠

The cost structure of the platform is derived directly from GPU pricing, server density, and supporting infrastructure. Since each session is mapped to a dedicated GPU, the system is primarily compute-bound, and GPU cost defines the overall economics.

The base deployment unit is a bare metal server equipped with multiple consumer GPUs. This approach is selected because it provides predictable performance and significantly lower cost per GPU hour compared to cloud-based solutions.

⁠

Bare Metal Server Model

A standard server configuration is defined as follows:

Metric

Value

GPUs per server

Sessions per GPU

Total concurrent sessions

GPU hours per GPU/month

~720 hours

Total GPU hours/server

~5,760 hours

Monthly server cost

$500 – $1,200

There are no rows in this table

⁠

This table defines the physical capacity of a single server. Each GPU runs one session, so the server supports eight concurrent users. Over a full month, the server provides approximately 5,760 GPU hours of total capacity.

The cost range reflects realistic pricing for consumer GPU bare metal deployments, assuming optimized sourcing rather than hyperscale cloud providers.

⁠

Cost per GPU and Session

Using the server-level cost, the effective cost per GPU hour can be derived.

Calculation

Value

Cost per server

$500 – $1,200

GPUs per server

Cost per GPU/month

$62 – $150

GPU hours/month

~720

Cost per GPU hour

$0.06 – $0.10 (target)

Upper bound

$0.10 – $0.20

There are no rows in this table

⁠

This table shows how infrastructure pricing translates into per-hour cost. The target operating range is $0.06–$0.10, while higher values represent less optimized conditions or temporary scaling scenarios.

Since each session consumes one GPU, the session cost is equal to the GPU cost per hour.

⁠

Cost per User (Subscription Model)

Under the subscription model, each user is allocated 100 hours per month. This allows direct calculation of maximum cost exposure.

Metric

Value

Cost per GPU hour

$0.06 – $0.10

Allocation per user

100 hours

Max cost per user

$6 – $10

There are no rows in this table

⁠

This table represents the worst-case scenario where a user consumes their full allocation.

⁠

Real Usage Adjustment

Actual user behavior is expected to fall below the allocation limit.

Metric

Value

Average usage

~40 – 70 hours

Cost per GPU hour

$0.06 – $0.10

Effective cost per user

$3 – $7

There are no rows in this table

⁠

This table reflects the realistic cost per user under typical usage patterns. The gap between allocated and actual usage is the primary mechanism that improves system efficiency.

⁠

Backend and Platform Cost

In addition to GPU infrastructure, the platform requires several backend services to operate. These costs are mostly fixed and do not scale linearly with usage during the early phase.

Component

Monthly Cost

Supabase Stack (self-hosted)

$60 – $120

Queue / Orchestration Server

$40 – $80

Website + Dashboard

$20 – $50

RustFS (S3-compatible storage)

$30 – $80

Total Backend Cost

$150 – $330

There are no rows in this table

⁠

This table represents the operational layer of the platform. Compared to GPU infrastructure, these costs are relatively small and decrease per user as the system scales.

⁠

Total Platform Capacity and Cost (Initial Deployment)

For the beta phase, the system targets approximately 50 to 100 concurrent users. Since each session requires one GPU, infrastructure scales directly with peak concurrency.

At the same time, the platform serves a significantly larger number of monthly subscribers, as users do not consume GPU time continuously.

⁠

Infrastructure Requirements (Real-Time Capacity)

50 concurrent users → 50 GPUs → ~6–7 servers

100 concurrent users → 100 GPUs → ~12–13 servers

This represents peak system capacity, defining how many users can play simultaneously.

⁠

Monthly User Capacity

Metric

50 GPU Setup

100 GPU Setup

GPUs

100

Users per GPU

~12

Monthly users supported

~600 users

~1,200 users

There are no rows in this table

⁠

This reflects how infrastructure is utilized over time. Each GPU serves multiple users per month due to time-based usage patterns.

⁠

Monthly Infrastructure Cost

Metric

50 GPU Setup

100 GPU Setup

Servers

6 – 7

12 – 13

Cost per server

$500 – $1,200

GPU Infrastructure Cost

$3,000 – $8,400

$6,000 – $15,600

There are no rows in this table

⁠

This table shows the total compute cost required to support peak concurrent demand.

⁠

Full Platform Cost

Component

50 GPU Setup

100 GPU Setup

GPU Infrastructure

$3,000 – $8,400

$6,000 – $15,600

Backend Services

$150 – $330

Total Platform Cost

$3,150 – $8,730

$6,150 – $15,930

There are no rows in this table

⁠

This represents the full monthly cost of operating the platform at different concurrency levels.

⁠

Cost per User (Corrected Model)

Using total platform cost distributed across monthly users, rather than only concurrent users:

Conservative Scenario

Metric

Value

Monthly users

~600 – 1,200

Total cost

$3,150 – $15,930

Cost per user

~$5 – $26

There are no rows in this table

⁠

Utilization-Adjusted Scenario

If average usage is lower (40–70h), increasing effective capacity:

Metric

Value

Monthly users

~800 – 1,500

Total cost

same

Cost per user

~$4 – $20

There are no rows in this table

⁠

Interpretation

Infrastructure is provisioned based on peak concurrency, while cost efficiency is achieved through time-based usage across a larger subscriber base.

This distinction is critical. Without separating concurrent users from monthly users, the system appears significantly more expensive than it actually is.

Higher utilization directly reduces cost per user without requiring additional infrastructure, making GPU time distribution the key driver of unit economics.

⁠

The beta phase is structured as a controlled rollout where infrastructure grows in parallel with real user activity. Instead of provisioning large capacity upfront, the system expands incrementally based on observed load, concurrency patterns, and utilization efficiency. This approach reduces risk while providing accurate data for future scaling decisions.

⁠

Growth Model and Load Evolution

User growth is expected to follow a staged pattern, with early users showing higher engagement and more concentrated activity during peak hours. As a result, infrastructure planning is based on daily active users and peak concurrent sessions rather than total registered users.

Phase

Daily Active Users

Peak Concurrent

GPUs Required

Servers Required

Launch (Week 1)

50 – 100

40 – 60

5 – 8

Week 2–3

150 – 300

80 – 150

10 – 19

Month 1

500 – 1,000

150 – 300

19 – 38

Month 3

2,000 – 3,000

500 – 800

63 – 100

Year Target

10,000+

2,000+

250+

There are no rows in this table

⁠

This table shows how infrastructure requirements scale directly with concurrency. Peak concurrency is assumed to be approximately 20–30% of daily active users, which reflects typical gaming usage behavior.

⁠

Daily Load Distribution

User activity is uneven throughout the day, which directly impacts infrastructure efficiency. Most sessions occur during evening hours and weekends, while late-night and early-morning periods remain underutilized.

Metric

Value

Peak usage window

Evening / weekends

Off-peak usage

Low (night / early morning)

Utilization variability

High

Effective utilization range

50% – 75%

There are no rows in this table

⁠

This table explains why full GPU utilization is not achievable in practice. Even with sufficient demand, time-based usage patterns limit how efficiently resources can be used.

⁠

Dynamic Scaling Strategy

The system is designed to scale based on real-time conditions rather than static projections. The orchestration layer continuously monitors system load and performance indicators to determine when additional capacity is required.

Scaling is triggered when:

GPU utilization exceeds 75–80% for sustained periods

session allocation becomes delayed or queues begin to form

latency increases beyond acceptable thresholds

available nodes cannot immediately serve incoming sessions

When these conditions occur, additional servers are provisioned to maintain performance and prevent degradation of user experience.

During the beta phase, scaling is performed manually or semi-automatically to maintain control over cost. Over time, this process transitions to automated scaling based on predefined thresholds.

⁠

Success Metrics and Operational Indicators

The success of the beta phase is measured through a combination of performance and economic indicators. These metrics define whether the system is stable, efficient, and ready to scale.

Metric

Target

Session success rate

> 98%

Session start time

< 10 seconds

Average latency

30 – 50 ms

Queue time

< 5 seconds

GPU utilization

65 – 80%

Cost per GPU hour

≤ $0.10 (target)

Avg usage per user

40 – 70 hours/month

There are no rows in this table

⁠

This table defines acceptable operating conditions. Maintaining these thresholds ensures both a good user experience and sustainable infrastructure cost.

⁠

Infrastructure Risks and Constraints

Several risks must be considered when operating and scaling the system.

Risk Factor

Impact

GPU pricing increase

Higher cost per session

Supply limitations

Slower scaling

Peak-time overload

Queueing and degraded experience

Low utilization

Increased cost per user

Regional latency

Reduced user satisfaction

There are no rows in this table

⁠

This table highlights the main constraints that affect both performance and economics.

⁠

Mitigation Strategy

The system is designed to reduce these risks through controlled expansion and flexible infrastructure sourcing.

Capacity is added incrementally to avoid over-provisioning, and a utilization buffer of approximately 20–30% is maintained to absorb unexpected demand spikes. This ensures that the system can handle peak load without degrading performance.

Over time, infrastructure sourcing is diversified. Bare metal servers provide the core capacity due to their cost efficiency, while additional capacity can be introduced through alternative providers or decentralized nodes to improve flexibility and reduce dependency on a single supply source.

Improving orchestration is also a key factor. More efficient scheduling allows better distribution of sessions across time, increasing utilization without compromising user experience.

⁠

Future Infrastructure Evolution

The next stage of development focuses on improving efficiency and expanding coverage.

First, utilization will be improved through better understanding of user behavior and more advanced scheduling. This allows more users to be served on the same hardware without increasing cost.

Second, geographic expansion will reduce latency and improve user experience in additional regions. This becomes necessary as the platform grows beyond its initial deployment area.

Third, infrastructure sourcing will evolve into a hybrid model. While bare metal remains the foundation, additional capacity from decentralized or alternative sources will enable more flexible scaling and cost optimization.

⁠

Final Observations

The system is designed around a balance between cost, capacity, and user experience. GPU allocation defines the base constraint, while utilization determines overall efficiency.

The corrected cost model, based on $0.06–$0.10 per GPU hour, makes the platform economically viable under realistic usage conditions. The subscription model ensures predictable cost exposure, while average usage below the allocation improves margins.

The beta phase is primarily focused on validating these assumptions. The most important outcomes are understanding how users behave, how load is distributed, and how effectively GPU time can be utilized.

Long-term success depends on maintaining low GPU cost, improving utilization, and scaling infrastructure in a controlled and efficient manner.

Gallery

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.