The system is modeled around dedicated GPU allocation and time-based usage. Each active session consumes a full GPU, and infrastructure cost scales directly with how long users play rather than how many users are registered. For this reason, all planning is based on GPU hours, concurrent sessions, and realistic utilization instead of theoretical capacity.
The platform operates under a fixed subscription model where each user receives 100 hours of gameplay per month. This defines the maximum infrastructure exposure per user. In practice, user behavior is expected to fall below this limit, with average usage between 40 and 70 hours per month. This gap between allocated and actual usage is critical, as it allows unused capacity to be redistributed and improves overall efficiency.
System Requirements per Game Session
Each session is provisioned with a consistent compute environment to ensure stable performance across a wide range of games. The system is built around consumer-grade NVIDIA RTX GPUs (30/40/50 series), which provide strong performance and hardware encoding support at a significantly lower cost than datacenter GPUs.
Each session is assigned a full GPU. This avoids performance contention, simplifies scheduling, and ensures predictable latency and frame consistency regardless of the game being played.
The baseline requirements per session are:
This configuration is designed to support 1080p60 gameplay with consistent encoding quality and low input latency.
GPU Cost Assumptions (Corrected Model)
The economic model relies on using low-cost infrastructure rather than cloud GPUs. Consumer GPUs deployed on bare metal or equivalent environments provide the necessary cost efficiency.
This table defines the acceptable operating range. The system is designed to operate near the lower end of this range, with higher values representing temporary or less efficient conditions.
GPU Capacity and Real Utilization
Each GPU has a theoretical maximum of approximately 720 hours per month. However, this capacity cannot be fully utilized due to how users behave.
Gaming activity is concentrated in peak hours, typically evenings and weekends. During off-peak periods, particularly late night and early morning, GPUs remain idle. This creates a gap between theoretical and usable capacity.
This table reflects the actual usable capacity of a GPU in a real system. Even with perfect infrastructure, time-of-day demand limits utilization.
Subscription Allocation vs GPU Coverage
The subscription model defines how GPU capacity is distributed across users.
This allocation intentionally exceeds strict capacity limits to account for real usage patterns. Not all users consume their full allocation, and usage is spread unevenly over time.
Real Usage Model
Actual efficiency depends on how users consume their allocated hours.
This table represents the realistic operating range. The system is designed to stay within this window to balance utilization and service quality.
Implications for System Design
Assigning one GPU per session simplifies the system and guarantees performance, but it increases the importance of utilization. Unlike shared GPU systems, unused capacity cannot be dynamically reassigned during idle periods.
This makes time-based utilization the central factor in cost efficiency. Infrastructure must be provisioned for peak demand, even if it results in idle capacity during off-peak hours.
The subscription model partially offsets this limitation by allowing unused user allocation to be absorbed by other active users. However, this effect is constrained by demand patterns, which remain the dominant factor.
Core Constraint
The entire system is built around a single requirement:
Cost per GPU hour must remain close to $0.06 – $0.10 This constraint determines infrastructure selection, pricing strategy, and scaling approach. If costs rise significantly above this range, the model becomes difficult to sustain under the current pricing structure.
The cost structure of the platform is derived directly from GPU pricing, server density, and supporting infrastructure. Since each session is mapped to a dedicated GPU, the system is primarily compute-bound, and GPU cost defines the overall economics.
The base deployment unit is a bare metal server equipped with multiple consumer GPUs. This approach is selected because it provides predictable performance and significantly lower cost per GPU hour compared to cloud-based solutions.
Bare Metal Server Model
A standard server configuration is defined as follows:
This table defines the physical capacity of a single server. Each GPU runs one session, so the server supports eight concurrent users. Over a full month, the server provides approximately 5,760 GPU hours of total capacity.
The cost range reflects realistic pricing for consumer GPU bare metal deployments, assuming optimized sourcing rather than hyperscale cloud providers.
Cost per GPU and Session
Using the server-level cost, the effective cost per GPU hour can be derived.
This table shows how infrastructure pricing translates into per-hour cost. The target operating range is $0.06–$0.10, while higher values represent less optimized conditions or temporary scaling scenarios.
Since each session consumes one GPU, the session cost is equal to the GPU cost per hour.
Cost per User (Subscription Model)
Under the subscription model, each user is allocated 100 hours per month. This allows direct calculation of maximum cost exposure.
This table represents the worst-case scenario where a user consumes their full allocation.
Real Usage Adjustment
Actual user behavior is expected to fall below the allocation limit.
This table reflects the realistic cost per user under typical usage patterns. The gap between allocated and actual usage is the primary mechanism that improves system efficiency.
Backend and Platform Cost
In addition to GPU infrastructure, the platform requires several backend services to operate. These costs are mostly fixed and do not scale linearly with usage during the early phase.
This table represents the operational layer of the platform. Compared to GPU infrastructure, these costs are relatively small and decrease per user as the system scales.
Total Platform Capacity and Cost (Initial Deployment)
For the beta phase, the system targets approximately 50 to 100 concurrent users. Since each session requires one GPU, infrastructure scales directly with peak concurrency.
At the same time, the platform serves a significantly larger number of monthly subscribers, as users do not consume GPU time continuously.
Infrastructure Requirements (Real-Time Capacity)
50 concurrent users → 50 GPUs → ~6–7 servers 100 concurrent users → 100 GPUs → ~12–13 servers This represents peak system capacity, defining how many users can play simultaneously.
Monthly User Capacity
This reflects how infrastructure is utilized over time. Each GPU serves multiple users per month due to time-based usage patterns.
Monthly Infrastructure Cost
This table shows the total compute cost required to support peak concurrent demand.
Full Platform Cost
This represents the full monthly cost of operating the platform at different concurrency levels.
Cost per User (Corrected Model)
Using total platform cost distributed across monthly users, rather than only concurrent users:
Conservative Scenario
Utilization-Adjusted Scenario
If average usage is lower (40–70h), increasing effective capacity:
Interpretation
Infrastructure is provisioned based on peak concurrency, while cost efficiency is achieved through time-based usage across a larger subscriber base.
This distinction is critical. Without separating concurrent users from monthly users, the system appears significantly more expensive than it actually is.
Higher utilization directly reduces cost per user without requiring additional infrastructure, making GPU time distribution the key driver of unit economics.
The beta phase is structured as a controlled rollout where infrastructure grows in parallel with real user activity. Instead of provisioning large capacity upfront, the system expands incrementally based on observed load, concurrency patterns, and utilization efficiency. This approach reduces risk while providing accurate data for future scaling decisions.
Growth Model and Load Evolution
User growth is expected to follow a staged pattern, with early users showing higher engagement and more concentrated activity during peak hours. As a result, infrastructure planning is based on daily active users and peak concurrent sessions rather than total registered users.
This table shows how infrastructure requirements scale directly with concurrency. Peak concurrency is assumed to be approximately 20–30% of daily active users, which reflects typical gaming usage behavior.
Daily Load Distribution
User activity is uneven throughout the day, which directly impacts infrastructure efficiency. Most sessions occur during evening hours and weekends, while late-night and early-morning periods remain underutilized.
This table explains why full GPU utilization is not achievable in practice. Even with sufficient demand, time-based usage patterns limit how efficiently resources can be used.
Dynamic Scaling Strategy
The system is designed to scale based on real-time conditions rather than static projections. The orchestration layer continuously monitors system load and performance indicators to determine when additional capacity is required.
Scaling is triggered when:
GPU utilization exceeds 75–80% for sustained periods session allocation becomes delayed or queues begin to form latency increases beyond acceptable thresholds available nodes cannot immediately serve incoming sessions When these conditions occur, additional servers are provisioned to maintain performance and prevent degradation of user experience.
During the beta phase, scaling is performed manually or semi-automatically to maintain control over cost. Over time, this process transitions to automated scaling based on predefined thresholds.
Success Metrics and Operational Indicators
The success of the beta phase is measured through a combination of performance and economic indicators. These metrics define whether the system is stable, efficient, and ready to scale.
This table defines acceptable operating conditions. Maintaining these thresholds ensures both a good user experience and sustainable infrastructure cost.
Infrastructure Risks and Constraints
Several risks must be considered when operating and scaling the system.
This table highlights the main constraints that affect both performance and economics.
Mitigation Strategy
The system is designed to reduce these risks through controlled expansion and flexible infrastructure sourcing.
Capacity is added incrementally to avoid over-provisioning, and a utilization buffer of approximately 20–30% is maintained to absorb unexpected demand spikes. This ensures that the system can handle peak load without degrading performance.
Over time, infrastructure sourcing is diversified. Bare metal servers provide the core capacity due to their cost efficiency, while additional capacity can be introduced through alternative providers or decentralized nodes to improve flexibility and reduce dependency on a single supply source.
Improving orchestration is also a key factor. More efficient scheduling allows better distribution of sessions across time, increasing utilization without compromising user experience.
Future Infrastructure Evolution
The next stage of development focuses on improving efficiency and expanding coverage.
First, utilization will be improved through better understanding of user behavior and more advanced scheduling. This allows more users to be served on the same hardware without increasing cost.
Second, geographic expansion will reduce latency and improve user experience in additional regions. This becomes necessary as the platform grows beyond its initial deployment area.
Third, infrastructure sourcing will evolve into a hybrid model. While bare metal remains the foundation, additional capacity from decentralized or alternative sources will enable more flexible scaling and cost optimization.
Final Observations
The system is designed around a balance between cost, capacity, and user experience. GPU allocation defines the base constraint, while utilization determines overall efficiency.
The corrected cost model, based on $0.06–$0.10 per GPU hour, makes the platform economically viable under realistic usage conditions. The subscription model ensures predictable cost exposure, while average usage below the allocation improves margins.
The beta phase is primarily focused on validating these assumptions. The most important outcomes are understanding how users behave, how load is distributed, and how effectively GPU time can be utilized.
Long-term success depends on maintaining low GPU cost, improving utilization, and scaling infrastructure in a controlled and efficient manner.