Skip to content
Share
Explore

Random Timeouts

Scenario: “Random Timeouts – Actually a Deep OutOfMemory in Pricing-Service”

Architecture:
API Gateway → order-service → pricing-service → Redis cache / DB

Users report:
“Sometimes placing an order works, but around peak time (11:30–1:00), it spins for long and then shows ‘Request timed out’.”

Full-Length Logs Across Services

API Gateway Log

2025-11-10 11:42:05,101 INFO [gw-req-34210] Incoming request: POST /orders/place for user=2098
2025-11-10 11:42:10,112 WARN [gw-req-34210] Downstream timeout calling order-service /api/orders/place (timeout=5000ms)
2025-11-10 11:42:10,114 ERROR [gw-req-34210] Returning HTTP 504 Gateway Timeout to client

order-service Log

2025-11-10 11:42:05.108 INFO 24561 --- [nio-8081-exec-7] c.c.o.controller.OrderController : Received request to place order userId=2098, items=[P1200, P3400]

2025-11-10 11:42:05.325 INFO 24561 --- [nio-8081-exec-7] c.c.o.service.PricingClient : Calling pricing-service for items=[P1200, P3400], url=http://pricing-service:8083/api/pricing/calc

2025-11-10 11:42:10.110 ERROR 24561 --- [nio-8081-exec-7] c.c.o.service.PricingClient : Timeout while calling pricing-service

org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://pricing-service:8083/api/pricing/calc": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:744)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:670)
at org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:414)
at com.company.order.service.PricingClient.calculatePrice(PricingClient.java:62)
at com.company.order.service.OrderService.createOrder(OrderService.java:91)
at com.company.order.controller.OrderController.placeOrder(OrderController.java:59)
...

Caused by: java.net.SocketTimeoutException: Read timed out
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
...

2025-11-10 11:42:10.112 WARN 24561 --- [nio-8081-exec-7] c.c.o.controller.OrderController : Pricing call timed out for userId=2098, returning HTTP 500
2025-11-10 11:42:10.113 INFO 24561 --- [nio-8081-exec-7] c.c.o.controller.OrderController : Responding with message='Unable to place order at this time, please retry'

pricing-service Log (The Deep Problem)

2025-11-10 11:41:40.501 INFO 26789 --- [ main] c.c.p.PricingServiceApplication : Starting PricingServiceApplication v3.0.1 on app-node-03 with PID 26789
2025-11-10 11:41:42.003 INFO 26789 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8083 (http)
2025-11-10 11:41:43.287 INFO 26789 --- [ main] c.c.p.PricingServiceApplication : Started PricingServiceApplication in 3.152 seconds

# Regular traffic earlier
2025-11-10 11:41:55.101 INFO 26789 --- [nio-8083-exec-3] c.c.p.controller.PricingController : Received pricing request for items=[P1001, P1002]
2025-11-10 11:41:55.845 INFO 26789 --- [nio-8083-exec-3] c.c.p.service.PricingEngine : Calculated price=2400.0 in 712ms

# Around peak time, GC / OOM patterns start
2025-11-10 11:42:00.201 WARN 26789 --- [GC Monitor] o.s.p.m.e.PricingGCMonitor : High GC activity detected, heap usage=82%
2025-11-10 11:42:02.452 WARN 26789 --- [GC Monitor] o.s.p.m.e.PricingGCMonitor : Full GC occurred, pause=2150ms, heap after GC=78%

2025-11-10 11:42:05.327 INFO 26789 --- [nio-8083-exec-7] c.c.p.controller.PricingController : Received pricing request for items=[P1200, P3400]

# No immediate response logged for this request...

2025-11-10 11:42:06.998 ERROR 26789 --- [Finalizer] o.a.c.loader.WebappClassLoaderBase : The web application [pricing-service] appears to have started a thread named [pricing-bulk-cache-loader] but has failed to stop it. This is very likely to create a memory leak.

2025-11-10 11:42:08.215 WARN 26789 --- [GC Monitor] o.s.p.m.e.PricingGCMonitor : Full GC occurred, pause=2985ms, heap after GC=92%

2025-11-10 11:42:09.504 ERROR 26789 --- [nio-8083-exec-7] c.c.p.service.PricingEngine : Failed to calculate price due to memory error

java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3720)
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.grow(ArrayList.java:244)
at java.base/java.util.ArrayList.add(ArrayList.java:454)
at java.base/java.util.ArrayList.add(ArrayList.java:467)
at com.company.pricing.engine.DiscountMatrixBuilder.buildMatrix(DiscountMatrixBuilder.java:88)
at com.company.pricing.engine.PricingEngine.calculate(PricingEngine.java:112)
at com.company.pricing.service.PricingService.calculatePrice(PricingService.java:57)
at com.company.pricing.controller.PricingController.calculate(PricingController.java:49)
...

2025-11-10 11:42:09.505 ERROR 26789 --- [nio-8083-exec-7] o.s.w.s.m.m.a.ExceptionHandlerExceptionResolver :
Resolved [java.lang.OutOfMemoryError: Java heap space]

# After OOM, service becomes unresponsive or extremely slow

2025-11-10 11:42:10.501 WARN 26789 --- [GC Monitor] o.s.p.m.e.PricingGCMonitor : Unable to collect heap, OOM already triggered, service may be unstable

Breakdown You Can Walk Through in Class


Step 1 – Start from user symptom

Users see: timeout / very slow order placement.
Gateway reports: 504 Gateway Timeout.
Ask:
“Does 504 come from backend app or the gateway?” → It’s the gateway saying “I waited too long.”

Step 2 – Check order-service (upstream app)

Key lines:
PricingClient : Calling pricing-service ...
ResourceAccessException ... Read timed out
Caused by: SocketTimeoutException: Read timed out

Discuss:
order-service sent the request.
It waited 5000ms and got no answer → timeout.
So the likely problem is inside or around pricing-service.
Teach phrase:
“If you see SocketTimeoutException in a client, that means the client was fine, the server was too slow or dead.”

Step 3 – Dive into pricing-service logs

Here is where the story is:
Increasing GC activity:
High GC activity detected, heap usage=82%
Full GC... pause=2150ms, heap after GC=78%

Then:
java.lang.OutOfMemoryError: Java heap space
at ... DiscountMatrixBuilder.buildMatrix(DiscountMatrixBuilder.java:88)

And a warning about a thread not stopped → memory leak hint:
appears to have started a thread ... failed to stop it. This is very likely to create a memory leak.

Ask:
“Where is the real root cause line?” → OutOfMemoryError: Java heap space (plus class & line).
“Is this App / DB / Infra?” → App/JVM memory issue, not DB, not network.
“Why would this cause timeouts for the caller?” → Pricing-service is so busy with GC / OOM that it never replies; client waits and times out.
“What code is suspicious?” → DiscountMatrixBuilder.buildMatrix continuously growing an ArrayList.
Connect dots:
Deep down, a pricing algorithm is consuming too much memory → heap fills → GC storms → OOM → service becomes sluggish/unresponsive → order-service’s REST call times out → gateway sends 504 to user.

Step 4 – First Actions for an L1/L2

Ask them:
“If you were on support duty, what 3 immediate actions would you take?”
Guide them to:
Stabilize service
Possibly restart pricing-service (short term).
Note: “Restart is NOT a fix, but buys time.”
Observe JVM with tools
Attach JConsole / VisualVM to pricing-service (in a test or staging env first)
Look at: heap usage, GC frequency, thread count.
Create a clear ticket to dev team
Include:
OOM stack trace snippet.
Class & line: DiscountMatrixBuilder.java:88.
Note increasing GC warnings before OOM.
Impact: pricing-service slow → order placement timing out.
Example ticket text you can teach them to write:
“At 11:42, pricing-service experienced java.lang.OutOfMemoryError: Java heap space in DiscountMatrixBuilder.buildMatrix(DiscountMatrixBuilder.java:88). Prior to this, GC monitor logs show repeated full GCs with high heap usage (80–90%). As a result, pricing-service became unresponsive, causing order-service pricing calls to time out (SocketTimeoutException) and API Gateway to return 504 to clients. Suspect memory leak or unbounded list growth in DiscountMatrixBuilder.”
That’s exactly the kind of thinking you want from L1/L2.

Discussion Prompts

To make this a live exercise, you can ask:
“What’s the difference between fixing this at JVM config level (increasing -Xmx) vs code level?”
“How would you convince a non-technical stakeholder that this is not a network problem?”
“If this kept happening daily at peak, what monitoring/alert would you set up?”
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.