A thread dump is what the JVM prints when it’s asked to describe what every thread is doing right now. You ask for one with jstack <pid>, by sending SIGQUIT to the process, or via JMX. The output looks like an alien language the first time you see it: dozens of stack traces, lock identifiers, thread states.
Most engineers, faced with a 200-thread dump, either skim for something familiar or close the file and ask someone else. There’s a better procedure. It’s six steps, each takes less than a minute, and by the end you usually know which thread is causing the problem.
Step 1: Take three dumps, ten seconds apart
A single thread dump is a snapshot. You can’t tell from one snapshot whether a thread is genuinely stuck or just briefly busy. Take three dumps, ten seconds apart. Threads that are stuck in the same stack frame across all three are the ones to focus on.
jstack -l 12345 > dump1.txt
sleep 10 && jstack -l 12345 > dump2.txt
sleep 10 && jstack -l 12345 > dump3.txt
The -l flag includes “long” output with lock-ownership info. Always include it.
Step 2: Count thread states
At the top of the dump, count how many threads are in each state. The states you care about are RUNNABLE, WAITING, TIMED_WAITING, and BLOCKED.
A healthy steady-state JVM has lots of WAITING threads (idle thread-pool workers, IO threads parked) and a small number of RUNNABLE. A pathological state is many BLOCKED threads (lock contention) or many RUNNABLE threads doing the same thing (CPU saturation, hot spinning).
If 200 of your 250 threads are BLOCKED, you have a lock contention problem. Skip straight to step 5.
Step 3: Look for monitor / lock identifiers
Each BLOCKED thread has a line like:
- waiting to lock <0x00000000fce8c478> (a com.example.Cache)
The hex value is a monitor identifier. Find which thread owns it. That thread will have:
- locked <0x00000000fce8c478> (a com.example.Cache)
The thread holding the lock is the one blocking everyone else. Read its stack — that’s where the problem is.
Step 4: Diff the three dumps for stuck threads
Threads that have the exact same top-of-stack across dump 1, dump 2, and dump 3 are stuck. Threads that are RUNNABLE in all three at the same frame are particularly suspicious — that frame is likely doing CPU-bound work or busy-waiting.
A quick way to do this without tooling: grep -A 1 "at " dump?.txt | sort | uniq -c | sort -rn | head -20. The most-frequent stack frame across the three dumps is your prime suspect.
Step 5: Recognise the common patterns
Most production thread-dump pathologies fall into a small number of patterns:
- Database connection pool exhaustion. Many threads
WAITINGonHikariPool.getConnection()or similar. Fix: increase pool size, add backpressure, or find the leak (a connection not being returned). - Synchronous external HTTP call without timeout. Threads
RUNNABLEdeep inSocketInputStream.read. Fix: configure timeouts on every HTTP client. Always. - Single-threaded bottleneck. Many threads
BLOCKEDon a synchronized method. The owner’s stack shows a long-running computation. Fix: remove the synchronization, narrow its scope, or move the computation off the hot path. - Thread-pool starvation. Active threads count equals the pool maximum, queue is full. Fix: identify what’s slow, increase pool, or split work across pools so a single slow operation doesn’t starve unrelated work.
- Deadlock. Two threads each waiting on a lock the other holds.
jstackusually announces this at the top of the dump with “Found one Java-level deadlock”. Read it; the diagnosis is in the announcement.
Step 6: Confirm with a heap snapshot only if step 5 didn’t resolve it
If the thread dump points clearly at, say, “HikariPool.getConnection() with a 60-second wait” you don’t need a heap dump. The diagnosis is already conclusive.
Heap dumps are useful for memory pressure investigations, retainer chain analysis, and finding object leaks. They’re a step after a thread dump, not a replacement.
The one-liner you actually want
Once you’ve done this enough times, your muscle memory is “take three dumps, count BLOCKED, find the lock owner, recognise the pattern”. We built ErrorLens partly because that muscle memory takes years to build, and an on-call engineer who pages at 2am doesn’t have it yet.
Paste a thread dump into New Analysis and ErrorLens will identify the dominant pattern, point to the blocking thread, and tell you which of the five categories above your problem belongs to. It won’t replace your engineering judgment for the genuinely novel cases, but it will get a junior engineer to the right diagnosis fast on the 90% of incidents that fit a known pattern.