We have a situation where build request is hanging in CHECKING_BUILD_CONDITION state while the build with same build version appears to be finished already.
After digging through the heap dump I found that a thread processing that request is stuck at socket read:
"Thread-5248962" daemon prio=10 tid=0x00007f21800cb000 nid=0x1924 runnable [0x00007f217f848000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x0000000708777a90> (a java.io.BufferedInputStream)
at sun.net.
www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.
www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.
www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
- locked <0x0000000708777ad0> (a sun.net.
www.protocol.http.HttpURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:166)
at $Proxy100.cacheBuildStatus(Unknown Source)
at com.pmease.quickbuild.DefaultBuildEngine.cacheBuildStatusInGrid(DefaultBuildEngine.java:1275)
at com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:286)
at com.pmease.quickbuild.DefaultBuildEngine.access$1(DefaultBuildEngine.java:242)
at com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:753)
at java.lang.Thread.run(Thread.java:662)
Checking the parameters for connection QB uses to communicate with grid I found that you have set connectionTimeout but no readTimeout meaning it will just wait forever if a node just disappears in the middle of socket session and does not close the connection properly. We can not assume the network to be 100% stable due to distributed nature of our grid.
What's most annoying is that it's impossible to remove the hanging build from the queue. It just does not go away. The only way to get rid of it is to restart QB which we can not afford to do very often.