History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-949
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Unassigned
Reporter: Rene Raasuke
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Build requests hanging in CHECKING_BUILD_CONDITION state because socket readTimeout is not set

Created: 14/Jun/11 02:12 PM   Updated: 18/Jun/11 12:13 PM
Component/s: None
Affects Version/s: 3.1.45
Fix Version/s: 3.1.48

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: None
Image Attachments:

1. blocked_queue.png
(52 kb)
Environment: Debian Squeeze x86-64


 Description  « Hide
We have a situation where build request is hanging in CHECKING_BUILD_CONDITION state while the build with same build version appears to be finished already.

After digging through the heap dump I found that a thread processing that request is stuck at socket read:
"Thread-5248962" daemon prio=10 tid=0x00007f21800cb000 nid=0x1924 runnable [0x00007f217f848000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x0000000708777a90> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
- locked <0x0000000708777ad0> (a sun.net.www.protocol.http.HttpURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:166)
at $Proxy100.cacheBuildStatus(Unknown Source)
at com.pmease.quickbuild.DefaultBuildEngine.cacheBuildStatusInGrid(DefaultBuildEngine.java:1275)
at com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:286)
at com.pmease.quickbuild.DefaultBuildEngine.access$1(DefaultBuildEngine.java:242)
at com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:753)
at java.lang.Thread.run(Thread.java:662)

Checking the parameters for connection QB uses to communicate with grid I found that you have set connectionTimeout but no readTimeout meaning it will just wait forever if a node just disappears in the middle of socket session and does not close the connection properly. We can not assume the network to be 100% stable due to distributed nature of our grid.
What's most annoying is that it's impossible to remove the hanging build from the queue. It just does not go away. The only way to get rid of it is to restart QB which we can not afford to do very often.

 All   Comments   Work Log   Change History      Sort Order:
No work has yet been logged on this issue.