<< Back to previous view

[QB-3148] QuickBuild Agents drop running build on timeout
Created: 16/Mar/18  Updated: 17/Mar/18

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 6.1.36
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: John Landers Assigned To: Robin Shen
Resolution: Fixed Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: Windows Build Server, build agents running on Azure


 Description   
Build Server is running inside company, build agents are up in cloud on Azure, occasionally our internet switch from company to azure has a blimp and drops the vpn connection between buildserver/agent for minute.Our builds are about 45 minutes long and a drop in middle is time lost. This happens once a week or so. We are looking into the network issue but it would be nice to increase the timeout.

A running build will fail with exception below at end of issue.

It seems the connect/read timeouts for the Remote connection are hard coded. It would be nice if these are configurable.
It looks like v8.0.0 has same values, we are looking to upgrade to 8.0.0 soon.

com.pmease.quickbuild.RemotingProxyFactory.RemotingProxyFactory(String)
public RemotingProxyFactory(String token) {
this.token = token;
setOverloadEnabled(true);
setConnectTimeout(Bootstrap.NET_CONNECT_TIMEOUT*1000L);
setReadTimeout(Bootstrap.NET_READ_TIMEOUT*1000L);
}

com.pmease.quickbuild.bootstrap.Bootstrap
    public static final int NET_CONNECT_TIMEOUT = 120; // in seconds
    
    public static final int NET_READ_TIMEOUT = 300; // in seconds


Exception that fails build:

09:46:44,417 ERROR - Build is failed.
    java.lang.RuntimeException: Error executing step execution job.
        at com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29)
        at com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19)
        at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:116)
        at com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:532)
        at com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:400)
        at com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139)
        at com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1142)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job.
        at com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:63)
        at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:98)
        ... 7 more
    Caused by: com.caucho.hessian.client.HessianRuntimeException: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://172.18.12.8:8811/service/node'
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285)
        at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
        at com.sun.proxy.$Proxy77.testGridJob(Unknown Source)
        at com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:51)
        ... 8 more
    Caused by: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://172.18.12.8:8811/service/node'
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:113)
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283)
        ... 11 more
    Caused by: java.net.ConnectException: Connection timed out: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.http.HttpClient.&lt;init&gt;(HttpClient.java:211)
        at sun.net.www.http.HttpClient.New(HttpClient.java:308)
        at sun.net.www.http.HttpClient.New(HttpClient.java:326)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:101)

 Comments   
Comment by Robin Shen [ 17/Mar/18 12:00 AM ]
Please upgrade to QB8 and set step disconnect tolerance value to workaround this issue
Generated at Fri May 03 09:58:55 UTC 2024 using JIRA 189.