<< Back to previous view |
[QB-3148] QuickBuild Agents drop running build on timeout
|
|
Status: | Closed |
Project: | QuickBuild |
Component/s: | None |
Affects Version/s: | 6.1.36 |
Fix Version/s: | None |
Type: | Improvement | Priority: | Major |
Reporter: | John Landers | Assigned To: | Robin Shen |
Resolution: | Fixed | Votes: | 0 |
Remaining Estimate: | Unknown | Time Spent: | Unknown |
Original Estimate: | Unknown | ||
Environment: | Windows Build Server, build agents running on Azure |
Description |
Build Server is running inside company, build agents are up in cloud on Azure, occasionally our internet switch from company to azure has a blimp and drops the vpn connection between buildserver/agent for minute.Our builds are about 45 minutes long and a drop in middle is time lost. This happens once a week or so. We are looking into the network issue but it would be nice to increase the timeout.
A running build will fail with exception below at end of issue. It seems the connect/read timeouts for the Remote connection are hard coded. It would be nice if these are configurable. It looks like v8.0.0 has same values, we are looking to upgrade to 8.0.0 soon. com.pmease.quickbuild.RemotingProxyFactory.RemotingProxyFactory(String) public RemotingProxyFactory(String token) { this.token = token; setOverloadEnabled(true); setConnectTimeout(Bootstrap.NET_CONNECT_TIMEOUT*1000L); setReadTimeout(Bootstrap.NET_READ_TIMEOUT*1000L); } com.pmease.quickbuild.bootstrap.Bootstrap public static final int NET_CONNECT_TIMEOUT = 120; // in seconds public static final int NET_READ_TIMEOUT = 300; // in seconds Exception that fails build: 09:46:44,417 ERROR - Build is failed. java.lang.RuntimeException: Error executing step execution job. at com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29) at com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19) at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:116) at com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:532) at com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:400) at com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139) at com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1142) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job. at com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:63) at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:98) ... 7 more Caused by: com.caucho.hessian.client.HessianRuntimeException: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://172.18.12.8:8811/service/node' at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285) at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171) at com.sun.proxy.$Proxy77.testGridJob(Unknown Source) at com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:51) ... 8 more Caused by: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://172.18.12.8:8811/service/node' at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:113) at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283) ... 11 more Caused by: java.net.ConnectException: Connection timed out: connect at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258) at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:101) |
Comments |
Comment by Robin Shen [ 17/Mar/18 12:00 AM ] |
Please upgrade to QB8 and set step disconnect tolerance value to workaround this issue |