<< Back to previous view |
[QB-2446] Expereinces Socket Read Timeout
|
|
Status: | Closed |
Project: | QuickBuild |
Component/s: | None |
Affects Version/s: | 6.0.10 |
Fix Version/s: | None |
Type: | Bug | Priority: | Major |
Reporter: | Phong Trinh | Assigned To: | Robin Shen |
Resolution: | Won't Fix | Votes: | 0 |
Remaining Estimate: | Unknown | Time Spent: | Unknown |
Original Estimate: | Unknown | ||
Environment: | QuickBuild on Unix platform |
Description |
Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide retry capacity when this issue happens?
07:53:46,032 ERROR - Build is failed. Java.lang.RuntimeException: Error executing step execution job. At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29) At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19) At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102) At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526) At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394) At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139) At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102) At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) At java.lang.Thread.run(Thread.java:701) Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job. At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49) At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84) ... 7 more Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165) At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300) At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171) At com.sun.proxy.$Proxy70.testGridJob(Unknown Source) At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41) ... 8 more Caused by: java.net.SocketTimeoutException: Read timed out At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) At java.lang.reflect.Constructor.newInstance(Constructor.java:534) At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547) At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543) At java.security.AccessController.doPrivileged(Native Method) At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541) At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192) At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145) ... 12 more Caused by: java.net.SocketTimeoutException: Read timed out At java.net.SocketInputStream.socketRead0(Native Method) At java.net.SocketInputStream.read(SocketInputStream.java:146) At java.io.BufferedInputStream.fill(BufferedInputStream.java:235) At java.io.BufferedInputStream.read1(BufferedInputStream.java:275) At java.io.BufferedInputStream.read(BufferedInputStream.java:334) At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642) At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590) At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248) At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397) At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132) ... 12 more |
Comments |
Comment by Phong Trinh [ 20/May/15 06:53 PM ] |
Corrected my typo:
---------------------------------------------------------------------------------- Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide QB retry capability when this issue happens? 07:53:46,032 ERROR - Build is failed. Java.lang.RuntimeException: Error executing step execution job. At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29) At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19) At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102) At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526) At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394) At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139) At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102) At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) At java.lang.Thread.run(Thread.java:701) Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job. At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49) At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84) ... 7 more Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165) At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300) At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171) At com.sun.proxy.$Proxy70.testGridJob(Unknown Source) At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41) ... 8 more Caused by: java.net.SocketTimeoutException: Read timed out At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) At java.lang.reflect.Constructor.newInstance(Constructor.java:534) At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547) At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543) At java.security.AccessController.doPrivileged(Native Method) At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541) At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192) At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145) ... 12 more Caused by: java.net.SocketTimeoutException: Read timed out At java.net.SocketInputStream.socketRead0(Native Method) At java.net.SocketInputStream.read(SocketInputStream.java:146) At java.io.BufferedInputStream.fill(BufferedInputStream.java:235) At java.io.BufferedInputStream.read1(BufferedInputStream.java:275) At java.io.BufferedInputStream.read(BufferedInputStream.java:334) At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642) At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590) At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248) At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397) At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132) |
Comment by Robin Shen [ 20/May/15 09:56 PM ] |
Most probably this is caused by agent overload or network connection instability. You may have QB to retry build I this case by scripting post build field I advanced setting of the configuration to check build error message and resubmit the build if it contains patterns such as "timed out". A pseudo script can be something like below:
groovy: If (build.errorMessage.contains("timed out")) { def newRequest = new BuildRequest(); newRequest.configurationId = request.configurationId; // continue to populate the new request with other properties as necessary BuildEngine.instance.requestBuild(newRequest); } |
Comment by Phong Trinh [ 08/Jun/15 08:58 PM ] |
I tried this, but it couldn't invoke the method, contains()
groovy: import com.pmease.quickbuild.* If (build.errorMessage.contains("timed out")) { def newRequest = new BuildRequest(); newRequest.configurationId = request.configurationId; // continue to populate the new request with other properties as necessary BuildEngine.instance.requestBuild(newRequest); } |
Comment by Robin Shen [ 08/Jun/15 11:31 PM ] |
What error it is giving? |
Comment by Phong Trinh [ 09/Jun/15 01:57 PM ] |
Hi Robin,
The error is as the follow: 22:29:58,584 ERROR - Step 'master>script' is failed: java.lang.NullPointerException: Cannot invoke method contains() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at script1433802598566941514341.run(script1433802598566941514341.groovy:10) at com.pmease.quickbuild.plugin.basis.BasisPlugin$28.evaluate(BasisPlugin.java:348) at com.pmease.quickbuild.DefaultScriptEngine.evaluate(DefaultScriptEngine.java:81) at com.pmease.quickbuild.plugin.basis.ScriptStep.run(ScriptStep.java:47) at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.CGLIB$run$0(<generated>) at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1$$FastClassByCGLIB$$70430117.invoke(<generated>) at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:215) at com.pmease.quickbuild.DefaultScriptEngine$Interpolator.intercept(DefaultScriptEngine.java:273) at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.run(<generated>) at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539) at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30) at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45) at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60) at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) |
Comment by Robin Shen [ 09/Jun/15 11:31 PM ] |
Looks like the error message is contained in a child step. Try improving the script as below:
groovy: import com.pmease.quickbuild.*; for (step in build.steps) { if (step.errorMessage != null && step.errorMessage.contains("timed out")) { // please check the actual error message to replace "timed out" with the real error message indicating the timed out def newRequest = new BuildRequest(); newRequest.configurationId = request.configurationId; // continue to populate the new request with other properties as necessary BuildEngine.instance.requestBuild(newRequest); } } |
Comment by Phong Trinh [ 11/Jun/15 03:55 AM ] |
Thank you for the script. It seems to me that the script retries the configuration, so it re-runs all of the steps in the it. I am looking retrying just a step in the configuration which is failed. |
Comment by Robin Shen [ 11/Jun/15 10:28 PM ] |
QB itself does not track which step to start with, however you may script your step condition to check step status of previously failed build to see which step has to rerun. |
Comment by AlSt [ 23/May/16 08:12 AM ] |
Hi Robin.
Is it possible to retry the hessian connection and not rerun the whole build. E.g. when the timeout occurs in a publish step and the build takes 2 hours rerunning the whole build is not the best option. In general more retries would be nice because we can get rid of a lot of failed builds. Thanks, Alex |
Comment by Robin Shen [ 24/May/16 12:28 AM ] |
To retry a certain step, try this approach:
http://forum.pmease.com/viewtopic.php?f=5&t=4022 |
Comment by Phong Trinh [ 26/May/16 01:41 PM ] |
I tried this approach, and it works. However, it may have an issue. The issue was that the first execution was failed, and the second retry was successful, but QB still reported that the sibling step was failed, so the next step in the configuration was skipped. |
Comment by Robin Shen [ 27/May/16 12:27 AM ] |
In this case you may enclose the repeat step inside a sequential step and set success condition of the container step to be "if any of the child step succeeds". |
Comment by Phong Trinh [ 27/May/16 02:47 AM ] |
Thank you for promptly response. May you give me an example on how to do that? |
Comment by Robin Shen [ 27/May/16 11:39 PM ] |
For instance, if you want to retry on step "compile", and you can add a container step of type "sequential" and then put the compile step as the only children into the container step. Then specify success condition of the container step as "success if any child step succeeds". That way as long as one retry succeeds, the container itself will be successful. Of course you need to add the container step into your current step workflow. |
Comment by Phong Trinh [ 29/May/16 06:55 PM ] |
Thanks Robin. It works for me. |