<< Back to previous view

[QB-2446]  Expereinces Socket Read Timeout
Created: 20/May/15  Updated: 29/May/16

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 6.0.10
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Phong Trinh Assigned To: Robin Shen
Resolution: Won't Fix Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: QuickBuild on Unix platform


 Description   
 Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide retry capacity when this issue happens?

07:53:46,032 ERROR - Build is failed.
    Java.lang.RuntimeException: Error executing step execution job.
                At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29)
                At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19)
                At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102)
                At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526)
                At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394)
                At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139)
                At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102)
                At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                At java.lang.Thread.run(Thread.java:701)
    Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job.
                At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49)
                At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84)
                ... 7 more
    Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165)
                At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300)
                At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
                At com.sun.proxy.$Proxy70.testGridJob(Unknown Source)
                At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41)
                ... 8 more
    Caused by: java.net.SocketTimeoutException: Read timed out
                At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
                At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                At java.lang.reflect.Constructor.newInstance(Constructor.java:534)
                At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547)
                At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543)
                At java.security.AccessController.doPrivileged(Native Method)
                At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541)
                At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192)
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145)
                ... 12 more
    Caused by: java.net.SocketTimeoutException: Read timed out
                At java.net.SocketInputStream.socketRead0(Native Method)
                At java.net.SocketInputStream.read(SocketInputStream.java:146)
                At java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
                At java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
                At java.io.BufferedInputStream.read(BufferedInputStream.java:334)
                At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642)
                At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590)
                At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248)
                At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132)
                ... 12 more


 Comments   
Comment by Phong Trinh [ 20/May/15 06:53 PM ]
 Corrected my typo:
----------------------------------------------------------------------------------

Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide QB retry capability when this issue happens?

 07:53:46,032 ERROR - Build is failed.
     Java.lang.RuntimeException: Error executing step execution job.
                 At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29)
                 At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19)
                 At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102)
                 At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526)
                 At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394)
                 At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139)
                 At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102)
                 At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                 At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                 At java.lang.Thread.run(Thread.java:701)
     Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job.
                 At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49)
                 At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84)
                 ... 7 more
     Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165)
                 At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300)
                 At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
                 At com.sun.proxy.$Proxy70.testGridJob(Unknown Source)
                 At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41)
                 ... 8 more
     Caused by: java.net.SocketTimeoutException: Read timed out
                 At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                 At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
                 At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                 At java.lang.reflect.Constructor.newInstance(Constructor.java:534)
                 At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547)
                 At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543)
                 At java.security.AccessController.doPrivileged(Native Method)
                 At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541)
                 At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192)
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145)
                 ... 12 more
     Caused by: java.net.SocketTimeoutException: Read timed out
                 At java.net.SocketInputStream.socketRead0(Native Method)
                 At java.net.SocketInputStream.read(SocketInputStream.java:146)
                 At java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
                 At java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
                 At java.io.BufferedInputStream.read(BufferedInputStream.java:334)
                 At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642)
                 At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590)
                 At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248)
                 At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132)
Comment by Robin Shen [ 20/May/15 09:56 PM ]
Most probably this is caused by agent overload or network connection instability. You may have QB to retry build I this case by scripting post build field I advanced setting of the configuration to check build error message and resubmit the build if it contains patterns such as "timed out". A pseudo script can be something like below:

groovy:
If (build.errorMessage.contains("timed out")) {
  def newRequest = new BuildRequest();
  newRequest.configurationId = request.configurationId;
  // continue to populate the new request with other properties as necessary
  BuildEngine.instance.requestBuild(newRequest);
}
Comment by Phong Trinh [ 08/Jun/15 08:58 PM ]
 I tried this, but it couldn't invoke the method, contains()
groovy:
 import com.pmease.quickbuild.*
 If (build.errorMessage.contains("timed out")) {
   def newRequest = new BuildRequest();
   newRequest.configurationId = request.configurationId;
   // continue to populate the new request with other properties as necessary
   BuildEngine.instance.requestBuild(newRequest);
 }
Comment by Robin Shen [ 08/Jun/15 11:31 PM ]
What error it is giving?
Comment by Phong Trinh [ 09/Jun/15 01:57 PM ]
 Hi Robin,
 
The error is as the follow:
 22:29:58,584 ERROR - Step 'master>script' is failed: java.lang.NullPointerException: Cannot invoke method contains() on null object
    at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
    at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
    at script1433802598566941514341.run(script1433802598566941514341.groovy:10)
    at com.pmease.quickbuild.plugin.basis.BasisPlugin$28.evaluate(BasisPlugin.java:348)
    at com.pmease.quickbuild.DefaultScriptEngine.evaluate(DefaultScriptEngine.java:81)
    at com.pmease.quickbuild.plugin.basis.ScriptStep.run(ScriptStep.java:47)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.CGLIB$run$0(<generated>)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1$$FastClassByCGLIB$$70430117.invoke(<generated>)
    at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:215)
    at com.pmease.quickbuild.DefaultScriptEngine$Interpolator.intercept(DefaultScriptEngine.java:273)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.run(<generated>)
    at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539)
    at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30)
    at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45)
    at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60)
    at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
 
Comment by Robin Shen [ 09/Jun/15 11:31 PM ]
Looks like the error message is contained in a child step. Try improving the script as below:

groovy:
 import com.pmease.quickbuild.*;

for (step in build.steps) {
  if (step.errorMessage != null && step.errorMessage.contains("timed out")) { // please check the actual error message to replace "timed out" with the real error message indicating the timed out
   def newRequest = new BuildRequest();
   newRequest.configurationId = request.configurationId;
   // continue to populate the new request with other properties as necessary
   BuildEngine.instance.requestBuild(newRequest);
  }
}

Comment by Phong Trinh [ 11/Jun/15 03:55 AM ]
Thank you for the script. It seems to me that the script retries the configuration, so it re-runs all of the steps in the it. I am looking retrying just a step in the configuration which is failed.
Comment by Robin Shen [ 11/Jun/15 10:28 PM ]
QB itself does not track which step to start with, however you may script your step condition to check step status of previously failed build to see which step has to rerun.
Comment by AlSt [ 23/May/16 08:12 AM ]
Hi Robin.
Is it possible to retry the hessian connection and not rerun the whole build. E.g. when the timeout occurs in a publish step and the build takes 2 hours rerunning the whole build is not the best option.
In general more retries would be nice because we can get rid of a lot of failed builds.
Thanks,
Alex
Comment by Robin Shen [ 24/May/16 12:28 AM ]
To retry a certain step, try this approach:
http://forum.pmease.com/viewtopic.php?f=5&t=4022

Comment by Phong Trinh [ 26/May/16 01:41 PM ]
I tried this approach, and it works. However, it may have an issue. The issue was that the first execution was failed, and the second retry was successful, but QB still reported that the sibling step was failed, so the next step in the configuration was skipped.
Comment by Robin Shen [ 27/May/16 12:27 AM ]
In this case you may enclose the repeat step inside a sequential step and set success condition of the container step to be "if any of the child step succeeds".
Comment by Phong Trinh [ 27/May/16 02:47 AM ]
Thank you for promptly response. May you give me an example on how to do that?
Comment by Robin Shen [ 27/May/16 11:39 PM ]
For instance, if you want to retry on step "compile", and you can add a container step of type "sequential" and then put the compile step as the only children into the container step. Then specify success condition of the container step as "success if any child step succeeds". That way as long as one retry succeeds, the container itself will be successful. Of course you need to add the container step into your current step workflow.
Comment by Phong Trinh [ 29/May/16 06:55 PM ]
Thanks Robin. It works for me.
Generated at Sat May 04 00:57:40 UTC 2024 using JIRA 189.