History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-2446
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Major Major
Assignee: Robin Shen
Reporter: Phong Trinh
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Expereinces Socket Read Timeout

Created: 20/May/15 05:27 PM   Updated: 29/May/16 06:55 PM
Component/s: None
Affects Version/s: 6.0.10
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
Environment: QuickBuild on Unix platform


 Description  « Hide
 Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide retry capacity when this issue happens?

07:53:46,032 ERROR - Build is failed.
    Java.lang.RuntimeException: Error executing step execution job.
                At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29)
                At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19)
                At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102)
                At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526)
                At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394)
                At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139)
                At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102)
                At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                At java.lang.Thread.run(Thread.java:701)
    Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job.
                At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49)
                At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84)
                ... 7 more
    Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165)
                At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300)
                At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
                At com.sun.proxy.$Proxy70.testGridJob(Unknown Source)
                At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41)
                ... 8 more
    Caused by: java.net.SocketTimeoutException: Read timed out
                At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
                At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                At java.lang.reflect.Constructor.newInstance(Constructor.java:534)
                At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547)
                At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543)
                At java.security.AccessController.doPrivileged(Native Method)
                At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541)
                At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192)
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145)
                ... 12 more
    Caused by: java.net.SocketTimeoutException: Read timed out
                At java.net.SocketInputStream.socketRead0(Native Method)
                At java.net.SocketInputStream.read(SocketInputStream.java:146)
                At java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
                At java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
                At java.io.BufferedInputStream.read(BufferedInputStream.java:334)
                At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642)
                At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590)
                At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248)
                At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
                At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132)
                ... 12 more


 All   Comments   Work Log   Change History      Sort Order:
Phong Trinh [20/May/15 06:53 PM]
 Corrected my typo:
----------------------------------------------------------------------------------

Our builds experience socket read timeout, and we get several incidents per week, so we lose nightly builds, since QB cancels all of the builds (which are assigned to the node) in the queue. I see a configuration for connection timeout, but read timeout. Can you provide QB retry capability when this issue happens?

 07:53:46,032 ERROR - Build is failed.
     Java.lang.RuntimeException: Error executing step execution job.
                 At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:29)
                 At com.pmease.quickbuild.stepsupport.StepExecutionTask.reduce(StepExecutionTask.java:19)
                 At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:102)
                 At com.pmease.quickbuild.DefaultBuildEngine.run(DefaultBuildEngine.java:526)
                 At com.pmease.quickbuild.DefaultBuildEngine.process(DefaultBuildEngine.java:394)
                 At com.pmease.quickbuild.DefaultBuildEngine.access$000(DefaultBuildEngine.java:139)
                 At com.pmease.quickbuild.DefaultBuildEngine$2.run(DefaultBuildEngine.java:1102)
                 At java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                 At java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                 At java.lang.Thread.run(Thread.java:701)
     Caused by: com.pmease.quickbuild.QuickbuildException: Error testing job.
                 At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:49)
                 At com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:84)
                 ... 7 more
     Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketTimeoutException: Read timed out
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165)
                 At com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300)
                 At com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
                 At com.sun.proxy.$Proxy70.testGridJob(Unknown Source)
                 At com.pmease.quickbuild.grid.GridTaskFuture.testJobs(GridTaskFuture.java:41)
                 ... 8 more
     Caused by: java.net.SocketTimeoutException: Read timed out
                 At sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                 At sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
                 At sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                 At java.lang.reflect.Constructor.newInstance(Constructor.java:534)
                 At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1547)
                 At sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1543)
                 At java.security.AccessController.doPrivileged(Native Method)
                 At sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1541)
                 At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1192)
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145)
                 ... 12 more
     Caused by: java.net.SocketTimeoutException: Read timed out
                 At java.net.SocketInputStream.socketRead0(Native Method)
                 At java.net.SocketInputStream.read(SocketInputStream.java:146)
                 At java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
                 At java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
                 At java.io.BufferedInputStream.read(BufferedInputStream.java:334)
                 At sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:642)
                 At sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:590)
                 At sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1248)
                 At java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
                 At com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132)

Robin Shen [20/May/15 09:56 PM]
Most probably this is caused by agent overload or network connection instability. You may have QB to retry build I this case by scripting post build field I advanced setting of the configuration to check build error message and resubmit the build if it contains patterns such as "timed out". A pseudo script can be something like below:

groovy:
If (build.errorMessage.contains("timed out")) {
  def newRequest = new BuildRequest();
  newRequest.configurationId = request.configurationId;
  // continue to populate the new request with other properties as necessary
  BuildEngine.instance.requestBuild(newRequest);
}

Phong Trinh [08/Jun/15 08:58 PM]
 I tried this, but it couldn't invoke the method, contains()
groovy:
 import com.pmease.quickbuild.*
 If (build.errorMessage.contains("timed out")) {
   def newRequest = new BuildRequest();
   newRequest.configurationId = request.configurationId;
   // continue to populate the new request with other properties as necessary
   BuildEngine.instance.requestBuild(newRequest);
 }

Robin Shen [08/Jun/15 11:31 PM]
What error it is giving?

Phong Trinh [09/Jun/15 01:57 PM]
 Hi Robin,
 
The error is as the follow:
 22:29:58,584 ERROR - Step 'master>script' is failed: java.lang.NullPointerException: Cannot invoke method contains() on null object
    at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
    at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
    at script1433802598566941514341.run(script1433802598566941514341.groovy:10)
    at com.pmease.quickbuild.plugin.basis.BasisPlugin$28.evaluate(BasisPlugin.java:348)
    at com.pmease.quickbuild.DefaultScriptEngine.evaluate(DefaultScriptEngine.java:81)
    at com.pmease.quickbuild.plugin.basis.ScriptStep.run(ScriptStep.java:47)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.CGLIB$run$0(<generated>)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1$$FastClassByCGLIB$$70430117.invoke(<generated>)
    at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:215)
    at com.pmease.quickbuild.DefaultScriptEngine$Interpolator.intercept(DefaultScriptEngine.java:273)
    at com.pmease.quickbuild.plugin.basis.ScriptStep$$EnhancerByCGLIB$$456f74c1.run(<generated>)
    at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539)
    at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30)
    at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45)
    at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60)
    at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
 

Robin Shen [09/Jun/15 11:31 PM]
Looks like the error message is contained in a child step. Try improving the script as below:

groovy:
 import com.pmease.quickbuild.*;

for (step in build.steps) {
  if (step.errorMessage != null && step.errorMessage.contains("timed out")) { // please check the actual error message to replace "timed out" with the real error message indicating the timed out
   def newRequest = new BuildRequest();
   newRequest.configurationId = request.configurationId;
   // continue to populate the new request with other properties as necessary
   BuildEngine.instance.requestBuild(newRequest);
  }
}


Phong Trinh [11/Jun/15 03:55 AM]
Thank you for the script. It seems to me that the script retries the configuration, so it re-runs all of the steps in the it. I am looking retrying just a step in the configuration which is failed.

Robin Shen [11/Jun/15 10:28 PM]
QB itself does not track which step to start with, however you may script your step condition to check step status of previously failed build to see which step has to rerun.

AlSt [23/May/16 08:12 AM]
Hi Robin.
Is it possible to retry the hessian connection and not rerun the whole build. E.g. when the timeout occurs in a publish step and the build takes 2 hours rerunning the whole build is not the best option.
In general more retries would be nice because we can get rid of a lot of failed builds.
Thanks,
Alex

Robin Shen [24/May/16 12:28 AM]
To retry a certain step, try this approach:
http://forum.pmease.com/viewtopic.php?f=5&t=4022


Phong Trinh [26/May/16 01:41 PM]
I tried this approach, and it works. However, it may have an issue. The issue was that the first execution was failed, and the second retry was successful, but QB still reported that the sibling step was failed, so the next step in the configuration was skipped.

Robin Shen [27/May/16 12:27 AM]
In this case you may enclose the repeat step inside a sequential step and set success condition of the container step to be "if any of the child step succeeds".

Phong Trinh [27/May/16 02:47 AM]
Thank you for promptly response. May you give me an example on how to do that?

Robin Shen [27/May/16 11:39 PM]
For instance, if you want to retry on step "compile", and you can add a container step of type "sequential" and then put the compile step as the only children into the container step. Then specify success condition of the container step as "success if any child step succeeds". That way as long as one retry succeeds, the container itself will be successful. Of course you need to add the container step into your current step workflow.

Phong Trinh [29/May/16 06:55 PM]
Thanks Robin. It works for me.