<< Back to previous view |
[QB-2708] Builds are cancelled randomly
|
|
Status: | Closed |
Project: | QuickBuild |
Component/s: | None |
Affects Version/s: | 6.1.0 |
Fix Version/s: | None |
Type: | Bug | Priority: | Major |
Reporter: | Erki Männiste | Assigned To: | Robin Shen |
Resolution: | Cannot Reproduce | Votes: | 0 |
Remaining Estimate: | Unknown | Time Spent: | Unknown |
Original Estimate: | Unknown | ||
Environment: |
Linux 3.2.0-4-amd64
Java HotSpot(TM) 64-Bit Server VM 1.8.0_45 |
Description |
We are lately seeing a lot of our builds getting CANCELLED. We don't have a 100% repro, but it happens quite often and usually under heavy link/compilation load.
Build log says: 21:00:23,539 ERROR - Step 'master>Compile' is failed. java.lang.RuntimeException: java.lang.InterruptedException: Composite step 'Compile' is cancelled. at com.pmease.quickbuild.stepsupport.CompositeStep.run(CompositeStep.java:121) at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539) at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30) at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45) at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60) at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.InterruptedException: Composite step 'Group Compile' is cancelled. ... 11 more I don't find anything with corresponding timestamps in agent log. We have investigated possible causes in our build infra, but didn't find anything. Do you have any ideas what might cause this? Not sure if that's relevant, but build agents do log following errors every now and then: 2016-05-02 20:57:24,024 [Thread-14] ERROR com.pmease.quickbuild.Quickbuild - Error connecting server. com.caucho.hessian.client.HessianRuntimeException: java.net.SocketException: Software caused connection abort: recv failed at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285) at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171) at com.sun.proxy.$Proxy19.connect(Unknown Source) at com.pmease.quickbuild.grid.AgentConnectivityTask.run(AgentConnectivityTask.java:51) at java.lang.Thread.run(Unknown Source) Caused by: java.net.SocketException: Software caused connection abort: recv failed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at sun.security.ssl.InputRecord.readFully(Unknown Source) at sun.security.ssl.InputRecord.read(Unknown Source) at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source) at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:101) at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283) ... 4 more We didn't see such behavior in 5.1.30. |
Comments |
Comment by Erki Männiste [ 03/May/16 09:21 AM ] |
As a clarification, these SocketExceptions are not always present in agent log when build is cancelled. |
Comment by Robin Shen [ 03/May/16 11:50 PM ] |
The server might be overloaded. You may consider to appoint some agents as artifact server via artifact storage setting in advanced setting of relevant configurations to reduce i/o load of server to see if situation is better. |
Comment by Robin Shen [ 01/Dec/16 08:22 AM ] |
Reopen it if there is more clue |