History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-2708
Type: Bug Bug
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Major Major
Assignee: Robin Shen
Reporter: Erki Männiste
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Builds are cancelled randomly

Created: 03/May/16 09:19 AM   Updated: 01/Dec/16 08:22 AM
Component/s: None
Affects Version/s: 6.1.0
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
Environment:
Linux 3.2.0-4-amd64
Java HotSpot(TM) 64-Bit Server VM 1.8.0_45


 Description  « Hide
We are lately seeing a lot of our builds getting CANCELLED. We don't have a 100% repro, but it happens quite often and usually under heavy link/compilation load.
Build log says:

21:00:23,539 ERROR - Step 'master>Compile' is failed.
    java.lang.RuntimeException: java.lang.InterruptedException: Composite step 'Compile' is cancelled.
        at com.pmease.quickbuild.stepsupport.CompositeStep.run(CompositeStep.java:121)
        at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539)
        at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30)
        at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45)
        at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60)
        at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
    Caused by: java.lang.InterruptedException: Composite step 'Group Compile' is cancelled.
        ... 11 more

I don't find anything with corresponding timestamps in agent log. We have investigated possible causes in our build infra, but didn't find anything. Do you have any ideas what might cause this?

Not sure if that's relevant, but build agents do log following errors every now and then:

2016-05-02 20:57:24,024 [Thread-14] ERROR com.pmease.quickbuild.Quickbuild - Error connecting server.
    com.caucho.hessian.client.HessianRuntimeException: java.net.SocketException: Software caused connection abort: recv failed
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285)
        at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
        at com.sun.proxy.$Proxy19.connect(Unknown Source)
        at com.pmease.quickbuild.grid.AgentConnectivityTask.run(AgentConnectivityTask.java:51)
        at java.lang.Thread.run(Unknown Source)
    Caused by: java.net.SocketException: Software caused connection abort: recv failed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at sun.security.ssl.InputRecord.readFully(Unknown Source)
        at sun.security.ssl.InputRecord.read(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:101)
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283)
        ... 4 more

We didn't see such behavior in 5.1.30.




 All   Comments   Work Log   Change History      Sort Order:
Erki Männiste [03/May/16 09:21 AM]
As a clarification, these SocketExceptions are not always present in agent log when build is cancelled.

Robin Shen [03/May/16 11:50 PM]
The server might be overloaded. You may consider to appoint some agents as artifact server via artifact storage setting in advanced setting of relevant configurations to reduce i/o load of server to see if situation is better.

Robin Shen [01/Dec/16 08:22 AM]
Reopen it if there is more clue