History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3089
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Critical Critical
Assignee: Robin Shen
Reporter: Piotr Jedrys
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Publish artifacts fails with java.io.IOException

Created: 01/Dec/17 11:37 AM   Updated: 14/Jan/18 11:35 PM
Component/s: None
Affects Version/s: 7.0.28
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
Environment: Linux


 Description  « Hide
12:24:05,427 ERROR - Step 'master>........>publish - artifacts' is failed.
    java.lang.RuntimeException: Error executing node job execute job.
        at com.pmease.quickbuild.grid.NodeJobExecuteTask.reduce(NodeJobExecuteTask.java:32)
        at com.pmease.quickbuild.grid.NodeJobExecuteTask.reduce(NodeJobExecuteTask.java:10)
        at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:123)
        at com.pmease.quickbuild.grid.GridTaskFuture.get(GridTaskFuture.java:127)
        at com.pmease.quickbuild.grid.GridNode.executeJob(GridNode.java:491)
        at com.pmease.quickbuild.grid.GridImpl.transferFiles(GridImpl.java:171)
        at com.pmease.quickbuild.model.Build.publish(Build.java:1158)
        at com.pmease.quickbuild.plugin.artifact.ArtifactPublishStep.run(ArtifactPublishStep.java:129)
        at com.pmease.quickbuild.plugin.artifact.ArtifactPublishStep$$EnhancerByCGLIB$$9cd97b49.CGLIB$run$6(<generated>)
        at com.pmease.quickbuild.plugin.artifact.ArtifactPublishStep$$EnhancerByCGLIB$$9cd97b49$$FastClassByCGLIB$$9dcb1444.invoke(<generated>)
        at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:215)
        at com.pmease.quickbuild.DefaultScriptEngine$Interpolator.intercept(DefaultScriptEngine.java:261)
        at com.pmease.quickbuild.plugin.artifact.ArtifactPublishStep$$EnhancerByCGLIB$$9cd97b49.run(<generated>)
        at com.pmease.quickbuild.stepsupport.Step.doExecute(Step.java:619)
        at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:521)
        at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:31)
        at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45)
        at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60)
        at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:107)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.RuntimeException: java.io.IOException: missing CR
        at com.pmease.quickbuild.util.FileUtils.untar(FileUtils.java:891)
        at com.pmease.quickbuild.grid.CopyFilesJob.execute(CopyFilesJob.java:121)
        at com.pmease.quickbuild.grid.NodeJobExecuteJob.execute(NodeJobExecuteJob.java:25)
        at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:107)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.io.IOException: missing CR
        at sun.net.www.http.ChunkedInputStream.processRaw(ChunkedInputStream.java:405)
        at sun.net.www.http.ChunkedInputStream.readAheadNonBlocking(ChunkedInputStream.java:520)
        at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:611)
        at sun.net.www.http.ChunkedInputStream.available(ChunkedInputStream.java:725)
        at java.io.FilterInputStream.available(FilterInputStream.java:168)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at org.apache.commons.compress.archivers.tar.TarBuffer.readBlock(TarBuffer.java:224)
        at org.apache.commons.compress.archivers.tar.TarBuffer.readRecord(TarBuffer.java:195)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getRecord(TarArchiveInputStream.java:302)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:230)
        at com.pmease.quickbuild.util.FileUtils.untar(FileUtils.java:863)
        ... 8 more

Sometimes it is premature EOF, sometimes missing CR.

 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [01/Dec/17 11:38 PM]
Is this happening all the time, or occasionally? If occasionally, is the server under heavy load?

Piotr Jedrys [04/Dec/17 09:50 AM]
It is happening occasionally. Yes after moving to QB7.0 we can see that the heap memory can be sometimes up to 90%, although the load on the server has not changed between QB6 and QB7

Piotr Jedrys [04/Dec/17 10:36 AM]
btw. What is the suggested amount of heap memory for QB 7.0?

Robin Shen [05/Dec/17 12:08 AM]
4G should be enough for most cases. Do you have many builds publishing artifacts/retrieving artifacts while this happens? How is the I/O load of the server.

Piotr Jedrys [05/Dec/17 09:40 AM]
Well, we have 16GB and it varies from 40% to 95%

As for the builds, at that time there were 14 builds requests on master, all in all we have many configurations (>100), most of them are publishing.
There is a peak in Upload (Fifteen-minute Rate: 24.12) and Download Visits (Fifteen-minute Rate: 5.83 )

As for I/O, I do not see any strange peaks on our external measurements system, but maybe QB has some internal measurements for it?

Robin Shen [06/Dec/17 12:32 AM]
16G mem should be more than enough. QB does not have internal measurement for I/O. I'd suggest to reduce the concurrent jobs doing file publishing to see if the issue still exist.

Piotr Jedrys [06/Dec/17 01:56 AM]
I believe limiting is not an option for us.
Moreover our publish steps are already distributed across nodes, we no longer store all the data on server.

Robin Shen [07/Dec/17 12:29 AM]
So you are storing artifacts on agents based on this document?
http://wiki.pmease.com/display/QB70/Store+Build+Artifacts+on+Agent

I suggest to limit the concurrency temporarily at least to see if this is the cause. If yes, you may set up more agents as artifact storage.

Piotr Jedrys [07/Dec/17 10:40 PM]
Yes we are storing the artifacts on agents like defined here http://wiki.pmease.com/display/QB70/Store+Build+Artifacts+on+Agent
All of our agents are set up as artifact storage.

How do you suggest to limit the concurrency on a production environment? Any Wiki how to do it?

Robin Shen [08/Dec/17 12:48 AM]
You may define a grid resource on QB server with desired number. Then add an intermediate sequential step between your existing publish step and its parent and make the intermediate step require that resource. Note that if your publish step's node selection rule is "On the same node running parent step" (which is default), you should change it to script selection rule with below script:
node.address == parent.parent.node.address

Robin Shen [14/Jan/18 11:35 PM]
Please also check if there any anti-virus software running interfering with the communication between QB nodes. We have other customers encountering the same situation, and it turns out to be anti-virus software. Please reopen it if you believe it is still an issue of QB.