[#QB-2109] Quickbuild checkout sometimes does not retrieve all files

QuickBuild

Quickbuild checkout sometimes does not retrieve all files

Created: 03/Jul/14 01:36 PM Updated: 02/Oct/14 12:25 AM

Return to search

Component/s:

None

Affects Version/s:

5.1.27

Fix Version/s:

None

Original Estimate:

Unknown

Remaining Estimate:

Unknown

Time Spent:

Unknown

File Attachments:

None

Image Attachments:

1. code.jpg
(121 kb)

Description

« Hide

During a QB checkout step no Exception occurs but we have some code in the post execute action to chmod the file on linux to be executable.

The checkout step initially does not fail, but the file we want to work with in post execute does not exist so the step fails. The file is published in the build we're checking out and also this does not happen every time. So the problem seems to be during the retrieval from the server, but I have no idea why, because there is no exception.

It would be nice to have a check if all files were retrieved and also retry the download (which is configurable)

All

Comments

Work Log

Change History

Sort Order:

[ Permlink | « Hide ]

Robin Shen [03/Jul/14 11:53 PM]

Right now the artifacts specified in QuickBuild repository definition is actually treated as a pattern. So whether it is defined as "somefile" or "somefile*", QB simply matches provided path against all build artifacts to pick up all matched files. And no matching files are also treated as a valid result of the "matching" procedure. This is the reason why the checkout step itself does not fail in this case.
As to the file not being retrieved issue itself, could it be possible that the file does not exist yet at the time of retrieval? This can happen if some other builds/processes are touching artifacts of the source build why they are being retrieved.

[ Permlink | « Hide ]

AlSt [04/Jul/14 08:17 AM]

Yes I'm aware of this pattern behavior, but the files are published at the other build, so they exist on the server, so they should be matched by the pattern (as it does in other builds of the same configuration).

And if I understand that right: When two builds checkout the same artifacts one (or maybe both) will fail to checkout the files?

[ Permlink | « Hide ]

Robin Shen [04/Jul/14 11:14 AM]

It will work even if multiple builds are checking out artifacts. I mean there will be problem if some other builds are writing artifacts to this build.

Is this problem happening intermittently or consistently?

[ Permlink | « Hide ]

AlSt [30/Jul/14 01:40 PM]

It happens not consistently but is still annoying.

Also if we do not have this post execute action the checkout step does not fail and the build is doomed to fail, because the files are missing in the steps which are executed afterwards. So sometimes we just hold the resource/machine although we already would be able to know that these steps are not necessary anymore (because they will fail as not all files are there).

[ Permlink | « Hide ]

Robin Shen [31/Jul/14 12:18 AM]

Can you please send me [robin AT pmease DOT com] your database backup and let me know which configuration is in trouble? And which file is missing?

[ Permlink | « Hide ]

AlSt [07/Aug/14 02:39 PM]

As our DB dump is too big we did some further investigation with application monitoring. We have 8 file retrieval patterns in our QB repository, but only the first pattern is used. All other file retrievals in the list are not used (because some error occurs or something).

I took a deeper look at the code where this artifact download is made and discovered some potential problems (also in respect of performance)

As I understood the highlighted code does a call to the server on every loop iteration. So it fetches the build object and also fetches the artifact storage more than once if there are more than one FileRetrieval is defined.

My suggestion would be to refactor this method to only fetch the dependencyBuild once and also the artifact storage object. This would need to have two loops. One for remote server and one for the "own" server and do the IF directly after method entry:

downloadDependencies([...]) {
  if(getServer() != null) {
    for([...]) {
      [...]
    }
  } else {
    Build dependencyBuild = [...]
    ArtifactStorage artifactStorage = [...]
    for([...]) {
      [...]
    }
  }
}

Also some more log output would be much appreciated to see what FileRetrieval is currently in progress.

We may then be able to investigate further on the problems and why only

[ Permlink | « Hide ]

AlSt [07/Aug/14 02:50 PM]

(somehow the rest of my comment got lost) ... and why only the first FileRetrieval works and the rest is not touched. That explains why the files are not there although the files are published as artifacts in the dependency build.

[ Permlink | « Hide ]

Robin Shen [08/Aug/14 01:44 AM]

Thanks for the insight. This snippet code has been changed according to your suggestions, and debug code for retrieving files is also added. Changes are available in:
https://build.pmease.com/build/2700

To check the debug output, edit configuration to use debug logging mode, and then run the build. Debugging statement of file retrieving will then appear in build log.

[ Permlink | « Hide ]

AlSt [09/Sep/14 12:54 PM]

This problem also occurs on 5.1.32

The problem here is if we take a look in our software monitoring tool we see that there was a SocketException (Connection reset by peer: socket write error) in FileUtils.tar() on server side, but no exception on client side (which is the buildagent). So the step does not fail, but (because of the SocketException) not all files are available.

Another thing is that we get a IOException with the message "This archives contains unclosed entries." on line 816 in FileUtils.tar()

[ Permlink | « Hide ]

Robin Shen [10/Sep/14 12:23 AM]

From past experience of supporting QB, the connection reset issue is normally caused by server overloading. Is this happening when the system is busy?

[ Permlink | « Hide ]

AlSt [10/Sep/14 07:12 AM]

I think that is the case as the system is always busy. It is just weird that the build does not fail on checkout and fails afterwards, because on a connection reset there should be any exception or whatever or am I wrong?

And the connection reset is (as far as I understood) on the agent side and the server complains with "Connection reset by peer"

[ Permlink | « Hide ]

Robin Shen [11/Sep/14 12:49 AM]

Not sure why agent node is reporting this. My experiment here is that this error will be reported if I disconnect agent while transferring artifacts. Is this issue happens always for a specific configuration, on specific agent, or randomly across different configurations and agents?

[ Permlink | « Hide ]

AlSt [01/Oct/14 11:21 AM]

It seems that this happens for different configurations and agents. But the checkout is the same. At least we know of this checkout because we make some changes to the files right after the checkout in post execute action.

The server complains about a connection loss and the agent just thinks the checkout has finished successfully. It is somehow weird and we were not able to reproduce it on purpose.

[ Permlink | « Hide ]

Robin Shen [02/Oct/14 12:25 AM]

I'd suggest to put artifacts on some agent instead of server, to reduce server load, to see if the problem still arises. Artifacts can be configured to store on agent via advanced setting of a configuration.