|
|
|
Yes I'm aware of this pattern behavior, but the files are published at the other build, so they exist on the server, so they should be matched by the pattern (as it does in other builds of the same configuration).
And if I understand that right: When two builds checkout the same artifacts one (or maybe both) will fail to checkout the files? It will work even if multiple builds are checking out artifacts. I mean there will be problem if some other builds are writing artifacts to this build.
Is this problem happening intermittently or consistently? It happens not consistently but is still annoying.
Also if we do not have this post execute action the checkout step does not fail and the build is doomed to fail, because the files are missing in the steps which are executed afterwards. So sometimes we just hold the resource/machine although we already would be able to know that these steps are not necessary anymore (because they will fail as not all files are there). Can you please send me [robin AT pmease DOT com] your database backup and let me know which configuration is in trouble? And which file is missing?
As our DB dump is too big we did some further investigation with application monitoring. We have 8 file retrieval patterns in our QB repository, but only the first pattern is used. All other file retrievals in the list are not used (because some error occurs or something).
I took a deeper look at the code where this artifact download is made and discovered some potential problems (also in respect of performance) As I understood the highlighted code does a call to the server on every loop iteration. So it fetches the build object and also fetches the artifact storage more than once if there are more than one FileRetrieval is defined. My suggestion would be to refactor this method to only fetch the dependencyBuild once and also the artifact storage object. This would need to have two loops. One for remote server and one for the "own" server and do the IF directly after method entry: downloadDependencies([...]) { if(getServer() != null) { for([...]) { [...] } } else { Build dependencyBuild = [...] ArtifactStorage artifactStorage = [...] for([...]) { [...] } } } Also some more log output would be much appreciated to see what FileRetrieval is currently in progress. We may then be able to investigate further on the problems and why only Thanks for the insight. This snippet code has been changed according to your suggestions, and debug code for retrieving files is also added. Changes are available in:
https://build.pmease.com/build/2700 To check the debug output, edit configuration to use debug logging mode, and then run the build. Debugging statement of file retrieving will then appear in build log. This problem also occurs on 5.1.32
The problem here is if we take a look in our software monitoring tool we see that there was a SocketException (Connection reset by peer: socket write error) in FileUtils.tar() on server side, but no exception on client side (which is the buildagent). So the step does not fail, but (because of the SocketException) not all files are available. Another thing is that we get a IOException with the message "This archives contains unclosed entries." on line 816 in FileUtils.tar() From past experience of supporting QB, the connection reset issue is normally caused by server overloading. Is this happening when the system is busy?
I think that is the case as the system is always busy. It is just weird that the build does not fail on checkout and fails afterwards, because on a connection reset there should be any exception or whatever or am I wrong?
And the connection reset is (as far as I understood) on the agent side and the server complains with "Connection reset by peer" Not sure why agent node is reporting this. My experiment here is that this error will be reported if I disconnect agent while transferring artifacts. Is this issue happens always for a specific configuration, on specific agent, or randomly across different configurations and agents?
It seems that this happens for different configurations and agents. But the checkout is the same. At least we know of this checkout because we make some changes to the files right after the checkout in post execute action.
The server complains about a connection loss and the agent just thinks the checkout has finished successfully. It is somehow weird and we were not able to reproduce it on purpose. I'd suggest to put artifacts on some agent instead of server, to reduce server load, to see if the problem still arises. Artifacts can be configured to store on agent via advanced setting of a configuration.
|
As to the file not being retrieved issue itself, could it be possible that the file does not exist yet at the time of retrieval? This can happen if some other builds/processes are touching artifacts of the source build why they are being retrieved.