|
|
|
Can you please upgrade to below build of your prod server?
https://build.pmease.com/build/2983/overview I simplied the multi-file transferring to use exactly the same mechanism as single file transferring (except for additional tar/untar), previous there is some logic to send bits to keep alive while tar/untar is ongoing. Ok. I tried with a separate server instance to be able to track everything better because with 100+ agent connected the debug log spams a lot. So I tried the copy job from a newly set up server on a windows server 2012 R2 machine.
Copy task (without TAR_TO_DISK) suddenly took 10 seconds. Switched the node back to the production qb server -> again 45 minutes. This drove me nuts. So I tried the copy task from the production qb server to a ubuntu12 node -> 12 seconds. Then I started modifying the groovy script like this: groovy: def source = grid.getNode('<node>:8811') grid.transferFiles(source, '/tmp/multifile', 'aixtest.tar', node, '/tmp/multifile', false, null, null); I tried the script with different source nodes: Node running Windows Server 2003 -> ~45 minutes Server running Windows Server 2008 -> ~45 minutes Node running Windows Server 2012 R2 -> ~10 seconds Node running ubuntu 12 -> ~10 seconds We also tested setting the delayed ACK in aix to disabled and suddenly it only takes about 2 minutes from the server running Win Server 2008. So this seems to be a weird communication issue between windows and aix. Also setting the delayed ACK to disabled is not recommended because it produces heavy network load. Maybe you know more or have some suggestions. The interesting thing is that only the multi file transfer using the TAR functionality is that slow. I overlooked that message. Using this option on CopyFilesJob side is enough. Please keep this option on on file receiving side and edit the file "conf/log4j.properties" both at file receiving and sending side to add below log statement:
log4j.logger.com.pmease.quickbuild.grid=DEBUG And then wait a minute to have this file taking effect. Then run the transferFiles script. QB will log file transferring timing in system log of both file receiving and sending side, something like below: at file receiving side: Downloading files ... Finished receiving files... Finished extracting files... at file sending side: Serving files... Finished archiving files... Finished sending files... Please post exact log statements of these files so that I can check the timing Does this really have any effect on the server when I set this system property? I took a look at the CopyFilesJob and the option seems only be used on receiver side. And as I already said in the issue description:
* checkout on agent with TAR_TO_DISK system property set to true: 15 minutes So this option speeds up the transfer about 75%. I can do the test you've suggested. But setting the system property on server side is a bit difficult. I would have to start a separate server for that because restarting the server is not that easy. There are currently 130 builds running. So basically by main question here is if the setting is really needed on server side. This seems odd. For below script, can you please turn on option TAR_TO_DISK (define as system property via JVM -D in conf/wrapper.conf) on both server node and the transfer target node, and test below script to see how it performs?
grid.transferFiles(grid.serverNode, 'C:/aixfull', 'aixtest.tar', node, '/tmp/multifile', false, null, null); Now I also tested the slow checkout again... around 40 minutes for the checkout. But I had the idea testing with the server option in the quickbuild repository:
so i checked the checkbox for "Server" and added the same qb server used as before just to force QB using the code for "remote server". Checkout took about 5 minutes. I ran the script on the aix node and it took about 20 seconds.
I also suspected that maybe it is not the tar. So I tested it with tar and untar with this script: groovy: def inStream = new java.io.PipedInputStream() def outStream = new java.io.PipedOutputStream(inStream) def t1 = Thread.start { com.pmease.quickbuild.util.FileUtils.tar(new File("/tmp/multifile"), "aixtest.tar", false, outStream ) outStream.close() } def t2 = Thread.start { com.pmease.quickbuild.util.FileUtils.untar(inStream , false, new File("/tmp/destdir")) inStream.close() } t1.join() t2.join() also about 20 seconds. So the problems seems to be multi file transfer over the network with this tar stuff. I will test the tar functionality on the quickbuild server to see how it behaves there. When transferFiles is called, QB has to tar all matched files (even if only one file matches), and thanks for your detailed test. Looks like the tar utility used by QB is causing the issue. Please run below test on server to verify it is indeed the case:
groovy: OutputStream os = new FileOutputStream("/temp/multifile"); com.pmease.quickbuild.util.FileUtils.tar(new File("C:/aixfull"), "aixtest.tar", false, os); os.close(); If yes, please turn off any anti-virus software or change the "/temp/multifile" to a different driver to see if it can speed up. I did some more testing. What I did:
1. Use grid.transferFiles to transfer the artifacts from server to aixnode like they are published at the dependency build. 2. Use grid.transferFiles to transfer the same artifacts but prepackaged them into 5 tar files with approx. the same size (also adding up to about 500 MB) Both times the transfer took around 45 minutes Second test: 1. Tar all files into one. 2. Use the following script: groovy: logger.info('transferFile') grid.transferFile(grid.serverNode, 'C:/aixfull/aixtest.tar', node, '/tmp/singlefile', null, null); logger.info('transferFiles') grid.transferFiles(grid.serverNode, 'C:/aixfull', 'aixtest.tar', node, '/tmp/multifile', false, null, null); Transferring the file via transferFile takes about 10 seconds. Transferring the file via transferFiles takes again around 45 minutes. So the file scanning seems not to be the problem. You can also see the transfer rate in the attached screenshot (networkperformance.png). The red marked range is the single file transfer. After that the transferFiles method got started. As you can see the rate is about 150 MBit/s on transferFile and about 2 MBit/s on transferFiles. I could also do the tests the other way round. Tranferring from the aix node to server if needed. I just did the tests for now in the direction server -> aix.
Thanks for doing the test. Looks like file transferring itself is not the problem. Can you please modify your script a bit to make it scan/tar and transfer the slow downloading artifacts on AIX?
def aixNode = grid.getNode('aixnode:8811') grid.transferFiles(aixNode, "/absolute/path/to/artifacts", "**", node, "C:/temp", false, null, null); If it is slow, then limit "/absolute/path/to/artifacts" to a small folder containing only several large files to see if it performs better. I tried two scenarios. Filesize always about 500 MB (zip file).
Running build on server with script: def aixNode = grid.getNode('aixnode:8811') grid.transferFile(aixNode, "/opt/test.zip", node, "C:/test.zip", null, null) grid.transferFile(node, "C:/test.zip", aixNode, "/tmp/test2.zip", null, null) and running build on aix with script grid.transferFile(node, "/opt/test.zip", grid.serverNode, "C:/test.zip", null, null) grid.transferFile(grid.serverNode, "C:/test.zip", node, "/tmp/test2.zip", null, null) Both ways the entire script took about 20 seconds. Pattern will be match on the node storing the artifacts. Can you please write a script running on the checkout node to transfer a single file from AIX to see how it performs?
groovy: grid.transferFile(aixNode, "/path/to/a/large/file", node, "/path/to/local/file", null, null); How are the "source path" and "file patterns" specified? Sometimes, it can be very slow scanning files when a directory containing many files are specified.
Oh... I missed the last parameter after base64auth there is the additional parameter <useBuffered> which has to be true or false and indicates that BufferedInputStream and BufferedOutputStream should be used.
Also CPU usage and Memory usage is ok during download.
|
Thanks for your help.