|
|
|
[
Permlink
| « Hide
]
Piotr Jedrys [24/Jan/18 11:58 AM]
Finally found it - we get 502 response from QB server.
jvm 1 | 2018-01-23 03:07:05,882 WARN - Error connecting task node 'qb:8810', will cancel running job...
jvm 1 | com.caucho.hessian.client.HessianConnectionException: 502: java.io.IOException: Server returned HTTP response code: 502 for URL: http://qb/service/node jvm 1 | at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165) jvm 1 | at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300) jvm 1 | at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171) jvm 1 | at com.sun.proxy.$Proxy47.isGridJobActive(Unknown Source) jvm 1 | at com.pmease.quickbuild.grid.GridJob$1.run(GridJob.java:85) jvm 1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152) jvm 1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) jvm 1 | at java.lang.Thread.run(Thread.java:748) jvm 1 | Caused by: java.io.IOException: Server returned HTTP response code: 502 for URL: http://qb/service/node jvm 1 | at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) jvm 1 | at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) jvm 1 | at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) jvm 1 | at java.lang.reflect.Constructor.newInstance(Constructor.java:526) jvm 1 | at sun.net.www.protocol.http.HttpURLConnection$7.run(HttpURLConnection.java:1725) jvm 1 | at sun.net.www.protocol.http.HttpURLConnection$7.run(HttpURLConnection.java:1723) jvm 1 | at java.security.AccessController.doPrivileged(Native Method) jvm 1 | at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1721) jvm 1 | at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1292) jvm 1 | at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:145) jvm 1 | ... 7 more jvm 1 | Caused by: java.io.IOException: Server returned HTTP response code: 502 for URL: http://qb/service/node jvm 1 | at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1676) jvm 1 | at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) jvm 1 | at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:132) jvm 1 | ... 7 more Can you explain why server returned 502. What should we check/fix This status code seems odd. Can you please check QB server log to see if there is any errors relating to this during this time period?
Only a warning with this :
2018-01-23 03:07:05,949 [pool-1-thread-14980] WARN com.pmease.quickbuild.grid.GridTaskFuture - Job still exists on job node and cancel command is issued (job id: f27d88f1-7bfd-42cd-a934-21c6143b9b70, bu ild id: 10140887, job node: agent:8811)... 2018-01-23 03:07:05,987 [pool-1-thread-15517] WARN com.pmease.quickbuild.grid.GridTaskFuture - Job still exists on job node and cancel command is issued (job id: 2b29eb96-1581-474a-bd6f-4e99e02e3b78, bu ild id: 10140889, job node: agent:8811)... 2018-01-23 03:07:06,828 [qtp536855282-165183] WARN org.eclipse.jetty.servlet.ServletHandler - /service/server com.pmease.quickbuild.AccessDeniedException: Access denied for connection from '172.18.0.4'. at com.pmease.quickbuild.security.SecurityHelper.checkAgentToken(SecurityHelper.java:757) at com.pmease.quickbuild.grid.GridServlet.service(GridServlet.java:34) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at com.pmease.quickbuild.Quickbuild$DisableTraceFilter.doFilter(Quickbuild.java:1076) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748) Do you have any form of proxy between QB server and agent?
yes, there is an nginx set up as a proxy (no buffering enabled on it though)
Please check at Nginx side as it seems that the 405 message is not reported by QB server.
No 405 messages in nginx . At that time only these:
10.117.225.47 - - [23/Jan/2018:03:06:33 +0000] "POST /service/connect HTTP/1.1" 200 219 "-" "Java/1.7.0_121" "-" 10.117.225.50 - - [23/Jan/2018:03:06:33 +0000] "POST /service/server HTTP/1.1" 200 73 "-" "Java/1.7.0_121" "-" 10.66.205.31 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_151" "-" 10.66.110.77 - - [23/Jan/2018:03:07:05 +0000] "POST /service/server HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.185.224 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.117.225.50 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.225.242 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.104.181 - - [23/Jan/2018:03:07:05 +0000] "POST /service/server HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.193.118 - - [23/Jan/2018:03:07:05 +0000] "POST /service/server HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.152.89 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_151" "-" 10.111.8.183 - - [23/Jan/2018:03:07:05 +0000] "GET /wicket/page?38-1.IBehaviorListener.0-&action=ajaxRefresh&random=0.10830004480240407 HTTP/1.1" 502 575 "http://qb/build/10139551/junit_report" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" "-" 10.66.127.110 - - [23/Jan/2018:03:07:05 +0000] "POST /service/server HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.104.173 - - [23/Jan/2018:03:07:05 +0000] "POST /service/node HTTP/1.1" 502 173 "-" "Java/1.7.0_121" "-" 10.66.104.173 - - [23/Jan/2018:03:07:05 +0000] "POST /service/connect HTTP/1.1" 200 219 "-" "Java/1.7.0_121" "-" 10.66.193.128 - - [23/Jan/2018:03:07:05 +0000] "POST /service/connect HTTP/1.1" 200 219 "-" "Java/1.7.0_121" "-" This seems quite odd. Are you able to reproduce this issue?
It is a random issue on our production server, (I cannot reproduce it locally), which seems like a problem of overloaded QB. Any suggestions?
If this happens when server is overloaded, please reduce server load to see if the situation is better. Several ways to reduce server load:
1. make sure all master step runs on build agents 2. if possible, manage artifacts on a build agent by using "store artifacts on specified agent" on advanced setting of the root configuration We already had both this points done.
All steps run on build agents and we store artifacts on specified agents not on master. How many build agents do you have and how many builds are running at peak time?
120 agents, and from what I could see around 50-60 builds (but that is hard to measure, maybe you know exact command how we can check how many builds are running)
You may check the queue to see how many of them are running when this issue happens. I can not find any clue why this happens. Can you please telnet against the server port from the problematic agent to see if there are any network connection issues to server?
|