History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3628
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Major Major
Assignee: Robin Shen
Reporter: Yoongeon Lee
Votes: 0
Watchers: 0

If you were logged in you would be able to see more operations.

Regarding QB-3627, the read timeout issue occurs only in the node where the cron scheduler is configured.

Created: 15/Oct/20 08:50 AM   Updated: 18/Oct/20 05:29 AM
Component/s: None
Affects Version/s: 10.0.24
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. Zip Archive logs - Copy.zip (52 kb)

Image Attachments:

1. 2020-10-15-02-52 PM - Cron Schedule Build.png
(126 kb)

2. 2020-10-15-02-52 PM - Timed out.png
(146 kb)

3. 2020-10-15-02-52 PM.png
(474 kb)

4. KOR-BUILD26 - Could not send reponse error 500.png
(195 kb)

 Description  « Hide

Hi, Robin

Since the timeout issue ( QB-3627) , I have been working on the following tasks.

1. Deleted quite a few old logs on each product node. ( that's because some node has a lot of build histories )
2. Paused the Cron Schedule build set on many nodes and changed the Build Condition from "if changes found in referenced repositories" to "always build".
This is because when the build condition is set to "if changes found in referenced repositories", it seems to be making more network traffic.
(It compares the latest commit & files on remote repository of github with the latest ones on the build agent. )

There was no timeout issue on the node that did the above two tasks.
Although this is not a perfect solution, I think above things I did can reduce timeout issues at least.

And I enabled cron schedule build on only one node. the read timeout issue occurs only in the node where the cron scheduler is configured.
Please see attached screenshots and log files from build agent machine.

Take a closer look at what more we need to do. this issue bother us a lot.

 All   Comments   Work Log   Change History      Sort Order:
Yoongeon Lee [15/Oct/20 10:01 AM]

The important thing is that a jetty eof exception error occurs before a timeout issue occurs. Please take a look at the log below.

jvm 2 | javax.servlet.ServletException: org.eclipse.jetty.io.EofException
jvm 2 | at com.caucho.hessian.server.HessianServlet.service(HessianServlet.java:385)
jvm 2 | at com.pmease.quickbuild.grid.GridServlet.service(GridServlet.java:36)
jvm 2 | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
jvm 2 | at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
jvm 2 | at com.pmease.quickbuild.Quickbuild$DisableTraceFilter.doFilter(Quickbuild.java:1186)
jvm 2 | at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
jvm 2 | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
jvm 2 | at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
jvm 2 | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
jvm 2 | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
jvm 2 | at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
jvm 2 | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
jvm 2 | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
jvm 2 | at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
jvm 2 | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
jvm 2 | at org.eclipse.jetty.server.Server.handle(Server.java:499)
jvm 2 | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
jvm 2 | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
jvm 2 | at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
jvm 2 | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
jvm 2 | at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
jvm 2 | at java.lang.Thread.run(Unknown Source)
jvm 2 | Caused by: org.eclipse.jetty.io.EofException
jvm 2 | at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:192)
jvm 2 | at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:408)
jvm 2 | at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:302)
jvm 2 | at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:129)
jvm 2 | at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:724)
jvm 2 | at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:246)
jvm 2 | at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:208)
jvm 2 | at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:472)
jvm 2 | at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:756)
jvm 2 | at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:794)
jvm 2 | at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:155)
jvm 2 | at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:148)
jvm 2 | at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:250)
jvm 2 | at com.caucho.hessian.io.Hessian2Output.flush(Hessian2Output.java:1615)
jvm 2 | at com.caucho.hessian.io.Hessian2Output.close(Hessian2Output.java:1654)
jvm 2 | at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:201)
jvm 2 | at com.caucho.hessian.server.HessianServlet.invoke(HessianServlet.java:399)
jvm 2 | at com.caucho.hessian.server.HessianServlet.service(HessianServlet.java:379)
jvm 2 | ... 21 more
jvm 2 | Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
jvm 2 | at sun.nio.ch.SocketDispatcher.write0(Native Method)
jvm 2 | at sun.nio.ch.SocketDispatcher.write(Unknown Source)
jvm 2 | at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
jvm 2 | at sun.nio.ch.IOUtil.write(Unknown Source)
jvm 2 | at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
jvm 2 | at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:170)
jvm 2 | ... 38 more
jvm 2 | 2020-10-14 21:18:42,534 WARN Could not send response error 500: javax.servlet.ServletException: javax.servlet.ServletException: org.eclipse.jetty.io.EofException
jvm 2 | 2020-10-14 21:23:43,431 WARN

Robin Shen [15/Oct/20 10:24 AM]
Please try below things:
1. Disable virus scan software on QB server and problematic agent if there is any
2. Reduce server load by pausing other schedules, and leave only the schedule on the problematic agent to see if it still fails
3. If the build generates large log file, change the log level to WARN in general setting of relevant configurations

Yoongeon Lee [18/Oct/20 05:23 AM]
Hi, Robin

We have added ssl encryption to the build agent, referring to the link below, and confirmed that the issue disappeared.
- https://wiki.pmease.com/display/QB10/Protect+QuickBuild+using+SSL
-> "Enable https access to agents" section.

All of the following issues have been resolved. Thank you so far.
QB-3617 There are two types error ( Read timed out and Error writing to server errors) that occur in many nodes of our quick build.
QB-3627 Two big issues occur now 1) Read timed out 2) Unable to find job
QB-3628 Regarding QB-3627, the read timeout issue occurs only in the node where the cron scheduler is configured.

Robin Shen [18/Oct/20 05:29 AM]
Thanks for the info. Seems that some security software discards QB packets for some reason, and encryption protected the packets.