<< Back to previous view |
[QB-3650] There was a deadlock in QuickBuild that caused the memory spike
|
|
Status: | Closed |
Project: | QuickBuild |
Component/s: | None |
Affects Version/s: | 9.0.46 |
Fix Version/s: | None |
Type: | Bug | Priority: | Major |
Reporter: | Phong Trinh | Assigned To: | Robin Shen |
Resolution: | Incomplete | Votes: | 0 |
Remaining Estimate: | Unknown | Time Spent: | Unknown |
Original Estimate: | Unknown |
File Attachments: | QB Server - DB Config.PNG QuickBuild_Server_Deadlock.txt |
Description |
Recently our QuickBuild server went down by itself several times. We checked the server logs and found that there was a deadlock as follow which caused the memory spike. we set wrapper.java.maxmemory=4096, I think it should be fine. I attached the server log file for your references.
======================================================================================================================================= 2020-12-17 10:17:22,470 [C3P0PooledConnectionPoolManager[identityToken->z8kfltaesj634k1iyllxx|58cae5cb)-AdminTaskTimer] WARN com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@44547507 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks! 2020-12-17 10:20:56,673 [pool-1-thread-21150] ERROR com.pmease.quickbuild.grid.GridJob - Error notifying task node of job finishing (job class: 86efdbe6-673b-4b9b-8d92-373f930e71aa, job id: com.pmease.quickbuild.stepsupport.StepProcessJob, task node: QB_SERVER.com:8810) java.lang.OutOfMemoryError: Java heap space ======================================================================================================================================= Please look into the issue and advise. Thank you in advance, ptrinh |
Comments |
Comment by Robin Shen [ 18/Dec/20 01:49 AM ] |
When OutOfMemory error occurs, please get me a memory dump by running below command as the same user running QB server process:
/path/to/jdk/bin/jmap -dump:format=b,file=/path/to/dump <QB server JVM process id> Then please put the memory dump to dropbox and send me the link |
Comment by Phong Trinh [ 22/Dec/20 03:14 PM ] |
It occurred again today, but forgot to take the memory dump prior to rebooting the server. We will do that in next one.
Thanks, Robin |
Comment by Robin Shen [ 22/Dec/20 11:17 PM ] |
In the meantime, please get a stack trace of the server when it happens:
/path/to/jdk/bin/stack <QB server JVM process id> |
Comment by Phong Trinh [ 13/Jan/21 03:54 AM ] |
The server was down again on 12/24/2020 due to that Java heap was out of memory. Unfortunately the system admin was not available to take the memory dump at that time. The issue have not occurred again since that day. I looked into the logs, it seems that number of build servers had network hiccups or down at the middle of the builds, and the server tried to connect to them. That might cause the deadlock as in the log file, and the deadlock consumed the memory and brought the server down. What do you think?
We still keep an eye on the server and build servers and will take memory dump as your advice if the issue occurs again. Thanks, ptinh |
Comment by Robin Shen [ 13/Jan/21 04:29 AM ] |
How many database connections you've configured in conf/hibernate.properties? |
Comment by Phong Trinh [ 13/Jan/21 03:23 PM ] |
We are using the PostgreSQL database. The configuration of database connections is 25 as in attached file. Thanks |
Comment by Robin Shen [ 13/Jan/21 11:16 PM ] |
Please increase it to 50 if you have many builds to run per day. |
Comment by Phong Trinh [ 14/Jan/21 03:20 AM ] |
Thanks, Robin. We are going to increase it to 50 as your advice, since there are many builds to run per day.. |
Comment by Phong Trinh [ 18/Jan/21 12:58 AM ] |
We made the change to our server yesterday. Is there any reasons that you think the configuration of the database connections might cause the issue? Thanks. |
Comment by Robin Shen [ 18/Jan/21 01:03 AM ] |
Increasing database connections may not solve the issue completely, but should be able to reduce the possibility of the issue. Especially when the site is busy. |