History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3650
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Robin Shen
Reporter: Phong Trinh
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
QuickBuild

There was a deadlock in QuickBuild that caused the memory spike

Created: 17/Dec/20 11:04 PM   Updated: 18/Jan/21 01:03 AM
Component/s: None
Affects Version/s: 9.0.46
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. Text File QuickBuild_Server_Deadlock.txt (168 kb)

Image Attachments:

1. QB Server - DB Config.PNG
(105 kb)


 Description  « Hide
 Recently our QuickBuild server went down by itself several times. We checked the server logs and found that there was a deadlock as follow which caused the memory spike. we set wrapper.java.maxmemory=4096, I think it should be fine. I attached the server log file for your references.

 =======================================================================================================================================

 2020-12-17 10:17:22,470 [C3P0PooledConnectionPoolManager[identityToken->z8kfltaesj634k1iyllxx|58cae5cb)-AdminTaskTimer] WARN com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@44547507 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
2020-12-17 10:20:56,673 [pool-1-thread-21150] ERROR com.pmease.quickbuild.grid.GridJob - Error notifying task node of job finishing (job class: 86efdbe6-673b-4b9b-8d92-373f930e71aa, job id: com.pmease.quickbuild.stepsupport.StepProcessJob, task node: QB_SERVER.com:8810)
    java.lang.OutOfMemoryError: Java heap space
=======================================================================================================================================

 Please look into the issue and advise.

 Thank you in advance,
ptrinh

 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [18/Dec/20 01:49 AM]
When OutOfMemory error occurs, please get me a memory dump by running below command as the same user running QB server process:

/path/to/jdk/bin/jmap -dump:format=b,file=/path/to/dump <QB server JVM process id>

Then please put the memory dump to dropbox and send me the link

Phong Trinh [22/Dec/20 03:14 PM]
It occurred again today, but forgot to take the memory dump prior to rebooting the server. We will do that in next one.
Thanks, Robin

Robin Shen [22/Dec/20 11:17 PM]
In the meantime, please get a stack trace of the server when it happens:
/path/to/jdk/bin/stack <QB server JVM process id>

Phong Trinh [13/Jan/21 03:54 AM]
The server was down again on 12/24/2020 due to that Java heap was out of memory. Unfortunately the system admin was not available to take the memory dump at that time. The issue have not occurred again since that day. I looked into the logs, it seems that number of build servers had network hiccups or down at the middle of the builds, and the server tried to connect to them. That might cause the deadlock as in the log file, and the deadlock consumed the memory and brought the server down. What do you think?
 We still keep an eye on the server and build servers and will take memory dump as your advice if the issue occurs again.

 Thanks,
ptinh

Robin Shen [13/Jan/21 04:29 AM]
How many database connections you've configured in conf/hibernate.properties?

Phong Trinh [13/Jan/21 03:23 PM]
We are using the PostgreSQL database. The configuration of database connections is 25 as in attached file. Thanks

Robin Shen [13/Jan/21 11:16 PM]
Please increase it to 50 if you have many builds to run per day.

Phong Trinh [14/Jan/21 03:20 AM]
Thanks, Robin. We are going to increase it to 50 as your advice, since there are many builds to run per day..

Phong Trinh [18/Jan/21 12:58 AM]
We made the change to our server yesterday. Is there any reasons that you think the configuration of the database connections might cause the issue? Thanks.

Robin Shen [18/Jan/21 01:03 AM]
Increasing database connections may not solve the issue completely, but should be able to reduce the possibility of the issue. Especially when the site is busy.