History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3330
Type: Improvement Improvement
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Major Major
Assignee: Robin Shen
Reporter: U. Artie Eoff
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Resource Starvation

Created: 25/Jan/19 06:05 PM   Updated: 08/Apr/19 11:32 PM
Component/s: None
Affects Version/s: 8.0.33
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
Environment: Linux


 Description  « Hide
I have many configurations that require a resource that I've defined. Often I observe that some older jobs get starved from acquiring the resource when competing configurations are triggered at later times. All configurations have the same priority.

That is, say I have configurations A, B, and C and all of them require resource X. When A, B and C are queued at the same time, only one of them will acquire resource X and the others will wait until the resource is released. However, if A, B or C are queued again before the old queued jobs acquire the resource, some of the older queued jobs will not acquire the resource before the newer queued jobs.

For example, A, B, and C are queued at time 0. Let's call them A0, B0, C0. C0 aquires the resource, runs and then releases the resource. Next, A0 acquires the resource (B0 still waiting). While A0 is holding the resource, C gets queued again (let's call it C1). Then, A0 releases the resource. Next, C1 acquires the resource and B0 is STILL waiting.

In extreme cases, I've observed that queued jobs wait for hours to acquire the resource because new competing jobs continue to get queued.

It seems resources should be given to oldest running job when other jobs are competing for it.





 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [31/Jan/19 08:17 AM]
My test shows that QB is respecting the queue order. What I am done to simulate your example:

1. Define a resource "X" with only one instance on server
2. Define a configuration "root/test" and edit node selection setting of the master step to run on node with resource "X"
3. Add a sleep step inside "master" to sleep for 30 seconds
4. Create configuration "root/test/A", "root/test/B", and "root/test/C" respective to inherit settings of "root/test"
5. Manually run configuration "root/test/C", "root/test/A", and "root/test/B" in order
6. QB will run "root/test/C" first, "root/test/A" and "root/test/B" will be waiting
7. After "root/test/C" finishes, "root/test/A" will be running. "root/test/B" is still waiting (the same as you've observed)
8. While "root/test/A" is running, request another build of "root/test/C", and "root/test/C" is queued again (still the same as you've observed)
9. After "root/test/A" finishes, QB runs "root/test/B" instead of "root/test/C". This is different from what you've observed

However if "root/test/C" has a higher build priority than "root/test/B", it will be picked up for execution first and this is purpose of build priority.

U. Artie Eoff [05/Feb/19 04:44 PM]
Ok, yes I suppose it could work fine in the general case.

To be more specific about our case, we have about 40 configurations competing for resources (same build priority). Each configuration has a parallel step. We have defined a inner step inside the parallel step. This inner step has repeat parameters defined with a list of resource names (e.g. X, Y, Z). The node selection of that step uses the repeat parameter to set the resource to use (e.g. run on node with resource '${params.get("resource")}'. Each defined resource has anywhere from 4 to 6 instances available.

I will continue to observe this on my side to see if I can narrow down more details. Otherwise, if you can't reproduce, I'm not sure that there's much you can do about it.

U. Artie Eoff [08/Apr/19 06:47 PM]
Ok, feel free to close this as invalid. I will observe on my end some more to see if I can narrow down additional details to reproduce as we don't always run into this problem. If I discover anything new, I can open new ticket. Thanks.